Django Haystack AutoQuery: no results found with HAYSTACK_DEFAULT_OPERATOR = 'AND'

advertisements

(update: issue is specific to use of AutoQuery)

As far as I can tell from the docs, and also from looking at the source, the HAYSTACK_DEFAULT_OPERATOR setting is supposed to control how .filter(...) clauses are combined when chained together on the queryset.

But when I use AutoQuery additionally it seems to be controlling whether the all words match or any word in the phrase matches. (I'm on ElasticSearch)

For example:

HAYSTACK_DEFAULT_OPERATOR = 'OR'
sqs = SearchQuerySet().filter(content=AutoQuery('some of these words are in my content'))
sqs.count() = 53

HAYSTACK_DEFAULT_OPERATOR = 'AND'
sqs = SearchQuerySet().filter(content=AutoQuery('some of these words are in my content'))
sqs.count() = 0
sqs = SearchQuerySet().filter(content=AutoQuery('all these words are in the content'))
sqs.count() = 1

Weirdly, using filter_or or filter_and doesn't seem to make any difference. eg

HAYSTACK_DEFAULT_OPERATOR = 'AND'
sqs = SearchQuerySet().filter_or(content=AutoQuery('some of these words are in my content'))
sqs.count() = 0

The answer must be in the haystack source code somewhere and I'll keep looking, at the very least it seems a deficiency in the docs...

Is this supposed to happen? Is there a way for chaining filters defaulted to AND while still matching any word in an AutoQuery?


Unfortunately, no. Part of the purpose of AutoQuery is to split your query up by words and make each word a separate query. The operator it uses to do this is dependant on your HAYSTACK_DEFAULT_OPERATOR setting, which is one of the most significant reasons to use it (HAYSTACK_DEFAULT_OPERATOR is the only way to affect the operator used for AutoQuery, as far as I know).

When you add filter_or to it, all that happens is that your chain of ANDs gets wrapped in an OR. filter_or is useful for chaining multiple filter queries together when AND is the default operator, but in this case it won't do what you're expecting it to do.

I had a similar issue, and the way I ended up defeating it was by somewhat implementing it myself. You can use a parser library (I landed on shlex, the Python library for parsing shell strings, because it considers things inside quotes as being a single token), and then use that to construct your own autoquery.

sqs = SearchQuerySet()
sq = None
keywords = 'some of these words are in my content'
for phrase in shlex.split(keywords):
    if not sq:
        sq = SQ(content=phrase)
    else:
        sq |= SQ(content=phrase)
sqs = sqs.filter(sq)

Modifying the SQ with |= (or operator) will make the SQ object chain the resulting calls with OR as the operator. You can also use &= for chained ANDs.

Also, once you've used the SQ object to filter, you can do whatever you want. If your default operator is an AND, any filters that you chain after the filter on the SQ object will be an AND filter. Double alternately, you can just use filter_and to specify that a specific filter should be ANDed onto the query.