How to speed up querying for autocomplete fields with MongoDB

Published on, May 25, 2018

In a project I am working on I have an autocomplete input field on a form which searches in a collection of about 200000 documents of potential customers. Normal autocomplete logic is to call back-end on every key pressed. Server performs regular expression search on a database field (usually name of company) and returns first n documents which meet the criteria.

 

The problem which I encountered with usual approach was that the responses were slow. Response for particular example took half a second which was very noticeable when entering data. So, what can be done to speed up queries.

 

All examples shown are written in Ruby on Rails with Mongoid ODM for MongoDB. I have also added search_field to the model for first two examples. This field concatenates customer names with vat number and other identification numbers, which are used for searching customer.

 

 

1. Standard query

 

Standard query which is used for autocomplete field looks like this:

 

Customer.where(search_field: /{params[:search]}/i).limit(20)

 

This is standard MongoDB regular expression query. Field "search_field'' is queried for value params[:search]. "search" is parameter passed through url. In this particular case the query takes around 500ms and lag before results are shown is very noticeable. Which is expected since all 200000 documents must be read into memory for regular expression queries. 

 

 

2. Full text search index

 

Full text search is available in MongoDB since version 2.6. Full text search field can be simply defined in model:

 

index(search_field: 'text' )

 

and search can be performed in Mongoid with this expression:

 

Customer.where({ :$text => { :$search => params[:search], :$language => "none" } }).limit(20)

 

This query takes 9ms which is unnoticeable when typing on the input form field. But full text search is not usable in this scenario, since it matches only values which exactly match query parameter. Unfortunately MongoDB FTS does not support searches of partial values.

 

 

3. Array field indexing

 

One of the field types which can be used in MongoDB is Array. And it happens so, that array fields can also be indexed. From here came an idea to break search field into individual words and save them as array. Indexing on array is defined same as any other Mongoid index

 

index(search_array: 1)

 

And searches can be performed like this:

 

Customer.where(search_array: /^#{params[:search]}/).limit(20)

 

Update: Regular expression queries have changed since mongoid v4.2.0. and the query looks like this since. 

Customer.where(search_array: BSON::Regexp::Raw.new("^#{params[:search]}").limit(20)
 

Pay attention to caret at the beginning of the regular expression. It forces query to start from the beginning of the word and thus uses array index. Without it, query takes about 50 times more. Searching from the beginning of the word is normal, because users always start entering name of the company and almost never look for words in the middle of the name. This query takes around 30ms and lag is not noticeable.

 

Below is the whole solution which also includes Mongoid model definition, clearing up search field from unwanted characters and words and is able to search for multiple words when more than one word is entered in the input field.

 

class Customer
  include Mongoid::Document
  include Mongoid::Timestamps

  field   :name,     type: String
  field   :id_vat,   type: String
  field   :id_1,     type: String
  field   :id_2,     type: String
  # .. lots of other fields
  field   :search_array, type: Array

  index search_array: 1

  before_save :prepare_search

#####################
# Will prepare search array_field before document is saved to database.
#####################
def prepare_search
# downcase   
  search = UnicodeUtils.downcase(name)
# remove unwanted characters and words
  search.gsub!(/\,|\.|\)|\(|\:|\;|\?\"\'/,' ')
  search.gsub!('ltd',' ')
  search.gsub!('company',' ')
# ensure single blank between words
  search.squish!
# add id numbers
  search << " #{id_vat} #{id_1} #{id_2}"
  self.search_array = search.split(' ')
end

#####################
# Create search query
#####################
def self.search_query(search)
# downcase search
  search = UnicodeUtils.downcase(search.chomp)
# more than one search word
  if search.match(' ')
# split words into array   
    a = search.split(' ')
# query foundation   
    query = self.where(search_array: BSON::Regexp::Raw.new("^#{a.first}") )
# and other query words
    1.upto(a.size - 1) { |i| query = query.and(search_array: BSON::Regexp::Raw.new("^#{a[i]}") )}
    query
  else 
# single word query
    self.where(search_array: /^#{search}/)
  end
end

#####################
# Query method required for text autocomplete. The query must return
# two dimensional array of names and ids. We also limit query to return
# only name and id fields.
#####################
def self.search_autocomplete(search)
  self.search_query(search).only(:id, :name).limit(20).inject([]) do |result,value|
    result << [value.name, value.id]
  end
end

end

 

And this is all you need to put autocomplete field on DRG Form:
 

    80:
      name: customer_id
      type: text_autocomplete
      size: 80
      search:
        table: customer
        field: name
        method: search_autocomplete

This solution is not only faster but is probably also a lot less stressful on the MongoDB server. Querying on regular expressions takes whole collection to be searched and read into memory. With this solution only array index is used.

 

Comments