rubygems.org is now using Elasticsearch
With it’s first release on 8 February, 2010, elasticsearch is so old it almost feels nostalgic. Today, elasticsearch is used by thousands of companies for full text search, auto completion, log processing and analytics.
Rubygems.org’s search functionality was not up to the mark. One must not have high hopes while using LIKE operator anyway. It was slow, inept and you could only search for rubygem names. The new search is, at the very least, way faster.
I would consider it an achievement for elasticsearch that once I went through their documentation and watched a couple of videos, I was able to make meaningful changes to the existing implementation. Also, docs with usage examples are the best kind of docs ❤
Search multiple fields
By default, queries will be matched against name, description and summary of gems. Each matched result has a relevance score, which is determined by following three factors:
Term Frequency - More often the query term appears in name, description or summary, higher is the relevance score.
Inverse Document Frequency - Terms that appear in many documents (words like the, a, gems etc) get lower score than more uncommon terms.
Field length norm - Match in longer fields like summary and description get lower score than name. (Read more)
It gets better ;) We are using gem downloads to boost search results’ score. It ensures that popular and community approved gems rank higher on the search page.
Amazon elasticsearch service’s Elasticsearch 5 support came just in time for us to update our instance before we switch to using it. Elasticsearch 5 has made some nifty improvements in it’s relevance scoring.
Filter and advanced search page
In elasticsearch, query string supports a mini-language of its own, which can be used to customize your query. Filter options presented on search page already provide useful defaults to limit your search just to match name, description, summary, updated in last one week or in last one month.
You can use AND and OR operator to combine your query terms. Query string also supports wildcards, fuzziness, ranges, boosting and much more. Check out our advanced search page when you are in mood for a little experimentation.
Suggestions and analyzers
Have you been feeling too lazy to type all the characters of your query correctly? Fear not! Elasticsearch suggestions have your back. When no matching results are found, the search page will suggest queries you probably wanted to type.
Elasticsearch’s prepackaged analyzers have sensible defaults and can be used directly. Analyzers ensure that when you search for rails async, async rails or async-rail, you get the same result back. We use pattern analyzer over name and english analyzer over summary and description.
Elasticsearch explain API came in handy while tracking scoring issues. The rails application templates on elasticsearch-rails is a great resource for reference. The elasticsearch-model gem also has some nice examples.
A considerable amount of work is yet to be done. We are yet to switch rubygem.org search api to elasticsearch. We already have a PR for adding autocomplete to the search field, however I am a bit skeptical about use of typehead.js. I am not sure if we need a suggestion engine when something usable can be written with much less code.
Feel free to hit up our issue board if you encounter any issue or have suggestions for improvement.