Don’t Waste Your Time With MySQL Full-Text Search

Written by ngrilly | Published 2017/10/05
Tech Story Tags: programming | mysql | postgresql | full-text-search | innodb

TLDRvia the TL;DR App

I spent precious hours trying to use the InnoDB full-text search feature included in MySQL since version 5.6, to eventually discover the use cases where it works are rather limited

As an illustration, let’s create a very basic table:

CREATE TABLE document (id int PRIMARY KEY,content longtext NOT NULL,FULLTEXT KEY (content))

The table is filled with 2 million rows. The average length of content is 2 kilobytes.

Now, let’s run a simple query:

SELECT idFROM documentWHERE MATCH(content) AGAINST ('commercial' IN BOOLEAN MODE)LIMIT 50

It’s executed in 50 ms. No problem.

Now, let’s combine filtering on a “normal” column and full-text search in the same query:

SELECT idfrom documentWHERE MATCH(content) AGAINST ('commercial' IN BOOLEAN MODE)AND id > 10000000LIMIT 50

It’s executed in 50 seconds, which makes it unusable…

Please note this is the same query as above, with only one additional condition (id > 10000000). Moreover, the query doesn’t return any rows, because the additional condition excludes all rows (all IDs are below 10,000,000).

Why is it so slow? It’s because the term “commercial” used in my query is common (~300,000 documents contain the term) and because MySQL is not good at merging indexes. Basically, MySQL uses the full-text index to lookup for the term “commercial”, and then it does a kind of nested join to lookup for the 300,000 rows and check the condition on id. The former is very quick, but the latter is extremely slow.

It would be really great for MySQL to be able to combine multiple index using an in memory bitmap. This would make the built-in full-text search a lot more useful. In the meantime, I have to find an alternate solution.

If you have any idea about how to solve this without switching to another database engine, please reach out to me.

Thanks to Carlos Otero Barros for reporting a typo.


Published by HackerNoon on 2017/10/05