Fuzzy Searching With Ferret
Saturday, September 20th, 2008I’ve spent the better part of this afternoon trying to suss out a fuzzy searching system for my ruby on rails application. What I want to do is return results that include slightly miss-spelled words. I started playing with sphinx, but eventually realised that “fuzzy” in the land of sphinx really just means wildcards. So I settled on Ferret with the acts as ferret (AAF) rails plugin.
It was a bit of a battle to work out how to trigger a fuzzy search through AAF, and then a complete guess to work out how to change the minimum similarity score. So for your reference and mine, when making a multiple word fuzzy search using acts as ferret:
- Suffix the two terms with a tilde (~) to indicate a fuzzy search
- Suffx the tilde with a minimum similarity threshold between [0,1] to override the default threshold
- Replace spaces with + signs. I’m not 100% sure on this one as I would have thought that surrounding the terms with quotes (”) would turn the query in to a fuzzy phrase search, but the results with quotes don’t match my thinking
E.g.
- Company.find_with_ferret(’Sandalfr+Wine~0.7′)
- Company.find_with_ferret(’name:Sandalfr+Wine~0.7′) # Search a column
Note that I have experienced, and read that others have also found Ferret to be unstable. Fortunately I only need ferret for offline processing. Sphinx looks really good for all other types of textual queries except for fuzzy searches.
