Page MenuHome

Search: Improve results by using fuzzy matching and prefix matching.
ClosedPublic

Authored by Jacques Lucke (JacquesLucke) on Sep 7 2020, 3:54 PM.

Details

Summary

This adds a new fuzzy string matching functionality to blenlib in BLI_string_search.h.

At the core of the fuzzy matching is the damerau levenshtein distance. However, just using that algorithm does not solve the problem. The rest of the implementation is loosely based on other searches I regularly use (e.g. the one in vscode) and what felt right. Finding "good" matches that make sense to users does not seem to be an exact science (at least I haven't found a good resource yet). The current approach works quite well in my tests, but we might have to fine tune how strings are matched in the future.

Besides normal fuzzy search, "prefix based searching" is supported as well (not sure what this is officially called). This allows us to search for "iat" and get "Install Application Template" or to search for "anitr" and get "Animated Transforms to Deltas".

I updated a couple of different places where we do search to use the new system. That includes:

  • F3 search
  • ID data block search (e.g. in the material selector)
  • Enum search operator (also includes searches added via the Python API)
  • Node name search (ctrl+f)

One surprisingly important aspect of this is performance. In release builds performance is perfectly fine (~0-2ms), no issue there. However, debug build performance was not great at first (~100ms after every change). I implemented a couple of tricks to skip computations to make it faster, so it is also perfectly usable in debug builds now, but that is something we have to keep an eye on. In the future we could make things faster by doing some preprocessing at the beginning of each search and/or by using multithreading (which should be relatively easy to use).

The algorithm is utf8 aware in the sense that the fuzzy matching works on code points (1-4 bytes each) instead of individual bytes.

Diff Detail

Repository
rB Blender
Branch
string-matching (branched from master)
Build Status
Buildable 10050
Build 10050: arc lint + arc unit

Event Timeline

Jacques Lucke (JacquesLucke) requested review of this revision.Sep 7 2020, 3:54 PM
Jacques Lucke (JacquesLucke) created this revision.
Jacques Lucke (JacquesLucke) edited the summary of this revision. (Show Details)
  • Merge branch 'master' into string-matching
Jacques Lucke (JacquesLucke) retitled this revision from Search: Implement fuzzy search for operator menu search (WIP) to Search: Implement fuzzy matching for operator menu search..Sep 7 2020, 4:53 PM
Jacques Lucke (JacquesLucke) retitled this revision from Search: Implement fuzzy matching for operator menu search. to Search: Improve results by using fuzzy matching and prefix matching..
Jacques Lucke (JacquesLucke) planned changes to this revision.Sep 7 2020, 8:10 PM

Have to investigate what is happening when searching for "cube".

  • improve string match scoring
  • improve word splitting (also split on arrow symbol now)
Pablo Vazquez (pablovazquez) accepted this revision.EditedSep 8 2020, 1:10 PM

Nothing to add! This is just amazing as it is right now. Thanks for working on it.

In the video below I'm searching for gpencil and get all Grease Pencil commands. Also trying the first-letter of each word like bmgp to get Bake Mesh to Grease Pencil. It's awesome!

Looking forward to seeing this in more places! Like data-block search, property-search, nodes search, outliner, etc. Do you need us to make a list or will this be grabbed by these cases automagically?

And as we talked, saving the most-recent searches to be listed first :D

This revision is now accepted and ready to land.Sep 8 2020, 1:10 PM
  • rename to BLI_string_search
  • simplify c api
  • use new string search api in more places

Works great! The API is simple and can be used in more places as needed. Should there ever be performance issues, there are various ways of optimizing Levenshtein based string matching. But I don't think we'll run into serious performance issues anytime soon.

The only design tradeoff I see is with data-block search. Currently we keep items ordered alphabetically. That changes with this patch, since the "best" matches are displayed first. This seems fine still and we can always change or fine tune that if needed.

@Jacques Lucke (JacquesLucke)
It looks like fuzzy search is too excessive sometimes, when I search proper word.
In this case, Instead of relax it searches [rela* + all variants like ?relax, r?elax etc]

it is fun that using wildcard makes search strict (as i need)

this is good example of typo-compensation

Is it possible to add checkbox for strict search?
Or even smarter: use fuzzy wide search only in case if input wasn`t found as is.
If it is found (and it means word exist), then serch only for it, and *relax + relax*, but exclude rela* and other fuzzy variants.