Dictionary of Algorithms and Data Structures (1998)(xlinux.nist.gov) |
Dictionary of Algorithms and Data Structures (1998)(xlinux.nist.gov) |
I once implemented it in typical scenario where sales people had to look for a client, but it could be written as: 1/ That Client With A Strange Name, Ltd. 2/ The Client With Strange Name, Ltd. 3/ That Client With A Strange Name 4/ [etc]
It worked really well and avoided lots of duplicated entries.
As if to prove its utility, https://xlinux.nist.gov/dads/HTML/Levenshtein.html lists most of the same algorithms under "See also".
Besides, related to this, does anybody know of a good Javascript implementation of a 3-way merge of strings, and perhaps also of JSON-like structures?
So if your string is "abc", you'd follow the node for "a", then the one for "b", but _also_ the one for "c", at a cost of 1 (because dropping the "b" incurs an edit distance of 1).
I'm not sure I see what you're driving at there. If you had a finite set of strings that you might have to compare, you could (for example) populate a graph or something with weighted edges representing the Levenshtein distance between strings (vertices). Offhand, it seems like your search could basically use a hash table to find the position of the vertex representing your string on an already-populated adjacency list.
It'd be big, but in reality you'd probably only populate the edges with especially high or low weights, depending on the application?
I was wondering whether someone could recommend similar resources (or books) for parallel algorithms. These are still very underrepresented on websites and in books, often just an addition or mentioned in passing by.
Any recommendations?
It's worth noting that some standard libraries (Java's JDK for one [1]) will cache the value of String.GetHashCode(), meaning string lookup in a HashTable is constant time average (but O(n) worst-case due to collisions).
Quick edit: Our data sources are handwritten and typed names, often transcribed by a second party. So algorithms that detect transposition errors as well as phonetic errors are really helpful.
Building the index, or adding to it, is not fast (without some heuristics applied) but searching using the index isn't bad.
But what if a search needs to be fast regardless of whether the string has been searched for before?
I remember some discussion of this in previous HN threads but I don't know what it was about...