In this post I’ll detail some notes I accumulated while designing a browser-based plagiarism detector.
So if you want to quickly search for items in a dataset. A hacky way to achieve
access the fields via key.
Interestingly V8 (used in node.js and the Chrome browser) doesn’t use such a data-structure. Instead opting for a strategy called Hidden Classes. But it still exhibits the same performance characteristics.
The MIT course goes into Hashing algorithms as they are used to decide where in the Hash table to put the information. We studied rabin fingerprinting which uses a mathematical tool called a rolling-hash. Essentially hashing is a computationally expensive operation. But once you perform it once, with the rabin fingerprint, you can compute the hash for a string that differs by one character in O(1) time. A more in depth paper is available.
strstr method. But we do have indexOf.
A very cute implementation of strstr could be made
In any case, surely the rabin-karp algo is better than iteratively calling
indexOf on a string?! wrong. Unless I did something gross in the implementation,
indexOf like so is faster.
On my machine the above code can run on
/usr/share/dict/words in 10ms
whereas my rabin-karp algo takes 238ms. File is test.js
So why is V8 so fast? Under the hood V8 implements
indexOf using the
along with some heuristics for different input sizes.
I’d like to mention JSInspect which is a very cool tool in the same spirit.