Locality Sensitive Hashing
Published:
Locality-sensitive hashing (LSH) reduces the dimensionality of high-dimensional data, as it is the text data. LSH hashes input items so that similar items map to the same “buckets” with high probability (the number of buckets being much smaller than the universe of possible input items). LSH differs from conventional and cryptographic hash functions because it aims to maximize the probability of a “collision” for similar items. Locality-sensitive hashing has much in common with data clustering and nearest neighbor search.
See also
Material
Papers
- Datar, M.; Immorlica, N.; Indyk, P.; Mirrokni, V.S. (2004). Locality-Sensitive Hashing Scheme Based on p-Stable Distributions. Proceedings of the Symposium on Computational Geometry.
- Pauleve, L.; Jegou, H.; Amsaleg, L. (2010). Locality sensitive hashing: A comparison of hash function types and querying mechanisms. Pattern recognition Letters.
- Buhler, J. (2001). Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics, 17(5), 419-428.
- Slaney, M., & Casey, M. (2008). Locality-sensitive hashing for finding nearest neighbors [lecture notes]. IEEE Signal Processing Magazine, 25(2), 128-131.
- Kulis, B., & Grauman, K. (2012). Kernelized locality-sensitive hashing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(6), 1092-1104.
- Satuluri, V., & Parthasarathy, S. (2012). Bayesian locality sensitive hashing for fast similarity search. Proceedings of the VLDB Endowment, 5(5), 430-441.
- Koga, H., Ishibashi, T., & Watanabe, T. (2007). Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing. Knowledge and Information Systems, 12(1), 25-53.
Books
- Rajaraman, A.; Ullman, J. (2010). Mining of Massive Datasets, Ch. 3. Cambridge University Press