Skip to main content

SimHash

2019


Near-duplicate with SimHash

·4 mins

Before talking about SimHash, let’s review some other methods which can also identify duplication.

Longest Common Subsequence(LCS) #

This is the algorithm used by diff command. It is also edit distance with insertion and deletion as the only two edit operations.