Reference

Hamming Distance

Hamming distance is the number of positions at which two equal-length bit strings differ. For image hashes, it counts how many bits two fingerprints disagree on, giving a simple, fast similarity score: a small distance means the images look alike, a large distance means they do not.

APIs & internalsGeneral

Hamming Distance

Also known as: hamming distance hashes, bit distance

Hamming distance is the number of positions at which two equal-length bit strings differ. For image hashes, it counts how many bits two fingerprints disagree on, giving a simple, fast similarity score: a small distance means the images look alike, a large distance means they do not.

  • Hamming distance counts the differing bits between two equal-length strings, computed via XOR plus a popcount.
  • On a 64-bit image hash, 0 means identical and 64 means every bit differs.
  • Dedup engines set a small threshold (often 5 or fewer differing bits) to flag near-duplicate images.

How it is computed

For two binary values, Hamming distance is calculated by XOR-ing them and counting the set bits in the result (a popcount). For example, 1011 and 1001 differ in one position, so their Hamming distance is 1.

On a 64-bit perceptual hash, the distance ranges from 0 (identical hashes) to 64 (every bit flipped). Modern CPUs have a dedicated popcount instruction, so comparing two hashes takes only a few operations, which is why hash-based dedup scales to huge photo libraries.

Choosing a threshold

Image dedup engines pick a threshold below which two hashes are treated as the same image. A distance of 0 means the fingerprints are identical; a small value such as 5 or less out of 64 typically means near-duplicate, while higher distances indicate different images.

The right threshold depends on the hashing algorithm (dHash, pHash) and how aggressive you want grouping to be. A lower threshold favors precision (fewer false matches); a higher one favors recall (catching more edited copies) at the risk of grouping unrelated photos.

Where Cleanor uses it

Cleanor compares the perceptual hashes of your photos using Hamming distance to decide which images belong in the same similar-photo group. The threshold is tuned so genuine near-duplicates cluster together while distinct shots stay separate.

Because the comparison is just a bit count, Cleanor can evaluate millions of hash pairs quickly and entirely on-device, keeping similar-photo and duplicate detection fast and private.

Related terms

Keep reading the reference.

Act on it

Guides and tools for this topic.