Perceptual Hash
Also known as: phash, image fingerprint
A perceptual hash (pHash) is a compact fingerprint, usually a 64-bit value, computed from an image's visual structure rather than its raw bytes. Unlike cryptographic hashes, two images that look alike produce similar hashes, so small edits, resizes, or re-compressions change only a few bits.
- A perceptual hash is typically a 64-bit fingerprint derived from low-frequency DCT coefficients of a downscaled grayscale image.
- Visually similar images produce hashes that differ by only a few bits, measured via Hamming distance.
- Unlike MD5 or SHA-256, a small visual change produces a small hash change rather than a completely different value.
How a perceptual hash is built
A typical pHash pipeline first shrinks the image to a small grayscale grid (often 32x32) to discard color and fine detail. It then applies a Discrete Cosine Transform (DCT), which converts the pixel grid into frequency components.
The algorithm keeps the top-left block of low-frequency DCT coefficients (commonly the upper 8x8), computes their median, and sets each output bit to 1 if its coefficient is above the median and 0 otherwise. The result is a 64-bit hash that captures the image's coarse visual layout while ignoring noise and minor changes.
Why it tolerates edits
Because pHash is driven by low-frequency structure, operations that humans barely notice (slight scaling, JPEG re-encoding, brightness tweaks, watermarks) leave most coefficients in the same relative order. The hash therefore changes by only a handful of bits.
To decide whether two images are near-duplicates, you compare their hashes with Hamming distance (the number of differing bits). A small distance means visually similar; a large distance means unrelated. This threshold-based comparison is what powers similar-photo grouping rather than strict byte equality.
Where Cleanor uses it
Cleanor uses perceptual hashing to find similar photos in your camera roll, not just exact duplicates. Edited copies, screenshots of the same shot, and lightly re-saved images all hash close together, so Cleanor can cluster them for review.
Computing a pHash is cheap and the resulting fingerprint is tiny, so Cleanor can scan thousands of photos on-device and compare them quickly without uploading anything. This makes perceptual hashing the backbone of similar-photo detection alongside faster pre-filters like dHash.