APIs & internalsGeneral

File-Level Deduplication

Also known as: file dedup, single instance storage, file-level dedup, file level deduplication

File-level deduplication treats each whole file as the unit: it fingerprints entire files and keeps only one copy of any that are identical. Also called single-instance storage, it is simpler than block-level dedup but only catches files that match in full.

Treats each whole file as the unit and keeps one copy of identical files (single-instance storage).
Simpler and faster than block-level dedup but misses partial overlaps between files.
It is the model behind duplicate-file cleanup on phones, where storage does not dedupe on its own.

How file-level deduplication works

File-level deduplication (or *single-instance storage*) hashes each file in full and compares the fingerprints. When two files produce the same hash, and ideally pass a confirming byte-for-byte check, the system keeps one copy and replaces the others with references or simply flags them for removal. It is the model behind email systems that store one copy of an attachment sent to many recipients, and behind duplicate-file finders.

It is lighter than block-level dedup: the index holds one entry per file rather than per chunk, and there is no need to manage content-defined chunk boundaries. The trade-off is granularity. If two files differ by even a single byte, their hashes differ and neither is deduplicated, so partial overlaps between files are not captured the way block-level dedup captures them.

File-level vs. block-level deduplication

The practical difference is the unit of comparison. Block-level dedup can collapse shared chunks inside otherwise-different files and reclaims more space, at the cost of CPU and a much larger hash index. File-level dedup only removes exact whole-file duplicates but is fast, predictable, and cheap to run, which makes it the right fit for a phone.

This is the model behind duplicate-file cleanup in Cleanor: it hashes your photos, videos, and downloads, groups exact matches, confirms them, and lets you delete the redundant copies. Because phones do not deduplicate storage automatically, a file-level pass is the most reliable way for a user to reclaim space lost to identical copies of the same screenshot, download, or saved image.

File-Level Deduplication

File-Level Deduplication

How file-level deduplication works

File-level vs. block-level deduplication

Related terms

Act on it