Reference

Sparse File

A sparse file is a file whose empty (all-zero) regions, called holes, are not actually stored on disk. It reports a large logical size but consumes far fewer real blocks, so its apparent size and on-disk size differ.

APIs & internalsGeneral

Sparse File

Also known as: sparse files, file holes, sparse file

A sparse file is a file whose empty (all-zero) regions, called holes, are not actually stored on disk. It reports a large logical size but consumes far fewer real blocks, so its apparent size and on-disk size differ.

  • A sparse file's apparent (logical) size can be far larger than the disk blocks it actually uses.
  • Reading an unwritten hole returns zeros without any block being allocated for it.
  • ext4, APFS, NTFS, and F2FS support sparse files; copying with hole-unaware tools can inflate them to full size.

How sparse files work

Most modern filesystems track which blocks of a file have been written. When an application seeks past the end of a file and writes later, or explicitly punches a hole, the filesystem records the gap as metadata rather than allocating physical blocks full of zeros. Reading from a hole returns zeros, but no disk space was used to store them.

This is why a file can report a logical size (the apparent length, e.g. via `stat`'s size) that is much larger than its allocated size (the real blocks used, reported as block count). On Linux you can see both with `ls -ls` or `du --apparent-size` versus `du`. On Windows, NTFS exposes sparse files through the `FSCTL_SET_SPARSE` and `FSCTL_SET_ZERO_DATA` control codes.

Where you encounter them

Sparse files are common for virtual machine disk images, database files, container images, and pre-allocated download files. Filesystems including ext4, APFS, NTFS, and F2FS all support sparseness in some form, while older or simpler filesystems may not.

Sparse-aware tools matter: copying a sparse file with a tool that does not understand holes (for example a naive byte-for-byte copy) can expand it to its full logical size and waste space. Utilities such as `cp --sparse=always`, `rsync -S`, and `tar -S` preserve sparseness.

Why it matters for storage cleanup

Sparse files are a major reason that summing reported file sizes does not match the real free space a deletion will recover. A storage tool that counts the logical size of every file will overstate how much room large sparse files occupy, and deleting one may free far less than its displayed size suggests.

An accurate cleaner like Cleanor reasons about actual blocks consumed rather than apparent length, so the space it promises to recover reflects what the filesystem will truly return after a delete.

Related terms

Keep reading the reference.

Act on it

Guides and tools for this topic.