Parquet
Also known as: .parquet file, Apache Parquet, columnar storage
Parquet is an open columnar storage format for large datasets, widely used in data analytics and big-data tools. By storing data column by column with compression, it makes analytical queries fast and files much smaller than CSV.
- Open columnar storage format for analytics
- Compresses well; smaller and faster than CSV
- Binary; read with data tools, not a text editor
Why columnar storage matters
A CSV stores data row by row. Parquet stores it column by column, so a query that reads only a few columns skips the rest entirely, and similar values in a column compress extremely well.
That design makes Parquet the default for analytics engines and data lakes. It is a binary format, not human-readable, and is meant to be read by data tools rather than opened in a text editor.
Parquet vs CSV and other formats
Compared with CSV, Parquet files are typically far smaller for the same data and much faster to query, while also preserving column data types. The trade-off is that you need a library or tool to read them.
Within the big-data world, Parquet is column-oriented while Avro is row-oriented; the two are often used together in data pipelines.