Software Engineering Team CU Dept. of Biomedical Informatics

Blog

Showing 1 of 19 results
Clear search

2024

Parquet: Crafting Data Bridges for Efficient Computation

Apache Parquet is a columnar and strongly-typed tabular data storage format built for scalable processing which is widely compatible with many data models, programming languages, and software systems. Parquet files (typically denoted with a .parquet filename extension) are typically compressed within the format itself and are often used in embedded or cloud-based high-performance scenarios. It has grown in popularity since it was introduced in 2013 and is used as a core data storage technology in many organizations. This article will introduce the Parquet format from a research data engineering perspective.

2023

2022