Liquid Clustering: First Impressions

Current challenge

When designing your lakehouse tables, defining the partition strategy can be challenging.

The general rules for partitioning and ZORDER columns are known [1], but, not infrequently, data requirements, growth, and usage change over time.

That can present a challenge to the previously defined and fixed data layout, making workloads inefficient.

Presented solution

Databricks has announced a new feature for Delta Lake 3.0 called Liquid Clustering [2].

This new data management technique can adapt the data layout based on changing patterns, making table design and management easier [2]. It also clusters new data incrementally [3].

In practice, it is only necessary to select as clustering keys the columns that will be queried more often.

Besides the configuration benefits (both initial and ongoing), it is also stated that there is a 2.5x faster ingestion time with a 1 TB table.

Read More