How to read a delta table???s .snappy.parquet file in databricks

In Databricks, learn how to read .snappy.parquet files of your delta tables.

TLDR

  1. Copy the .snappy.parquet file you want to read from the table’s location to a different directory in your storage
  2. Verify that the “_delta_log” folder for that table does not exist in the copied path where the Parquet file is located.
  3. read the .snappy.parquet file by running spark.read.parquet() command.

Detail

Allow me to provide a concise overview of the reasons for reading a Delta table’s Snappy Parquet file, how to do so, and what to avoid when doing so.

Common reasons to directly read a .snappy.parquet file

  1. To forcibly access a prior version of the Delta table, which might not be accessible via select * from table version as of X command.
  2. To restore data from older Parquet files in case there is an unreturnable data issue.
  3. To reverse engineer and recreate the source data by analyzing and reconstructing the operations from Parquet files that were used to write to the Delta table.

To avoid

Using read_parquet method in pandas. This is the most common solution I used to find while searching for a solution in the internet

Visit Now

Tags: Databricks