In Databricks, learn how to read .snappy.parquet files of your delta tables.
TLDR
- Copy the .snappy.parquet file you want to read from the table’s location to a different directory in your storage
- Verify that the “_delta_log” folder for that table does not exist in the copied path where the Parquet file is located.
- read the .snappy.parquet file by running
spark.read.parquet()command.
Detail
Allow me to provide a concise overview of the reasons for reading a Delta table’s Snappy Parquet file, how to do so, and what to avoid when doing so.
Common reasons to directly read a .snappy.parquet file
- To forcibly access a prior version of the Delta table, which might not be accessible via
select * from table version as of Xcommand. - To restore data from older Parquet files in case there is an unreturnable data issue.
- To reverse engineer and recreate the source data by analyzing and reconstructing the operations from Parquet files that were used to write to the Delta table.
To avoid
Using read_parquet method in pandas. This is the most common solution I used to find while searching for a solution in the internet