Delta-RS and DuckDB ??? Read and Write Delta Without Spark

I have used Apache Spark (often as Azure Databricks) for some years and see it as a sledgehammer in data processing. It is a reliable tool built on JVM, which does in-memory processing and can spin up multiple workers to distribute workload to handle various use cases. It does not matter: whether small or considerable datasets to process; Spark does a job and has a reputation as a de-facto standard processing engine for running Data Lakehouses.

There is an alternative to Java, Scala, and JVM, though. Open-source libraries like delta-rsduckdbpyarrow, and polars written in more performant languages. These newcomers can act as the performant option in specific scenarios like low-latency ETLs on small to medium-size datasets, data exploration, etc.

This article is a form of POC exploration with a bit of benchmark to see what else is currently achievable outside of spark.

Website

Tags: Spark Without