Tag: Spark

What If Everything You Know About Reality Is Wrong?

My journey into the quantum world began with a simple question: how can something be in two places simultaneously? This seemingly nonsensical notion, central to the concept of superposition in quantum mechanics, piqued my curiosity and led me down a rabbit hole of exploration. Delving into books,...

This Brilliantly Simple Wind Turbine Could Spark A Revolution

Renewable energy only accounts for around 11% of global energy production. This is a staggeringly low figure, considering that if we are to reach net-zero by 2050 and save the world, renewable energy needs to account for at least 60% of global energy production in less than seven years’ time. ...

The Small Island Man That Helped Spark a Revolution

You might be familiar with Secretary of Transportation Pete Buttigieg. This 41-year-old former mayor of South Bend was born into a relatively comfortable life with well-educated parents in Indiana. However, Pete’s father’s early life was quite contrasting. He was born in 1947 and grew up...

Introduction to ???Partition??? in ???Apache Spark???

What is the “Importance” of “Partition”? “Apache Spark” is known for its “Speed”. The “Fast Speed” of “Computing” comes from the “Parallel Processing”. “Partition” is the “Key” for &ld...

Spark Performance Tuning: Spill

Spill problem happens when the moving of an RDD (resilient distributed dataset, aka fundamental data structure in Spark) moves from RAM to disk and then back to RAM again. Simply put, this behavior occurs when a given data partition is too large to fit within the RAM of the executor. Spark w...

Spark Performance Tuning: Spill

Spill problem happens when the moving of an RDD (resilient distributed dataset, aka fundamental data structure in Spark) moves from RAM to disk and then back to RAM again. Simply put, this behavior occurs when a given data partition is too large to fit within the RAM of the executor. Spark w...

Writing PySpark logs in Apache Spark and Databricks

The closer your data product is getting to the production, the bigger is the importance of properly collecting and analysing logs. Logs help both during debugging in-depth issues and analysing the behaviour of your application. For general Python applications the classical choice would be to use ...