Spill problem happens when the moving of an RDD (resilient distributed dataset, aka fundamental data structure in Spark) moves from RAM to disk and then back to RAM again.
Simply put, this behavior occurs when a given data partition is too large to fit within the RAM of the executor. Spark w...