Tag: Apache

Setting up Apache-Airflow in Windows using WSL 2

In the previous story, you learned to set up Ubuntu 20.04 on Windows 10 as Linux Subsystem Distribution. In this article, I will walk you through the installation process of Apache Airflow in WSL 2 using a virtual environment. Installation of pip on WSL 2 To set up a virt...

Unlocking the Power of Spark on Kubernetes with Apache YuniKorn

On October 12, 2023, a significant event took place at the LinkedIn office in Bangalore, Karnataka. The Hadoop MeetUp featured a variety of engaging talks and discussions on cutting-edge technologies. Among them, one talk that stole the spotlight was “Unlocking the Power of Spark on ...

How to Set Up Apache in a Docker Container on Ubuntu 22.04

Setting up Apache in a Docker container on Ubuntu 22.04 can be a straightforward process if you follow the step-by-step tutorial below. Docker allows you to isolate applications within containers, making it easier to manage and deploy them across different environments. Step 1: Install Docker ...

Introduction to ???Partition??? in ???Apache Spark???

What is the “Importance” of “Partition”? “Apache Spark” is known for its “Speed”. The “Fast Speed” of “Computing” comes from the “Parallel Processing”. “Partition” is the “Key” for &ld...

Apache Airflow: Custom Task Triggering for Efficient Data Pipelines

Apache Airflow is an indispensable tool for orchestrating data pipelines, making it a must-know tool for any data engineer in 2023. Like any tool, Airflow has its advantages and disadvantages. While it boasts excellent built-in functionality, there are situations where custom solutions are required ...

Apache Doris 2.0.0 is Production-Ready!

We are more than excited to announce that, after six months of coding, testing, and fine-tuning, Apache Doris 2.0.0 is now production-ready. Special thanks to the 275 committers who altogether contributed over 4100 optimizations and fixes to the project. This new version highlights: Auto-sy...

A beginner???s guide to using Apache Hudi for data lake management.

Data lakes have become an essential part of data management in today’s organisations. They provide a centralised repository that can store structured and unstructured data at any scale. However, managing data lakes can be a challenging task, especially for beginners. Apache Hudi is an open-sou...