Exploratory Data Analysis (EDA) is the single most important task to conduct at the beginning of every data science project.
In essence, it involves thoroughly examining and characterizing your data in order to find its underlying characteristics, possible anomalies, and hidden patterns and relationships.
This understanding of your data is what will ultimately guide through the following steps of you machine learning pipeline, from data preprocessing to model building and analysis of results.
The process of EDA fundamentally comprises three main tasks:
- Step 1: Dataset Overview and Descriptive Statistics
- Step 2: Feature Assessment and Visualization, and
- Step 3: Data Quality Evaluation
As you may have guessed, each of these tasks may entail a quite comprehensive amount of analyses, which will easily have you slicing, printing, and plotting your pandas dataframes like a madman.