A Data Scientist???s Essential Guide to Exploratory Data Analysis

Exploratory Data Analysis (EDA) is the single most important task to conduct at the beginning of every data science project.

In essence, it involves thoroughly examining and characterizing your data in order to find its underlying characteristics, possible anomalies, and hidden patterns and relationships.

This understanding of your data is what will ultimately guide through the following steps of you machine learning pipeline, from data preprocessing to model building and analysis of results.

The process of EDA fundamentally comprises three main tasks:

Step 1: Dataset Overview and Descriptive Statistics
Step 2: Feature Assessment and Visualization, and
Step 3: Data Quality Evaluation

As you may have guessed, each of these tasks may entail a quite comprehensive amount of analyses, which will easily have you slicing, printing, and plotting your pandas dataframes like a madman.

A Data Scientist???s Essential Guide to Exploratory Data Analysis

The process of EDA fundamentally comprises three main tasks:

Related posts

Recent posts