What is EDA in Data Science?

Everything about Exploratory Data Analysis for Data Science!

Aman Kharwal
3 min readJun 1, 2023

--

Exploratory data analysis (EDA) is a Data Science concept where we analyze a dataset to discover patterns, trends, and relationships within the data. If you want to understand Exploratory Data Analysis, this article is for you. In this article, I will take you through everything about Exploratory Data Analysis (EDA) you should know as a Data Science professional.

What is EDA & How Does it Help?

Exploratory data analysis (EDA) is a Data Science concept where we analyze a dataset to discover patterns, trends, and relationships within the data. It helps us better understand the information contained in the dataset and guides us in making informed decisions and formulating strategies to solve real business problems.

For example, suppose a retail business is facing a drop in sales.

By performing an EDA on their sales data, we can explore various factors that may influence declining sales, such as changes in customer preferences, changes in market trends, or the impact of promotional campaigns.

EDA can help identify these factors, allowing us to design targeted marketing strategies and make data-driven decisions to increase sales.

Below are some resources you can follow to learn about the practical implementation of Exploratory Data Analysis:

  1. EDA using Python
  2. EDA using SQL
  3. EDA using R

Questions You Need to Ask from Data While Performing EDA

When we perform Exploratory Data Analysis, we ask questions from data using Data Science tools like Python, R, or SQL. So it would help if you always started by asking the right questions from your data while performing EDA. Below are some questions that you should always ask from your data while performing EDA:

  1. How many variables/features are present?
  2. What are the range, minimum, maximum, mean, and median values?
  3. Are the variables normally distributed or skewed?
  4. Are there any extreme values or outliers that need to be addressed?
  5. Are there any strong positive or negative correlations between variables?
  6. Which variables have the most significant impact on the target variable?
  7. Are there any seasonality or periodic patterns?
  8. Are there any increasing or decreasing trends over time?
  9. Are there any clusters or groups within the data?
  10. Are there any anomalies or unusual observations?
  11. How does the data vary across different categories or groups?

Process of EDA

It doesn’t matter which language or tool you use for EDA. Below is the process you should follow while performing Exploratory Data Analysis:

  1. Data Collection: Gather relevant data from various sources, ensuring its accuracy and completeness.
  2. Data cleaning: Perform data cleaning operations to fix missing values, manage outliers, and eliminate inconsistencies.
  3. Data Visualization: Create visual representations of data using graphs, histograms, scatterplots, or heatmaps. Visualization helps identify patterns, trends, and anomalies in the data set.
  4. Descriptive Statistics: Calculate and analyze key descriptive statistics, such as mean, median, mode, standard deviation, and quartiles. These statistics provide insight into the data’s central tendencies, distributions, and variabilities.
  5. Correlation Analysis: Explore relationships between variables by calculating correlation coefficients. This analysis helps identify dependencies and understand how changes in one variable affect others.

Summary

Exploratory data analysis (EDA) is a Data Science concept where we analyze a dataset to discover patterns, trends, and relationships within the data. It helps us better understand the information contained in the dataset and guides us in making informed decisions and formulating strategies to solve real business problems. I hope you liked this article on what is EDA in Data Science. Feel free to ask valuable questions in the comments section below.

--

--