EDA for Data Analysis or Data Visualization is very important. It gives a brief summary and main characteristics of data. According to a survey, Data Scientist uses their most of time to perform EDA tasks. EDA involves a lot of steps including some statistical tests, visualization of data using different kinds of plots, and many more. Some of the steps of EDA are discussed below: : It can be done using some Pandas library functions i.e. Data Quality Check df.describe() , df.shape , df.info(), df.dtypes() These functions are generally used to find missing values, duplicate values, features, data-types, summary of data, etc. : Some statistical test i.e. Pearson correlation,Spearman correlation, Kendall test etc is done to get correlation between features . What I mean to correlation is that how one feature is dependent on other feature. It can be done in Python using Statistical Test stats library. : Some quantitative test is used to find the spread of numerical features, count of categorical features. It can be implemented in Python using the functions of the pandas library. Quantitative Test : Feature visualization is very essential to get an understanding of the data. Graphical techniques like bar plots, pie charts are used to get an understanding of categorical features, whereas scatter plots, histograms are used for numerical features. Visualization To perform the above-mentioned tasks we need to type several lines of code. Here the open-source library comes into the play, which can perform all these tasks using just 1 line of code. pandas-profiling Wow! Just one line of code!🤔 Yes, you read it correct only one line of code. It’s possible in Python using it’s open-source library. Also the result of EDA using pandas-profiling can be displayed in a Jupyter notebook or can be converted to an HTML page. pandas-profiling Now, without wasting any time let’s see how to do this😲 Installation: There are many ways to install Pandas-profiling library but we’ll use simplest one using pip : pip install pandas-profiling Import libraries: To use the pandas-profiling library for EDA, we need to import some required libraries: pandas pd numpy np pandas_profiling ProfileReport import as import as from import Now EDA using one line code: = ProfileReport(pd.read_csv(’titanic.csv’),title= ,html={ : { : }}, sort= )) profile 'Pandas Profiling Report' 'style' 'full_width' True "None" Yes, that’s it, we’ve completed with exploratory data analysis. Results can be observed in Jupyter notebook or Google colab itself or the file can be saved in HTML format and used in a web browser. # view result jupyter notebook google colab profile. # save results pandas-profiling a HTML file profile. to in or to _widgets() to of to to _file( ) "EDA.html" EDA for the Titanic Dataset: The dataset used for exploratory data analysis using the pandas-profiling library is . downloaded from Kaggle Here is work sample of EDA for Titanic Dataset https://gist.github.com/TheSkyFox3006/4181ce62b1d41fb4cfc8d011945cea0e Output: The output of EDA for Titanic Dataset will looks like this : Note: If you are a beginner in Data Science I won’t suggest you to perform EDA using pandas-profiling. I prefer to do my EDA with self-defined functions using several Python libraries. For beginners, it is good to start doing EDA using the pandas library and writing Python code before trying this library, as it is more important to be equipped with the fundamental knowledge and programming practices. If you want to know about NumPy library than I’ll suggest article. NumPy: Everything A Data Scientist Should Know If you want to know than click here. how to convers PDFs into an Audiobook Thank You so much for Reading! follow for more stuff of Data Science. Also published at https://medium.com/@jitendraballa2015/exploratory-data-analysis-eda-in-easiest-way-using-python-12ea25c633d8