What is Data Visualization ? Data visualization is a . It involves the creation and study of the . form of visual communication visual representation of data We'll be implementing various data visualization techniques on the 'iris' dataset. Different types of analysis: : In univariate analysis we use a single feature to analyze its properties. Univariate (U) : When we compare the data between exactly 2 features then its called bivariate analysis. Bivariate (B) : Comparing more than 2 variables is called as Multivariate analysis. Multivariate (M) Most common types of plots used in data visualization: Scatter plot (B) Pair plot (M) Box plot (U) Violin plot(U) Distribution plot (U) Joint plot (U) & (B) Bar chart (B) Line plot (B) Let us look at some of these plots used in data visualization one by one : Import libraries for data visualization First we need to import two important libraries for data visualization - matplotlib seaborn is a python library used extensively for the . While is a python library . Seaborn provides a high-level interface for drawing attractive and informative statistical graphics. Matplotlib visualization of data Seaborn based on matplotlib matplotlib.pyplot plt seaborn sns import as import as Load file into a dataframe = pd.read_csv( ) iris "iris.csv" 1. Scatter Plot: It is one of the most commonly used plots for simple data visualization. It gives us a representation of where each point in the entire dataset are present with respect to any 2 or 3 features (or columns). They are available in 2D as well as 3D. # we are plotting sepal_length vs sepal_width # setosa - ; versicolor - ; virginica - for n in range( , ): if iris[ ][n] == : plt.scatter(iris[ ][n], iris[ ][n], color = ) plt.xlabel( ) plt.ylabel( ) elif iris[ ][n] == : plt.scatter(iris[ ][n], iris[ ][n], color = ) plt.xlabel( ) plt.ylabel( ) elif iris[ ][n] == : plt.scatter(iris[ ][n], iris[ ][n], color = ) plt.xlabel( ) plt.ylabel( ) Here 'red' 'blue' 'green' 0 150 'species' 'setosa' 'sepal_length' 'sepal_width' 'red' 'sepal_length' 'sepal_width' 'species' 'versicolor' 'sepal_length' 'sepal_width' 'blue' 'sepal_length' 'sepal_width' 'species' 'virginica' 'sepal_length' 'sepal_width' 'green' 'sepal_length' 'sepal_width' 2. Pair Plot Lets say we have n number of features in a data, Pair plot will help us create us a (n x n) figure where the of the feature corresponding to that row and rest of the plots are the combination of feature from each row in y axis and feature from each column in x axis. diagonal plots will be histogram plot The code snippet for pair plot implemented on Iris dataset is : 3. Box Plot A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that . The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution. facilitates comparisons between variables or across levels of a categorical variable Code for plotting the features using Box plots : # ( ) ( , , ) (x = , y = , data = iris) ( , , ) (x = , y = , data = iris) ( , , ) (x = , y = , data = iris) ( , , ) (x = , y = , data = iris) Plotting the features using boxes plt .style .use 'ggplot' plt .subplot 2 2 1 sns .boxplot 'species' 'sepal_length' plt .subplot 2 2 2 sns .boxplot 'species' 'sepal_width' plt .subplot 2 2 3 sns .boxplot 'species' 'petal_length' plt .subplot 2 2 4 sns .boxplot 'species' 'petal_width' 4. Violin Plots: The violin plots can be inferred as a combination of . This can give us the details of distribution like whether the distribution is mutimodal, Skewness etc. Box plot at the middle and distribution plots (Kernel Density Estimation ) on both side of the data Violin plot is also from seaborn package. The code is simple and as follows. # ( ) ( , , ) (x = , y = , data = iris) ( , , ) (x = , y = , data = iris) ( , , ) (x = , y = , data = iris) ( , , ) (x = , y = , data = iris) Representing data using violin form plt .style .use 'ggplot' plt .subplot 2 2 1 sns .violinplot 'species' 'sepal_length' plt .subplot 2 2 2 sns .violinplot 'species' 'sepal_width' plt .subplot 2 2 3 sns .violinplot 'species' 'petal_length' plt .subplot 2 2 4 sns .violinplot 'species' 'petal_width' 5. Joint Plot Join plots can do . The will give us a , whereas on the we will get . It makes our job easy by getting both scatter plots for bivariate and Distribution plot for univariate, both in a single plot. both univariate as well as bivariate analysis main plot bivariate analysis top and right side univariate plots of both the variables that were considered There are variety of option you can choose from, which can be tuned using parameter in seaborn’s jointplot function. kind sns.jointplot(x = , y = , data = iris) # Joint plots shows bivariate scatterplots # And univariate histograms 'sepal_length' 'sepal_width' 6. Strip Plot A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to . show all observations along with some representation of the underlying distribution It is is a graphical data anlysis technique for . It is typically used for (histograms and density plots are typically preferred for larger data sets). summarizing a univariate data set small data sets plt.subplot sns.stripplot plt.subplot sns.stripplot plt.subplot sns.stripplot plt.subplot sns.stripplot # Plottign data in strip (2,2,1) ( = 'species', = 'sepal_length', = iris, = True) x y data jitter (2,2,2) ( = 'species', = 'sepal_width', = iris, = True) x y data jitter (2,2,3) ( = 'species', = 'petal_length', = iris, = True) x y data jitter (2,2,4) ( = 'species', = 'petal_width', = iris, = True) x y data jitter lmplot() function in seaborn Seaborn's lmplot is a . Logistic regression for binary classification is also supported with lmplot . It is intended as a across conditional subsets of a dataset. 2D scatterplot with an optional overlaid regression line convenient interface to fit regression models The fuction can draw a scatterplot of two variables, x and y, and then fit the regression model y ~ x and plot the resulting regression line with a 95% confidence interval for that regression. has data as a required parameter and the x and y variables must be specified as strings. lmplot() sns.lmplot # This graph is same as above but plotting the species separately ( = 'sepal_length', = 'sepal_width', = iris, = 'species', = 'species') x y data hue col Conclusion : So here you go, you have learned about the different kinds of plots that you could make using seaborn and matplotlib library. Data visualization not only helps you to well but whenever you find any insights, you can use these visualization techniques to . understand your data share your findings with other people Now go on and try creating such amazing plots on some real-world data sets.