**What is Data Visualization ?**

Data visualization is a

**form of visual communication**. It involves the creation and study of the**visual representation of data**.We'll be implementing various data visualization techniques on the 'iris' dataset.

**Different types of analysis:**

**Univariate (U)**: In univariate analysis we use a single feature to analyze its properties.**Bivariate****(B)**: When we compare the data between exactly 2 features then its called bivariate analysis.**Multivariate (M)**: Comparing more than 2 variables is called as Multivariate analysis.

**Most common types of plots used in data visualization:**

- Scatter plot (B)
- Pair plot (M)
- Box plot (U)
- Violin plot(U)
- Distribution plot (U)
- Joint plot (U) & (B)
- Bar chart (B)
- Line plot (B)

Let us look at some of these plots used in data visualization one by one :

**Import libraries for data visualization**

First we need to import two important libraries for data visualization -

- matplotlib
- seaborn

**Matplotlib**is a python library used extensively for the

**visualization of data**. While

**Seaborn**is a python library

**based on matplotlib**. Seaborn provides a high-level interface for drawing attractive and informative statistical graphics.

```
import matplotlib.pyplot as plt
import seaborn as sns
```

**Load file into a dataframe**

`iris = pd.read_csv("iris.csv")`

**1. Scatter Plot:**

It is one of the most commonly used plots for simple data visualization. It gives us a representation of where each point in the entire dataset are present with respect to any 2 or 3 features (or columns). They are available in 2D as well as 3D.

```
# Here we are plotting sepal_length vs sepal_width
# setosa - 'red'; versicolor - 'blue'; virginica - 'green'
for n in range(0,150):
if iris['species'][n] == 'setosa':
plt.scatter(iris['sepal_length'][n], iris['sepal_width'][n], color = 'red')
plt.xlabel('sepal_length')
plt.ylabel('sepal_width')
elif iris['species'][n] == 'versicolor':
plt.scatter(iris['sepal_length'][n], iris['sepal_width'][n], color = 'blue')
plt.xlabel('sepal_length')
plt.ylabel('sepal_width')
elif iris['species'][n] == 'virginica':
plt.scatter(iris['sepal_length'][n], iris['sepal_width'][n], color = 'green')
plt.xlabel('sepal_length')
plt.ylabel('sepal_width')
```

**2. Pair Plot**

Lets say we have n number of features in a data, Pair plot will help us create us a (n x n) figure where the

**diagonal plots will be histogram plot**of the feature corresponding to that row and rest of the plots are the combination of feature from each row in y axis and feature from each column in x axis.The code snippet for pair plot implemented on Iris dataset is :

**3. Box Plot**

A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that

**facilitates comparisons between variables or across levels of a categorical variable**. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution.Code for plotting the features using Box plots :

```
# Plotting the features using boxes
plt.style.use('ggplot')
plt.subplot(2,2,1)
sns.boxplot(x = 'species', y = 'sepal_length', data = iris)
plt.subplot(2,2,2)
sns.boxplot(x = 'species', y = 'sepal_width', data = iris)
plt.subplot(2,2,3)
sns.boxplot(x = 'species', y = 'petal_length', data = iris)
plt.subplot(2,2,4)
sns.boxplot(x = 'species', y = 'petal_width', data = iris)
```

**4. Violin Plots:**

The violin plots can be inferred as a combination of

**Box plot at the middle and distribution plots (Kernel Density Estimation ) on both side of the data**. This can give us the details of distribution like whether the distribution is mutimodal, Skewness etc.Violin plot is also from seaborn package. The code is simple and as follows.

```
# Representing data using violin form
plt.style.use('ggplot')
plt.subplot(2,2,1)
sns.violinplot(x = 'species', y = 'sepal_length', data = iris)
plt.subplot(2,2,2)
sns.violinplot(x = 'species', y = 'sepal_width', data = iris)
plt.subplot(2,2,3)
sns.violinplot(x = 'species', y = 'petal_length', data = iris)
plt.subplot(2,2,4)
sns.violinplot(x = 'species', y = 'petal_width', data = iris)
```

**5. Joint Plot**

Join plots can do

**both univariate as well as bivariate analysis**. The**main plot**will give us a**bivariate analysis**, whereas on the**top and right side**we will get**univariate plots of both the variables that were considered**. It makes our job easy by getting both scatter plots for bivariate and Distribution plot for univariate, both in a single plot.There are variety of option you can choose from, which can be tuned using

**kind**parameter in seaborn’s jointplot function.```
# Joint plots shows bivariate scatterplots
# And univariate histograms
sns.jointplot(x = 'sepal_length', y = 'sepal_width', data = iris)
```

**6. Strip Plot**

A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to

**show all observations along with some representation of the underlying distribution**.It is is a graphical data anlysis technique for

**summarizing a univariate data set**. It is typically used for**small data sets**(histograms and density plots are typically preferred for larger data sets).```
# Plottign data in strip
plt.subplot(2,2,1)
sns.stripplot(x = 'species', y = 'sepal_length', data = iris, jitter = True)
plt.subplot(2,2,2)
sns.stripplot(x = 'species', y = 'sepal_width', data = iris, jitter = True)
plt.subplot(2,2,3)
sns.stripplot(x = 'species', y = 'petal_length', data = iris, jitter = True)
plt.subplot(2,2,4)
sns.stripplot(x = 'species', y = 'petal_width', data = iris, jitter = True)
```

**lmplot() function in seaborn**

Seaborn's lmplot is a

**2D scatterplot with an optional overlaid regression line**. Logistic regression for binary classification is also supported with lmplot . It is intended as a**convenient interface to fit regression models**across conditional subsets of a dataset.The fuction can draw a scatterplot of two variables, x and y, and then fit the regression model y ~ x and plot the resulting regression line with a 95% confidence interval for that regression.

lmplot() has data as a required parameter and the x and y variables must be specified as strings.

```
# This graph is same as above but plotting the species separately
sns.lmplot(x = 'sepal_length', y = 'sepal_width', data = iris, hue = 'species', col = 'species')
```

**Conclusion :**

So here you go, you have learned about the different kinds of plots that you could make using seaborn and matplotlib library. Data visualization not only helps you to

**understand your data**well but whenever you find any insights, you can use these visualization techniques to**share your findings with other people**.Now go on and try creating such amazing plots on some real-world data sets.

## Comments