_Featured_: interpolation, t-SNE projection (with gifs & examples!)  In the “_Deep Learning bits_” series, we will **not** see how to use deep learning to solve complex problems end-to-end as we do in [**_A.I. Odyssey_**](https://medium.com/@juliendespois/talk-to-you-computer-with-you-eyes-and-deep-learning-a-i-odyssey-part-2-7d3405ab8be1)**_._** We will rather look at different techniques, along with some **examples and applications.** Don’t forget to check out [_Deep Learning bits #1_](https://hackernoon.com/autoencoders-deep-learning-bits-1-11731e200694)! > **_If you like Artificial Intelligence, make sure to_** [**_subscribe to the newsletter_**](http://eepurl.com/cATXvT) **_to receive updates on articles and much more!_** ### Introduction [Last time](https://hackernoon.com/autoencoders-deep-learning-bits-1-11731e200694), we have seen what autoencoders are, and how they work. Today, we will see how they can help us **visualize the data** in some _very_ cool ways. For that, we will work on images, using the Convolutional Autoencoder architecture (_CAE_). #### What’s the latent space again? An autoencoder is made of two components, here’s a quick reminder. The **_encoder_** brings the data from a high dimensional input to a **_bottleneck_** layer, where the number of neurons is the smallest. Then, the **_decoder_** takes this encoded input and converts it back to the original input shape — in our case an image. The **_latent space is_** the space in which the data lies in the bottleneck layer.  Convolutional Encoder-Decoder architecture The latent space contains a **compressed** representation of the image, which is **the only information** the decoder is allowed to use to try to reconstruct the input **as faithfully** **as possible**. To perform well, the network has to learn to extract the **most relevant** features in the bottleneck. _Let’s see what we can do!_ ### The dataset We’ll change from the datasets of last time. Instead of looking at [my eyes](https://hackernoon.com/talk-to-you-computer-with-you-eyes-and-deep-learning-a-i-odyssey-part-2-7d3405ab8be1#.scd7s8ej4) or [blue squares](https://hackernoon.com/autoencoders-deep-learning-bits-1-11731e200694#.6qgkt12jm), we will work on probably the _most famous for computer vision:_ the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset of _handwritten digits_. I usually prefer to work with **less conventional** datasets just for diversity, but MNIST is **really convenient** for what we will do today. **_Note:_** Although MNIST visualizations are _pretty common_ on the internet, the images in this post are 100% generated **from the code,** so you can use these techniques with your own models.  MNIST is a labelled dataset of 28x28 images of handwritten digits ### Baseline — Performance of the autoencoder To understand what kind of features the encoder is capable of extracting from the inputs, we can first look at **reconstructed of images.** If this **sounds familiar**, it’s normal, we already did that last time. However, this step is **necessary** because it sets the baseline for our _expectations_ of the model. **_Note:_** For this post, the bottleneck layer has only **32 units**, which is some _really_, _really_ brutal dimensionality reduction. If it was an image, it **wouldn’t even be 6x6** pixels.  Each digit is displayed next to its blurry reconstruction We can see that the autoencoder **successfully** reconstructs the digits. The **reconstruction is blurry** because the input is **compressed** at the bottleneck layer. The reason we need to take a look at _validation samples_ is to be sure we are not _overfitting_ the training set. **_Bonus_**: _Here’s the training process animation_  Reconstruction of **training**(left) and **validation**(right) samples at each step ### t-SNE visualization #### What’s t-SNE? The first thing we want to do when working with a dataset is to **visualize** the data in a _meaningful_ way. In our case, the **image** _(or pixel)_ **space** has 784 dimensions (28_\*28\*1_), and we clearly _cannot_ plot that. The challenge is to squeeze all this dimensionality into something we can grasp, in _2D_ or _3D_. Here comes [t-SNE](http://jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf), an algorithm that maps a **high dimensional space** to a **2D or 3D space,** while trying to **keep the distance** between the points **the same**. We will use this technique to plot embeddings of our dataset, _first_ directly from the **image space**, and _then_ from the **smaller** **latent space**. **_Note:_** _t-SNE is better for visualization than it’s cousins_ [_PCA_](http://www.cs.cmu.edu/~elaw/papers/pca.pdf) _and_ [_ICA_](http://www2.hawaii.edu/~kyungim/papers/baek_cvprip02.pdf)_._ #### Projecting the pixel space Let’s start by plotting the t-SNE embedding of our dataset (from image space) and see what it looks like.  t-SNE projection of **image space** representations from the validation set We can already see that some numbers are _roughly_ **clustered** together. That’s because the dataset is really simple\*, and we can use simple _heuristics_ on pixels to classify the samples. Look how there’s no cluster for the digits **8, 5, 7 and 3**, that’s because they are all made of the **same pixels**, and only minor changes differentiates them. _\*On more complex data, such as_ [_RGB images_](https://www.cs.toronto.edu/~kriz/cifar.html)_, the_ **_only_** **_clusters_** _would be of images of the_ **_same general color_**_._ #### Projecting the latent space We know that the _latent space_ contains **a** **simpler representation** of our images than the pixel space**,** so we can hope that t-SNE will give us an interesting **2-D projection of the latent space**.  t-SNE projection of **latent space** representations from the validation set Although _not perfect_, the projection shows **denser** clusters. This shows that in the latent space, the same digits are close to one another. We can see that the digits **8, 7, 5 and 3** are now easier to distinguish, and appear in _small_ clusters. ### Interpolation Now that we know what **level of detail** the model is capable of extracting, we can _probe_ the structure of the latent space. To do that, we will compare how **interpolation** looks in the _image space_, versus _latent space_. #### Linear interpolation in image space We start off by taking **two images from the dataset**, and linearly interpolate between them. Effectively, this _blends_ the images in a kind of **ghostly** way.    Interpolation in **pixel space** The reason for this messy transition is the **structure of the pixel space itself.** It’s simply not possible to go smoothly from one image to another in the image space. This is the reason why blending the image of _an empty glass_ and the image of an _full glass_ will not give the image of a _half-full glass_. #### Linear interpolation in latent space Now, let’s do the same in the latent space. We take the same start and end images and **feed them to the encoder** to obtain their _latent space representation._ We then interpolate between the two latent vectors, and feed these to the **decoder**.    Interpolation in **latent space** The result is much **more convincing**. Instead of having a _fading_ _overlay_ of the two digits, we clearly see the shape slowly _transform_ from one to the other. This shows how well the latent space **understands the structure** of the images. **_Bonus:_** here’s a few animations of the interpolation in both spaces     Linear interpolation in **image space** (left) and **latent space** (right) ### More techniques & examples #### Interpolation examples On **richer** datasets, and with **better** model, we can get _incredible_ visuals.  3-way **Latent space** interpolation for **faces**  Interpolation of [**3D shapes**](http://3dgan.csail.mit.edu) #### Latent space arithmetics We can also do **arithmetics** in the latent space**.** This means that **instead of** **interpolating, we can add or subtract** latent space representations. _For example with faces, man with glasses - man without glasses + woman without glasses = woman with glasses._ This technique gives mind-blowing results.  Arithmetics on [**3D shapes**](http://3dgan.csail.mit.edu) **_Note:_** I’ve put a function for that in the code, but it looks terrible on MNIST. ### Conclusions In this post, we have seen several techniques to visualize the **learned** features _embedded_ in the latent space of an autoencoder neural network. These visualizations help understand _what_ the network is learning. From there, we can exploit the latent space for **_clustering_**, **_compression_**, and many other applications. > **_If you like Artificial Intelligence, make sure to_** [**_subscribe to the newsletter_**](http://eepurl.com/cATXvT) **_to receive updates on articles and much more!_** You can play with the code over there: [**GitHub - despoisj/LatentSpaceVisualization: Visualization techniques for the latent space of a…** _LatentSpaceVisualization - Visualization techniques for the latent space of a convolutional autoencoder in Keras_github.com](https://github.com/despoisj/LatentSpaceVisualization "https://github.com/despoisj/LatentSpaceVisualization")[](https://github.com/despoisj/LatentSpaceVisualization) Thanks for reading this post, stay tuned for more !