In this series we will discuss a truly exciting natural language processing topic that is using **deep learning techniques to summarize text** , t[he code for this series is open source](https://github.com/theamrzaki/text_summurization_abstractive_methods) , and is found in a jupyter notebook format , to allow it to run on google colab without the need to have a powerful gpu , in addition all data is open source , and you don’t have to download it , as you can connect google colab with google drive and put your data directly onto google drive , without the need to download it locally , read [this blog](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc) to learn more about google colab with google drive . To summarize text you have 2 main approaches (i truly like how it is explained in [this blog](http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization#two-types-of-summarization)) 1. **Extractive method** , which is choosing specific main words from the input to generate the output , this model tends to work , but won’t output a correctly structured sentences , as it just selects words from input and copy them to the output , without actually understanding the sentences , think of it as a highlighter .  **2\. Abstractive method** , which is building a neural network to truly workout the relation between the input and the output , not merely copying words , this series would go though this method , think of it like a pen.  this series is made for whomever feels excited to learn the power of building a deep network that is capable of * analyzing sequences of input * understanding text * outputting sequences of output in form of summarizes hence the name of seq2seq , sequence of inputs to sequence of outputs , which is the main algorithm that is used here . This series would go into details on how to 1. build your deep learning network online without the need to have a powerful computer 2. Access your datasets online , without the need to download the datasets to your computer. 3. Build a tensorflow networks to address the task Multiple research has been done throughout the last couple of years , I am currently researching these new approaches , in this series we would go through some of these approaches. This series implement its code using google colab , so no need to have a powerful computer to implement these ideas , I am currently working on converting the most recent researches to a google colab notebooks for researches too try them out without the need to have powerful gpus , also all the data can be used without the need to download them , as we would use google drive with google colab , read [this blog to learn more about how you can work on google ecosystem for deep learning](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc) All the code would be available on [this github repo](https://github.com/theamrzaki/text_summurization_abstractive_methods) , which contains modifications on some open source implementations of text stigmatization these researches mainly include  1. implementations using a **seq2seq encoder(bi directional lstm ) decoder (with attention)** this is a crucial implementation , as it is the cornerstone of any recent research for now i have [collected different approaches](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20A%20%28seq2seq%20with%20attention%20and%20feature%20rich%20representation%29) that implement this concept 2\. other implementation that i have found truly interesting is a combination of creating new sentences for summarization , with copying from source input , this method is called **pointer generator** , here is [my modification](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20B%20%28Pointer%20Generator%20seq2seq%20network%29) in a google colab to the [original implementation](https://github.com/abisee/pointer-generator) 3\. other implementations that i am currently still researching , is the usage of [**reinforcement learning** with deep learning](https://github.com/yaserkl/RLSeq2Seq) This series would be built to be easily understandable for any newbie like myself , as you might be the one that introduces the newest architecture to be used as the newest standard for text summarization , so lets begin !! The following is a quick overview on the series , i hope you enjoy it ### 1 - Building your deep work online  we would be using google colab for our work , this would enable us to use their free gpu time to build our network , ( [this](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc) blog would give you even more insights on the free ecosystem for your deep project) you have 2 main options to build your google colab 1. Build a new empty colab 2. Build from github , you can use this repo , which is a collection of different you can find the details on how to do this in [this blog](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc) having your code on google colab enables you to 1. connect to google drive (put your datasets onto google drive ) 2. free gpu time you can find how to connect to google drive in [this blog](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc) ### 2- Lets represent words since our task is a nlp task we would need a way to represent words ,this have 2 main approaches that we would discuses , 1. either providing the network with a representation for each word , this is called word embedding , which is simply representing a certain word by a an array of numbers , There are multiple already trained word embedding available online , one of them is **Glove vectors** 2. or letting the network understand the representations by itslef ### 3- The used Datasets For this task we would use a dataset in form of news and their headers , the most popular is using the CNN/Daily Mail dataset , the news body is used as the input for our model , while the header would be used as the summary target output . These datasets could be found easily online , we would use 2 main approaches for using these datasets 1. using the raw data itseld , and manually applying processing on them 2. using a [prepossessed version](https://github.com/abisee/cnn-dailymail) for the data , it is currently used in the most recent researches ### 4 - Models used Here i would briefly talk about the models that would be included if GOD wills in the coming series , hope you enjoy [**A .Corner Stone model**](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20A%20%28seq2seq%20with%20attention%20and%20feature%20rich%20representation%29) to implement this task , researchers use a deep learning model that consists of 2 parts , an encoder , that understands the input , and represent it in an internal representation , and feed it to another part of the network which is the decoder , The main deep learning network that is used for these 2 parts in a LSTM , which stands for long short term memory , which is a modification on the rnn in the encoder we mainly use a multi-layer bidirectional LSTM , while in the decoder we use an attention mechanism , more on this later [**B .Pointer Generator**](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20B%20%28Pointer%20Generator%20seq2seq%20network%29) But researchers found 2 main problems with the above implementation , like discussed in this ACL 2017 paper [_Get To The Point: Summarization with Pointer-Generator Networks_](https://arxiv.org/pdf/1704.04368.pdf) , they have a truly [amazing blog](http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html) you need to see which is 1. **the inability of the network to copy Facts** (like names , and match scores) as it doesn’t copy words , it generates them , so it sometimes incapable of generating facts correctly 2. **Repetition of words** this research builds on these 2 main problems and try to fix them , I have modified their repo to work inside a jupyter notebook on google colab * [my modification](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20B%20%28Pointer%20Generator%20seq2seq%20network%29) * [their repo](https://github.com/abisee/pointer-generator) **C. Using Reinforcement learning with deep learning** I am still researching on this work , but it is a truly interesting research , it is about combing two fields together , it actually uses the pointer generator in its work (like in implementation B ) , and uses the same prepossessed version of the data . This [is the research](https://arxiv.org/pdf/1805.09461.pdf) , it uses [this repo](https://github.com/yaserkl/RLSeq2Seq) for its code they actually are trying to fix 2 main problems with the corner stone implementation which are 1. the decoder in the training , uses the (1 output from the encoder) , (2 the actual summary) , (3 and then uses its current output for the next action) , while in testing it doesn’t have a ground truth , as we it is actually needed to be generated , so it only uses (1 output from the encoder) (2 and then uses its current output for the next action) , this causes an **Exposure Problem** 2. the training of the network relies on a metric for measuring the loss , which is different from the metric used in testing , as the metric used in training is the cross entropy loss , while the metric for the testing (like discussed below) is non-differentiable measures such as BLEU and ROUGE I am currently working on implementing this approach in a jupyter notebook , so if GOD wills it , you would see more updates concerning this in the near future . ### 4 — Summary Evaluation to evaluate a summary , we use a non-differentiable measures such as BLEU and ROUGE , they simply try to find the common words between the input and the output , the more the better , most of the above approches score from 32 to 38 rouge scores I hope you enjoyed this quick overview on the series , my main focus in these blogs is to present the topic of text summarization in easy and practical way , providing you with an actual code that is runnable on any computer , without the need to have a powerful GPU , and to connect you to the latest researches about this topic , please sow your support by clapping to this blog , and don’t forget to check out the [code of these blogs](https://github.com/theamrzaki/text_summurization_abstractive_methods) In the coming blogs if GOD wills it , i would go through the details to build the corner stone implementation , that actually all the modern researches are based apon it , we will use word embedding approach , and we would use the raw data , and manually apply preprocessing While in later blogs if GOD wills it , we would go through modern approaches like how you would be able to create a pointer generator model , to fix the problems mentioned above , and using reinforcement learning with deep learning .