In this series we will discuss a truly exciting natural language processing topic that is using **deep learning techniques to summarize text** , t[he code for this series is open source](https://github.com/theamrzaki/text_summurization_abstractive_methods) , and is found in a jupyter notebook format , to allow it to run on google colab without the need to have a powerful gpu , in addition all data is open source , and you don’t have to download it , as you can connect google colab with google drive and put your data directly onto google drive , without the need to download it locally , read [this blog](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc) to learn more about google colab with google drive .\n\nTo summarize text you have 2 main approaches (i truly like how it is explained in [this blog](http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization#two-types-of-summarization))\n\n1. **Extractive method** , which is choosing specific main words from the input to generate the output , this model tends to work , but won’t output a correctly structured sentences , as it just selects words from input and copy them to the output , without actually understanding the sentences , think of it as a highlighter .\n\n!(https://hackernoon.com/hn-images/1*X1B2i1ctBh2KYO9m-a8paA.jpeg)\n\n**2\\. Abstractive method** , which is building a neural network to truly workout the relation between the input and the output , not merely copying words , this series would go though this method , think of it like a pen.\n\n!(https://hackernoon.com/hn-images/1*k1nQKUG8r34Esyd78aDSlA.jpeg)\n\nthis series is made for whomever feels excited to learn the power of building a deep network that is capable of\n\n* analyzing sequences of input\n* understanding text\n* outputting sequences of output in form of summarizes\n\nhence the name of seq2seq , sequence of inputs to sequence of outputs , which is the main algorithm that is used here .\n\nThis series would go into details on how to\n\n1. build your deep learning network online without the need to have a powerful computer\n2. Access your datasets online , without the need to download the datasets to your computer.\n3. Build a tensorflow networks to address the task\n\nMultiple research has been done throughout the last couple of years , I am currently researching these new approaches , in this series we would go through some of these approaches.\n\nThis series implement its code using google colab , so no need to have a powerful computer to implement these ideas , I am currently working on converting the most recent researches to a google colab notebooks for researches too try them out without the need to have powerful gpus , also all the data can be used without the need to download them , as we would use google drive with google colab , read [this blog to learn more about how you can work on google ecosystem for deep learning](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc)\n\nAll the code would be available on [this github repo](https://github.com/theamrzaki/text_summurization_abstractive_methods) , which contains modifications on some open source implementations of text stigmatization\n\nthese researches mainly include\n\n!(https://hackernoon.com/hn-images/1*1BwMlWYa5ewAt96Z-gJ8Yg.png)\n\n1. implementations using a **seq2seq encoder(bi directional lstm ) decoder (with attention)**\n\nthis is a crucial implementation , as it is the cornerstone of any recent research for now i have [collected different approaches](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20A%20%28seq2seq%20with%20attention%20and%20feature%20rich%20representation%29) that implement this concept\n\n2\\. other implementation that i have found truly interesting is a combination of creating new sentences for summarization , with copying from source input , this method is called **pointer generator** , here is [my modification](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20B%20%28Pointer%20Generator%20seq2seq%20network%29) in a google colab to the [original implementation](https://github.com/abisee/pointer-generator)\n\n3\\. other implementations that i am currently still researching , is the usage of [**reinforcement learning** with deep learning](https://github.com/yaserkl/RLSeq2Seq)\n\nThis series would be built to be easily understandable for any newbie like myself , as you might be the one that introduces the newest architecture to be used as the newest standard for text summarization , so lets begin !!\n\nThe following is a quick overview on the series , i hope you enjoy it\n\n### 1 - Building your deep work online\n\n!(https://hackernoon.com/hn-images/1*96ZwRlUiGylIpbsgz0m5Wg.jpeg)\n\nwe would be using google colab for our work , this would enable us to use their free gpu time to build our network , ( [this](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc) blog would give you even more insights on the free ecosystem for your deep project)\n\nyou have 2 main options to build your google colab\n\n1. Build a new empty colab\n2. Build from github , you can use this repo , which is a collection of different\n\nyou can find the details on how to do this in [this blog](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc)\n\nhaving your code on google colab enables you to\n\n1. connect to google drive (put your datasets onto google drive )\n2. free gpu time\n\nyou can find how to connect to google drive in [this blog](https://hackernoon.com/begin-your-deep-learning-project-for-free-free-gpu-processing-free-storage-free-easy-upload-b4dba18abebc)\n\n### 2- Lets represent words\n\nsince our task is a nlp task we would need a way to represent words ,this have 2 main approaches that we would discuses ,\n\n1. either providing the network with a representation for each word , this is called word embedding , which is simply representing a certain word by a an array of numbers , There are multiple already trained word embedding available online , one of them is **Glove vectors**\n2. or letting the network understand the representations by itslef\n\n### 3- The used Datasets\n\nFor this task we would use a dataset in form of news and their headers , the most popular is using the CNN/Daily Mail dataset , the news body is used as the input for our model , while the header would be used as the summary target output .\n\nThese datasets could be found easily online , we would use 2 main approaches for using these datasets\n\n1. using the raw data itseld , and manually applying processing on them\n2. using a [prepossessed version](https://github.com/abisee/cnn-dailymail) for the data , it is currently used in the most recent researches\n\n### 4 - Models used\n\nHere i would briefly talk about the models that would be included if GOD wills in the coming series , hope you enjoy\n\n[**A .Corner Stone model**](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20A%20%28seq2seq%20with%20attention%20and%20feature%20rich%20representation%29)\n\nto implement this task , researchers use a deep learning model that consists of 2 parts , an encoder , that understands the input , and represent it in an internal representation , and feed it to another part of the network which is the decoder ,\n\nThe main deep learning network that is used for these 2 parts in a LSTM , which stands for long short term memory , which is a modification on the rnn\n\nin the encoder we mainly use a multi-layer bidirectional LSTM , while in the decoder we use an attention mechanism , more on this later\n\n[**B .Pointer Generator**](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20B%20%28Pointer%20Generator%20seq2seq%20network%29)\n\nBut researchers found 2 main problems with the above implementation , like discussed in this ACL 2017 paper [_Get To The Point: Summarization with Pointer-Generator Networks_](https://arxiv.org/pdf/1704.04368.pdf) , they have a truly [amazing blog](http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html) you need to see\n\nwhich is\n\n1. **the inability of the network to copy Facts** (like names , and match scores) as it doesn’t copy words , it generates them , so it sometimes incapable of generating facts correctly\n2. **Repetition of words**\n\nthis research builds on these 2 main problems and try to fix them , I have modified their repo to work inside a jupyter notebook on google colab\n\n* [my modification](https://github.com/theamrzaki/text_summurization_abstractive_methods/tree/master/Implementation%20B%20%28Pointer%20Generator%20seq2seq%20network%29)\n* [their repo](https://github.com/abisee/pointer-generator)\n\n**C. Using Reinforcement learning with deep learning**\n\nI am still researching on this work , but it is a truly interesting research , it is about combing two fields together , it actually uses the pointer generator in its work (like in implementation B ) , and uses the same prepossessed version of the data .\n\nThis [is the research](https://arxiv.org/pdf/1805.09461.pdf) , it uses [this repo](https://github.com/yaserkl/RLSeq2Seq) for its code\n\nthey actually are trying to fix 2 main problems with the corner stone implementation which are\n\n1. the decoder in the training , uses the (1 output from the encoder) , (2 the actual summary) , (3 and then uses its current output for the next action) , while in testing it doesn’t have a ground truth , as we it is actually needed to be generated , so it only uses (1 output from the encoder) (2 and then uses its current output for the next action) , this causes an **Exposure Problem**\n2. the training of the network relies on a metric for measuring the loss , which is different from the metric used in testing , as the metric used in training is the cross entropy loss , while the metric for the testing (like discussed below) is non-differentiable measures such as BLEU and ROUGE\n\nI am currently working on implementing this approach in a jupyter notebook , so if GOD wills it , you would see more updates concerning this in the near future .\n\n### 4 — Summary Evaluation\n\nto evaluate a summary , we use a non-differentiable measures such as BLEU and ROUGE , they simply try to find the common words between the input and the output , the more the better , most of the above approches score from 32 to 38 rouge scores\n\nI hope you enjoyed this quick overview on the series , my main focus in these blogs is to present the topic of text summarization in easy and practical way , providing you with an actual code that is runnable on any computer , without the need to have a powerful GPU , and to connect you to the latest researches about this topic , please sow your support by clapping to this blog , and don’t forget to check out the [code of these blogs](https://github.com/theamrzaki/text_summurization_abstractive_methods)\n\nIn the coming blogs if GOD wills it , i would go through the details to build the corner stone implementation , that actually all the modern researches are based apon it , we will use word embedding approach , and we would use the raw data , and manually apply preprocessing\n\nWhile in later blogs if GOD wills it , we would go through modern approaches like how you would be able to create a pointer generator model , to fix the problems mentioned above , and using reinforcement learning with deep learning .