Text summarizer using deep learning made easy

Written by theamrzaki | Published 2018/12/23
Tech Story Tags: machine-learning | text-summarization | artificial-intelligence | seq2seq | ai

TLDRvia the TL;DR App

In this series we will discuss a truly exciting natural language processing topic that is using deep learning techniques to summarize text , the code for this series is open source , and is found in a jupyter notebook format , to allow it to run on google colab without the need to have a powerful gpu , in addition all data is open source , and you don’t have to download it , as you can connect google colab with google drive and put your data directly onto google drive , without the need to download it locally , read this blog to learn more about google colab with google drive .

To summarize text you have 2 main approaches (i truly like how it is explained in this blog)

  1. Extractive method , which is choosing specific main words from the input to generate the output , this model tends to work , but won’t output a correctly structured sentences , as it just selects words from input and copy them to the output , without actually understanding the sentences , think of it as a highlighter .

2. Abstractive method , which is building a neural network to truly workout the relation between the input and the output , not merely copying words , this series would go though this method , think of it like a pen.

this series is made for whomever feels excited to learn the power of building a deep network that is capable of

  • analyzing sequences of input
  • understanding text
  • outputting sequences of output in form of summarizes

hence the name of seq2seq , sequence of inputs to sequence of outputs , which is the main algorithm that is used here .

This series would go into details on how to

  1. build your deep learning network online without the need to have a powerful computer
  2. Access your datasets online , without the need to download the datasets to your computer.
  3. Build a tensorflow networks to address the task

Multiple research has been done throughout the last couple of years , I am currently researching these new approaches , in this series we would go through some of these approaches.

This series implement its code using google colab , so no need to have a powerful computer to implement these ideas , I am currently working on converting the most recent researches to a google colab notebooks for researches too try them out without the need to have powerful gpus , also all the data can be used without the need to download them , as we would use google drive with google colab , read this blog to learn more about how you can work on google ecosystem for deep learning

All the code would be available on this github repo , which contains modifications on some open source implementations of text stigmatization

these researches mainly include

  1. implementations using a seq2seq encoder(bi directional lstm ) decoder (with attention)

this is a crucial implementation , as it is the cornerstone of any recent research for now i have collected different approaches that implement this concept

2. other implementation that i have found truly interesting is a combination of creating new sentences for summarization , with copying from source input , this method is called pointer generator , here is my modification in a google colab to the original implementation

3. other implementations that i am currently still researching , is the usage of reinforcement learning with deep learning

This series would be built to be easily understandable for any newbie like myself , as you might be the one that introduces the newest architecture to be used as the newest standard for text summarization , so lets begin !!

The following is a quick overview on the series , i hope you enjoy it

1 - Building your deep work online

we would be using google colab for our work , this would enable us to use their free gpu time to build our network , ( this blog would give you even more insights on the free ecosystem for your deep project)

you have 2 main options to build your google colab

  1. Build a new empty colab
  2. Build from github , you can use this repo , which is a collection of different

you can find the details on how to do this in this blog

having your code on google colab enables you to

  1. connect to google drive (put your datasets onto google drive )
  2. free gpu time

you can find how to connect to google drive in this blog

2- Lets represent words

since our task is a nlp task we would need a way to represent words ,this have 2 main approaches that we would discuses ,

  1. either providing the network with a representation for each word , this is called word embedding , which is simply representing a certain word by a an array of numbers , There are multiple already trained word embedding available online , one of them is Glove vectors
  2. or letting the network understand the representations by itslef

3- The used Datasets

For this task we would use a dataset in form of news and their headers , the most popular is using the CNN/Daily Mail dataset , the news body is used as the input for our model , while the header would be used as the summary target output .

These datasets could be found easily online , we would use 2 main approaches for using these datasets

  1. using the raw data itseld , and manually applying processing on them
  2. using a prepossessed version for the data , it is currently used in the most recent researches

4 - Models used

Here i would briefly talk about the models that would be included if GOD wills in the coming series , hope you enjoy

A .Corner Stone model

to implement this task , researchers use a deep learning model that consists of 2 parts , an encoder , that understands the input , and represent it in an internal representation , and feed it to another part of the network which is the decoder ,

The main deep learning network that is used for these 2 parts in a LSTM , which stands for long short term memory , which is a modification on the rnn

in the encoder we mainly use a multi-layer bidirectional LSTM , while in the decoder we use an attention mechanism , more on this later

B .Pointer Generator

But researchers found 2 main problems with the above implementation , like discussed in this ACL 2017 paper Get To The Point: Summarization with Pointer-Generator Networks , they have a truly amazing blog you need to see

which is

  1. the inability of the network to copy Facts (like names , and match scores) as it doesn’t copy words , it generates them , so it sometimes incapable of generating facts correctly
  2. Repetition of words

this research builds on these 2 main problems and try to fix them , I have modified their repo to work inside a jupyter notebook on google colab

C. Using Reinforcement learning with deep learning

I am still researching on this work , but it is a truly interesting research , it is about combing two fields together , it actually uses the pointer generator in its work (like in implementation B ) , and uses the same prepossessed version of the data .

This is the research , it uses this repo for its code

they actually are trying to fix 2 main problems with the corner stone implementation which are

  1. the decoder in the training , uses the (1 output from the encoder) , (2 the actual summary) , (3 and then uses its current output for the next action) , while in testing it doesn’t have a ground truth , as we it is actually needed to be generated , so it only uses (1 output from the encoder) (2 and then uses its current output for the next action) , this causes an Exposure Problem
  2. the training of the network relies on a metric for measuring the loss , which is different from the metric used in testing , as the metric used in training is the cross entropy loss , while the metric for the testing (like discussed below) is non-differentiable measures such as BLEU and ROUGE

I am currently working on implementing this approach in a jupyter notebook , so if GOD wills it , you would see more updates concerning this in the near future .

4 — Summary Evaluation

to evaluate a summary , we use a non-differentiable measures such as BLEU and ROUGE , they simply try to find the common words between the input and the output , the more the better , most of the above approches score from 32 to 38 rouge scores

I hope you enjoyed this quick overview on the series , my main focus in these blogs is to present the topic of text summarization in easy and practical way , providing you with an actual code that is runnable on any computer , without the need to have a powerful GPU , and to connect you to the latest researches about this topic , please sow your support by clapping to this blog , and don’t forget to check out the code of these blogs

In the coming blogs if GOD wills it , i would go through the details to build the corner stone implementation , that actually all the modern researches are based apon it , we will use word embedding approach , and we would use the raw data , and manually apply preprocessing

While in later blogs if GOD wills it , we would go through modern approaches like how you would be able to create a pointer generator model , to fix the problems mentioned above , and using reinforcement learning with deep learning .


Published by HackerNoon on 2018/12/23