The Art of Transformers: How AI Intuitively Summarizes Business Papers Using NLP

TL;DR “I don’t want a full paper, just give me a concise summary of it”. Who hasn't found themselves in this situation, at least once? Sound familiar? Well, I decided to do something about it. Having written a lengthy paper on how to measure and manage the digitization process, I wanted to try to summarize it using state-of-the-art Natural Language Processing (NLP) models: transformers. The result is interesting, to say the least.

People who know me know how passionate I am about NLP and how fascinated I am with the latest innovations in that field. Particularly transformers. For those unfamiliar with the subject, transformers are deep learning models launched in 2017 and used primarily in Natural Language Processing.

In particular, they allow performing different functions without training data sets. They are already trained, and their utilization is based on the concept of transfer learning or even a zero-touch approach.

(Zero touch approach is using the model without specific training. It means a big step forward avoiding the creation of big datasets to train the model itself).

A big problem, not just with transformers but with deep learning models in general, is what I personally call "the business outcome". When a model is created and trained with academic datasets and for purely academic purposes, then things work well, in general. When you bring the same model and methodology into the business world, things go a little differently. Being able to apply the model / methodology results in a much bigger (often bad) surprise than what happens in the academic world.

I wanted to do a little experiment, and try to understand how one of the new technologies in the NLP world can be successfully (or unsuccessfully) applied to business. I chose an extremely new one, just to see if a potential use to solve a business problem is immediate and fast and doesn't require tremendous efforts as usual.

Last year, with my friend Gian Paolo Franzoni, I wrote a paper related to digitization. In particular, how to measure and manage the efforts of a digitization program in order to succeed. It is quite a long paper, about 60 pages. Interesting, but elaborated academically. You can download it for free here, or feel free to message me to get the academic version.

Today, I'm trying to do something different: I won't be writing the paper, but an AI model will summarize it. What I will do is using an out-of-the-box transformer model to do that. One of the many activities that transformers are able to do is summarize.

Let’s first understand what text summarization is before we look at how good it works. Here is a succinct Wikipedia definition to get us started:

"Automatic summarization is the process of shortening a set of data computationally, to create a subset (a summary) that represents the most important or relevant information within the original content."

There are broadly two different approaches that are used for text summarization:

Extractive Summarization

The name gives away what this approach does. The model identifies the important sentences or phrases from the original text and extracts only those from the text. Those extracted sentences would be the summary.

Abstractive Summarization

In this approach, the model generates new sentences from the original text. This is in contrast to the extractive approach we saw earlier.

The sentences generated through abstractive summarization might not be present in the original text.

Using the great potential offered by transformers, and in particular, by the Hugging Face library (you folks at Hugging Face rock!), I experimented with different models to abstractive summarize the text of the long paper (about 6,000 words) trying to synthesize it as much as possible. It will be interesting to note the syntactic choices of the model in comparison to the original structure of the text.

I wrote a couple of lines of code in Python to create a summary of our paper. I immediately ran into a big problem; until a few weeks ago it was only possible to generate all-too-short test summaries when using transformers, 512, 1,024, 2,048 tokens (one token = one word in the document).

Then I came across an important paper that presented the concept of Longformer: the possibility of transformers to be able to go beyond that limitation of the number of tokens. And that's where I started. What you see after this line, I didn't write, the model took care of it. Anyone interested can get a quick idea of the code here. You can find also a full list of relevant articles at the end of this post.

BELOW THIS LINE IS WHAT HAS BEEN SUMMARIZED BY THE MODEL:

The impact of digitalization on the customer experience has become a major issue for companies, as it represents a continuum between the physical and digital contexts.

Companies must therefore understand its components and typologies in order to create unique, engaging, and effective customer experiences.

The Phygital customer experience represents "the combination of the physical and digital worlds within the point of sale in order to offer the customer a richer and continuous experience through the different channels."

Phygitalization thus refers to approaches that aim to combine the best of the physical and digital worlds to deliver fluid sensory and emotional experiences across the different channels, or to physically manifest a digital experience.

Building on a concept developed by Bain & Co, we developed our own model for the digital management of the customer experience. This model considers the customer journey not as a monolithic entity but as a series of discrete points of contact between customers and businesses.

We call each of these points of contact an episode.

An episode is defined as a dual entity (customer side and company side ) with each side having a binary definition.

We have developed a simple mathematical model that describes the state of digital transformation with respect to the impact of the INDEPENDENT episodes (ie) and the AUTOMATED EPISODES (ae) on the total sum of all the episodes considered by the digital transformation process.

Our main objective has been to develop a measure that takes into account these three entities and how they relate to each other, with the clear awareness that the result must be expressed as a relevant KPI supporting the strategy and tactics that guide the broader digitization effort.

We recommend the proposed model as a system for measuring the impact of digitization on the customer experience.

*** END ***

I have to say that the model really summed up the paper in a nutshell. Perhaps even in an extreme and radical way. Interesting to note, though, how it identified and emphasized firm points, and gave proper meaning and context to the synthesis. The model "wrote" the summary following its own style. Certainly not mine, and probably not Gian Paolo's either (the co-author).

The first paragraph introduces the paper perfectly: we're talking about digitization and its impact on Customer Experience. In particular the problem of offering seamless and frictionless experiences between the analog and digital worlds. In this regard, he gives his definition of phygital journey. What can I say, in a few seconds he gives the reader what Gian Paolo and I give in several pages... frustrating for us :-)

The model then identified the concept of episodes. It correctly quotes Bain & Co. because they introduced this concept into customer experience management.

It then defines the duality of the episode. The episode can have a customer side and a company side. Perfect! Gian Paolo will be happy with this point because he was the one who insisted on this approach, and rightly so.

Also, the definition of INDEPENDENT and AUTOMATED EPISODES as the main objective of digitization process management is perfect. Just what we recommend in our paper. The model then somewhat abruptly summarizes the mathematical concept. Unfortunately, there is no sign of the formulas proposed in our paper, but maybe neural networks don't like mathematics...

What can I say, I think the outcome is not perfect, but nevertheless, it is very interesting. In a nutshell, the model has synthesized more than six thousand words. Clearly, it cannot replace reading the original paper. What impresses me, however, is the ability of the model to identify the important entities and summarize them in a very intuitive language. I think the experiment, in this case, has produced a positive "business result".

If you are interested in transformers and NLP in general, comment on this article or contact me. Happy to exchange new views and ideas.

References:

Previously published at https://www.linkedin.com/pulse/wasnt-me-ai-writing-digital-transformation-customer-federico-cesconi/