, October 2019 Cleuton Sampaio Every company has a customer service channel, right? It can be an address, or a form in which the customer registers his message. email It is very difficult to quantitatively analyze messages, being a tedious job that requires many hours of reading and interpretation. One way to facilitate this work is to classify the highest priority messages, which can be done in two ways: Presence of certain terms in the message body;Sentiment analysis of texts and prioritization. This program reads the files contained in a folder and classifies the messages according to the feeling of the text using techniques of in the language. NLP R Sentiment analysis is one of the segments of (Natural Language Processing), and has great appeal. We can interpret the feeling of the texts and, consequently, the feeling of those who wrote them. There are several uses for this, such as: CRM, for example. NLP This Demo uses the to calculate the weight of a text, based on a -5,+5 interval AFINN lexicon file Installation It is a program made in , therefore, you need to . If you want, you can install as well. I am using Linux - Ubuntu, so I will provide instructions for this operating system, but it can be installed and run on any platform. R install the R interpreter Rstudio In this example, I use some third-party libraries that need to be installed: ) readr: https://cran.r-project.org/web/packages/readr/index.html dplyr: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html stringr: https://cran.r-project.org/web/packages/stringr/vignettes/stringr.html tidytext: https://cran.r-project.org/web/packages/tidytext/vignettes/tidytext.html Setting up an environment is not very simple. I recommend that you install and create a environment by installing packages on it. R Anaconda R To begin, we have to install a few things in the operating system: sudo apt-get install -y r-cran-isocodes sudo apt install libxml2-dev conda install -c r r-xml2 sudo apt-get install libgdal-dev Then, we can start (or ) and install the remaining packages right there: R Rstudio install.packages(c( , , , , , )) install.packages(c( , , , , , , )) install.packages( ) "dplyr" "readr" "stringr" "tidyRSS" "tidytext" "wordcloud" "mnormt" "psych" "SnowballC" "hunspell" "broom" "tokenizers" "janeaustenr" 'stopwords' Finally, we have to create two environment variables: : The folder where the script and the lexical file are; : The folder where the files you want to analyze are located; SENTIMENT_HOME SENTIMENT_TEXT_ENGLISH In this package, i've included some text samples to be analyzed, inside the folder. english_samples Run the : R script Rscript sentiment_english.R Just clone this repo! Results This is the result using the samples: Joining, by = [ ] [ ] Joining, by = [ ] [ ] Joining, by = [ ] [ ] Joining, by = [ ] [ ] Joining, by = [ ] [ ] Joining, by = [ ] [ ] "word" 1 "The product is regular" 1 3 "word" 1 "This product is garbage and should not be sold!" 1 1 "word" 1 "You know... the product is not so bad..." 1 1 "word" 1 "It's a good product, and with some improvements can be a great product!" 1 5 "word" 1 "It's a good product." 1 5 "word" 1 "It's easy to use and very nice. I love it!" 1 5 And this is the graph: Why naive? Good question! Because this algorithm is less than perfect and can give you false negatives and positives! People often use , , and other indirect expressions to refer to a product. For example, the can mislead you: sarcasm double negatives file sample3 You know... the product is not so bad... This algorithm extracts english , then it uses the to assign to each word, summing the result. Negative words can cancel positive ones. It is a algorithm. stop words file lexicon weights naive Although naive, this algorithm can still give good results, if used as the first filter of messages, decreasing the amount of messages to be analyzed in detail. A better solution would consider the influence of one word over another, like the . In fact, there are some good tutorials, like this one: . deep learning soltions LTSM https://towardsdatascience.com/sentiment-analysis-using-lstm-step-by-step-50d074f09948 But they all use complicated frameworks, and require large CPU's or even GPU's to process. This is a simple solution that runs using low resources, and is very fast. Previously published at https://github.com/cleuton/datascience/blob/master/nlp/sentiment/english.md