Model deployment is one of the most important skills you should have if you're going to work with NLP models. Model deployment is the process of integrating your model into an existing production environment. The model will receive input and predict an output for decision-making for a specific use case. “Only when a model is fully integrated with the business systems, we can extract real value from its predictions”. - Christopher Samiullah There are different ways you can deploy your model into production, you can use Flask, Django, Bottle e.t.c .But in today's article, you will learn how to build and deploy your NLP model with FastAPI. NLP In this series of  articles, you will learn: How to build a NLP model that classifies IMDB Movies reviews into different sentiments. What is FastAPI and how to install it. How to deploy your model with FastAPI. How to use your deployed NLP model in any Python application. In part 1, we will focus on building an NLP model that can classify movie reviews into different sentiments. So let’s get started! How to Build the NLP Model First, we need to build our NLP model. We are going to use the to build a simple model that can classify if the review about the movie is Positive or Negative. Here are the steps you should follow to do that. IMDB Movie dataset Import Important packages First, we import important python packages to load data, clean the data, create a machine learning model (classifier), and save the model for deployment. numpy np pandas pd sklearn.model_selection train_test_split sklearn.pipeline Pipeline sklearn.naive_bayes MultinomialNB sklearn.metrics (
    accuracy_score,
    classification_report,
    plot_confusion_matrix,
) sklearn.feature_extraction.text TfidfVectorizer, CountVectorizer string punctuation nltk.tokenize word_tokenize nltk nltk.corpus stopwords nltk.stem WordNetLemmatizer re dependency ( , , , , ,
):
    nltk.download(dependency) warnings
warnings.filterwarnings( ) np.random.seed( ) # import important modules import as import as # sklearn modules from import from import from import # classifier from import from import # text preprocessing modules from import # text preprocessing modules from import import from import from import import #regular expression # Download dependency for in "brown" "names" "wordnet" "averaged_perceptron_tagger" "universal_tagset" import "ignore" # seeding 123 Load the dataset from the data folder. data = pd.read_csv( , sep= ) # load data "../data/labeledTrainData.tsv" '\t' Show sample of the dataset. data.head() # show top five rows of data Our dataset has 3 columns. - This is the id of the review Id - either positive(1) or negative(0) Sentiment - comment about the movie Review Check the shape of the dataset. data.shape # check the shape of the data (25000, 3) The dataset has 25,000 reviews. We need to check if the dataset has any missing values. data.isnull().sum() # check missing values in data id                   0 sentiment    0 review           0 dtype: int64 The output shows that our dataset does not have any missing values. How to Evaluate Class Distribution We can use the method from the pandas package to evaluate the class distribution from our dataset. value_counts() data.sentiment.value_counts() # evalute news sentiment distribution 1    12500 0    12500 Name: sentiment, dtype: int64 In this dataset, we have an equal number of positive and negative reviews. How to Process the Data After analyzing the dataset, the next step is to preprocess the dataset into the right format before creating our machine learning model. The reviews in this dataset contain a lot of unnecessary words and characters that we don't need when creating a machine learning model. We will clean the messages by removing stopwords, numbers, and punctuation. Then we will convert each word into its base form by using the lemmatization process in the NLTK package. The function will handle all necessary steps to clean our dataset. text_cleaning() stop_words =  stopwords.words( ) text = re.sub( , , text)
    text = re.sub( , , text)
    text =  re.sub( , , text)
    text = re.sub( , , text) text = .join([c c text c punctuation]) remove_stop_words:
        text = text.split()
        text = [w w text w stop_words]
        text = .join(text) lemmatize_words:
        text = text.split()
        lemmatizer = WordNetLemmatizer() 
        lemmatized_words = [lemmatizer.lemmatize(word) word text]
        text = .join(lemmatized_words) (text) 'english' : def text_cleaning (text, remove_stop_words=True, lemmatize_words=True) # Clean the text, with the option to remove stop_words and to lemmatize word # Clean the text r"[^A-Za-z0-9]" " " r"\'s" " " r'http\S+' ' link ' r'\b\d+(?:\.\d+)?\s+' '' # remove numbers # Remove punctuation from text '' for in if not in # Optionally, remove stop words if for in if not in " " # Optionally, shorten words to their stems if for in " " # Return a list of words return Now we can clean our dataset by using the function. text_cleaning() data[ ] = data[ ].apply(text_cleaning) #clean the review "cleaned_review" "review" Then split data into feature and target variables. X = data[ ]
y = data.sentiment.values #split features and target from  data "cleaned_review" Our feature for training is the variable and the target is the variable. cleaned_review sentiment We then split our dataset into train and test data. The test size is 15% of the entire dataset. X_train, X_valid, y_train, y_valid = train_test_split(
    X,
    y,
    test_size= ,
    random_state= ,
    shuffle= ,
    stratify=y,
) # split data into train and validate 0.15 42 True How to Actually Create Our NLP Model We will train the Multinomial Naive Bayes algorithm to classify if a review is positive or negative. This is one of the most common algorithms used for text classification. But before training the model, we need to transform our cleaned reviews into numerical values so that the model can understand the data. In this case, we will use the method from scikit-learn. TfidfVectorizer will help us to convert a collection of text documents to a matrix of TF-IDF features. TfidfVectorizer To apply this series of steps(pre-processing and training), we will use a from scikit-learn that sequentially applies a list of transforms and a final estimator. Pipeline class sentiment_classifier = Pipeline(steps=[
                                 ( ,TfidfVectorizer(lowercase= )),
                                 ( ,MultinomialNB())
                                 ]) # Create a classifier in pipeline 'pre_processing' False 'naive_bayes' Then we train our classifier. sentiment_classifier.fit(X_train,y_train) # train the sentiment classifier We then create a prediction from the validation set. y_preds = sentiment_classifier.predict(X_valid) # test model performance on valid data The model's performance will be evaluated by using the evaluation metric. We use accuracy_score because we have an equal number of classes in the sentiment variable. accuracy_score accuracy_score(y_valid,y_preds) 0.8629333333333333 The accuracy of our model is around which is a good performance. 86.29% Save Model Pipeline The model pipeline will be saved in the model’s directory by using the python package. joblib joblib 

joblib.dump(sentiment_classifier, ) #save model import '../models/sentiment_model_pipeline.pkl' Wrapping Up Congratulations 👏👏, you have made it to the end of this part 1. I hope you have learned something new on how to build a NLP model. In part 2 we will learn how to deploy our NLP model with FastAPI and run it in python applications. If you learned something new or enjoyed reading this article, please share it so that others can see it. Until then, see you in part 2! You can also find me on Twitter . @Davis_McDavid And you can read more articles like this . here For more AI and machine learning guides, be sure to in the footer below. subscribe to our newsletter

Target

Twitter

How To Build and Deploy an NLP Model with FastAPI: Part 2

How To Build and Deploy an NLP Model with FastAPI: Part 1

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 Best African Language Datasets for Data Science Projects

10 Pieces of Sci-Fi Armor We All Wish Were Real

10 Movie Tie-In Games That Were Actually Good

10 Most Anticipated Sci-fi Games of 2022 You Can’t Miss

10 Female Video Game Characters You Need to Know

10 Cutest Pink Pokémon of All Time

10 Best African Language Datasets for Data Science Projects

10 Pieces of Sci-Fi Armor We All Wish Were Real

10 Movie Tie-In Games That Were Actually Good

10 Most Anticipated Sci-fi Games of 2022 You Can’t Miss

10 Female Video Game Characters You Need to Know

10 Cutest Pink Pokémon of All Time

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps