Sentiment analysis is a technique used to determine if data is , , or . Natural Language Processing (NLP) positive negative neutral Sentiment analysis is fundamental, as it helps to understand the emotional tones within language. This, in turn, helps to automatically sort the opinions behind reviews, social media discussions, etc., allowing you to make faster, more accurate decisions. Although sentiment analysis has become extremely popular in recent times, work on it has been progressing since the early 2000s. Traditional machine learning methods such as Naive Bayesian, Logistic Regression, and Support Vector Machines (SVMs) are widely used for large-scale sentiment analysis because they scale well. Deep learning (DL) techniques have now been proven to provide better accuracy for various NLP tasks, including sentiment analysis; however, they tend to be slower and more expensive to learn and use. In this story, I want to offer a little-known alternative that combines speed and quality. For conclusions and assessments of the proposed method, I need a baseline model. I chose the time-tested and popular BERT. Getting the Data Social media is a source that produces a massive amount of data on an unprecedented scale. The dataset I will be using for this story is . Coronavirus tweets NLP As I can see, there is not so much data for the model, and at first glance, it seems that one cannot do without a pre-trained model. Due to the small number of samples for training, I reduce the number of classes to 3 by combining them. Baseline BERT Model Let’s use TensorFlow Hub. TensorFlow Hub is a repository of trained machine learning models ready for fine-tuning and deployable anywhere. You can use trained models like BERT and Faster R-CNN with just a few lines of code. !pip install tensorflow_hub
!pip install tensorflow_text Smaller BERT model. small_bert/bert_en_uncased_L-4_H-512_A-8 — This is one of the smaller BERT models referenced in . The smaller BERT models are intended for environments with restricted computational resources. They can be fine-tuned in the same manner as the original BERT models. However, they are most effective in the context of knowledge distillation, where a larger and more accurate teacher produces the fine-tuning labels. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models Text preprocessing for BERT. This model uses a vocabulary for English extracted from Wikipedia and BooksCorpus. Text inputs have been normalized the “uncased” way, meaning that the text has been lower-cased before tokenization into word pieces, and any accent markers have been stripped. bert_en_uncased_preprocess — tfhub_handle_encoder = \ tfhub_handle_preprocess = \ "https://tfhub.dev/tensorflow/small_bert/bert_en_uncased_L-4_H-512_A-8/1" "https://tfhub.dev/tensorflow/bert_en_uncased_preprocess/3" I will not make the selection of parameters and optimization in order not to complicate the code. All the same, this is the baseline model, not SOTA. def build_classifier_model():
    
    text_input = tf.keras.layers.Input(
        shape=(), dtype=tf.string, name= )
    
    preprocessing_layer = hub.KerasLayer(
        tfhub_handle_preprocess, name= )
    
    encoder_inputs = preprocessing_layer(text_input)
    encoder = hub.KerasLayer(
        tfhub_handle_encoder, trainable=True, name= )
    
    outputs = encoder(encoder_inputs)
    net = outputs[ ]
    net = tf.keras.layers.Dropout( )(net)
    net = tf.keras.layers.Dense( , activation= , name= )(net)
    model = tf.keras.Model(text_input, net)
    
    loss = tf.keras.losses.CategoricalCrossentropy(from_logits=True)
    metric = tf.metrics.CategoricalAccuracy( )
    optimizer = Adam(
        learning_rate= , epsilon= , decay= , clipnorm= )
    model.compile(
        optimizer=optimizer, loss=loss, metrics=metric)
    model.summary() model 'text' 'preprocessing' 'BERT_encoder' 'pooled_output' 0.1 3 'softmax' 'classifier' 'accuracy' 5e-05 1e-08 0.01 1.0 return I have created a model with just under 30M parameters. I allocated 30 percent of the data for model validation. train train, valid = train_test_split(
    df_train,
    train_size= ,
    random_state= ,
    stratify=df_train[ ])
y_train, X_train = \
    train[ ], train.drop([ ], axis= )
y_valid, X_valid = \
    valid[ ], valid.drop([ ], axis= )
y_train_c = tf.keras.utils.to_categorical(
    y_train.astype( ).cat.codes.values, num_classes= )
y_valid_c = tf.keras.utils.to_categorical(
    y_valid.astype( ).cat.codes.values, num_classes= ) 0.7 0 'Sentiment' 'Sentiment' 'Sentiment' 1 'Sentiment' 'Sentiment' 1 'category' 3 'category' 3 The number of epochs was chosen intuitively and did not require justification :) history = classifier_model.fit(
    x=X_train[ ].values,
    y=y_train_c,
    validation_data=(X_valid[ ].values, y_valid_c),
    epochs= ) 'Tweet' 'Tweet' 5 BERT Accuracy: 0.833859920501709 Confusion Matrix: Classification Report: Here I have the baseline model. Obviously, I can improve this model further. But let’s leave this task as your homework. CatBoost Model is a high-performance, open-source library for gradient boosting on decision trees. From release 0.19.1, it supports text features for classification on GPU out-of-the-box. CatBoost The main advantage is that CatBoost can include categorical functions and text functions in your data without additional preprocessing. For those who value inference speed — CatBoost predictions are 20 to 40 times faster than other open-source gradient boosting libraries, making CatBoost useful for latency-critical tasks. !pip install catboost I will not select the optimal parameters; let that be your other homework. Let’s write a function to initialize and train the model. def fit_model(train_pool, test_pool, **kwargs):
    model = CatBoostClassifier(
        task_type= ,
        iterations= ,
        eval_metric= ,
        od_type= ,
        od_wait= ,
        **kwargs
    ) model.fit(
        train_pool,
        eval_set=test_pool,
        verbose= ,
        plot=True,
        use_best_model=True) 'GPU' 5000 'Accuracy' 'Iter' 500 return 100 When working with CatBoost, I recommend using a . The Pool is a convenience wrapper combining features, labels, and further metadata like categorical and text features. Pool train_pool = Pool(
    data=X_train,
    label=y_train,
    text_features=[ ]
)
valid_pool = Pool(
    data=X_valid, 
    label=y_valid,
    text_features=[ ]
) 'Tweet' 'Tweet' text_features — A one-dimensional array of text columns indices (specified as integers) or names (specified as strings). Use only if the data parameter is a two-dimensional feature matrix (has one of the following types: list, numpy.ndarray, pandas.DataFrame, pandas.Series). If any elements in this array are specified as names instead of indices, names for all columns must be provided. To do this, either use the feature_names parameter of this constructor to explicitly specify them or pass a pandas.DataFrame with column names specified in the data parameter. Supported : training parameters — Tokenizers are used to preprocess Text type feature columns before creating the dictionary. tokenizers Dictionaries used to preprocess Text type feature columns. dictionaries — Feature calcers used to calculate new features based on preprocessed Text type feature columns. feature_calcers — I set all the parameters intuitively; tuning them will be your homework again. model = fit_model(
    train_pool, valid_pool,
    learning_rate= ,
    tokenizers=[
        { : , : , : , :[ , , ], : }      
    ],
    dictionaries = [
        { : , : }
    ],
    feature_calcers = [ ]
) 0.35 'tokenizer_id' 'Sense' 'separator_type' 'BySense' 'lowercasing' 'True' 'token_types' 'Word' 'Number' 'SentenceBreak' 'sub_tokens_policy' 'SeveralTokens' 'dictionary_id' 'Word' 'max_dictionary_size' '50000' 'BoW:top_tokens_count=10000' Accuracy: Loss: CatBoost model accuracy: 0.8299104791995787 Confusion Matrix: Classification Report: The result is very close to what the baseline BERT model has shown. Because I have very little data for training, and the model was taught from scratch, the result is, in my opinion, impressive. Bonus I got two models with very similar results. Can this give us anything else useful? Both models have little in common at their core, which means that their combination should give a synergistic effect. The easiest way to test this conclusion is to average the result and see what happens. y_proba_avg = np.argmax((y_proba_cb + y_proba_bert)/ , axis= ) 2 1 The gain is impressive. Average accuracy: 0.855713533438652 Confusion Matrix: Classification Report: Summary In this story, I: created a baseline model using BERT; created a model with CatBoost using built-in text capabilities; looked at what happens if average the result from both models. In my opinion, complex and slow SOTAs can be avoided in most cases, especially if speed is a critical need. CatBoost provides great sentiment analysis capabilities right out of the box. For competition lovers like , , etc., CatBoost can provide a good model both as a baseline solution and as a part of an ensemble of models. Kaggle DrivenData The code from the story can be viewed . here

Unconventional Sentiment Analysis: BERT vs. Catboost

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Gender Prediction Using Mobile App Data

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

Gender Prediction Using Mobile App Data

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps