NLP Tutorial: Topic Modeling in Python with BerTopicby@davisdavid
20,769 reads

NLP Tutorial: Topic Modeling in Python with BerTopic

August 24th 2021
5 min
by @davisdavid 20,769 reads
tldt arrow
EN
Read on Terminal Reader

Too Long; Didn't Read

BerTopic is a topic modeling technique that uses transformers (BERT embeddings) and class-based TF-IDF to create dense clusters. It also allows you to easily interpret and visualize the topics generated. In this NLP tutorial, we will use Olympic Tokyo 2020 Tweets with a goal to create a model that can automatically categorize the tweets by their topics. The BerTopic algorithm contains 3 stages:Embed the textual data(documents) Embed the documents with BERT, or it can use any other embedding technique. The algorithm uses UMAP to reduce the dimensionality of embeddeddings and the HDBSCAN technique.

Company Mentioned

Mention Thumbnail
featured image - NLP Tutorial: Topic Modeling in Python with BerTopic
Davis David HackerNoon profile picture

@davisdavid

Davis David

About @davisdavid
LEARN MORE ABOUT @DAVISDAVID'S EXPERTISE AND PLACE ON THE INTERNET.
react to story with heart

RELATED STORIES

L O A D I N G
. . . comments & more!
Hackernoon hq - po box 2206, edwards, colorado 81632, usa