Intro to NLP: Topic Modeling and Text Categorization

Editor’s note: Sanghamitra Deb is a speaker for ODSC East 2022. Be sure to check out her talk, “Intro to NLP: Text Categorization and Topic Modeling,” there!
Natural Language Processing (NLP) is the basis of machine intelligence. NLP is the process of bringing structure to free-form unstructured text.
At first, I will explain topic modeling which is unsupervised learning. Topic modeling is a process of recognizing hidden structures in data. Topic modeling using LDA is a generative probabilistic model. The assumption is the documents are a combination of different topics and topics are made of the underlying vocabulary.
Topic modeling is a good way to get some insights into your data in the absence of training data.
Here is a sneak peek into the topic modeling of recipes. Recipes relevant to baking are under the same topic.
Next, I will talk about text categorization/classification. This is one of the most common applications. Categories could be predefined or could be derived from topic modeling. This supervised learning requires training data. In text classification, I will go through simple techniques such as tfidf and some deep learning models.
I will conclude my session by talking about performance metrics and how to interpret them. I will also talk about real-world use cases and discuss combining numerical, categorical, and text features in the same deep learning model. Here is an example of such an architecture.
This type of modeling becomes important when we are trying to do personalized prediction and user interaction needs to be taken into account.
Lecture material will be found here: https://github.com/sangha123/Intro-to-NLP-Topic-Modeling-and-Text-Categorization
About the author/ODSC East 2022 Speaker:





