Introduction to Text Classification
Subject: Natural language processing (VU-CSC 322)
Text classification is the process of teaching a computer to place text into categories automatically. That is, predicting a label for a piece of text.
At this stage, you already know how to clean text and convert it into numerical vectors using
Bag-of-Words (BoW) and
TF-IDF. text classication which comes after "Text processing" and "Text representation" is useful for building machine learning
models that make predictions.
Examples:
- Detecting spam messages, for example Spam vs. Not Spam, use in emails, WhatsApp messages.
- Predicting if a review is positive or negative for example movie reviews and tweets.
- Identifying fake news
- Categorizing emails; important, social or adverts
Recall in the last class we learn how BoW and TF-IDF turn text into vectors. Now, those vectors become features for machine learning models.
Machine Learning Workflow for NLP
A machine learning model is a program that learns patterns from data, so as to make prediction.
Raw Text
↓
Text Cleaning
↓
BoW / TF-IDF Representation
↓
Machine Learning Model
↓
Prediction
Example:"I love this phone"
↓
TF-IDF Vector
↓
Classifier
↓
Positive Sentiment
“Win money now” → model → classified as spam.
“I love this movie” → model → classified as positive
Supervised Learning
Text classification is usually a supervised learning problem. This means that we provide examples and each example already has a correct label (data set). Example of such dataset could be :
Message (Label)“Free recharge card!” (Spam)
“Meeting starts at 9am.” (Not Spam)
The model studies these examples and learns patterns.
Features in NLP
In machine learning, the numerical representation of text is called features. This is the result from the Text Representation we learnt in the last class using BoW and TF-IDF vectors
These vectors become the input to the model.
For example a sentence: "I love rice", would be represent by:
TF-IDF Vector: [0.0, 0.8, 0.5, 0.0]
The machine learning model works with these numbers.
Spam vs. Not Spam
By:
Vision University
Login to comment or ask question on this topic
Previous Topic Next Topic