Models in scikit-learn
Subject: Natural language processing (VU-CSC 322)
Logistic Regression
Logistic Regression is a classification algorithm used to predict categories.
Despite the name “regression,” it is commonly used for classification tasks.
It predicts probabilities.
Example:
• Spam probability = 0.95
• Positive sentiment = 0.87
How Logistic Regression WorksThe model learns which words are important.
Example:
• Words like “win,” “free,” and “urgent” may indicate spam.
• Words like “excellent” and “amazing” may indicate positive sentiment.
Advantages• Fast
• Simple
• Works well for text classification
• Good baseline model
Disadvantages• May struggle with very complex language
• Not as powerful as deep learning models
Naive Bayes
Naive Bayes is a probabilistic machine learning algorithm based on probability theory.
It is one of the most popular algorithms for NLP tasks.
Why It Is Called “Naive”It assumes words are independent.
For example:
• “good”
• “movie”
The model treats them separately.
This assumption is not always true, but the algorithm still works surprisingly well.
Advantages• Very fast
• Works excellently for text classification
• Performs well with small datasets
Disadvantages• Assumes independence between words
• Sometimes less accurate than advanced models
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
# Example dataset
texts = [
"Win money now!!!",
"Limited offer, claim your prize",
"Hi friend, how are you?",
"Let's meet tomorrow for lunch"
]
labels = ["spam", "spam", "ham", "ham"] # ham = not spam
# Step 1: Vectorize text
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)
# Step 2: Train Logistic Regression
log_reg = LogisticRegression()
log_reg.fit(X, labels)
# Step 3: Train Naive Bayes
nb = MultinomialNB()
nb.fit(X, labels)
# Step 4: Test predictions
test = ["Free money waiting for you", "See you at lunch"]
print("Logistic Regression:", log_reg.predict(vectorizer.transform(test)))
print("Naive Bayes:", nb.predict(vectorizer.transform(test)))
Comparing Logistic Regression and Naive Bayes
By:
Vision University
Login to comment or ask question on this topic
Previous Topic