Models in scikit-learn

Subject: Natural language processing (VU-CSC 322)

Logistic Regression


Logistic Regression is a classification algorithm used to predict categories.
Despite the name “regression,” it is commonly used for classification tasks.
It predicts probabilities.

Example:
• Spam probability = 0.95
• Positive sentiment = 0.87

How Logistic Regression Works
The model learns which words are important.
Example:
• Words like “win,” “free,” and “urgent” may indicate spam.
• Words like “excellent” and “amazing” may indicate positive sentiment.

Advantages
• Fast
• Simple
• Works well for text classification
• Good baseline model

Disadvantages
• May struggle with very complex language
• Not as powerful as deep learning models


Naive Bayes


Naive Bayes is a probabilistic machine learning algorithm based on probability theory.
It is one of the most popular algorithms for NLP tasks.

Why It Is Called “Naive”
It assumes words are independent.
For example:
• “good”
• “movie”
The model treats them separately.
This assumption is not always true, but the algorithm still works surprisingly well.

Advantages
• Very fast
• Works excellently for text classification
• Performs well with small datasets

Disadvantages
• Assumes independence between words
• Sometimes less accurate than advanced models



from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB

# Example dataset
texts = [
"Win money now!!!",
"Limited offer, claim your prize",
"Hi friend, how are you?",
"Let's meet tomorrow for lunch"
]
labels = ["spam", "spam", "ham", "ham"] # ham = not spam

# Step 1: Vectorize text
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(texts)

# Step 2: Train Logistic Regression
log_reg = LogisticRegression()
log_reg.fit(X, labels)

# Step 3: Train Naive Bayes
nb = MultinomialNB()
nb.fit(X, labels)

# Step 4: Test predictions
test = ["Free money waiting for you", "See you at lunch"]
print("Logistic Regression:", log_reg.predict(vectorizer.transform(test)))
print("Naive Bayes:", nb.predict(vectorizer.transform(test)))


Comparing Logistic Regression and Naive Bayes





By: Vision University

Comments

No Comment yet!

Login to comment or ask question on this topic


Previous Topic