POS Tagging and NER

Subject: Natural language processing (VU-CSC 322)

Part-of-Speech (POS) Tagging


Part-of-Speech tagging is the process of identifying the grammatical role of each word in a sentence. Every word is assigned a tag that tells us whether it is a noun, verb, adjective, adverb, pronoun, conjunction, or another category. For example, in the sentence “Natural Language Processing is fun and powerful!”, the word “Natural” is tagged as an adjective, “Language” as a noun, and “is” as an auxiliary verb. POS tagging helps us understand the structure of a sentence and is useful in tasks like syntactic parsing, text classification, and even machine translation.

Named Entity Recognition (NER)


Named Entity Recognition is the process of detecting and classifying important real-world entities in text. These entities can be names of people, organizations, locations, dates, monetary values, and more. For instance, in the sentence “Apple is looking at buying U.K. startup for $1 billion”, spaCy identifies “Apple” as an organization (ORG), “U.K.” as a geopolitical entity (GPE), and “$1 billion” as money (MONEY). NER is widely used in applications like information extraction, search engines, chatbots, and document analysis because it transforms unstructured text into structured data.

Example

import spacy
# Load English model
nlp = spacy.load("en_core_web_sm")

# Sample text
doc = nlp("Benjamin Onuorah is a fine artist and a software developer. He lives in Lagos, Nigeria.")

# Tokenization
tokens = [token.text for token in doc]
print("spaCy Tokens:", tokens)

# Stopword removal
filtered_tokens = [token.text for token in doc if not token.is_stop]
print("Filtered Tokens:", filtered_tokens)

# POS tags
print([(token.text, token.pos_) for token in doc])

# Named Entities
print([(ent.text, ent.label_) for ent in doc.ents])


OUTPUT
spaCy Tokens: ['Benjamin', 'Onuorah', 'is', 'a', 'fine', 'artist', 'and', 'a', 'software', 'developer', '.', 'He', 'lives', 'in', 'Lagos', ',', 'Nigeria', '.']

Filtered Tokens: ['Benjamin', 'Onuorah', 'fine', 'artist', 'software', 'developer', '.', 'lives', 'Lagos', ',', 'Nigeria', '.']

[('Benjamin', 'PROPN'), ('Onuorah', 'PROPN'), ('is', 'AUX'), ('a', 'DET'), ('fine', 'ADJ'), ('artist', 'NOUN'), ('and', 'CCONJ'), ('a', 'DET'), ('software', 'NOUN'), ('developer', 'NOUN'), ('.', 'PUNCT'), ('He', 'PRON'), ('lives', 'VERB'), ('in', 'ADP'), ('Lagos', 'PROPN'), (',', 'PUNCT'), ('Nigeria', 'PROPN'), ('.', 'PUNCT')]

[('Benjamin Onuorah', 'PERSON'), ('Lagos', 'GPE'), ('Nigeria', 'GPE')]

POS tagging tells us how words function grammatically.
NER tells us which words or phrases represent meaningful entities in the real world.







By: Vision University

Comments

No Comment yet!

Login to comment or ask question on this topic


Previous Topic Next Topic