Natural Language Processing • Ben Lau

Techniques

Lowercasing: Convert text to lowercase.
Punctuation Removal: Remove punctuation marks from text.
Stemming: Remove the suffixes of words to get the root word. For example, “changing”, “changed”, “change” all become “chang”.
Lemmatization: Similar to stemming, but the root word is a lemmatized word, which is a valid word in the dictionary. For example, “changing”, “changed”, “change” all become “change”.
Stop Words: Common words like “the”, “is”, “and” that are removed from text because they don’t add much meaning.
Tokenization: Split text into words or sentences.
Bag of Words: Represent text as a set of words, ignoring grammar and word order.
TF-IDF: Term Frequency-Inverse Document Frequency. It measures how important a word is to a document in a collection of documents.
Word Embeddings: Represent words as vectors in a high-dimensional space. Words with similar meanings are closer together in this space.

Named Entity Recognition: Identify named entities like people, organizations, and locations in text.
Sentiment Analysis: Determine the sentiment of text, such as positive, negative, or neutral.
Topic Modeling: Discover topics in text documents.
Text Classification: Assign predefined categories or labels to text.
Machine Translation: Translate text from one language to another.
Text Summarization: Generate a concise summary of a text document.
Question Answering: Answer questions based on a given context or text.