Stemming Words with NLTK:

The process of production of morphological variants of root or a base word  in python for data science is known as stemming. Stemming programs refer to as stemming algorithm or stemmers. Example: The words chocolaty, chocolates, and choco will get convert to the root word chocolate.

Errors in stemming:

We have two errors in stemming known as overstemming and understemming. Overstemming will occur when two words are stems to the same root but are of different stems. Under stemming occurs when two words are stems to the same root which are not of different stems.

Applications of stemming:
  • We use stemming gin information retrieval systems such as search engines.
  • We also use it for determining domain vocabularies in domain analysis.

Stemming reduces redundancy as the stem word, and the derive word may have the same meaning.

Below code shows the implementation of the words using NLTK.

Code 1:

Stemming Words with NLTK in Python for Data Science - PST Analytics

Code 2: Stemming of words from sentences.

Stemming Words with NLTK in Python for Data Science - PST Analytics

 So, to learn more about it in python for data science, you can check this and this as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.