The process of production of morphological variants of root or a base word in python for data science is known as stemming. Stemming programs refer to as stemming algorithm or stemmers. Example: The words chocolaty, chocolates, and choco will get convert to the root word chocolate.
Errors in stemming:
We have two errors in stemming known as overstemming and understemming. Overstemming will occur when two words are stems to the same root but are of different stems. Under stemming occurs when two words are stems to the same root which are not of different stems.
Applications of stemming:
- We use stemming gin information retrieval systems such as search engines.
- We also use it for determining domain vocabularies in domain analysis.
Stemming reduces redundancy as the stem word, and the derive word may have the same meaning.
Below code shows the implementation of the words using NLTK.
Code 1:
Code 2: Stemming of words from sentences.
So, to learn more about it in python for data science, you can check this and this as well.