Parts of Speech Tagging with Stop Words using NLTK:

NLTK is a platform for building programs for text analysis in python for data science. One of the powerful aspects of NLTK module is the Part of Speech Tagging.

We must first install NLTK for running the programs. The installation process is as follows.

  • We will open our terminal and run the pip install nltk.
  • We will then write python in the command prompt so that the python interactive shell is ready for the execution of our code.
  • Next, we will type import nltk.
  • Then use download().

Then we will get a GUI pop up, and we have to choose to download “all” for getting all packages and click ‘download’. This will provide us with all the chunkers, tokenizers, other algorithms, and all of the corpora. This takes some time because of the amount of content being installed.

In the corpus linguistics, Parts of Speech Tagging (POS/PoS/POST) also known as grammatical tagging or word category disambiguation.

Parts of Speech Tagging with Stop Words NLTK in Python for Data Science

Below we have provided some tags, their meaning, and some examples.

Parts of Speech Tagging with Stop Words NLTK in Python for Data Science

We can filter the stop words from the text to be processed. We should remember that there is no universal list of stop words in nlp research, but nltk module will contain a list of stop words.

It is possible to add our own stop words. First, we should go to NLTK download directory path-> corpora-> stopwords-> update the stop word file will depend on the language being used by us. We will use English words over here.

Parts of Speech Tagging with Stop Words NLTK in Python for Data Science

So, to learn more about it in python for data science, you can check this and this as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.