Fetching Text from Infobox of Wikipedia in Python:

An Infobox is a template that is there for the collection and presentation of a subset of information about its subject. It is a structured document containing a set of attributes in python for data science. In Wikipedia, it will represent summary of information about the subject of an article.

Web Scraping:

It is a process that helps in the extraction of large amounts of data from the website and saved to a location in our computer or database in tabular format.

Although many different methods are available for the extraction of data, using APIs is the best. All large websites like Youtube, Wikipedia, Google provide APIs so that we can access their data in a structured manner. APIs are always preferred over web scrapping.

In the following article, we will look at the process of extracting the contents of Wikipedia’s Infobox.

We can scrap data by using two Python modules.

urllib2: It is there for fetching the URLs. It provides a simple interface by using the function urlopen. It is able to fetch URLs using various different protocols.

BeautifulSoup:

It is there for fetching information from the web page. It can be there for extracting tables, lists, paragraphs and also put filters for extracting information from web pages. We should keep in mind that BeautifulSoup will not fetch the webpage for us.

There is another way for data scraping.

The modules required for this method are as follows.

lxml: It is an easy to use and feature rich library for processing XML and HTML in Python.

requests: It is an Apache2 licensed HTTP library which is there in Python. It will allow us to send HTTP/1.1 requests using Python. We can also add header, form data, multipart files, and parameters via simple libraries using this. It will also allow us to access the response data of Python.

Here we have Python 2.7

Fetching Text from Infobox of Wikipedia in Python for Data Science - PSTFetching Text from Infobox of Wikipedia in Python for Data Science - PST

After we run the program, we will get the following,

So, to learn more about it in python for data science, you can check this and this as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.