Indexing and Selecting Data using Pandas:

Indexing in Pandas Python for Data Science:

Indexing is also known as Subset selection. In pandas, indexing means selecting specific rows and columns of data from a particular data frame. In indexing, we can either select entire rows and some of the columns, entire columns and some of the rows or some of each row and column.

Selection of some rows and some columns:

We will be performing indexing on some data given below.

Indexing and Selecting Data using Pandas Python for Data Science - PST

Suppose we need to select the columns Age, College and Salary for rows only with labels Amir Johnson and Terry Rozier.

Indexing and Selecting Data using Pandas Python for Data Science - PST

The final data frame will look like:

Selection of rows and columns:

Now we will select row Amir Johnson, Terry Rozier, and John Holland along with all columns in the data frame.

Indexing and Selecting Data using Pandas Python for Data Science - PST

The final data frame will look like:

Indexing and Selecting Data using Pandas Python for Data Science - PST

Selecting some columns and all rows:

Now we want to select the columns Age, Height, and Salary with all rows in datarame.

The final data frame will be like:

Indexing and Selecting Data using Pandas Python for Data Science - PST

Pandas Indexing using [ ], .loc[ ], .ix[ ]:

Many ways are present for pulling elements, rows, and columns from a data frame. In Pandas, some indexing methods are present that helps in obtaining an element from a data frame. The indexing methods appear similar, but their behavior is different. In Pandas, there are four types of Multi- axes indexing:

  • []: It is a function known as indexing operator.
  • loc[]: It is a function used for labels.
  • iloc[]: It is a function used for positions or integer based.
  • ix[]: It s a function used for both label and integer based.

These functions are collectively called indexers. These are the most common ways of indexing data.

Indexing Dataframe using indexing operator []:

Indexing operator is used for referring to square brackets following an object. The .loc and .iloc indexers use indexing operator for making selections. In this case, the indexing operator refers to df[].

Selecting a single column:

We put the name of the column in between the brackets for selecting a single column.

Indexing and Selecting Data using Pandas Python for Data Science - PST

OUTPUT:

Indexing and Selecting Data using Pandas Python for Data Science - PST

Selecting multiple columns:

For selecting multiple columns, we need to pass a list of columns in indexing operator.

OUTPUT:

Indexing DataFrame using .loc[ ]:

It is a function which selects data by label of rows and columns. The selection process of df.loc indexer is different from indexing operator. It is able to select subsets of rows and columns. It is able to select subsets of rows and columns simultaneously.

Selecting a single row:

For selection of a single row using .loc[ ], we can put a single row label in .loc function.

OUTPUT:

Selection of multiple rows:

For selection of multiple rows, we will put all row labels in a list and pass to .loc function.

OUTPUT:

Selection of two rows and three columns:

For selecting two rows and three columns, we need to select the required rows and columns, and then put it in a separate list as shown below:

OUTPUT:

Selecting all rows and some columns:

For selection of all rows and some columns, we will use single colon [:] for selecting all rows and list of some columns that we want to select which is given below:

OUTPUT:

Indexing DataFrame using .iloc[ ]:

It is a function which helps to retrieve rows and columns by position. For doing this we should specify positions of rows we want and the positions of columns we want as well. The df.iloc indexer and df.loc are similar to each other, but df.iloc only uses integer locations for making selections.

Selecting a single row:

For selection of single row using .iloc[], we pass a single integer to .iloc function.

OUTPUT:

Selecting multiple rows:

For selection of multiple rows, passing a list of integer to .iloc[] function.

OUTPUT:

Selecting two rows and two columns:

For selection of two rows and two columns, creation of a list of 2 integers for rows and list of 2 integers for columns is done and then passed to .iloc[] function.

OUTPUT:

Selection of all rows and some columns in python for data science:

For selection of all rows and some columns, we will use single colon [:] for selection of all rows. To select columns we make list of integers and then pass it to .iloc[]  function.

Indexing and Selecting Data using Pandas Python for Data Science - PST

OUTPUT:

Indexing a using DataFrame.ix[]:

In the early development stages of Pandas there was another indexer, ix. This indexer was able to select both by label and integer location. It was versatile but caused lots of confusion as it was not explicit. Sometimes integers can be labels for rows and columns. So, there were some instances where it was ambiguous. In general, ix is label based and acts similar to .loc indexer.

Selection of a single row using .ix[] as .loc[]:

For selection of a single row, we need to put a single row label in .ix function. This function is similar to .loc[] when we pass a row label as argument of a function.

OUTPUT:

Selection of a single row using .ix[] as .iloc[]:

For selection of a single row, we pass a single integer to .ix[] function. . This function is similar to .loc[] when we pass an integer in a .ix[] function.

OUTPUT:

Indexing and Selecting Data using Pandas Python for Data Science - PST

Method for indexing in DataFrame:

Indexing and Selecting Data using Pandas Python for Data Science - PST

So, to learn more about pandas in python for data science, you can check this and this as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.