Working with Missing Data in Pandas:

In case we do not provide information for one or more items or for a whole unit, missing data will occur. Missing data refers to NA values in Python Pandas for Data Science. In DataFrame we will be provided with many datasets which have missing data.

Missing data is represented in Pandas by two values:

  • None: It is a Python singleton object which is used for missing data in Python.
  • NaN: It is the short form of Not A Number. It is a special floating point value which is recognized by all systems that use standard IEEE floating point representation.

In Pandas, None and NaN are used interchangeably for indicating missing values. For facilitating convention, we have several functions for detecting, deleting, and replacing null values in Pandas DataFrame:

  • isnull()
  • notnull()
  • dropna()
  • fillna()
  • replace()
  • interpolate()
Checking missing values using isnull():

To check for missing values in Pandas DataFrame we will use isnull() function, and it will return dataframe of Boolean values that are True for NaN values.

Code 1:

Working with Missing Data in Python Pandas for Data Science - PST

OUTPUT:

Working with Missing Data in Python Pandas for Data Science - PST

Code 2:

OUTPUT:

Working with Missing Data in Python Pandas for Data Science - PST

As we see in the output, only the rows which have Gender = Null are displayed.

Checking missing values using notnull():

To check for missing values in Pandas DataFrame we will use notnull() function, and it will return dataframe of Boolean values that are True for NaN values.

Code 1:

Working with Missing Data in Python Pandas for Data Science - PST

OUTPUT:

Working with Missing Data in Python Pandas for Data Science - PST

Code 4:

Working with Missing Data in Python Pandas for Data Science - PST

OUTPUT:

Working with Missing Data in Python Pandas for Data Science - PST

In the output, we can see that rows having Gender equal to NOT NULL has been displayed.

Filling of missing values using fillna(), replace() and interpolate():

All these functions will replace the null values with value of their own. The interpolate function is used for filling NA values by using interpolation technique and not by hard coding.

Code 1: To fill null values with single values.

Working with Missing Data in Python Pandas for Data Science - PST

OUTPUT:

Code 2: To fill null values with previous ones.

Working with Missing Data in Python Pandas for Data Science - PST

OUTPUT:

Working with Missing Data in Python Pandas for Data Science - PST

Code 3: To fill null values with the succeeding ones.

OUTPUT:

Code 4: To fill null values in CSV files.

Working with Missing Data in Python Pandas for Data Science - PST

Now we will fill all null values in Gender column with “No Gender”

OUTPUT:

Working with Missing Data in Python Pandas for Data Science - PST

Code 5: To fill a null value using replace() method.

Working with Missing Data in Python Pandas for Data Science - PST

OUTPUT:

Now we will replace all NaN values in data frame with -99 value.

OUTPUT:

Code 6: Usage of interpolate() function for filling missing values using linear method.

Now we will interpolate missing values using linear method. It treats the values as equally spaced.

OUTPUT:

Dropping missing values using dropna():

The dropna() function drops rows or columns of datasets with null values in different ways.

Code 1: Dropping of rows by using at least one null value:

Working with Missing Data in Python Pandas for Data Science - PST

Now we will drop rows with at least one NaN value.

OUTPUT:

Code 2: Dropping of rows in case all values in that row are missing.

Now we will drop rows missing all data or contains null values.

OUTPUT:

Code 3: Dropping columns having at least one null value.

Now we will drop columns having at least one missing value.

OUTPUT:

Code 4: Dropping rows with at least one null value in CSV file.

OUTPUT:

Now let us compare sizes of data frames so, that we know how the number of rows having at least one Null value.

So, to learn more about pandas in python for data science, you can check this and this as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.