In case we do not provide information for one or more items or for a whole unit, missing data will occur. Missing data refers to NA values in Python Pandas for Data Science. In DataFrame we will be provided with many datasets which have missing data.
Missing data is represented in Pandas by two values:
- None: It is a Python singleton object which is used for missing data in Python.
- NaN: It is the short form of Not A Number. It is a special floating point value which is recognized by all systems that use standard IEEE floating point representation.
In Pandas, None and NaN are used interchangeably for indicating missing values. For facilitating convention, we have several functions for detecting, deleting, and replacing null values in Pandas DataFrame:
Checking missing values using isnull():
To check for missing values in Pandas DataFrame we will use isnull() function, and it will return dataframe of Boolean values that are True for NaN values.
As we see in the output, only the rows which have Gender = Null are displayed.
Checking missing values using notnull():
To check for missing values in Pandas DataFrame we will use notnull() function, and it will return dataframe of Boolean values that are True for NaN values.
In the output, we can see that rows having Gender equal to NOT NULL has been displayed.
Filling of missing values using fillna(), replace() and interpolate():
All these functions will replace the null values with value of their own. The interpolate function is used for filling NA values by using interpolation technique and not by hard coding.
Code 1: To fill null values with single values.
Code 2: To fill null values with previous ones.
Code 3: To fill null values with the succeeding ones.
Code 4: To fill null values in CSV files.
Now we will fill all null values in Gender column with “No Gender”
Code 5: To fill a null value using replace() method.
Now we will replace all NaN values in data frame with -99 value.
Code 6: Usage of interpolate() function for filling missing values using linear method.
Now we will interpolate missing values using linear method. It treats the values as equally spaced.
Dropping missing values using dropna():
The dropna() function drops rows or columns of datasets with null values in different ways.
Code 1: Dropping of rows by using at least one null value:
Now we will drop rows with at least one NaN value.
Code 2: Dropping of rows in case all values in that row are missing.
Now we will drop rows missing all data or contains null values.
Code 3: Dropping columns having at least one null value.
Now we will drop columns having at least one missing value.
Code 4: Dropping rows with at least one null value in CSV file.
Now let us compare sizes of data frames so, that we know how the number of rows having at least one Null value.