Standardization or feature scaling: It is a step of data pre-processing applied to independent variables or features of data. It helps in normalizing the data within a particular range. It helps in speeding up the calculations in a python algorithm for data science.
Formula in the backend:
Standardization will replace the values with their Z scores.
The Fit method is mostly there for feature scaling in python.
Where and why to apply feature scaling?
The datasets of the real world has features which vary in magnitude, units, and range. Normalization is perform when the scale of feature is irrelevant or misleading. Normalization is not perform when the scaling is meaningful.
The algorithms using Euclidean distance are sensitive to magnitudes. In this case, feature scaling helps to weigh all features equally.
If a feature in the dataset is on a big scale compare to others, and when Euclidean distance is measure, then this feature of big scale becomes dominating. In order to remove this, we use normalization.
Examples of Algorithms where feature scaling matters:
- K-Means uses Euclidean distance measures and features scaling matters.
- K-Nearest-Neighbors require feature scaling.
- Principal Component Analysis(PCA) tries to get feature with maximum variance, so, in this case, too feature scaling will be necessary.
- Gradient descent: We are able to increase the calculation speed as the theta calculation becomes faster after we apply feature scaling.