NumPy library in Python – Set 2:

Now let us discuss some of the advanced methods in NumPy python for data science:

  1. Stacking:

    We can stack several arrays together along different axes.

  • vstack: It is there for stacking arrays along vertical axes.
  • hstack: It is there for stacking arrays along horizontal axes.
  • column_stack: It is there for stacking 1-D arrays as columns into 2-D arrays.
  • concatenate: It is there for stacking arrays along a specific axis.

NumPy library in Python for Data Science – Set 2- PST Analytics

  1. Splitting:

    We use the following functions for splitting.

  • hsplit: It splits the array along horizontal axis.
  • vsplit: It splits the array along vertical axis.
  • array_split: It splits the array along a specific axis.

  1. Broadcasting:

    It is there to describe how NumPy treats arrays with different shapes during arithmetic operations. The smaller array is broadcast across larger arrays to have compatible shapes.

It helps to provide a means of vectorizing array operations so, as looping occurs in C and not in Python. In this, there is no copying of data and leads to efficient algorithm implementation. In some cases, broadcasting is not a good idea as it leads to inefficient use of memory which leads to slow computation.

The operations in NumPy are done element by element and, it requires two arrays to have the same shape. This constraint is relax when the array shapes meet certain constraints.

NumPy library in Python for Data Science – Set 2- PST Analytics

We can have a simple broadcasting when an array and scalar value are being combine in an operation.

The scalar b is being stretched when we perform the arithmetic operation into an array having same shape as a. The new elements in b are simply copies of original scalar. We should keep in mind that the stretching is conceptual.

NumPy uses the original scalar value without making copies which makes it efficient in both memory and computation.

We can make things clear from the below diagram.

Now we will look at an example where both the arrays are stretch.

NumPy library in Python for Data Science – Set 2- PST Analytics

  1. Working with datetime: NumPy consists of core array data types that supports datetime function. The data type is “datetime64”.

  1. Linear algebra:

    It allows us to apply linear algebra on any numpy array.

We can find:

  • The rank, trace, determinant, etc. of all arrays.
  • The Eigen values of specified matrices.
  • The product of matrix and vectors and also matrix exponentiation.
  • The solution of linear or tensor equations.

Suppose we want to solve the below set of equations:

It can be solved using linalg.solve method.

Now we will see an example of linear regression using least square method.

Linear regression line is of the form w1x + w2 = y. This line minimizes the sum of squares of distance from each data point to the line. So, if we have n pairs of data , then the parameters we are looking for are w1 and w2 that minimizes the error.


NumPy library in Python for Data Science – Set 2- PST Analytics

So, to learn more about numpy in python for data science, you can check this and this as well.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.