5,588 reads

Numpy With Python For Data Science

by Harun-Ur-Rashid(Shimanto)August 15th, 2018

Too Long; Didn't Read

In <a href="https://hackernoon.com/10-ways-to-make-python-a-dangerous-language-for-data-science-6b88566ac040" target="_blank">Part 1</a> of the Data science With Python series, we looked at the basic in-built functions for numerical computing in Python. In this part, we will be taking a look at the Numpy library.

featured image - Numpy With Python For Data Science

NumPy is the fundamental package for scientific computing with Python.

In Part 1 of the Data science With Python series, we looked at the basic in-built functions for numerical computing in Python. In this part, we will be taking a look at the Numpy library.

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

a powerful N-dimensional array object
sophisticated (broadcasting) functions
tools for integrating C/C++ and Fortran code
useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Great, let’s see how to use the Numpy library for basic array manipulation.

The Numpy library

First, we need to import numpy in Python.

import numpy as np

Let’s create a numpy array.

np.array([4,5,6])

Output : array([4,5,6])

Now, let’s create a multi-dimensional array.

mul=np.array([[5,4,6],[7,8,9],[10,11,12]])mul

Output : array([[4, 5, 6], [7, 8, 9],[10,11,12]])

Check the shape (rows and columns of the array).

mul.shape

Output : (3, 3)

Create an evenly spaced array between 1 and 60 with a difference of 2.

dif=np.arange(1,60,2)dif

Output : array([ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59])

Reshape the above array into a desired shape.

dif.reshape(10,3)

Output : array([[ 1, 3, 5], [ 7, 9, 11], [13, 15, 17], [19, 21, 23], [25, 27, 29], [31, 33, 35], [37, 39, 41], [43, 45, 47], [49, 51, 53], [55, 57, 59]])

Generate an evenly spaced list between the interval 1 and 8. (Take a minute here to understand the difference between ‘linspace’ and ‘arange’)

gen = np.linspace(1,8,40)gen

Output: array([1. , 1.17948718, 1.35897436, 1.53846154, 1.71794872, 1.8974359 , 2.07692308, 2.25641026, 2.43589744, 2.61538462, 2.79487179, 2.97435897, 3.15384615, 3.33333333, 3.51282051, 3.69230769, 3.87179487, 4.05128205, 4.23076923, 4.41025641, 4.58974359, 4.76923077, 4.94871795, 5.12820513, 5.30769231, 5.48717949, 5.66666667, 5.84615385, 6.02564103, 6.20512821, 6.38461538, 6.56410256, 6.74358974, 6.92307692, 7.1025641 , 7.28205128, 7.46153846, 7.64102564, 7.82051282, 8. ])

Now, change the shape of the array in place (‘resize’ function changes the shape of the array in place, unlike ‘reshape’)

gen.resize(10,4)gen

Output: array([[1. , 1.17948718, 1.35897436, 1.53846154], [1.71794872, 1.8974359 , 2.07692308, 2.25641026], [2.43589744, 2.61538462, 2.79487179, 2.97435897], [3.15384615, 3.33333333, 3.51282051, 3.69230769], [3.87179487, 4.05128205, 4.23076923, 4.41025641], [4.58974359, 4.76923077, 4.94871795, 5.12820513], [5.30769231, 5.48717949, 5.66666667, 5.84615385], [6.02564103, 6.20512821, 6.38461538, 6.56410256], [6.74358974, 6.92307692, 7.1025641 , 7.28205128], [7.46153846, 7.64102564, 7.82051282, 8. ]])

Create an array with all elements as ones.

onarr = np.ones((4,4))onarr

Output: array([[1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.], [1., 1., 1., 1.]])

Create an array filled with zeros.

zearr = np.zeros((4,4))zearr

Output: array([[0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.], [0., 0., 0., 0.]])

Create a diagonal matrix with diagonal values = 1

dm = np.eye(3)dm

Output: array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])

Extract only diagonal values from an array.

np.diag(dm)

Output: array([1., 1., 1.])

Create an array consisting of repeating list

relist = np.array([1,2,3]*7)relist

Output: array([1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3])

Now, repeat each element of array n number of times using repeat function.

np.repeat([1,2,3],3)

Output : array([1, 1, 1, 2, 2, 2, 3, 3, 3])

Generate two arrays of desired shape filled with random values between 0 and 1.

relist = np.random.rand(2,3)print(relist)de = np.random.rand(2,3)print(de)

Output :

[[0.55523672 0.46815197 0.67590369] [0.5331193 0.62780236 0.45044916]]

[[0.26215572 0.07380256 0.06592746] [0.89782279 0.95603968 0.82052478]]

Stack the above two arrays created vertically

st = np.vstack([de,relist])st

Output :

array([[0.26215572, 0.07380256, 0.06592746], [0.89782279, 0.95603968, 0.82052478], [0.55523672, 0.46815197, 0.67590369], [0.5331193 , 0.62780236, 0.45044916]])

Now, let’s stack them horizontally.

sh = np.hstack([de,relist])sh

Output :

array([[0.26215572, 0.07380256, 0.06592746, 0.55523672, 0.46815197, 0.67590369], [0.89782279, 0.95603968, 0.82052478, 0.5331193 , 0.62780236, 0.45044916]])

Great, now let’s perform some array operations. First let’s create two random arrays

r1 = np.random.rand(2,2)r2 = np.random.rand(2,2)print(r1)print(r2)

Output :

[[ 0.02430146 0.14448542] [ 0.54428337 0.40332494]]

[[ 0.77574886 0.08747577] [ 0.51484157 0.92319888]]

Let’s do element wise addition.

r3 = r1+ r2r3

Output : array([[-0.75144739, 0.05700965], [ 0.02944179, -0.51987394]])

Element wise subtraction.

r4 = r1 - r2r4

Output : array([[-0.75144739, 0.05700965], [ 0.02944179, -0.51987394]])

Let’s power each element to 3.

r5 = r1**3r5

Output : array([[0.65228631, 0.24993365], [0.97976155, 0.71554632]])

Now, instead of element wise operation, let’s perform a dot product of the two arrays r1 and r2.

r6 = r1.dot(r2)r6

Output : array([[ 0.09323893, 0.13551456], [ 0.62987564, 0.41996073]])

Let’s create a new array and transpose it.

sh = np.array([[1,2],[3,4]])sh

Output :

array([[1, 2], [3, 4]])

sh.T

Output :

array([[1, 3], [2, 4]])

Now, check the datatype of elements in the array.

sh.dtype

Output : dtype(‘int32’)

Change the datatype of the array.

rs = a.astype('f')rs.dtype

Output : dtype(‘float32’)

Now, let’s look at some mathematical functions in an array, starting with sum of an array.

c = np.array([1,2,3,4,5])c.sum()

Output : 15

Maximum of the elements of an array.

c.max()

Output : 5

Mean of the elements of the array

c.mean()

Output : 3

Now, let’s retrieve the index of the maximum value of the array.

c.argmax()

Output : 4

c.argmin()

Output : 0

Create an array consisting of square of first ten whole numbers.

dim = np.arange(10)**2dim

Output : array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81], dtype=int32)

Access values in the above array using index

dim[2]

Output : 4

dim[1:5]

Output : array([ 1, 4, 9, 16], dtype=int32)

Use negative sign to access variables in reverse.

dim[-1:]

Output : array([81], dtype=int32)

Now, access certain elements of the array based on a step size.

dim[1:10:2] #dim[start:stop:stepsize]

Output : array([ 1, 9, 25, 49, 81], dtype=int32)

Create a multidimensional array

en = np.arange(36)en.resize(6,6)en

Output : array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29], [30, 31, 32, 33, 34, 35]])

Access the second row and third column

en[1,2]

Output : 8

Access 2nd row and columns 3 to 7. Note that the numbering of the rows and columns start with 0.

en[1, 2:6]

Output : array([ 8, 9, 10, 11])

Select all rows till the 2nd row and all columns except last column

en[:2,:-1]

Output : array([[ 0, 1, 2, 3, 4], [ 6, 7, 8, 9, 10]])

Select values from array greater than 20.

en[en>20]

Output : array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])

Assign value of the array elements as 20 if the element value is greater than 20.

en[en>20] = 20en

Output : array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 20, 20, 20], [20, 20, 20, 20, 20, 20], [20, 20, 20, 20, 20, 20]])

To copy an array onto another variable, always use the copy function.

fun = en.copy()fun

Output : array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 20, 20, 20], [20, 20, 20, 20, 20, 20], [20, 20, 20, 20, 20, 20]])

Create an array with a set of random integers between 1 and 10. Specify the array to be of shape 4*4

gom = np.random.randin(1,10,(4,4))gom

Output : array([[9, 7, 1, 4], [1, 4, 3, 6], [2, 5, 5, 1], [2, 2, 9, 9]])

Great, we have looked at creating, accessing and manipulating arrays in Numpy. In the next part of the series, we will be looking at a library which is built on the Numpy library — Pandas. Pandas is a library which makes data manipulation and analysis much easier in Python. It offers data structures and operations for numerical tables and time series.