Introduction to Numpy -1 : An absolute beginners guide to Machine Learning and Data science. by@rakshithvasudev

September 28th 2017 17,239 reads

Lets get started quickly. Numpy is a math library for python. It enables us to do computation efficiently and effectively. It is better than regular python because of it’s amazing capabilities.

In this article I’m just going to introduce you to the basics of what is mostly required for machine learning and datascience. I’m not going to cover everything that’s possible with numpy library. This is the part one of numpy tutorial series.

The first thing I want to introduce you to is the way you import it.

import numpy as np

Okay, now we’re telling python that “np” is the official reference to numpy from further on.

Let’s create python array and np array.

# python array

a = [1,2,3,4,5,6,7,8,9]

# numpy array

A = np.array([1,2,3,4,5,6,7,8,9])

If I were to print them, I wouldn’t see much difference.

print(a)

print(A)

====================================================================[1, 2, 3, 4, 5, 6, 7, 8, 9]

[1 2 3 4 5 6 7 8 9]

Okay, but why do I have to use an np array instead of a regular array?

The answer is that np arrays are better interms of faster computation and ease of manipulation.

More on those details here, if you’re interested:

Let’s proceed further with more cool stuff. Wait, there was nothing cool we saw yet! Okay, here’s something:

np.arange(0,10,2)

====================================================================array([0, 2, 4, 6, 8])

What arange([start],stop,[step]) does is that it arranges numbers from starting to stop, in steps of step. Here is what it means for np.arange(0,10,2):

return an np list starting from 0 all the way upto 10 but don’t include 10 and increment numbers by 2 each time.

So, that’s how we get :

array([0, 2, 4, 6, 8])

important thing remember here is that the stopping number is not going to be included in the list.

another example:

np.arange(2,29,5)

====================================================================

array([ 2, 7, 12, 17, 22, 27])

Before I proceed further, I’ll have to warn you that this “array” is interchangeably called “matrix” or also “vector”. So don’t get panicked when I say for example “Matrix shape is 2 X 3”. All it means is that array looks something like this:

array([ 2, 7, 12,],

[17, 22, 27])

Now, Let’s talk about the shape of a default np array.

Shape is an attribute for np array. When a default array, say for example A is called with shape, here is how it looks.

A = [1, 2, 3, 4, 5, 6, 7, 8, 9]

A.shape

====================================================================

(9,)

This is a rank 1 matrix(array), where it just has 9 elements in a row.

Ideally it should be a 1 X 9 matrix right?

I agree with you, so that’s where reshape() comes into play. It is a method that changes the dimensions of your original matrix into your desired dimension.

Let’s look at reshape in action. You can pass a tuple of whatever dimension you want as long as the reshaped matrix and original matrix have the same number of elements.

A = [1, 2, 3, 4, 5, 6, 7, 8, 9]

A.reshape(1,9)

====================================================================

array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

Notice that reshape returns a multi-dim matrix. Two square brackets in the beginning indicate that. [[1, 2, 3, 4, 5, 6, 7, 8, 9]] is a potentially multi-dim matrix as opposed to [1, 2, 3, 4, 5, 6, 7, 8, 9].

Another example:

B = [1, 2, 3, 4, 5, 6, 7, 8, 9]

B.reshape(3,3)

====================================================================

array([[1, 2, 3],

[4, 5, 6],

[7, 8, 9]])

If I look at B’s shape, it’s going to be (3,3):

B.shape

====================================================================

(3,3)

This time it’s your job to tell me what happens looking at this code:

np.zeros((4,3))

====================================================================

???????????

Good, if you thought it’s going to print a 4 X 3 matrix filled with zeros. Here’s the output:

np.zeros((4,3))

====================================================================

array([[ 0., 0., 0.],

[ 0., 0., 0.],

[ 0., 0., 0.],

[ 0., 0., 0.]])

np.zeros((n,m)) returns an** n x m **matrix that contains zeros. It’s as simple as that.

Hint: eye() stands for Identity.

np.eye(5)

====================================================================

array([[ 1., 0., 0., 0., 0.],

[ 0., 1., 0., 0., 0.],

[ 0., 0., 1., 0., 0.],

[ 0., 0., 0., 1., 0.],

[ 0., 0., 0., 0., 1.]])

np.eye() returns an identity matrix with the specified dimensions.

No problem, we have np.dot().

np.dot() performs matrix multiplication, provided both the matrices are “multiply-able”. It just means that the number of columns of the first matrix must match the number of rows in second matrix.

ex: A = (2,3) & B=(3,2). Here number of cols in A= 3. Number of rows in B = 3. Since they match, multiplication is possible.

Let’s illustrate multiplication via np code:

# generate an identity matrix of (3 x 3)

I = np.eye(3)

I

====================================================================

array([[ 1., 0., 0.],

[ 0., 1., 0.],

[ 0., 0., 1.]])

# generate another (3 x 3) matrix to be multiplied.

D = np.arange(1,10).reshape(3,3)

D

====================================================================

array([[1, 2, 3],

[4, 5, 6],

[7, 8, 9]])

We now prepared both the matrices to be multiplied. Let’s see them in action.

# perform actual dot product.

M = np.dot(D,I)

M

====================================================================

array([[ 1., 2., 3.],

[ 4., 5., 6.],

[ 7., 8., 9.]])

Great! Now you know how easy and possible it is to multiply matrices! Also, notice that the entire array is now float type.

# add all the elements of matrix.

sum_val = np.sum(M)

sum_val

====================================================================

45.0

np.sum() adds all the elements of the matrix.

However are 2 variants.

# sum along the rows

np.sum(M,axis=1)

====================================================================

array([ 6., 15., 24.])

6 is the sum of 1st row (1, 2, 3).

15 is the sum of 2nd row (4, 5, 6).

24 is the sum of 3rd row (7, 8, 9).

# sum along the cols

np.sum(M,axis=0)

====================================================================

array([ 12., 15., 18.])

12 is the sum of 1st col (1, 4, 7).

15 is the sum of 2nd col (2, 5, 8).

18 is the sum of 3rd col (3, 6, 9).

Here is the follow up tutorial — part 2 . That’s it at this point.

Here’s a video tutorial explaining everything that I did if you’re interested to consume via video.

If you’re interested to learn pandas, I wrote a tutorial article here. It’s called Intro to Pandas: -1 : An absolute beginners guide to Machine Learning and Data science