Introduction to Numpy -2 : An absolute beginners guide to Machine Learning and Data science.

This is part two of numpy tutorial series. If you’ve not read my previous tutorial on numpy, I’d recommend you to do so here. In this tutorial, I’m going to cover some important things that are required for datascience and machine learning, meaning, I’m not going to cover everything that’s possible with numpy.

Okay so, we’ve seen np.array(), np.arange(), np.eye(), np.dot(), np.shape, np.reshape(), np.sum() in the previous tutorial. Let’s proceed by first pointing out the way we import numpy, which you’ve already seen as follows:

import numpy as np

We’re telling python that “np” is the official reference to numpy from further on.

np.random.rand():

Now, let’s talk about random value generation using numpy. Say supposing you want to get random values in a matrix form. It’s as easy as the following statement:

# generate random values in a 2 x 3 matrix form
np.random.rand(2,3)
====================================================================
array([[ 0.2248368 , 0.49652272, 0.76189091],
[ 0.73520939, 0.48107188, 0.3883801 ]])

Okay, so can this be extended for multiple dims? For sure yes! That’s why numpy was built.

# generate random values in a 12 x 13 matrix form
np.random.rand(12,13)
====================================================================
array([[ 0.43385691, 0.15503296, 0.19860119, 0.65346609, 0.16774261,0.56058978, 0.84974275, 0.05887681, 0.27276929, 0.88750259,0.25141674, 0.05663906, 0.54186252],
[ 0.2635477 , 0.88291404, 0.42043263, 0.83565607, 0.92982761,0.79879409, 0.91323242, 0.37954769, 0.60198588, 0.44773903,0.70699903, 0.3892703 , 0.94314732],
[ 0.12593268, 0.97838364, 0.81297353, 0.3368167 , 0.33501746,0.99619471, 0.22476839, 0.93321408, 0.41301684, 0.01808732,0.61321647, 0.22462791, 0.468457 ],
[ 0.63765001, 0.13884884, 0.67648642, 0.65589694, 0.80931411,0.46202022, 0.40819602, 0.03863341, 0.16494124, 0.69603883,0.96849077, 0.19150476, 0.8968954 ],
[ 0.25646945, 0.21928867, 0.70952192, 0.80569537, 0.84562245,0.54595757, 0.00684613, 0.19142737, 0.94387805, 0.80871064,0.73648968, 0.80105002, 0.16716087],
[ 0.3894393 , 0.61933361, 0.41088568, 0.88781578, 0.40932049,0.90947387, 0.71984125, 0.81259019, 0.69020009, 0.56480145,0.43041522, 0.02650665, 0.7738148 ],
[ 0.21326808, 0.2036178 , 0.30368209, 0.51081501, 0.64345557,0.99061654, 0.96805793, 0.19446453, 0.25974565, 0.74033622,0.37379014, 0.67444828, 0.82899251],
[ 0.47571066, 0.82012796, 0.50881338, 0.3900192 , 0.34356749,0.36440024, 0.58048805, 0.74650051, 0.24974157, 0.70129048,0.99920892, 0.29142188, 0.09263266],
[ 0.4140815 , 0.25578684, 0.5485647 , 0.07581615, 0.28539059,0.93805043, 0.56897052, 0.23606972, 0.78568646, 0.609795,0.70741831, 0.51003452, 0.53791667],
[ 0.53967367, 0.78513565, 0.94739241, 0.03891731, 0.15962705,0.45470422, 0.56172944, 0.49735169, 0.35216862, 0.87391629,0.43953245, 0.18160601, 0.78307107],
[ 0.1725005 , 0.89132449, 0.05287284, 0.2113003 , 0.69802999,0.12609322, 0.83490382, 0.34199806, 0.90740966, 0.33934554,0.02015816, 0.13498658, 0.06695927],
[ 0.14066135, 0.34828447, 0.0780561 , 0.00126867, 0.57958087,0.93641585, 0.70294758, 0.21712057, 0.24902555, 0.53284372,0.19795993, 0.69817631, 0.71156616]])

Here are 12 rows each containing 13 columns. That’s pretty cool, right? np is really a great library. We all know.

What if I want to add elements manually into the array? np.append() is here to answer your prayers!

# generate an array using np.arange()
A = np.arange(5,15,2)
A
====================================================================
array([ 5, 7, 9, 11, 13])

Say supposing I want to add “19” to A. How do I do it?

That’s exactly why we have np.append().

A = np.append(A,19)
A
====================================================================
array([ 5, 7, 9, 11, 13, 19])

np.append() adds elements into the specified array.

Another one (I’m not DJ khaled!):

A = np.append(A,[3,55,34,553])
A
====================================================================
array([ 5, 7, 9, 11, 13, 19, 3, 55, 34, 553])

This time it was a list of elements that needed to be added into A. So we could expect that regardless of one element or a list of elements to be passed in as a parameter to np.append(original array here , element(s) here).

It’s important that you capture the returned values or otherwise those elements wouldn’t be actually updated to the old list. Here’s is what it means:

# A not updated because returned not captured.
np.append(A,[3,55,34,553])
# A updated because returned value captured.
A = np.append(A,[3,55,34,553])

Why am I telling you this? It’s very obvious and simple! Here’s the thing: I’ve had hours and hours trying to figure out what happened and it finally boiled down to this trivial mistake me of not capturing.

Great! How do I find the difference between adjacent elements?

A
====================================================================
array([ 5, 7, 9, 11, 13, 19, 3, 55, 34, 553])

My question is, in the array A, how do I find (7–5), (9–7), (11–9)? Or In other words, how do I find A[n+1]-A[n] continuously?

There’s a built-in method from np, called np.diff() that address this question. It looks something like this:

B = np.diff(A,n=1)
B
====================================================================
array([ 2, 2, 2, 2, 6, -16, 52, -21, 519])

If you do some subtraction, you’ll notice that this array is exactly A[n+1]-A[n]. Here’s the deal though: length of this array is one less than the actual array A’s length.

If I were to continue doing the same np.diff() on B, I’d end up with the following:

B = np.diff(B,n=1)
B
===================================================================
array([ 0, 0, 0, 4, -22, 68, -73, 540])

Instead of doing it twice explicitly, I could tell np.diff() to do it twice for me and still end up with the same results. Here’s what I mean:

# parameter n indicates that this diff() must be run twice. 
np.diff(A,n=2)
===================================================================
array([ 0, 0, 0, 4, -22, 68, -73, 540])

So, simply put that’s np.diff(). n is a parameter that defines the iteration count.

Now what if I want to stack elements and form a Matrix / Vector? np.vstack(), np.column_stack()

# lets define 3 lists.
a = [1,2,3]
b = [4,5,6]
c = [7,8,9]

There are 2 variants to think of, when we think about stacking elements.

How to I generate a matrix directly by stacking 3 lists as they are?

# directly stack with lists passed in the same order.
np.vstack((a,b,c))
===================================================================
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
np.vstack((b,a,c))
===================================================================
array([[4, 5, 6],
[1, 2, 3],
[7, 8, 9]])

How to I generate a matrix by stacking 3 lists with their columns as rows?

# stack with lists passed taking their columns as rows. 
np.vstack((a,b,c))
===================================================================
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])

Let’s look at those lists again.

a = [1,2,3]
b = [4,5,6]
c = [7,8,9]

here, along the columns are the elements [1,4,7],[2,5,8],[3,6,9]. My question is how do I get these elements to be stacked as rows?

We have np.column_stack()

np.column_stack((a,b,c))
===================================================================
array([[1, 4, 7],
[2, 5, 8],
[3, 6, 9]])
np.column_stack((b,a,c))
===================================================================
array([[4, 1, 7],
[5, 2, 8],
[6, 3, 9]])

Perfect! All these are advanced. My basic question is how do I select a few elements from an array? slicing.

Say I’m dealing with array A. I want to select elements 9 ,11 and 13 only. How do I do that?

A
===================================================================
array([ 5, 7, 9, 11, 13, 19, 3, 55, 34, 553])

This goes back to array slicing, if you haven’t seen what it is, here’s how slicing works: You choose the starting index and stop at the ending index. The catch is the ending index would not be included.

Let’s see:

A[2:5]  
===================================================================
array([ 9, 11, 13])

9 is present at index 2, while 13 is present at index 4. When you do array slicing, to get the element at index 4, you must choose the ending index as 5. In other words, starting is as is, ending is always one larger than required.

From now on, I’m going to refer starting index as lower bound and ending index as upper bound. Accordingly, in general, slicing could be formulated as :

A[lowerbound(inclusive): upperbound(exclusive)]

Another example:

A[0:3]
===================================================================
array([5, 7, 9])

Let’s talk about broadcasting.

Broadcasting is one of the best features of numpy. It is the ability of numpy to extend an operation to all the elements.

What’s that supposed to mean?

A
===================================================================
array([ 5, 7, 9, 11, 13, 19, 3, 55, 34, 553])

What if I want to add 1 to all the elements of A? Our natural tendency is to do the following:

# create an empty array to hold the new values
K = np.array([])
# go through all the elements in A and add 1 to each
# and append the new value to the array.
for e in A:
K = np.append(K,e+1)
# print the k array.
K
====================================================================
array([ 6, 8, 10, 12, 14, 20, 4, 56, 35, 554])

It seems very natural and most likely everybody is going agree with this approach. The problem is, all numpy enthusiasts and geeks are probably going to report this article as “time consuming & inefficient” for providing this inefficient solution.

REALLY!? WHY IS THIS INEFFICIENT?

That’s when broadcasting comes to play. I can do the same with just one line and without any for loop as follows:

K = A+1
K
====================================================================
array([ 6, 8, 10, 12, 14, 20, 4, 56, 35, 554])

but how does this work?

numpy internally matches elements based on the need.

# This is how numpy matches internally aka broadcasting
A+[1,1,1,1,1,1,1,1,1,1]

Another example:

A*-1
====================================================================
array([ -5, -7, -9, -11, -13, -19, -3, -55, -34, -553])

Notice the entire array is now with a minus (-) sign. It is broadcasting.

At this point you know, if you want to deal with all the elements in a numpy array, you probably don’t need a for loop and you also know broadcasting is really cool.

Here’s a video tutorial explaining everything that I did if you’re interested to consume via video.

If you liked this article, a clap/recommendation would be really appreciated. It helps me to write more such articles.