NumPy is a Python library that is mainly used to work with arrays. An array is a collection of items that are stored next to each other in memory. For now, just think of them as Python lists.
NumPy is written in Python and C. The calculations in NumPy are done by the parts that are written in C, which makes them extremely fast as compared to normal Python code.
Make sure Python & Pip are installed on your computer. Then open the command prompt or terminal and run the following command:
pip install numpy
You can create a NumPy array by using the numpy
module's array()
function as shown below:
import numpy as np
arr = np.array([3, 5, 7, 9])
print(type(arr))
The output will look like this:
<class 'numpy.ndarray'>
We just created a NumPy array from a python list. The type of our arr
variable is numpy.ndarray
. Here ndarray
stands for N-dimensional array.
In NumPy, dimensions are called axes (plural for axis). I like to think of an axis as a line along which items can be stored.
A simple list or a 1-dimensional array can be visualized as:
We will now look at the following:
A scalar is just a single value.
import numpy as np
s = np.array(21)
print("Number of axes:", s.ndim)
print("Shape:", s.shape)
Number of axes: 0
Shape: ()
Here we have used 2 properties of a NumPy array:
ndim
: It returns the number of dimensions (or axes) in an array. It returns 0 here because a value in itself does not have any dimensions.shape
: It returns a tuple that contains the number of values along each axis of an array. Since a scalar has 0 axes, it returns an empty tuple.
A vector is a collection of values.
import numpy as np
vec = np.array([-1, 2, 7, 9, 2])
print("Number of axes:", vec.ndim)
print("Shape:", vec.shape)
Number of axes: 1
Shape: (5,)
vec.shape[0]
gives us the number of values in our vector, which is 5 here.
A matrix is a collection of vectors.
import numpy as np
mat = np.array([
[1, 2, 3],
[5, 6, 7]
])
print("Number of axes:", mat.ndim)
print("Shape:", mat.shape)
Number of axes: 2
Shape: (2, 3)
Here we created a 2x3 matrix (2D array) using a list of lists. Since a matrix has 2 axes, mat.shape
tuple contains two values: the first value is the number of rows and the second value is the number of columns.
Each item (row) in a 2D array is a vector (1D array).
A 3D array is a collection of matrices.
import numpy as np
t = np.array([
[[1, 3, 9],
[7, -6, 2]],
[[2, 3, 5],
[0, -2, -2]],
[[9, 6, 2],
[-7, -3, -12]],
[[2, 4, 5],
[-1, 9, 8]]
])
print("Number of axes:", t.ndim)
print("Shape:", t.shape)
Number of axes: 3
Shape: (4, 2, 3)
Here we created a 3D array by using a list of 4 lists, which themselves contain 2 lists.
Each item in a 3D array is a matrix (1D array). Note that the last matrix in the array is the front-most in the image.
After looking at the above examples, we see a pattern here. An n-dimensional array is a collection of n-1 dimensional arrays, for n > 0. I hope that now you have a better idea of visualizing multidimensional arrays.
Just like Python lists, the indexes in NumPy arrays start with 0.
import numpy as np
vec = np.array([-3, 4, 6, 9, 8, 3])
print("vec - 4th value:", vec[3])
vec[3] = 19
print("vec - 4th value (changed):", vec[3])
mat = np.array([
[2, 4, 6, 8],
[10, 12, 14, 16]
])
print("mat - 1st row:", mat[0])
print("mat - 2nd row's 1st value:", mat[1, 0])
print("mat - last row's last value:", mat[-1, -1])
vec - 4th value: 9
vec - 4th value (changed): 19
mat - 1st row: [2 4 6 8]
mat - 2nd row's 1st value: 10
mat - last row's last value: 16
NumPy arrays also support slicing:
# continuing the above code
print("vec - 2nd to 4th:", vec[1:4])
print("mat - 1st rows 1st to 3rd values:", mat[0, 0:3])
print("mat - 2nd column:", mat[:, 1])
vec - 2nd to 4th: [4 6 9]
mat - 1st row's 1st to 3rd values: [2 4 6]
mat - 2nd column: [ 4 12]
In the last example, [:, 1]
says "get 2nd value from all rows". Hence, we get the 2nd column of the matrix as the output.
Let's say we want to access the circled values. It is located in the 2nd 3D array's last matrix's 2nd row's 2nd column. It's a lot so take your time.
Here's how to access it:
arr[2, -1, 1, 1]
At the beginning of the post, I said that calculations in NumPy are extremely fast compared to normal Python code. Let's see the difference.
We will create two lists with 10 million numbers from 0 to 9,999,999, add them element-wise and measure the time it takes. We will convert both lists to NumPy arrays and do the same.
import numpy as np
import time
l1 = list(range(10000000))
l2 = list(range(10000000))
sum = []
then = time.time()
for i in range(len(l1)):
sum.append(l1[i] + l2[i])
print(f"With just Python: {time.time() - then: .2f}s")
arr1 = np.array(l1)
arr2 = np.array(l2)
then = time.time()
sum = arr1 + arr2
print(f"With NumPy: {time.time() - then: .2f}s")
With just Python: 2.30s
With NumPy: 0.14s
In this case, NumPy was 16x faster than raw Python.