NumPy is the fundamental package for scientific computing with Python. ----------------------------------------------------------------------  In [Part 1](https://hackernoon.com/10-ways-to-make-python-a-dangerous-language-for-data-science-6b88566ac040) of the Data science With Python series, we looked at the basic in-built functions for numerical computing in Python. In this part, we will be taking a look at the Numpy library. NumPy is the fundamental package for scientific computing with Python. It contains among other things: * a powerful N-dimensional array object * sophisticated (broadcasting) functions * tools for integrating C/C++ and Fortran code * useful linear algebra, Fourier transform, and random number capabilities Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. Great, let’s see how to use the Numpy library for basic array manipulation. ### The Numpy library First, we need to import numpy in Python. import numpy as np Let’s create a numpy array. np.array(\[4,5,6\]) **Output : array(\[4,5,6\])** Now, let’s create a multi-dimensional array. mul=np.array(\[\[5,4,6\],\[7,8,9\],\[10,11,12\]\]) mul **Output : array(\[\[4, 5, 6\], \[7, 8, 9\],\[10,11,12\]\])** Check the shape (rows and columns of the array). mul.shape **Output : (3, 3)** Create an evenly spaced array between 1 and 60 with a difference of 2. dif=np.arange(1,60,2) dif **Output : array(\[ 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59\])** Reshape the above array into a desired shape. dif.reshape(10,3) **Output : array(\[\[ 1, 3, 5\], \[ 7, 9, 11\], \[13, 15, 17\], \[19, 21, 23\], \[25, 27, 29\], \[31, 33, 35\], \[37, 39, 41\], \[43, 45, 47\], \[49, 51, 53\], \[55, 57, 59\]\])** Generate an evenly spaced list between the interval 1 and 8. (Take a minute here to understand the difference between ‘linspace’ and ‘arange’) gen = np.linspace(1,8,40) gen **Output: array(\[1. , 1.17948718, 1.35897436, 1.53846154, 1.71794872, 1.8974359 , 2.07692308, 2.25641026, 2.43589744, 2.61538462, 2.79487179, 2.97435897, 3.15384615, 3.33333333, 3.51282051, 3.69230769, 3.87179487, 4.05128205, 4.23076923, 4.41025641, 4.58974359, 4.76923077, 4.94871795, 5.12820513, 5.30769231, 5.48717949, 5.66666667, 5.84615385, 6.02564103, 6.20512821, 6.38461538, 6.56410256, 6.74358974, 6.92307692, 7.1025641 , 7.28205128, 7.46153846, 7.64102564, 7.82051282, 8. \])** Now, change the shape of the array in place (‘resize’ function changes the shape of the array in place, unlike ‘reshape’) gen.resize(10,4) gen **Output: array(\[\[1. , 1.17948718, 1.35897436, 1.53846154\], \[1.71794872, 1.8974359 , 2.07692308, 2.25641026\], \[2.43589744, 2.61538462, 2.79487179, 2.97435897\], \[3.15384615, 3.33333333, 3.51282051, 3.69230769\], \[3.87179487, 4.05128205, 4.23076923, 4.41025641\], \[4.58974359, 4.76923077, 4.94871795, 5.12820513\], \[5.30769231, 5.48717949, 5.66666667, 5.84615385\], \[6.02564103, 6.20512821, 6.38461538, 6.56410256\], \[6.74358974, 6.92307692, 7.1025641 , 7.28205128\], \[7.46153846, 7.64102564, 7.82051282, 8. \]\])** Create an array with all elements as ones. onarr = np.ones((4,4)) onarr **Output: array(\[\[1., 1., 1., 1.\], \[1., 1., 1., 1.\], \[1., 1., 1., 1.\], \[1., 1., 1., 1.\]\])** Create an array filled with zeros. zearr = np.zeros((4,4)) zearr **Output: array(\[\[0., 0., 0., 0.\], \[0., 0., 0., 0.\], \[0., 0., 0., 0.\], \[0., 0., 0., 0.\]\])** Create a diagonal matrix with diagonal values = 1 dm = np.eye(3) dm **Output: array(\[\[1., 0., 0.\], \[0., 1., 0.\], \[0., 0., 1.\]\])** Extract only diagonal values from an array. np.diag(dm) **Output: array(\[1., 1., 1.\])** Create an array consisting of repeating list relist = np.array(\[1,2,3\]\*7) relist **Output: array(\[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3\])** Now, repeat each element of array n number of times using repeat function. np.repeat(\[1,2,3\],3) **Output : array(\[1, 1, 1, 2, 2, 2, 3, 3, 3\])** Generate two arrays of desired shape filled with random values between 0 and 1. relist = np.random.rand(2,3) print(relist) de = np.random.rand(2,3) print(de) **Output :** **\[\[0.55523672 0.46815197 0.67590369\] \[0.5331193 0.62780236 0.45044916\]\]** **\[\[0.26215572 0.07380256 0.06592746\] \[0.89782279 0.95603968 0.82052478\]\]** Stack the above two arrays created vertically st = np.vstack(\[de,relist\]) st **Output :** **array(\[\[0.26215572, 0.07380256, 0.06592746\], \[0.89782279, 0.95603968, 0.82052478\], \[0.55523672, 0.46815197, 0.67590369\], \[0.5331193 , 0.62780236, 0.45044916\]\])** Now, let’s stack them horizontally. sh = np.hstack(\[de,relist\]) sh **Output :** **array(\[\[0.26215572, 0.07380256, 0.06592746, 0.55523672, 0.46815197, 0.67590369\], \[0.89782279, 0.95603968, 0.82052478, 0.5331193 , 0.62780236, 0.45044916\]\])** Great, now let’s perform some array operations. First let’s create two random arrays r1 = np.random.rand(2,2) r2 = np.random.rand(2,2) print(r1) print(r2) **_Output :_** **_\[\[ 0.02430146 0.14448542\] \[ 0.54428337 0.40332494\]\]_** **_\[\[ 0.77574886 0.08747577\] \[ 0.51484157 0.92319888\]\]_** Let’s do element wise addition. r3 = r1+ r2 r3 **Output : array(\[\[-0.75144739, 0.05700965\], \[ 0.02944179, -0.51987394\]\])** Element wise subtraction. r4 = r1 - r2 r4 **Output : array(\[\[-0.75144739, 0.05700965\], \[ 0.02944179, -0.51987394\]\])** Let’s power each element to 3. r5 = r1\*\*3 r5 **Output : array(\[\[0.65228631, 0.24993365\], \[0.97976155, 0.71554632\]\])** Now, instead of element wise operation, let’s perform a dot product of the two arrays r1 and r2. r6 = r1.dot(r2) r6 **Output : array(\[\[ 0.09323893, 0.13551456\], \[ 0.62987564, 0.41996073\]\])** Let’s create a new array and transpose it. sh = np.array(\[\[1,2\],\[3,4\]\]) sh **Output :** **array(\[\[1, 2\], \[3, 4\]\])** sh.T **Output :** **array(\[\[1, 3\], \[2, 4\]\])** Now, check the datatype of elements in the array. sh.dtype **Output : dtype(‘int32’)** Change the datatype of the array. rs = a.astype('f') rs.dtype **Output : dtype(‘float32’)** Now, let’s look at some mathematical functions in an array, starting with sum of an array. c = np.array(\[1,2,3,4,5\]) c.sum() **Output : 15** Maximum of the elements of an array. c.max() **_Output : 5_** Mean of the elements of the array c.mean() **_Output : 3_** Now, let’s retrieve the index of the maximum value of the array. c.argmax() **_Output : 4_** c.argmin() **Output : 0** Create an array consisting of square of first ten whole numbers. dim = np.arange(10)\*\*2 dim **_Output : array(\[ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81\], dtype=int32)_** Access values in the above array using index dim\[2\] **Output : 4** dim\[1:5\] **Output : array(\[ 1, 4, 9, 16\], dtype=int32)** Use negative sign to access variables in reverse. dim\[-1:\] **Output : array(\[81\], dtype=int32)** Now, access certain elements of the array based on a step size. dim\[1:10:2\] _#dim\[start:stop:stepsize\]_ **Output : array(\[ 1, 9, 25, 49, 81\], dtype=int32)** Create a multidimensional array en = np.arange(36) en.resize(6,6) en **Output : array(\[\[ 0, 1, 2, 3, 4, 5\], \[ 6, 7, 8, 9, 10, 11\], \[12, 13, 14, 15, 16, 17\], \[18, 19, 20, 21, 22, 23\], \[24, 25, 26, 27, 28, 29\], \[30, 31, 32, 33, 34, 35\]\])** Access the second row and third column en\[1,2\] **Output : 8** Access 2nd row and columns 3 to 7. Note that the numbering of the rows and columns start with 0. en\[1, 2:6\] **Output : array(\[ 8, 9, 10, 11\])** Select all rows till the 2nd row and all columns except last column en\[:2,:-1\] **_Output : array(\[\[ 0, 1, 2, 3, 4\], \[ 6, 7, 8, 9, 10\]\])_** Select values from array greater than 20. en\[en>20\] **_Output : array(\[21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35\])_** Assign value of the array elements as 20 if the element value is greater than 20. en\[en>20\] = 20 en **Output : array(\[\[ 0, 1, 2, 3, 4, 5\], \[ 6, 7, 8, 9, 10, 11\], \[12, 13, 14, 15, 16, 17\], \[18, 19, 20, 20, 20, 20\], \[20, 20, 20, 20, 20, 20\], \[20, 20, 20, 20, 20, 20\]\])** To copy an array onto another variable, always use the copy function. fun = en.copy() fun **Output : array(\[\[ 0, 1, 2, 3, 4, 5\], \[ 6, 7, 8, 9, 10, 11\], \[12, 13, 14, 15, 16, 17\], \[18, 19, 20, 20, 20, 20\], \[20, 20, 20, 20, 20, 20\], \[20, 20, 20, 20, 20, 20\]\])** Create an array with a set of random integers between 1 and 10. Specify the array to be of shape 4\*4 gom = np.random.randin(1,10,(4,4)) gom **Output : array(\[\[9, 7, 1, 4\], \[1, 4, 3, 6\], \[2, 5, 5, 1\], \[2, 2, 9, 9\]\])** Great, we have looked at creating, accessing and manipulating arrays in Numpy. In the next part of the series, we will be looking at a library which is built on the Numpy library — Pandas. Pandas is a library which makes data manipulation and analysis much easier in Python. It offers data structures and operations for numerical tables and time series. #### Resources : 1. [Numpy documentation](http://numpy.org) 2. [Applied Data Science with Python Specialization.](https://www.coursera.org/specializations/data-science-python) Connect on [LinkedIn](https://www.linkedin.com/in/harun-ur-rashid6647/) and, check out Github (below) for the complete notebook. [**harunshimanto/Python-The-Dangerous-Tool-For-ML-Data-Science** _Python-The-Dangerous-Tool-For-ML-Data-Science - Learn data science and Machine learning with Python._github.com](https://github.com/harunshimanto/Python-The-Dangerous-Tool-For-ML-Data-Science/blob/master/Python%20for%20Data%20Science%20and%20ML-%20Part%202.ipynb "https://github.com/harunshimanto/Python-The-Dangerous-Tool-For-ML-Data-Science/blob/master/Python%20for%20Data%20Science%20and%20ML-%20Part%202.ipynb")[](https://github.com/harunshimanto/Python-The-Dangerous-Tool-For-ML-Data-Science/blob/master/Python%20for%20Data%20Science%20and%20ML-%20Part%202.ipynb) You can [tell me](http://harunspeedy1995@gmail.com) what you think about this, if you enjoy writing, click on the clap 👏 button. Thanks to everyone.