This article was excerpted from Machine Learning with TensorFlow.
Before jumping into machine learning algorithms, you should first familiarize yourself with how to use the tools. This article covers some essential advantages of TensorFlow, to convince you it’s the machine learning library of choice.
As a thought experiment, let’s imagine what happens when we write Python code without a handy computing library. It’ll be like using a new smartphone without installing any extra apps. The phone still works, but you’d be more productive if you had the right apps.
Consider the following… You’re a business owner tracking the flow of sales. You want to calculate your revenue from selling your products. Your inventory consists of 100 different products, and you represent each price in a vector called prices_. Another vector of size 100 called_ amounts represents the inventory count of each item. You can write the following chunk of Python code shown in listing 1 to calculate the revenue of selling all products. Keep in mind that this code doesn’t import any libraries.
That’s a lot of code just to calculate the inner-product of two vectors (also known as dot product). Imagine how much code would be required for something more complicated, such as solving linear equations or computing the distance between two vectors.
By installing the TensorFlow library, you also install a well-known and robust Python library called NumPy, which facilitates mathematical manipulation in Python. Using Python without its libraries (e.g. NumPy and TensorFlow) is like using a camera without autofocus: you gain more flexibility, but you can easily make careless mistakes. It’s already pretty easy to make mistakes in machine learning, so let’s keep our camera on auto-focus and use TensorFlow to help automate some tedious software development.
Listing 2 shows how to concisely write the same inner-product using NumPy.
Python is a succinct language. Fortunately for you, that means you won’t see pages and pages of cryptic code. On the other hand, the brevity of the Python language implies that a lot is happening behind each line of code, which you should familiarize yourself with carefully as you work.
By the way… Detailed documentation about various functions for the Python and C++ APIs for TensorFlow are available at https://www.tensorflow.org/api_docs/index.html.
This article is geared toward using TensorFlow for computations, because machine learning relies on mathematical formulations. After going through the examples and code listings, you’ll be able to use TensorFlow for some arbitrary tasks, such as computing statistics on big data. The focus here will entirely be about how to use TensorFlow, as opposed to machine learning in general.
Machine learning algorithms require a large amount of mathematical operations. Often, an algorithm boils down to a composition of simple functions iterated until convergence. Sure, you might use any standard programming language to perform these computations, but the secret to both manageable and performant code is the use of a well-written library.
That sounds like a gentle start, right? Without further ado, let’s write our first TensorFlow code!
First, we need to ensure that everything is working correctly. Check the oil level in your car, repair the blown fuse in your basement, and ensure that your credit balance is zero.
Just kidding, I’m talking about TensorFlow.
Go ahead and create a new file called test.py for our first piece of code. Import TensorFlow by running the following script:
import tensorflow as tf
This single import prepares TensorFlow for your bidding. If the Python interpreter doesn’t complain, then we’re ready to start using TensorFlow!
Having technical difficulty? A common cause of error at this step is if you installed the GPU version and the library fails to search for CUDA drivers. Remember, if you compiled the library with CUDA, you need to update your environment variables with the path to CUDA. Check the CUDA instructions on TensorFlow. (See https://www.tensorflow.org/versions/master/get_started/os_setup.html#optional-linux-enable-gpu-support for further information).
The TensorFlow library is usually imported with the tf qualified name. Generally, qualifying TensorFlow with tf is a good idea to remain consistent with other developers and open-source TensorFlow projects. You may choose not to qualify it or change the qualification name, but then successfully reusing other people’s snippets of TensorFlow code in your own projects will be an involved process.
Now that we know how to import TensorFlow into a Python source file, let’s start using it! A convenient way to describe an object in the real world is by listing out its properties, or features. For example, you can describe a car by its color, model, engine type, and mileage. An ordered list of some features is called a feature vector, and that’s exactly what we’ll represent in TensorFlow code.
Feature vectors are one of the most useful devices in machine learning because of their simplicity (they’re lists of numbers). Each data item typically consists of a feature vector and a good dataset has thousands, if not thousands, of these feature vectors. No doubt, you’ll often be dealing with more than one vector at a time. A matrix concisely represents a list of vectors, where each column of a matrix is a feature vector.
The syntax to represent matrices in TensorFlow is a vector of vectors, each of the same length. Figure 1 is an example of a matrix with two rows and three columns, such as [[1, 2, 3], [4, 5, 6]]. Notice, this is a vector containing two elements, and each element corresponds to a row of the matrix.
Figure 1. The matrix in the lower half of the diagram is a visualization from its compact code notation in the upper half of the diagram. This form of notation is a common paradigm in most scientific computing libraries.
We access an element in a matrix by specifying its row and column indices. For example, the first row and first column indicate the first top-left element. Sometimes it’s convenient to use more than two indices, such as when referencing a pixel in a color image not only by its row and column, but also its red/green/blue channel. A tensor is a generalization of a matrix that specifies an element by an arbitrary number of indices.
Example of a tensor… Suppose an elementary school enforces assigned seating to its students. You’re the principal, and you’re terrible with names. Luckily, each classroom has a grid of seats, where you can easily nickname a student by his or her row and column index.
There are multiple classrooms, so you can’t say “Good morning 4,10! Keep up the good work.” You need to also specify the classroom, “Hi 4,10 from classroom 2.” Unlike a matrix, which needs only two indices to specify an element, the students in this school need three numbers. They’re all a part of a rank three tensor!
The syntax for tensors is even more nested vectors. For example, a 2-by-3-by-2 tensor is [[[1,2], [3,4], [5,6]], [[7,8], [9,10], [11,12]]], which can be thought of as two matrices, each of size 3-by-2. Consequently, we say this tensor has a rank of 3. In general, the rank of a tensor is the number of indices required to specify an element. Machine learning algorithms in TensorFlow act on Tensors, and it’s important to understand how to use them.
Figure 2. This tensor can be thought of as multiple matrices stacked on top of each other. To specify an element, you must indicate the row and column, as well as which matrix is being accessed. Therefore, the rank of this tensor is three.
It’s easy to get lost in the many ways to represent a tensor. Intuitively, each of the following three lines of code in Listing 3 is trying to represent the same 2-by-2 matrix. This matrix represents two features vectors of two dimensions each. It could, for example, represent two people’s ratings of two movies. Each person, indexed by the row of the matrix, assigns a number to describe his or her review of the movie, indexed by the column. Run the code to see how to generate a matrix in TensorFlow.
The first variable (m1) is a list, the second variable (m2) is an ndarray from the NumPy library, and the last variable (m3) is TensorFlow’s Tensor object. All operators in TensorFlow, such as neg, are designed to operate on tensor objects. A convenient function we can sprinkle anywhere to make sure that we’re dealing with tensors, as opposed to the other types, is tf.convert_to_tensor( … ). Most functions in the TensorFlow library already perform this function (redundantly), even if you forget to. Using tf.convert_to_tensor( … ) is optional, but I show it here because it helps demystify the implicit type system being handled across the library. The aforementioned listing 3 produces the following output three times:
<class ‘tensorflow.python.framework.ops.Tensor’>
Let’s take another look at defining tensors in code. After importing the TensorFlow library, we can use the constant operator as follows in Listing 4.
Running listing 4 produces the following output:
Tensor( “Const:0”,shape=TensorShape([Dimension(1),Dimension(2)]),dtype=float32 )
Tensor( “Const_1:0”,shape=TensorShape([Dimension(2),Dimension(1)]),dtype=int32 )
Tensor( “Const_2:0”,shape=TensorShape([Dimension(2),Dimension(3),Dimension(2)]),dtype=int32 )
As you can see from the output, each tensor is represented by the aptly named Tensor object. Each Tensor object has a unique label (name), a dimension (shape) to define its structure, and data type (dtype) to specify the kind of values we’ll manipulate. Because we didn’t explicitly provide a name, the library automatically generated them: “Const:0”, “Const_1:0”, and “Const_2:0”.
Notice that each of the elements of matrix1 end with a decimal point. The decimal point tells Python that the data type of the elements isn’t an integer, but instead a float. We can pass in explicit dtype values. Much like NumPy arrays, tensors take on a data type that specifies the kind of values we’ll manipulate in that tensor.
TensorFlow also comes with a few convenient constructors for some simple tensors. For example, tf.zeros(shape) creates a tensor with all values initialized at zero of a specific shape. Similarly, tf.ones(shape) creates a tensor of a specific shape with all values initialized at one. The shape argument is a one-dimensional (1D) tensor of type int32 (a list of integers) describing the dimensions of the tensor.
Now that we have a few starting tensors ready to use, we can apply more interesting operators, such as addition or multiplication. Consider each row of a matrix representing the transaction of money to (positive value) and from (negative value) another person. Negating the matrix is a way to represent the transaction history of the other person’s flow of money. Let’s start simple and run the negation op (short for operation) on our matrix1 tensor from listing 4. Negating a matrix turns the positive numbers into negative numbers of the same magnitude, and vice versa.
Negation is one of the simplest operations. As shown in listing 5, negation takes only one tensor as input, and produces a tensor with every element negated — now, try running the code yourself. If you master how to define negation, it’ll provide a stepping stone to generalize that skill to all other TensorFlow operations.
Aside… Defining an operation, such as negation, is different from running it.
Listing 5 generates the following output:
Tensor(“Neg:0”, shape=(1, 2), dtype=int32)
The official documentation carefully lays out all available math ops: https://www.tensorflow.org/api_docs/Python/math_ops.html.
Some specific examples of commonly used operators include:
tf.add(x, y)Add two tensors of the same type, x + y
tf.sub(x, y)Subtract tensors of the same type, x — y
tf.mul(x, y)Multiply two tensors element-wise
tf.pow(x, y)Take the element-wise power of x to y
tf.exp(x)Equivalent to pow(e, x), where e is Euler’s number (2.718…)
tf.sqrt(x)Equivalent to pow(x, 0.5)
tf.div(x, y)Take the element-wise division of x and y
tf.truediv(x, y)Same as tf.div, except casts the arguments as a float
tf.floordiv(x, y)Same as truediv, except rounds down the final answer into an integer
tf.mod(x, y)Takes the element-wise remainder from division
Exercise… Use the TensorFlow operators we’ve learned to produce the Gaussian Distribution (also known as Normal Distribution). See Figure 3 for a hint. For reference, you can find the probability density of the normal distribution online: https://en.wikipedia.org/wiki/Normal_distribution.
Most mathematical expressions such as “*”, “-“, “+”, etc. are shortcuts for their TensorFlow equivalent, for the sake of brevity. The Gaussian function includes many operations, and it’s cleaner to use some short-hand notations as follows:
from math import pi
mean = 1.0sigma = 0.0
(tf.exp(tf.neg(tf.pow(x — mean, 2.0) /(2.0 * tf.pow(sigma, 2.0) ))) *(1.0 / (sigma * tf.sqrt(2.0 * pi) )))
Figure 3. The graph represents the operations needed to produce a Gaussian distribution. The links between the nodes represent how data flows from one operation to the next. The operations themselves are simple, but complexity arises in how they intertwine.
As you can see, TensorFlow algorithms are easy to visualize. They can be described by flowcharts. The technical (and more correct) term for the flowchart is a graph. Every arrow in a flowchart is called the edge of the graph. In addition, every state of the flowchart is called a node.
A session is an environment of a software system that describes how the lines of code should run. In TensorFlow, a session sets up how the hardware devices (such as CPU and GPU) talk to each other. That way, you can design your machine learning algorithm without worrying about micro-managing the hardware that it runs on. Of course, you can later configure the session to change its behavior without changing a line of the machine learning code.
To execute an operation and retrieve its calculated value, TensorFlow requires a session. Only a registered session may fill the values of a Tensor object. To do so, you must create a session class using tf.Session() and tell it to run an operator (listing 6). The result will be a value you can later use for further computations.
Congratulations! You have just written your first full TensorFlow code. Although all it does is negate a matrix to produce [[-1, -2]], the core overhead and framework are just the same as everything else in TensorFlow.
You can also pass options to tf.Session. For example, TensorFlow automatically determines the best way to assign a GPU or CPU device to an operation, depending on what is available. We can pass an additional option, log_device_placements=True, when creating a Session, as shown in listing 7.
This outputs info about which CPU/GPU devices are used in the session for each operation. For example, running listing 6 results in traces of output like the following to show which device was used to run the negation op:
Neg: /job:localhost/replica:0/task:0/cpu:0
Sessions are essential in TensorFlow code. You need to call a session to actually “run” the math. Figure 4 maps out how the different components on TensorFlow interact with the machine learning pipeline. A session not only runs a graph operation, but can also take placeholders, variables, and constants as input. We’ve used constants so far, but in later sections we’ll start using variables and placeholders. Here’s a quick overview of these three types of values.
Figure 4. The session dictates how the hardware will be used to most efficiently process the graph. When the session starts, it assigns the CPU and GPU devices to each of the nodes. After processing, the session outputs data in a usable format, such as a NumPy array. A session optionally may be fed placeholders, variables, and constants.
That’s it for now, I hope that you have successfully acquainted yourself with some of the basic workings of TensorFlow. If this article has left you ravenous for more delicious TensorFlow tidbits, please go download the first chapter of Machine Learning with TensorFlow and see this Slideshare presentation for more information (and a discount code).