Make sure to check out the other articles here.
In this installment we will be going over all the abstracted models that are currently available in TensorFlow and describe use cases for that particular model as well as simple sample code. Full sources of working examples are in the TensorFlow In a Nutshell repo.
A recurrent neural network
Use Cases: Language Modeling, Machine translation, Word embedding, Text processing.
Since the advent of Long Short Term Memory and Gated Recurrent Units, Recurrent Neural Networks have made leaps and bounds above other models in natural language processing. They can be fed vectors representing characters and be trained to generate new sentences based on the training set. The merit in this model is that it keeps the context of the sentence and derives meaning that “cat sat on the mat” means the cat is on the mat. Since the creation of TensorFlow writing these networks have become increasingly simpler. There are even hidden features covered by Denny Britz here that make writing RNN’s even simpler heres a quick example.
import tensorflow as tfimport numpy as np
# Create input dataX = np.random.randn(2, 10, 8)
# The second example is of length 6X[1,6,:] = 0X_lengths = [10, 6]
cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple=True)cell = tf.nn.rnn_cell.DropoutWrapper(cell=cell, output_keep_prob=0.5)cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell] * 4, state_is_tuple=True)
outputs, last_states = tf.nn.dynamic_rnn(cell=cell,dtype=tf.float64,sequence_length=X_lengths,inputs=X)
result = tf.contrib.learn.run_n({"outputs": outputs, "last_states": last_states},n=1,feed_dict=None)
Convolution Neural Network
Use Cases: Image processing, Facial recognition, Computer Vision
Convolution Neural Networks are unique because they’re created in mind that the input will be an image. CNNs perform a sliding window function to a matrix. The window is called a kernel and it slides across the image creating a convolved feature.
from http://deeplearning.standford.edu/wiki/index.php/Feature_extraction_using_convolution
Creating a convolved feature allows for edge detection which then allows for a network to depict objects from pictures.
edge detection from GIMP manual
The convolved feature to create this looks like this matrix below:
Convolved feature from GIMP manual
Here’s a sample of code to identify handwritten digits from the MNIST dataset.
### Convolutional networkdef max_pool_2x2(tensor_in):return tf.nn.max_pool(tensor_in, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
def conv_model(X, y):
X = tf.reshape(X, [-1, 28, 28, 1])
with tf.variable_scope('conv_layer1'):h_conv1 = learn.ops.conv2d(X, n_filters=32, filter_shape=[5, 5],bias=True, activation=tf.nn.relu)h_pool1 = max_pool_2x2(h_conv1)
with tf.variable_scope('conv_layer2'):h_conv2 = learn.ops.conv2d(h_pool1, n_filters=64, filter_shape=[5, 5],bias=True, activation=tf.nn.relu)h_pool2 = max_pool_2x2(h_conv2)
# reshape tensor into a batch of vectors
h\_pool2\_flat = tf.reshape(h\_pool2, \[-1, 7 \* 7 \* 64\])
h_fc1 = learn.ops.dnn(h_pool2_flat, [1024], activation=tf.nn.relu, dropout=0.5)return learn.models.logistic_regression(h_fc1, y)
Use Cases: Classification and Regression
These networks consist of perceptrons in layers that take inputs that pass information on to the next layer. The last layer in the network produces the output. There is no connection between each node in a given layer. The layer that has no original input and no final output is called the hidden layer.
The goal of this network is similar to other supervised neural networks using back propagation, to make inputs have the desired trained outputs. These are some of the simplest effective neural networks for classification and regression problems. We will show how easy it is to create a feed forward network to classify handwritten digits:
def init_weights(shape):return tf.Variable(tf.random_normal(shape, stddev=0.01))
def model(X, w_h, w_o):h = tf.nn.sigmoid(tf.matmul(X, w_h)) # this is a basic mlp, think 2 stacked logistic regressionsreturn tf.matmul(h, w_o) # note that we dont take the softmax at the end because our cost fn does that for us
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels
X = tf.placeholder("float", [None, 784])Y = tf.placeholder("float", [None, 10])
w_h = init_weights([784, 625]) # create symbolic variablesw_o = init_weights([625, 10])
py_x = model(X, w_h, w_o)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y)) # compute coststrain_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost) # construct an optimizerpredict_op = tf.argmax(py_x, 1)
# Launch the graph in a sessionwith tf.Session() as sess:# you need to initialize all variablestf.initialize_all_variables().run()
for i in range(100):for start, end in zip(range(0, len(trX), 128), range(128, len(trX)+1, 128)):sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})print(i, np.mean(np.argmax(teY, axis=1) ==sess.run(predict_op, feed_dict={X: teX, Y: teY})))
Use Cases: Classification and Regression
Linear models take X values and produce a line of best fit used for classification and regression of Y values. For example if you have a list of house sizes and their price in a neighborhood you can predict the price of house given the size using a linear model.
One thing to note is that linear models can be used for multiple X features. For example in the housing example we can create a linear model given house sizes, how many rooms, how many bathrooms and price and predict price given a house with size, # of rooms, # of bathrooms.
import numpy as npimport tensorflow as tf
def weight_variable(shape):initial = tf.truncated_normal(shape, stddev=1)return tf.Variable(initial)
# datasetxx = np.random.randint(0,1000,[1000,3])/1000.yy = xx[:,0] * 2 + xx[:,1] * 1.4 + xx[:,2] * 3
# modelx = tf.placeholder(tf.float32, shape=[None, 3])y_ = tf.placeholder(tf.float32, shape=[None])W1 = weight_variable([3, 1])y = tf.matmul(x, W1)
# training and cost functioncost_function = tf.reduce_mean(tf.square(tf.squeeze(y) - y_))train_function = tf.train.AdamOptimizer(1e-2).minimize(cost_function)
# create a sessionsess = tf.Session()
# trainsess.run(tf.initialize_all_variables())for i in range(10000):sess.run(train_function, feed_dict={x:xx, y_:yy})if i % 1000 == 0:print(sess.run(cost_function, feed_dict={x:xx, y_:yy}))
Use Cases: Currently only Binary Classification
The general idea behind a SVM is that there is an optimal hyperplane for linearly separable patterns. For data that is not linearly separable we can use a kernel function to transform the original data into a new space. SVMs maximize the margin around separating the hyperplane. They work extremely well in high dimensional spaces and and are still effective if the dimensions are greater than the number of samples.
def input_fn():return {'example_id': tf.constant(['1', '2', '3']),'price': tf.constant([[0.6], [0.8], [0.3]]),'sq_footage': tf.constant([[900.0], [700.0], [600.0]]),'country': tf.SparseTensor(values=['IT', 'US', 'GB'],indices=[[0, 0], [1, 3], [2, 1]],shape=[3, 5]),'weights': tf.constant([[3.0], [1.0], [1.0]])}, tf.constant([[1], [0], [1]])
price = tf.contrib.layers.real_valued_column('price')sq_footage_bucket = tf.contrib.layers.bucketized_column(tf.contrib.layers.real_valued_column('sq_footage'),boundaries=[650.0, 800.0])country = tf.contrib.layers.sparse_column_with_hash_bucket('country', hash_bucket_size=5)sq_footage_country = tf.contrib.layers.crossed_column([sq_footage_bucket, country], hash_bucket_size=10)svm_classifier = tf.contrib.learn.SVM(feature_columns=[price, sq_footage_bucket, country, sq_footage_country],example_id_column='example_id',weight_column_name='weights',l1_regularization=0.1,l2_regularization=1.0)
svm_classifier.fit(input_fn=input_fn, steps=30)accuracy = svm_classifier.evaluate(input_fn=input_fn, steps=1)['accuracy']
Use Cases: Recommendation systems, Classification and Regression
Deep and Wide models were covered with greater detail in part two, so we won’t get too heavy here. A Wide and Deep Network combines a linear model with a feed forward neural net so that our predictions will have memorization and generalization. This type of model can be used for classification and regression problems. This allows for less feature engineering with relatively accurate predictions. Thus, getting the best of both worlds. Here’s a code snippet from part two’s github.
def input_fn(df, train=False):"""Input builder function."""
continuous_cols = {k: tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS}
categorical_cols = {k: tf.SparseTensor(indices=[[i, 0] for i in range(df[k].size)],values=df[k].values,shape=[df[k].size, 1])for k in CATEGORICAL_COLUMNS}
feature_cols = dict(continuous_cols)feature_cols.update(categorical_cols)
if train:label = tf.constant(df[SURVIVED_COLUMN].values)# Returns the feature columns and the label.return feature_cols, labelelse:return feature_cols
m = build_estimator(model_dir)m.fit(input_fn=lambda: input_fn(df_train, True), steps=200)print m.predict(input_fn=lambda: input_fn(df_test))results = m.evaluate(input_fn=lambda: input_fn(df_train, True), steps=1)for key in sorted(results):print("%s: %s" % (key, results[key]))
Use Cases: Classification and Regression
Random Forest model takes many different classification trees and each tree votes for that class. The forest chooses the classification having the most votes.
Random Forests do not overfit, you can run as many treees as you want and it is relatively fast. Give it a try on the iris data with this snippet below:
hparams = tf.contrib.tensor_forest.python.tensor_forest.ForestHParams(num_trees=3, max_nodes=1000, num_classes=3, num_features=4)classifier = tf.contrib.learn.TensorForestEstimator(hparams)
iris = tf.contrib.learn.datasets.load_iris()data = iris.data.astype(np.float32)target = iris.target.astype(np.float32)
monitors = [tf.contrib.learn.TensorForestLossMonitor(10, 10)]classifier.fit(x=data, y=target, steps=100, monitors=monitors)classifier.evaluate(x=data, y=target, steps=10)
Use Cases: Classification and Regression
In the contrib folder of TensorFlow there is a library called BayesFlow. BayesFlow has no documentation except for an example of the REINFORCE algorithm. This algorithm is proposed in a paper by Ronald Williams.
REward Increment = Nonnegative Factor * Offset Reinforcement * Characteristic Eligibility
This network trying to solve an immediate reinforcement learning task, adjusts the weights after getting the reinforcement value at each trial. At the end of each trial each weight is incremented by a learning rate factor multiplied by the reinforcement value minus the baseline multiplied by characteristic eligibility. Williams paper also discusses the use of back propagation to train the REINFORCE network.
"""Build the Split-Apply-Merge Model.Route each value of input [-1, -1, 1, 1] through one of thefunctions, plus_1, minus_1. The decision for routing is made by4 Bernoulli R.V.s whose parameters are determined by a neural networkapplied to the input. REINFORCE is used to update the NN parameters.Returns:The 3-tuple (route_selection, routing_loss, final_loss), where:- route_selection is an int 4-vector- routing_loss is a float 4-vector- final_loss is a float scalar."""inputs = tf.constant([[-1.0], [-1.0], [1.0], [1.0]])targets = tf.constant([[0.0], [0.0], [0.0], [0.0]])paths = [plus_1, minus_1]weights = tf.get_variable("w", [1, 2])bias = tf.get_variable("b", [1, 1])logits = tf.matmul(inputs, weights) + bias
# REINFORCE forward steproute_selection = st.StochasticTensor(distributions.Categorical, logits=logits)
Use Cases: Sequential Data
CRFs are conditional probability distributions that factoirze according to an undirected model. They predict a label for a single sample keeping context from the neighboring samples. CRFs are similar to Hidden Markov Models. CRFs are often used for image segmentation and object recognition, as well as shallow parsing, named entity recognition and gene finding.
# Train for a fixed number of iterations.session.run(tf.initialize_all_variables())for i in range(1000):tf_unary_scores, tf_transition_params, _ = session.run([unary_scores, transition_params, train_op])if i % 100 == 0:correct_labels = 0total_labels = 0for tf_unary_scores_, y_, sequence_length_ in zip(tf_unary_scores, y, sequence_lengths):# Remove padding from the scores and tag sequence.tf_unary_scores_ = tf_unary_scores_[:sequence_length_]y_ = y_[:sequence_length_]
# Compute the highest scoring sequence.
viterbi\_sequence, \_ = tf.contrib.crf.viterbi\_decode(
tf\_unary\_scores\_, tf\_transition\_params)
# Evaluate word-level accuracy.
correct\_labels += np.sum(np.equal(viterbi\_sequence, y\_))
total\_labels += sequence\_length\_
accuracy = 100.0 \* correct\_labels / float(total\_labels)
print("Accuracy: %.2f%%" % accuracy)
Ever since TensorFlow has been released the community surrounding the project has been adding more packages, examples and cases for using this amazing library. Even at the time of writing this article there are more models and sample code being written. It is amazing to see how much TensorFlow as grown in these past few months. The ease of use and diversity in the package are increasing overtime and don’t seem to be slowing down anytime soon.
As always — Feel free to email me any questions or inquiries at [email protected]
Originally posted at Camron.xyz