The fast and easy guide to the most popular Deep Learning framework in the world. Make sure to check out the other articles . here Overview In this installment we will be going over all the abstracted models that are currently available in TensorFlow and describe use cases for that particular model as well as simple sample code. Full sources of working examples are in the . TensorFlow In a Nutshell repo A recurrent neural network Recurrent Neural Networks Language Modeling, Machine translation, Word embedding, Text processing. Use Cases: Since the advent of Long Short Term Memory and Gated Recurrent Units, Recurrent Neural Networks have made leaps and bounds above other models in natural language processing. They can be fed vectors representing characters and be trained to generate new sentences based on the training set. The merit in this model is that it keeps the context of the sentence and derives meaning that “cat sat on the mat” means the cat is on the mat. Since the creation of TensorFlow writing these networks have become increasingly simpler. There are even hidden features covered by Denny Britz that make writing RNN’s even simpler heres a quick example. here import tensorflow as tfimport numpy as np X = np.random.randn(2, 10, 8) # Create input data X[1,6,:] = 0X_lengths = [10, 6] # The second example is of length 6 cell = tf.nn.rnn_cell.LSTMCell(num_units=64, state_is_tuple= )cell = tf.nn.rnn_cell.DropoutWrapper(cell=cell, output_keep_prob=0.5)cell = tf.nn.rnn_cell.MultiRNNCell(cells=[cell] * 4, state_is_tuple= ) True True outputs, last_states = tf.nn.dynamic_rnn(cell=cell,dtype=tf.float64,sequence_length=X_lengths,inputs=X) result = tf.contrib.learn.run_n({"outputs": outputs, "last_states": last_states},n=1,feed_dict= ) None Convolution Neural Network Convolution Neural Networks Image processing, Facial recognition, Computer Vision Use Cases: Convolution Neural Networks are unique because they’re created in mind that the input will be an image. CNNs perform a sliding window function to a matrix. The window is called a kernel and it slides across the image creating a convolved feature. from http://deeplearning.standford.edu/wiki/index.php/Feature_extraction_using_convolution Creating a convolved feature allows for edge detection which then allows for a to depict objects from pictures. network edge detection from GIMP manual The convolved feature to create this looks like this matrix below: Convolved feature from GIMP manual Here’s a sample of code to identify handwritten digits from the MNIST dataset. ### Convolutional networkdef max_pool_2x2(tensor_in):return tf.nn.max_pool(tensor_in, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') def conv_model(X, y): reshape X to 4d tensor with 2nd and 3rd dimensions being image width and height final dimension being the number of color channels. X = tf.reshape(X, [-1, 28, 28, 1]) first conv layer will compute 32 features for each 5x5 patch with tf.variable_scope('conv_layer1'):h_conv1 = learn.ops.conv2d(X, n_filters=32, filter_shape=[5, 5],bias=True, activation=tf.nn.relu)h_pool1 = max_pool_2x2(h_conv1) second conv layer will compute 64 features for each 5x5 patch. with tf.variable_scope('conv_layer2'):h_conv2 = learn.ops.conv2d(h_pool1, n_filters=64, filter_shape=[5, 5],bias=True, activation=tf.nn.relu)h_pool2 = max_pool_2x2(h_conv2) # reshape tensor into a batch of vectors  
h\_pool2\_flat = tf.reshape(h\_pool2, \[-1, 7 \* 7 \* 64\]) densely connected layer with 1024 neurons. h_fc1 = learn.ops.dnn(h_pool2_flat, [1024], activation=tf.nn.relu, dropout=0.5)return learn.models.logistic_regression(h_fc1, y) Feed Forward Neural Networks Classification and Regression Use Cases: These networks consist of perceptrons in layers that take inputs that pass information on to the next layer. The last layer in the network produces the output. There is no connection between each node in a given layer. The layer that has no original input and no final output is called the hidden layer. The goal of this network is similar to other supervised neural networks using back propagation, to make inputs have the desired trained outputs. These are some of the simplest effective neural networks for classification and regression problems. We will show how easy it is to create a feed forward network to classify handwritten digits: def init_weights(shape):return tf.Variable(tf.random_normal(shape, stddev=0.01)) def model(X, w_h, w_o):h = tf.nn.sigmoid(tf.matmul(X, w_h)) # this is a basic mlp, think 2 stacked logistic regressionsreturn tf.matmul(h, w_o) # note that we dont take the softmax at the end because our cost fn does that for us mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)trX, trY, teX, teY = mnist.train.images, mnist.train.labels, mnist.test.images, mnist.test.labels X = tf.placeholder("float", [None, 784])Y = tf.placeholder("float", [None, 10]) w_h = init_weights([784, 625]) # create symbolic variablesw_o = init_weights([625, 10]) py_x = model(X, w_h, w_o) cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(py_x, Y)) # compute coststrain_op = tf.train.GradientDescentOptimizer(0.05).minimize(cost) # construct an optimizerpredict_op = tf.argmax(py_x, 1) # Launch the graph in a sessionwith tf.Session() as sess:# you need to initialize all variablestf.initialize_all_variables().run() for i in range(100):for start, end in zip(range(0, len(trX), 128), range(128, len(trX)+1, 128)):sess.run(train_op, feed_dict={X: trX[start:end], Y: trY[start:end]})print(i, np.mean(np.argmax(teY, axis=1) ==sess.run(predict_op, feed_dict={X: teX, Y: teY}))) Linear Models Classification and Regression Use Cases: Linear models take X values and produce a line of best fit used for classification and regression of Y values. For example if you have a list of house sizes and their price in a neighborhood you can predict the price of house given the size using a linear model. One thing to note is that linear models can be used for multiple X features. For example in the housing example we can create a linear model given house sizes, how many rooms, how many bathrooms and price and predict price given a house with size, # of rooms, # of bathrooms. import numpy as npimport tensorflow as tf def weight_variable(shape):initial = tf.truncated_normal(shape, stddev=1)return tf.Variable(initial) # datasetxx = np.random.randint(0,1000,[1000,3])/1000.yy = xx[:,0] * 2 + xx[:,1] * 1.4 + xx[:,2] * 3 # modelx = tf.placeholder(tf.float32, shape=[None, 3])y_ = tf.placeholder(tf.float32, shape=[None])W1 = weight_variable([3, 1])y = tf.matmul(x, W1) # training and cost functioncost_function = tf.reduce_mean(tf.square(tf.squeeze(y) - y_))train_function = tf.train.AdamOptimizer(1e-2).minimize(cost_function) # create a sessionsess = tf.Session() # trainsess.run(tf.initialize_all_variables())for i in range(10000):sess.run(train_function, feed_dict={x:xx, y_:yy})if i % 1000 == 0:print(sess.run(cost_function, feed_dict={x:xx, y_:yy})) Support Vector Machines Currently only Binary Classification Use Cases: The general idea behind a SVM is that there is an optimal hyperplane for linearly separable patterns. For data that is not linearly separable we can use a kernel function to transform the original data into a new space. SVMs maximize the margin around separating the hyperplane. They work extremely well in high dimensional spaces and and are still effective if the dimensions are greater than the number of samples. def input_fn():return {'example_id': tf.constant(['1', '2', '3']),'price': tf.constant([[0.6], [0.8], [0.3]]),'sq_footage': tf.constant([[900.0], [700.0], [600.0]]),'country': tf.SparseTensor(values=['IT', 'US', 'GB'],indices=[[0, 0], [1, 3], [2, 1]],shape=[3, 5]),'weights': tf.constant([[3.0], [1.0], [1.0]])}, tf.constant([[1], [0], [1]]) price = tf.contrib.layers.real_valued_column('price')sq_footage_bucket = tf.contrib.layers.bucketized_column(tf.contrib.layers.real_valued_column('sq_footage'),boundaries=[650.0, 800.0])country = tf.contrib.layers.sparse_column_with_hash_bucket('country', hash_bucket_size=5)sq_footage_country = tf.contrib.layers.crossed_column([sq_footage_bucket, country], hash_bucket_size=10)svm_classifier = tf.contrib.learn.SVM(feature_columns=[price, sq_footage_bucket, country, sq_footage_country],example_id_column='example_id',weight_column_name='weights',l1_regularization=0.1,l2_regularization=1.0) svm_classifier.fit(input_fn=input_fn, steps=30)accuracy = svm_classifier.evaluate(input_fn=input_fn, steps=1)['accuracy'] Deep and Wide Models Recommendation systems, Classification and Regression Use Cases: Deep and Wide models were covered with greater detail in , so we won’t get too heavy here. A Wide and Deep Network combines a linear model with a feed forward neural net so that our predictions will have memorization and generalization. This type of model can be used for classification and regression problems. This allows for less feature engineering with relatively accurate predictions. Thus, getting the best of both worlds. Here’s a code snippet from . part two part two’s github def input_fn(df, train=False):"""Input builder function.""" Creates a dictionary mapping from each continuous feature column name (k) to the values of that column stored in a constant Tensor. continuous_cols = {k: tf.constant(df[k].values) for k in CONTINUOUS_COLUMNS} Creates a dictionary mapping from each categorical feature column name (k) to the values of that column stored in a tf.SparseTensor. categorical_cols = {k: tf.SparseTensor(indices=[[i, 0] for i in range(df[k].size)],values=df[k].values,shape=[df[k].size, 1])for k in CATEGORICAL_COLUMNS} Merges the two dictionaries into one. feature_cols = dict(continuous_cols)feature_cols.update(categorical_cols) Converts the label column into a constant Tensor. if train:label = tf.constant(df[SURVIVED_COLUMN].values)# Returns the feature columns and the label.return feature_cols, labelelse:return feature_cols m = build_estimator(model_dir)m.fit(input_fn=lambda: input_fn(df_train, True), steps=200)print m.predict(input_fn=lambda: input_fn(df_test))results = m.evaluate(input_fn=lambda: input_fn(df_train, True), steps=1)for key in sorted(results):print("%s: %s" % (key, results[key])) Random Forest Classification and Regression Use Cases: Random Forest model takes many different classification trees and each tree votes for that class. The forest chooses the classification having the most votes. Random Forests do not overfit, you can run as many treees as you want and it is relatively fast. Give it a try on the iris data with this snippet below: hparams = tf.contrib.tensor_forest.python.tensor_forest.ForestHParams(num_trees=3, max_nodes=1000, num_classes=3, num_features=4)classifier = tf.contrib.learn.TensorForestEstimator(hparams) iris = tf.contrib.learn.datasets.load_iris()data = iris.data.astype(np.float32)target = iris.target.astype(np.float32) monitors = [tf.contrib.learn.TensorForestLossMonitor(10, 10)]classifier.fit(x=data, y=target, steps=100, monitors=monitors)classifier.evaluate(x=data, y=target, steps=10) Bayesian Reinforcement Learning Classification and Regression Use Cases: In the contrib folder of TensorFlow there is a library called BayesFlow. BayesFlow has no documentation except for an example of the REINFORCE algorithm. This algorithm is proposed in a by Ronald Williams. paper ward ncrement = onnegative actor * ffset einforcement * haracteristic ligibility RE I N F O R C E This network trying to solve an immediate reinforcement task, adjusts the weights after getting the reinforcement value at each trial. At the end of each trial each weight is incremented by a learning rate factor multiplied by the reinforcement value minus the baseline multiplied by characteristic eligibility. Williams paper also discusses the use of back propagation to train the REINFORCE network. learning """Build the Split-Apply-Merge Model.Route each value of input [-1, -1, 1, 1] through one of thefunctions, plus_1, minus_1.  The decision for routing is made by4 Bernoulli R.V.s whose parameters are determined by a neural networkapplied to the input.  REINFORCE is used to update the NN parameters.Returns:The 3-tuple (route_selection, routing_loss, final_loss), where:- route_selection is an int 4-vector- routing_loss is a float 4-vector- final_loss is a float scalar."""inputs = tf.constant([[-1.0], [-1.0], [1.0], [1.0]])targets = tf.constant([[0.0], [0.0], [0.0], [0.0]])paths = [plus_1, minus_1]weights = tf.get_variable("w", [1, 2])bias = tf.get_variable("b", [1, 1])logits = tf.matmul(inputs, weights) + bias # REINFORCE forward steproute_selection = st.StochasticTensor(distributions.Categorical, logits=logits) Linear Chain Conditional Random Fields Sequential Data Use Cases: CRFs are conditional probability distributions that factoirze according to an undirected model. They predict a label for a single sample keeping context from the neighboring samples. CRFs are similar to Hidden Markov Models. CRFs are often used for image segmentation and object recognition, as well as shallow parsing, named entity recognition and gene finding. # Train for a fixed number of iterations.session.run(tf.initialize_all_variables())for i in range(1000):tf_unary_scores, tf_transition_params, _ = session.run([unary_scores, transition_params, train_op])if i % 100 == 0:correct_labels = 0total_labels = 0for tf_unary_scores_, y_, sequence_length_ in zip(tf_unary_scores, y, sequence_lengths):# Remove padding from the scores and tag sequence.tf_unary_scores_ = tf_unary_scores_[:sequence_length_]y_ = y_[:sequence_length_] # Compute the highest scoring sequence.  
    viterbi\_sequence, \_ = tf.contrib.crf.viterbi\_decode(  
        tf\_unary\_scores\_, tf\_transition\_params)  

    # Evaluate word-level accuracy.  
    correct\_labels += np.sum(np.equal(viterbi\_sequence, y\_))  
    total\_labels += sequence\_length\_  
  accuracy = 100.0 \* correct\_labels / float(total\_labels)  
  print("Accuracy: %.2f%%" % accuracy) Conclusion Ever since TensorFlow has been released the community surrounding the project has been adding more packages, examples and cases for using this amazing library. Even at the time of writing this article there are more models and sample code being written. It is amazing to see how much TensorFlow as grown in these past few months. The ease of use and diversity in the package are increasing overtime and don’t seem to be slowing down anytime soon. As always — Feel free to email me any questions or inquiries at camron@camron.xyz Originally posted at Camron.xyz

The Graph

Chain

TensorFlow in a Nutshell — Part Three: All the Models

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Deep Learning Cheat Sheet

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Deep Learning Cheat Sheet

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps