[Tutorial] Build a Gender Classifier for Live Webcam Stream using Tensorflow and OpenCV

Training a Neural Network from scratch suffers two main problems. First, a very large, classified input dataset is needed so that the Neural Network can learn the different features it needs for the classification. Training a Model from scratch needs thousands of classified, high quality training data. Often, such data has to be classified by hand first, so that the Artificial Intelligence can learn from the handmade classification. Second, designing a Neural Network that fits the needs is hard and complicated since the Networks acts as a blackbox. Highly specialized knowledge is needed in order to correctly design such a Network. Third, training a Neural Network needs a lot of time and resources. Even on a modern GPU and with an efficient Network design, training the Network can easily take days if not weeks. Finally, in order to create a highly efficient and reliable Network, one has to tweak the network parameters again and again, which always leads to a (complete) retraining of the network — implying yet more consumption of time and resources. of mine is a demonstration of how we can train a neural network to live-classify male and female faces using tensorflow and only 50 input images, some unclassified datasets from the internet, 2 days of time and no time-investment other than writing code and waiting for the training to finish — meaning with no hand-classifying of images. This repo We are going to use Haarcascade and OpenCV to detect faces in a live webcam input stream. Then, we will retrain an inception v3 Artificial Neural Network to classify Male and Female faces. As training data, we are going to scrape some images from Bing Images search. Afterwards, we will use this slow inception v3 model to classify a big dataset of about 15'000 face images automatically, which we will then use to train a much faster Neural Network which will enhance the performance of the live classifier significantly. For this project, we will mainly rely on Tensorflow and OpenCV. All relevant libraries and online resources are linked and credited in the appendix. Let’s get started! 1.) Camera input stream First of all, we need to feed the input stream of our webcam to our python script. After installing openCV with pip via pip install opencv-python , connecting to a webcam and displaying the live image feed is done in just a few lines of code. cv2 cap = cv2.VideoCapture( ) cap.set( , / ) cap.set( , / ) : ret, img = cap.read() gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) cv2.imshow( , img) k = cv2.waitKey( ) & k == : import # Initialize the camera (use bigger indices if you use multiple cameras) 0 # Set the video resolution to half of the possible max resolution for better performance 3 1920 2 4 1080 2 while True # Read frame from camera stream and convert it to greyscale # Show image in cv2 window "image" # Break if input key equals "ESC" 30 0xff if 27 break When we run our python script, a little window will pop up containing the live video stream from our webcam. In the “while True” loop, we have access to the current frame stored either colored in the “img” variable or as greyscale in the “gray” variable. We will use the greyscale version later on when we classify faces. cv2 frames are casual multidimensional numpy arrays, so we can directly manipulate them by either applying mathematical functions or using one of the many built-in OpenCV methods. 2.) Cascade Face Detection Now that we have our input stream set up, we want to do some face detection. While we could train an ANN for that purpose as well, we will rely on Haar Cascade instead. This has two main reasons: First, I want to apply Machine Learning only on problems that are very hard to solve using classical algorithms. But face detection using Haar cascade is very efficient and quiet accurate, so I will rely on existing algorithms to solve new problems instead. Second, the goal of this repo is to not manually create and classify datasets, since this is quiet boring and unchallenging work. Instead, we will use algorithms that do the work for us for everything, since I prefer coding an algorithm for 10 hours over manually cutting out faces out of images for 5 hours, just to train a Machine Learning model. Haar cascade algorithms are a form of machine learning as well, but they do not use Neural Networks. The underlying principle of haar cascade was proposed by and in their very interstring paper is easy to understand yet geniously efficient and smart. Paul Viola Michael Jones “Rapid object dete3ction using a boosted cascade of simple features” The haar cascade algorithms tries to identify simple features in grayscaled images. Such a simple feature can be edges, lines or rectangles. An edge is described as a sharp change in contrast from bright to dark in either horizontal or vertical direction. So whenever the haar cascade algorithm sees a square of pixels which follows that property, it marks them them as an edge feature. The whole subset of the image is scanned for edges, lines and rectangles. Figure 1: Haar cascade features. Copyright: OpenCV.org The haar cascade algorithms chooses a first subset of the image with a certain size and identifies all these simple features. For a face detecting algorithm, haar cascade identifies over 200 features. One such feature may be a line between the eyebrow and the eye itself, another one a sharp contrast between around the iris. Finally, he compares the features he identified with a model which contains such feature-descriptions of faces. When the features he identified match to a certain degree with the models description of the optimal features, the haar cascade algorithm marks the subset he is currently scanning as a “face”. Figure 2: Face detection. Copyright: OpenCV.org The subareas are chosen of different sizes and at all possible different positions, to match faces of at any scale. Note that haar cascade is by no means limited to face detection, even though it might be the biggest field it is applied to. There are cascades for many different objects like eyes, number plates, chairs, animals and many more. All of them base on the principle of identifying simple features and combining them to match high-level objects. https://youtu.be/hPCTwxF0qf4 The haar cascade we will use for our purpose is the which you can download directly from the OpenCV . Note that even though the cascades are free to use, they fall under the license of the Intel Corporation. Demonstration of haar cascade face detection. Copyright: Adam Harvey Frontal face default cascade github repository First, we write a small method that lets us download the haar cascade for facial recognition from the github repository manually. The python code for that is quite simple: print( ) url = folder = local_filename = folder + url.split( )[ ] os.path.exists(folder): os.makedirs(folder) r = requests.get(url, stream= ) open(local_filename, ) f: chunk r.iter_content(chunk_size= ): chunk: f.write(chunk) # Downloading haarcascade feature set from github : def __downloadCascade () "Downloading haarcascade for face detection" "https://github.com/opencv/opencv/raw/master/data/haarcascades/haarcascade_frontalface_default.xml" "./cascade/" '/' -1 # Check if already exists on users disk if not # Stream download dataset to lcoal disk True with 'wb' as for in 1024 if We use os to create a folder called “cascade” in our working directory, then we use requests to write the filestream on our local disk. Next, we use this downloaded cascade classifier to detect a face in the the webcam camera frames. face_cascade = cv2.CascadeClassifier( ) cap = cv2.VideoCapture( ) cap.set( , / ) cap.set( , / ) exceptional_frames = : print(exceptional_frames) ret, img = cap.read() gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, , ) (x, y, w, h) faces: color = ( , , ) startpoint = (x, y) endpoint = (x + w, y + h) exceptional_frames = cv2.rectangle(img, startpoint, endpoint, color, ) cv2.imshow( , img) k = cv2.waitKey( ) & k == : exceptional_frames += : def liveFaceDetection (self) # Initialize the cascade classifier for detecting faces "./face_cascade/haarcascade_frontalface_default.xml" # Initialize the camera (use bigger indices if you use multiple cameras) 0 # Set the video resolution to half of the possible max resolution for better performance 3 1920 2 4 1080 2 # Standard text that is displayed above recognized face 100 while True # Read frame from camera stream and convert it to greyscale # Detect faces using cascade face detection 1.3 5 # Loop through detected faces and set new face rectangle positions for in 0 255 0 0 # Draw face rectangle on image frame 2 # Show image in cv2 window "image" # Break if input key equals "ESC" 30 0xff if 27 break 1 Figure 4: Demonstration of Haar cascade Face detection. Copyright: SRF We use OpenCV again to import the classification model. After converting the input image to Grayscale, we detect all the faces at multiple scales and save the coordinates (x and y coordinates, height and width of the detected face) into a list which we later draw on top of the frame. Each time we detect a face, we reset the timer, which keeps track for how many frames we have not identified a face yet. If we did not identify a face for over 15 Frames, which is roughly corresponding to half a second, we reset the rectangle we have drawn around the face. exceptional_frames This is done to prevent flickering: A blink of the eyes is sometimes enough for the cascade classifier to loose the face, so we make use of the fact that most of the times, faces won’t just disappear, so even thought we might have lost a face, we keep the rectangle at its place for half a second. When we have found a face again in this time, we adjust the rectangle position accordingly. If we do not refind the face again, we remove the rectangle by setting its position to be just one pixel at the top left corner of the image. 3.) Searching and Downloading Images Okay, the next step would be to download images that show male and female faces. We will use these images to later train our inception v3 model. Of course we could just go and perform a Google Images search manually and then download the first X images that seem to fullfill our needs — but why would we do anything manually that we could also code in python? Let’s write a short script that performs a Bing Images search and download the first X images for us. (yep, Bing. I could not manage to use Google Images, since both serach engines do actually not want us to scrape their search result, but Google is too smart for me. Could not figure out how to trick them — but only took me 2 minutes for Bing) Since this task is quiet decoupled from our actual ANN training, I decided to write a quick library for the Bing Image Search part and upload it to PyPi. You can find the repository and the link to PyPi . here here Let me just very quickly explain you how BingImages works. When you monitor your network activity while performing a bing images serach, you may notice that they make an asynchronous call to their REST API, which then responds with a HTML site containing a list image-links. With simply faking the REST-Call and matching the response pattern, I was able to extract the links and download them to the local disk. With BingImages, all this is done with one line of code: BingImages BingImages BingImages.download(BingImages( , count= , person= ).get(), ) from import "Male Face" 30 "portrait" "./Males" Figure 5: Demonstrating the background of BingImages. Now that we can easily download images, we write a new method which downloads and renames the images for us. person self.__persons: print( .format(person)) folder = + person BingImages.download(BingImages(person + , count=count, person= ).get(), folder) counter = filename os.listdir(folder): filepath = folder + + filename _, file_extension = os.path.splitext(filepath) file_extension.lower() [ , ] os.path.getsize(filepath) < * : os.remove(filepath) tries = (tries < ): : os.rename(filepath, folder + + + str(counter) + ) FileExistsError: tries += counter += counter += # Downloads resource images with the specified terms : def __resourceDownloader (self, count= ) 300 for in "--Downloading Resources for '{}'" "./tmp/downloaded_images/" # Fetch links from Bing Images and download images " face" "portrait" 0 # Rename the files accordingly for in "/" # Remove files that are of wrong type or too small if not in ".jpg" ".jpeg" or 1024 128 # File not jpeg smaller than 128kb continue 0 # Rename all other files with fitting schema "img_X.jpg" while 1000000 try "/" "image_" ".jpg" break # Catch error that a file may exist already with a certain name except 1 1 pass 1 After we have downloaded the images, we filter out all images with the wrong file extention or too low quality (size too small). Then, we rename the images to follow the naming pattern . img_X.jpg 4.) Prepare Images for Training Now that we have some images for the classification, we need to make sure the Neural Network is trained on exactly the same data that he shall later recognize. That is: The face of human, and only the face. The images we downloaded mostly consist of the head, hair, and most of the times even clothing of humans — but we do not want that. We only want it to train on the face, otherwise we run into the danger of training the Network to recognize long hair or colorful clothes. Since we later cut out the face in the webcam and classify it using our pre-trained model, we want to train our model with only the part of the face that the cascade classifier will see. Therefore, let’s write a method that takes our downloaded dataset and cuts out all the faces, which are then sorted into a new folder that we use to train our network. First, we write a method that takes an input images and returns the images of the faces in that particular image. face_cascade = cv2.CascadeClassifier( ) gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, , ) faceCuts = [] (x, y, w, h) faces: faceCuts.append(img[y:y + h, x:x + w]) faceCuts # Detects face in cv2 image via haarcascade : def __faceDetector (self, img) "./face_cascade/haarcascade_frontalface_default.xml" 1.3 5 for in # Faces are recognized with x-y (top-left point) and width-height # Returny images (numpy array) of detected faces return This is very similar to what we have already done in the Webcam input stream to detect the faces there, with the only difference that we now just add the faces to a face-cut list which we later return. print( ) i = person self.__persons: folder = + person facefolder = + person os.path.exists(facefolder): os.makedirs(facefolder) file os.listdir(folder): image = cv2.imread(folder + + file) faces = self.__faceDetector(image) face faces: cv2.imwrite(facefolder + .format(i), face) i += # Cut faces of : def __cutFaces (self) "--Cutting out faces" 0 # Loop through folders, cut out face and save face into training_images directory for in "./tmp/downloaded_images/" "./training_images/" if not for in "/" # Detecting Faces for in # Saving the image of the face to the disk "/face_{}.jpg" 1 Now, simply loop over the images, detect the faces via our previously written method and save them to the disk into a new folder. The training data is not perfect, but let’s see whether it gets the job done. Figure 6: The face training data for Males & Females. Copyright: Various, but certainly not me. 5.) Retraining Inception v3 Model When we think about training a Neural Network, we have datasets with millions of training data in mind, models that we have to train from scratch to best fit our needs. The downside is that such training needs thousands of input data, days of training and weeks of tweaking before it produces a good quality output. On the other side, there are pretrained networks like “ImageNet” or “Inception V3”, models built, tweaked and trained by research teams from Google Brain with a nearly infinte amount of resources and training data. Such models are able to precisely classify images from tousands of classes and the models were tweaked by the best Artificial Intelligence expertes to produce the highest output possible. Figure 7: Inception V3 Model by Google Brain. Copyright: Medium What if we could take such an advanced model and retrain it to classify our classes for us? Well, this is exactly what is all about. As you know, Neural Networks are structured into layers of Neurons. Transfer learning While the first layers identify abstract information, later layers are capable of recongizing higher level features like If you are not familiar with Neural Networks and layer, I highly recommend you checkout my other Repository about “Introduction to Deep Dreaming” When we retrain an existing model, we remove the last few layers, capable of recognizing the highest-level features, and retrain the layers with our own input data. We can recycle the knowledge of most of the layers that recognize abstract features, which we would need in most of the cases anyway to train our model accurately. This allows us to train with very few training data, since the Neural Network only has to recalibrate the last few neurons, so how to identify a Male/Female face out of the abstract features it already learned. There is an amazing paper written by Maxime Oquab et. al. called . It explains the learning transfer quiet on point, which is why I am going to cite the corresponding paragraph here: “Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks” “The CNN architecture contains more than 60 million parameters. Directly learning so many parameters from only a few thousand training images is problematic. The key idea of this work is that the internal layers of the CNN can act as a generic extractor of mid-level image representation, which can be pre-trained on one dataset (the source task, here InceptionV3) and then re-used on other target tasks (here Face classification), as illustrated in Figure 8. However, this is difficult as the labels and the distribution of images (type of objects, typical viewpoints, imaging conditions, etc.) in the source and target datasets can be very different. To address these challenges we (i) design an architecture that explicitly remaps the class labels between the source and target tasks, and (ii) develop training and test procedures, inspired by sliding window detectors, that explicitly deal with different distributions of object sizes, locations and scene clutter in source and target tasks.” Source: Maxime Oquab et. al.: “Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks” Figure 8: Retraining a Neural Network. Copyright: “Learning and Transferring Mid-Level Image Representations using Convolutional Neural Networks” Let’s write a method that calls the on our training image folder to classify a new tensorflow model. retrain.py os.system( + + + + + + + ) "retrain.py " "--tf/training_data/bottleneck_dir=bottlenecks" "--model_dir=tf/training_data/inception" "--summaries_dir=tf/training_data/summaries/basic " "--output_graph=tf/training_output/retrained_graph.pb " "--output_labels=tf/training_output/retrained_labels.txt " "--image_dir=training_images " "--how_many_training_steps=4000" The script can take thirty minutes or more to complete, depending on the speed of your machine. The first phase analyzes all the images on disk and calculates the bottleneck values for each of them. ‘Bottleneck’ refers to the layer just before the final output layer that actually does the classification. Once the bottleneck calculations are complete, the actual training of the top layer of the network begins. You’ll see a series of step outputs, each one showing training accuracy, validation accuracy, and the cross entropy. The training accuracy shows what percent of the images used in the current training batch were labeled with the correct class. The validation accuracy is the precision on a randomly-selected group of images from a different set. The key difference is that the training accuracy is based on images that the network has been able to learn from so the network can overfit to the noise in the training data. A true measure of the performance of the network is to measure its performance on a data set not contained in the training data — this is measured by the validation accuracy. If the train accuracy is high but the validation accuracy remains low, that means the network is overfitting and memorizing particular features in the training images that aren’t helpful more generally. Cross entropy is a loss function which gives a glimpse into how well the learning process is progressing. The training’s objective is to make the loss as small as possible, so you can tell if the learning is working by keeping an eye on whether the loss keeps trending downwards, ignoring the short-term noise. 6.) Applying Face Classifier on Live Camera Feed After a few minutes, our Neural Network is retrained successfully. Now we need to write a method that makes use of this trained network to actually classify images for us. For this purpose, we rewrite a script offered by Tensorflow themselves to classify an input image. model_file = graph file_name = filename label_file = label input_height = input_width = input_mean = input_std = output_layer = input_layer = graph = load_graph(model_file) t = read_tensor_from_image_file( file_name, input_height=input_height, input_width=input_width, input_mean=input_mean, input_std=input_std) input_name = + input_layer output_name = + output_layer input_operation = graph.get_operation_by_name(input_name) output_operation = graph.get_operation_by_name(output_name) tf.Session(graph=graph) sess: results = sess.run(output_operation.outputs[ ], { input_operation.outputs[ ]: t }) results = np.squeeze(results) top_k = results.argsort()[ :][:: ] labels = load_labels(label_file) result = [] i top_k: result.append((labels[i], results[i])) result # @Param filename: Path to the file that you want to classify # @Param graph: Path to the retrained inception v3 Graph # @Param label: Path to the labels.txt file from the retraining process : def classify (filename, graph, label) 299 299 0 255 # Name of the final output tensor layer "final_result" "Placeholder" # Load graph and tensors "import/" "import/" # Open up a new tensorflow session and run it on the input with as 0 0 # Sort the output predictions by prediction accuracy -5 -1 for in # Return sorted result tuples return Classifying a face is not as simple as saving it to the disk and then calling the “classify” function. prediction = classify(filename, , ) "./tf/training_output/retrained_graph.pb" "./tf/training_output/retrained_labels.txt" Now let’s extend the LiveClassify method to not just detect a face, but also classify it. filename = print( ) prediction = classify(filename, , ) text text = prediction[ ][ ] print( + text) face_cascade = cv2.CascadeClassifier( ) cap = cv2.VideoCapture( ) cap.set( , / ) cap.set( , / ) text = exceptional_frames = startpoint = ( , ) endpoint = ( , ) color = ( , , ) : print(exceptional_frames) ret, img = cap.read() gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, , ) (x, y, w, h) faces: color = ( , , ) text == : color = ( , , ) startpoint = (x, y) endpoint = (x + w, y + h) face = (img[y:y + h, x:x + w]) exceptional_frames > : cv2.imwrite(filename, face) threading._start_new_thread(classifyFace, ()) exceptional_frames = exceptional_frames == : print( ) text = startpoint = ( , ) endpoint = ( , ) cv2.rectangle(img, startpoint, endpoint, color, ) textpos = (startpoint[ ], startpoint[ ] - ) cv2.putText(img, text, textpos, , , color, ) cv2.imshow( , img) k = cv2.waitKey( ) & k == : exceptional_frames += : def liveDetect (self) "./tmp/face.jpg" # Inner function for thread to parallel process image classification according to trained model : def classifyFace () "Classifying Face" "./tf/training_output/retrained_graph.pb" "./tf/training_output/retrained_labels.txt" nonlocal 0 0 "Finished classifying with text: " # Initialize the cascade classifier for detecting faces "./face_cascade/haarcascade_frontalface_default.xml" # Initialize the camera (use bigger indices if you use multiple cameras) 0 # Set the video resolution to half of the possible max resolution for better performance 3 1920 2 4 1080 2 # Standard text that is displayed above recognized face "unknown face" 100 0 0 0 0 0 0 255 # Red while True # Read frame from camera stream and convert it to greyscale # Detect faces using cascade face detection 1.3 5 # Loop through detected faces and set new face rectangle positions for in 0 0 255 if not "unknown face" 0 255 0 # Only reclassify if face was lost for at least half a second (15 Frames at 30 FPS) if 15 # Save detected face and start thread to classify it using tensorflow model 0 # Face lost for too long, reset properties if 15 "Exceeded exceptional frames limit" "unknown face" 0 0 1 1 # Draw face rectangle and text on image frame 2 0 1 7 1 1.5 2 # Show image in cv2 window "image" # Break if input key equals "ESC" 30 0xff if 27 break 1 Most of the code is the same as before, but we added a text over the face-rectangle in which we write the prediction of our neural network classifier. Whenever a face is lost and redetected after half a second, we start a new thread in which we detect the face. Then, we read out the best prediction and set the text over the face-rectangle accordingly. 7.) Fastening our Classification The only problem we have with this solution is that the tensorflow model we use, called , is quiet slow. It takes roughtly 2 seconds to classify the face, which is still quiet performant, but not enought for a live image detection. inception v3 We therefore need to find a way to speed up our network — and the best way to do this is by training a new Neural Network called , also created by google but made to be as fast as possible, with help of our other, slower Network. MobileNet You may ask yourself why we did not retrain the MobileNet Network in the first place. The biggest problem of the MobileNet Network is the amount of input images we need for a good classification, since it has a smaller overhead compared to the inception v3 model. This means we retrain a larger set of layers, making the network smaller and faster, but creating the need for more input images for an accurate classification. With the bing images downloader, we could never reach such a high quantity of training images, since the quality of the search results starts to drop significantly after the first few hundret (or with bing even dozen) search results. What we CAN find online are larg datasets of images that we can download for free — images of persons, faces and many other. The problem with these sets is that they are not classified for our needs: When we download a large set of face-images, images of Males and Females are all mixed up. We can then either dig through them by hand and split males from females — or we do it the smart way and use our slow but reliable inception v3 model for this task! 8.) Fetching Big Datasets from the Internet To train the MobileNet network, we need a pretty large dataset. Preferably, a few thousand images per category. To achieve this, we let our script automatically download a few big datasets from the internet which we will then classify. One of the biggest open source datasets I could find is the dataset. It contains of roughtly 13 000 greyscale images of faces of different celebrities, including Roger Federer, Michelle Obama or even Osama Bin Laden. Labeled Faces in the Wild Another good collection of images is offered by Stirling and can be downloaded . Now let’s write a script again to download, unpack and sort all these images — 15 000+ in total. here print( ) urls = [ , , , , ] folder = url urls: print( .format(url)) local_filename = folder + url.split( )[ ] file_extention = local_filename[local_filename.rfind( )+ :] os.path.exists(local_filename): print( .format(file_extention)) os.path.exists(folder): os.makedirs(folder) r = requests.get(url, stream= ) open(local_filename, ) f: chunk r.iter_content(chunk_size= ): chunk: f.write(chunk) print( ) file_extention == : print( .format(local_filename)) zippedFile = zipfile.ZipFile(local_filename, ) zippedFile.extractall(folder) zippedFile.close() : print( .format(local_filename)) tarredFile = tarfile.open(local_filename, ) tarredFile.extractall(folder) tarredFile.close() : print( ) print( ) i = data_folder = : shutil.rmtree(data_folder) FileNotFoundError: os.makedirs(data_folder) root, dirs, files os.walk( ): file files: filename = os.path.join(root, file) _, file_extension = os.path.splitext(filename) file_extension.lower() == os.path.exists(data_folder + .format(i)): print( .format(filename, i)) shutil.copyfile(filename, data_folder + .format(i)) i += print( ) : def __downloadLFW (self) "Downloading LFW Face Dataset for face detection" # Links to all dataset archives "http://pics.psych.stir.ac.uk/zips/Aberdeen.zip" "http://pics.psych.stir.ac.uk/zips/Iranian.zip" "http://pics.psych.stir.ac.uk/zips/pain.zip" "http://pics.psych.stir.ac.uk/zips/utrecht.zip" "https://www.openu.ac.il/home/hassner/data/lfwa/lfwa.tar.gz" "./lfw_dataset/tmp/" # Download all datasets for in "Start downloading {}" '/' -1 "." 1 # Skip if archive already exists if not "File extention: {}" if not # Open up download strem to file True with 'wb' as for in 1024 if "Download complete. Entzipping now" # Unpack zips if "zip" "Unzipping file {}" 'r' # Unpack tars else "Untarring file {}" "r:gz" else "Dataset already exists. Skipping it." "Finished all downloads. Reordering data" 0 "./lfw_dataset/images/" # Refresh folder if already existent try except pass # Copy all files to new location with correct naming schema for in "./lfw_dataset/tmp/" for in if ".jpg" and not "img_{}.jpg" "Copy file: {} to img_{}.jpg" "img_{}.jpg" 1 "Done setting up the dataset" After we downloaded the archives, we unzip them and copy all files to a new location with a given naming scheme of . img_X.jpg Figure 9: Sample from the LFW Datset. Copyright: University of Massachusetts 9.) Classify Datasets using Inception v3 Model The next step is to cut out the faces of the images, classify them and moving them yet to another location. print( ) lfw_folder = self.__downloadLFW() bigdata_folder = os.path.exists(bigdata_folder): os.makedirs(bigdata_folder) i = models = {} person self.__persons: models[person.lower()] = newDataset: print( ) folder self.__persons: os.path.exists(bigdata_folder + + folder): os.makedirs(bigdata_folder + + folder) file os.listdir(lfw_folder): print( .format(lfw_folder + + file)) image = cv2.imread(lfw_folder + + file) faces = self.__faceDetector(image) face faces: cv2.imwrite(bigdata_folder + , face) predictions = classify(bigdata_folder + , , ) predictions[ ][ ] > : cv2.imwrite(bigdata_folder + + predictions[ ][ ] + .format(i), face) i += models[predictions[ ][ ]] += print( , models) # Train a faster mobile neural network with the intelligence of the inception v3 model : def fastenNetwork (self, newDataset=True) "Fastening Network" "./lfw_dataset/images" # Download datasets "training_images_bigdata" # Creates big data folder for saving faces if not existent if not 0 for in 0 # Train a new dataset if requested, use the existing images otherwise if "Classifying new images from LFW Dataset" for in if not "/" "/" # Loop over all the files for in "Processing {}" "/" "/" # Detect faces for in # Save face temporarily for the classifier "/tmpface.jpg" # Classify face "/tmpface.jpg" "./tf/training_output/retrained_graph.pb" "./tf/training_output/retrained_labels.txt" # Save image to the classified class if certainty is above 60%, skip image otherwise if 0 1 .6 "/" 0 0 "/img_{}.jpg" 1 0 0 1 "Current prediction status: " We loop through all the images, detect their faces using haar cascade, then classify the image. If the image prediction is good enough, save it to the corresponding folder. This process takes roughly 30h to process. Figure 10: Cut out faces from the LFW Datset. Copyright: University of Massachusetts 10.) Retraining Faster MobileNet Model Now we just call the retrain.py script again, but this time with different parameters. We overwrite the old model since our new one is going to be preciser and faster anyways. print( ) os.system( + + + + + + + + + + ) "--Training model" "retrain.py --tf/training_data/bottleneck_dir=bottlenecks " "--model_dir=tf/training_data/inception " "--summaries_dir=tf/training_data/summaries/basic " "--output_graph=tf/training_output/retrained_graph.pb " "--output_labels=tf/training_output/retrained_labels.txt " "--image_dir=training_images_bigdata " "--how_many_training_steps=4000 " "--tfhub_module https://tfhub.dev/google/imagenet/mobilenet_v1_100_224/feature_vector/1 " # Which architecture to use "--validation_batch_size=-1 " "--learning_rate=0.0001 " "--random_brightness=30" After roughtly two days, the model has finished its training and is ready to be classified. 11.) Final Live Tests with Faster MobileNet Model Now let’s test the new MobileNet model with our live classification. In theory, the classification process should now be extremely fast and take under half a second to finish on a CPU, way under 1/30 of a second on a GPU. And indeed, classifying a face only takes a very short amount of time and after a few frames, the face is already classified. There are some difficulties the trained model faced. First, the training data was quiet asymmetric with about 25% Women and 75% men, which makes the model predict man wich much higher accuracy than women. Further, median age of the people shown in the training set was arround 30 to 40 years, which makes the Neural Network look for features like wrinkles or uneven skin for identifying men — which is why often young men are mistakenly identified as women since they have face properties that are closer to women of middle age then to men of middle age. Figure 11: Live classification of faces. Copyright: SRF . Final Thoughts With the help of Tensorflow, we could achieve building a fast and reliable live classifier without having to manually create a dataset of input images at all. Our code could easily be reused and be trained on other facial features as well: wrinkles, age, long/short hair, glasses and many more — as long as the training features are found in one of the datasets we automatically download. The algorithm starts with just two words and manages to do all the work on it’s own: From initially getting a few reliable base-training images, to training the inception-v3 model, to downloading a bigger dataset, preparing it to make it a reliable input dataset for MobileNet network, then finally retraining a faster MobileNet network and fastly classifying from a live input stream. To further improve the algorithm, we could try to fine-tune the training parameters (iterations, training steps) or make the initial set of images more reliable. Without double checking the initial training set, we run into the danger of having images as training images that do not directly describe our features. This is due to the Bing image search engine: When searching for a “Female Face”, we get drawings of female faces very high in the result feed, making us train the network for wrong data.