How to Label Data — Create ML for Object Detection The new Create ML app just announced at WWDC 2019, is an incredibly easy way to train your own personalized machine learning models. All that’s required is dragging a folder containing your training data into the tool and Create ML does the rest of the heavy lifting. So how do we prepare our data? When doing image or sound classification we just need to organize the data into folders, but if we want to do object detection the task becomes a bit more complicated. With object detection, we need to specify some additional information. In addition to our images, we need an annotations.json with the coordinates of where the objects are. The annotations need to match the following format: [ { : , : [ { : , : { : : : : } }, { : , : { : : : : } } ] }, ... ] "image" "image1.jpg" "annotations" "label" "carrots" "coordinates" "x" 120 "y" 164 "width" 230 "height" 119 "label" "orange" "coordinates" "x" 230 "y" 321 "width" 50 "height" 50 The x and y coordinates are the centers of the bounding rectangles and all coordinates are in pixels Note: How do we generate this json? (I definitely don’t want to do it all by hand) Apple tells us “You can download tools from the web to help you build these [annotations]” So what tool should we use? Cloud Annotations! Cloud Annotations is a tool that I built exactly for this purpose. It lets us quickly draw boxes on our images and gives us an annotations.json in the format required by Apple. Creating an object storage instance To use Cloud Annotations we need to create a cloud object storage instance. Creating a Cloud Object Storage instance gives us a reliable place to keep our training data. It also opens up the potential for data collection and collaboration, letting us collect user data and allowing a team of specialists to easily label it. IBM Cloud offers a lite tier of object storage, which includes 25 GB of storage for free. (this is what we will be using throughout the tutorial) To create an instance we first need to or for . log in sign up IBM Cloud Once logged in you should find your IBM Cloud dashboard. This is where we can create and manage IBM Cloud resources. We want to create a new Cloud Object Storage instance so click the button. Create resource Locate and choose the option. Object Storage Choose a pricing plan and click , then on the following popup. Create Confirm Credentials Once we have an object storage instance we need a way to access our data from outside of IBM Cloud. To do be able to do this we need to create a set credentials for our resource. We can do this by navigating to the tab and clicking the button. Service credentials New credential You can leave most of the options blank, but make sure the credential has the role of and add the following inline configuration parameters: Writer { : } "HMAC" true Once added, click the dropdown and take note of your , , and . View credentials ▾ apikey access_key_id secret_access_key resource_instance_id { : , : { : , : }, : , : , : , : , : , : } "apikey" "..." "cos_hmac_keys" "access_key_id" "..." "secret_access_key" "..." "endpoints" "..." "iam_apikey_description" "..." "iam_apikey_name" "..." "iam_role_crn" "..." "iam_serviceid_crn" "..." "resource_instance_id" "..." Cloud Annotations To use the tool just navigate to the and add your object storage credentials. Cloud Annotations Tool We will be storing our files and annotations in something called a , we can create one by clicking . bucket Create bucket After we create and name our bucket, it will prompt us to choose an annotation type. We need to choose . This allows us to draw bounding box rectangles on our images. Localization Training data best practices The model we will be training is optimized for photographs of objects in the real world. They are unlikely to work well for x-rays, hand drawings, scanned documents, receipts, etc. The training data should be as close as possible to the data on which predictions are to be made. For example, if your use case involves blurry and low-resolution images (such as from a security camera), your training data should be composed of blurry, low-resolution images. In general, you should also consider providing multiple angles, resolutions, and backgrounds for your training images. The model we will be training can’t generally predict labels that humans can’t assign. So, if a human can’t be trained to assign labels by looking at the image for 1–2 seconds, the model likely can’t be trained to do it either. We recommend at least 50 training images per label for a usable model, but using 100s or 1000s would provide better results. The model we will be training resizes the image to 300x300 pixels, so keep that is mind when training the model with images where one dimension is much longer than the other. Labeling the data To label images: Upload a video or many images Create the desired labels Start drawing bounding boxesGetting the annotations After we have collected and labeled our first round of images, we are ready to start training our model! Installation To access our annotations we need to install the Cloud Annotation CLI: npm install -g cloud-annotations Note: You’ll need to have Node 10.13.0 or later installed. You can use nvm (macOS/Linux) or nvm-windows to easily switch Node versions between different projects. Download the annotations To download the annotations all we need to do is run the following command cacli --create-ml export Once finished there should be a folder named with your bucket inside. All you need to do is drag this folder into the Create ML app and you’re good to go! exported_buckets Thanks for reading! If you have any questions, feel free to reach out atbourdakos1@gmail.com, connect with me on , or follow me on and . LinkedIn Medium Twitter If you found this article helpful, it would mean a lot if you gave it some applause👏 and shared to help others find it! And feel free to leave a comment below.