Object Recognition With SPOT from Boston Robotics by@danielcrowe

Object Recognition With SPOT from Boston Robotics

Joris Sijs, Lead Scientist at TNO on the second series of Grakn Orbit, talks about how to combine robotics and artificial intelligence. The project that TNO is working on is called SNOW, focusing on autonomous systems. This means that the robot is given more autonomy, with less human intervention from a user point of view. This then enables the robot to operate in a more complex environment. The robot — SPOT from Boston Dynamics — is then sent into the villa to retrieve or localise the victims.
Daniel Crowe Hacker Noon profile picture

Daniel Crowe

Chicago > NYC > London @ Vaticle

Earlier this month, we were joined by Joris Sijs, Lead Scientist at TNO on the second series of Grakn Orbit.

TNO is the national research association in the Netherlands, where Joris and his team look to combine robotics and artificial intelligence.

They first started working with Grakn about two years ago. Up to that point, there was no suitable database for robotics that could accurately represent the real world. This is essential in autonomous systems that need to perform tasks and make decisions within a real world environment. Too often, robotics projects are run in curated environments and in TNO’s case, they wanted to get as close to real world scenarios as possible.

Search and Rescue — Project SNOW

The project that TNO is working on is called SNOW, focusing on autonomous systems. The team defines autonomous systems as the combination of robotics with AI.

In practice, this means that the robot is given more autonomy, with less human intervention from a user point of view. This then enables the robot to operate in a more complex environment.

Project SNOW objective. Slides used with permission

Traditionally in robotics, you would have remotely controlled systems. However, with SNOW, TNO is trying to reduce human intervention as much as possible, especially when the complexity of the environment increases.

The team working on the SNOW project is comprised of 15 scientists — primarily focused on 3 challenges:

  1. Situational Awareness — does the robot understand where it is operating
  2. Self Awareness — what can the robot do at this moment given circumstances
  3. Situational Planning — based on what the robot knows, how should it complete a task

The use case that TNO chose for the project is meant to be semi-realistic. Normally, their customers would ask for only a narrow Artificial Intelligence task to explore. However, by creating their own case, they’re able to test the full set of technical situations and integrate all technical solutions.

What is the environment and situation SPOT will be operating within?


TNO makes use of an on-premise villa, made up of four rooms, a hall, and a porch. There are four victims and a fireman on site. SPOT is sent in to get accurate information on the whereabouts of the victims, and any additional information on the situation useful for search and rescue.

The robot — SPOT from Boston Dynamics — is then sent into the villa to retrieve or localise the victims.

SPOT is then tasked with a set of objectives:

  • locate family members
  • medically assess them

For example, the robot might report that the daughter is in the hallway going to the kitchen, and that she is responsive. The action, then, would be to continue to search for other victims.

Some time later, the robot reports that it found the father in the living room, next to a chair, and was not responsive. In this instance, the action is to stay put and make noise for the rescue team.

There are some challenges that SPOT faces when operating in a real world environment:

  • SPOT shouldn’t get confused by a toy dog — it should differentiate a real dog from a toy
  • SPOT should assess when the situation is too dangerous — based on conditions in the room, how search and locate capabilities are diminished
  • SPOT should be able to make a trade off of whom to rescue based on victims’ condition or other variables like location, or proximity to danger, etc.

SNOW’s Robotics System

How does SPOT observe and collect data — what’s the hardware used in SNOW?

PTZ camera — pan tilt zoom camera mounted on top of SPOT

Speech array and speaker for speech interaction

These components are mounted on top of a mini PC. This mini PC has a lot of computing power, as most of the computations is done on the device itself to ensure as much autonomy for the robot as possible.

What Does a Typical Robotics System Look Like at TNO?


E.g. TNO robotics system. Slides used with permission

First, an image is captured from the PTZ camera, this is then passed to an image recognition module, which is then used to do some room characterisation so that the robot can localise itself within the room.

The association module takes all the detections from the camera and associates them so that they can create better tracks instead of individual detections.

Grakn is used as the database, to orchestrate all the data and knowledge in their system.

Traditionally in robotics, ROS and Python are used to enable communication between the modules in the system. If you take one of your modules or software components and create it in Python, ROS then adds another layer to the Python code for communication with the rest of the system.

Because of this, they need to create a dedicated ROS client for Grakn as well, along with using the native Python client.

Joris notes that their database, while relatively small, is extremely dynamic in terms of handling administrative burden and adding new knowledge into the database.

Grakn ROS Client

What is ROS? What role does ROS play in the process?


ROS (robot state publisher) is a publish / subscribe mechanism. A setup consists of: a camera, image recognition, an association layer, a planner and a controller. ROS facilitates the merging of that input date into your database. Let’s look at this example where a dog is identified:

  • An image is published by the PTZ camera
  • The Grakn client subscribes to the image in order to feed it back to the database
  • The image is then passed to the recognition module
  • A dog is detected
  • The dog is then published on the ROS bus
  • The Grakn client again subscribes to this output (dog) in order to send and write it back to the database
  • The same happens with the association module where the speed and position of the dog are written into the database
  • Then the planner might request some piece of information
  • The Grakn client will also subscribe to these requests coming in from the planner (object velocity) and vice-versa

How Do You Go About Building the ROS Grakn Client?

The team, setting out to build this ROS client, divided it into two parts: the ROS wrapper and the Grakn client session.


Building a ROS client. Slides used with permission

The ROS wrapper handles the publish / subscribe mechanisms, the Grakn client session is built on top of the Grakn client as an abstract layer.

What are the requirements for each?

  1. The ROS wrapper initialises the client and the topics to publish or subscribe to.
  2. The Grakn Client should automatically start Grakn and delete any existing keyspace - deleting the keyspace is necessary as in robotics we want to start with an empty keyspace.
  3. We want to have a fresh scenario for the dynamically changing environmental situations that the robot finds itself in.
  4. The Grakn client session should also load the schema and instances that are already known, back in.

After a while, when things may have been running for some amount of time, a new ROS messages comes in (e.g. a request for object_velocity).

Here, they use Grakn utilities, which is based on the Grakn Python client.

Joris and his team are currently looking at one addition that may automate the query generation. This is noted as future work.


This slide shows us how a read function looks like under the hood. The function, request_all_humans , fetches us all the humans that are known in the database. They’ve set this up to read the query from the database using Grakn utilities.

The data is then retrieved, and put in a nice format that can be given to the ROS wrapper which then publishes it again as a ROS topic.

This is a fairly simply function to create. However, it gets a bit more complex when we look at adding additional variables for the robot to observe.

Before we get there, let’s look at their schema — Joris walked through a portion of it as a schema diagram in the slide below.


Section of schema diagram. Slides used with permission

Here we see that they have defined a living_being, which is the parent entity for a human or an animal. These two are then the parent entities for: adult, child ; and dog, cat.

SPOT will need to report on the state of the living_beings, both as a physical state and mental state. Specifically, SPOT is tasked with identifying the mental-responsiveness of an identified living_being.

Let’s look at how they modelled relations in this snippet of schema.

## relation 1: mental responsiveness of a discovered (by SPOT) living-being ##
well-being sub relation,
	relates health,
	relates being;
mental-responsiveness sub attribute,
	plays health;
living-being sub entity,
	plays being;
## relation 2: family of living-beings ##
family sub relation,
	relates father,
	relates mother,
	relates son,
	relates daughter,
	relates pet; 
human sub living-being;
adult sub human;
child sub human;
man sub adult,
	plays father;
woman sub adult,
	plays mother;
boy sub child,
	plays son;
girl sub child,
	plays daughter;
animal sub living-being,
	plays pet;

We can also see how an instance of this model looks, referring to the lower section of the slide above.

First, they instantiate the mental concept with either true or false. They have some people and a dog, fluffy. Notice that in this instance there are two adults, man(with name Sam) and man(with name George). Sam is the fireman, and George is the father of the family.

We can see that Sam is mentally responsive and George is not mentally responsive. How do we create the mental responsiveness in code?

Let’s say the mother is mentally responsive:


Slides used with permission

Here’s where you can see that the code grows quickly, and the reason for wanting to explore automated query generation.

Small but Dynamic

Remember when we talked about the database being small but dynamic? Here we get to see what happens as SPOT operates in a real world scenario.

Setting the scene in the real world, we have a family, a fire-brigade, and a house with a hall and a set of rooms.

Here is, more or less, the full schema diagram for the SNOW project.


Full schema diagram. Slides used with permission

You can see all the relations in green and all the attributes in dark blue. All the concepts, or entities, are in light blue and organised into a type hierarchy.

At first glance, it’s not as large or complex as it may appear, but as Joris goes on to describe, once the robot is in a room the complexity grows quickly as the robot needs to orient itself.

They are able to model this by representing polygons in the database.

Looking at the slide below we see that the kitchen, kitchen door, and four walls are mapped as points and edges of a polygon.

A polygon is a mathematical concept that you can use to describe the boarders of a room, like the kitchen. Written as a polygon, you have a set of points and edges or lines. These edges then correspond to either walls or the kitchen door.

A polygon is constructed of lines, and a line is defined by two points. If you were to model this polygon into Grakn, the polygon relates to a set of lines, each related to two points and a structural part.


Modelling a polygon for spatial awareness in Grakn. Slides used with permission.

Modelling this in Grakn we get the entities: mathematical-concept, polygon, lines, points, and two sub-types of the entity point: local-origin and global-origin.

These lines can also take some kind of physical form like a structural part: walls or doors.

Why is this useful information to know?

If SPOT is in the kitchen it should be able to localise itself within that room.

If it is in the kitchen and needs to exit the room via the kitchen door, it must know the position of the kitchen door. Using a lidar system, SPOT is able to measure the distances to the walls and thereby map the array to the polygon. Next, it should locate the door by retrieving the end-points of the door. Finally, to head towards the door, it finds a waypoint to exit.

In this way, using polygons are handy to have if you are working with robots and real environments according to a floor-plan.

Reasoning How to Move From One Room to Another

We saw how Joris modeled a specific room in order for SPOT to move within it; but what about moving throughout the building? How should we model a building such that SPOT can move freely between rooms and halls?

First, Joris needed to model the composition of a building in the schema — we can see how his diagram might look in Graql:

building sub entity,
plays composition; 
office-building sub building;
res-house sub building;	
space sub entity,
	plays partition;	
open-space sub space;
closed-space sub space;
real-estate sub closed-space,
	plays composition;
room sub real-estate;
	composing sub relation,
	relates composition,
	relates partition;
structural-part sub entity,
	plays partition;
window sub structural-part;
wall sub structural-part;
door sub structural-part;

This means that when we have an instance of a res-house like the villa in this case, it is composed of rooms: kitchen hall living. These rooms are composed of structural-parts: kitchen-wall, kitchen-door, hall-wall, living-wall, living-door.


As Joris explains, we can make use of Grakn’s hyper-graph and rule-based reasoning to create room connections (relations) based on commonly associated structural-parts.

If we know that the kitchen-door is a role-player in a relation (in the slide above) with the room kitchen, as well as the room living; then we can infer a relation between the rooms, kitchen, and living via the kitchen-door.

This gives SPOT the knowledge that it can move from the kitchen to the hall via the kitchen door. As you might infer yourself, we can then make a relation between the rooms: kitchen and the living, via the kitchen-door and living-door. This is an example of a transitive relation in Grakn.


Traditionally, a robotics team might use SLAM or other navigation techniques to achieve this mobility through spaces. In Grakn this becomes fairly simple to do.

You can see what this looks like in Grakn Workbase — Grakn’s IDE:


Modeling Negation

How do we address the fact that SPOT doesn’t yet know where the living-beings are located when the robot enters a building.

How do we model this lack of knowledge in our database?


In Grakn, we use a locating relation with two sub relations: possibly-locating, not-possibly-locating, and actually-locating. These allow us to address the negative; once a space is searched, capturing in the database that a living_being is not located in that room. We don't need SPOT re-checking rooms and wasting valuable time. This requires a frequent update to the database during the active search.

What About Adding New Knowledge to the Database?

In real-world scenarios, new facts are presented during an active search and rescue. Imagine a fire commander identifying that one of the bedrooms has a pinball machine, data that may not be currently known in the database.

Rather than adding this new knowledge to the database as an instance, you should update the underlying knowledge. You want to add this to the schema so that the knowledge can be reasoned over and used to help SPOT achieve its objectives.

Joris notes that this is something that would potentially be happening on a regular basis. Doing this in a suitable and automated way is essential. Grakn’s dynamic schema — able to be updated at any time without needing to do any migrations — makes this quite simple.


For me, robotics are extremely cool. Also, knowledge graphs are hot. So combining those, in my sense, gives me the fever. You can see some of my fevers here.

Just a few additional “fevers” that move Joris and his team.

Concluding his talk, Joris talks about his interest in using Machine Learning over a Knowledge Graph to localise yourself through object recognition. For example, if we recognise two objects: an oven and a sink, we should be able to know that we are in the kitchen.

Joris and his team are currently collaborating with James Fletcher, Principal Scientist at Grakn Labs — whose research on knowledge graph convolutional networks (KGCN) is utilised in this project.

Special thank you to Joris and his team at TNO for their enthusiasm, inspiration and thorough explanations.

You can find the full presentation on the Grakn Labs YouTube channel here.

Previously published at https://towardsdatascience.com/object-recognition-and-spacial-awareness-for-a-spot-robotics-system-2ba33152bf65


Join Hacker Noon

Create your free account to unlock your custom reading experience.