Can a computer learn to distinguish the different “places” in a home?
In this series of articles, I will be going through successively more complex, machine learning models for image classification, eventually reaching 92% accuracy with a deep convolutional neural network.
Despite a longstanding trend to design more flexibility into our spaces, the final use of those spaces (especially in our homes) creates recognizable artifacts, both in the design and the end-product. No matter how modernist and sparse it may be, when you walk into a kitchen you know it’s a kitchen. Presumably, we can train a machine learning model to distinguish these artifacts as well.
This is interesting from an experimental design perspective, exploring the activations of a network that has learned this task can provide useful, maybe even inspiring, examples of what visually matters in our designs. Furthermore, when fed with thousands of images of a specific type of place, the deeper layers of a network will begin to extract and encode the latent differences between examples, allowing insight into stylistic associations that may not be immediately apparent.
As designers, we make use of a wealth of imagery, creating a natural pool of data to pull from, and also an opportunity to improve our operations:
Opportunity: Precedent Imagery
When fleshing out a design concept, one of the first steps a designer undertakes is precedent research, searching google, pintrest, magazines, etc… for images relevant to the concept that speak to them. We want to fill the walls with these so that we’re immersed in evocative imagery to build off of. We also want to use them to communicate early design concepts to others.
Of course, for every image you find to throw on the wall, there are dozens or even hundreds that weren’t relevant or didn’t capture the style you were interested in. It can be an inefficient process.
To aid this, we maintain a database of our perennial favorites, categorized into descriptive groupings like “bedrooms”, “entry-ways”, and “lobbies”. This can substantially speed up the research process, because we have already narrowed down images that we believe fit in our style, and can quickly grab them from relevant categories.
However, if design is to evolve, this database cannot stay static. New imagery has to be added constantly, and the database is already over tens of thousands of images. This introduces two problems:
- Someone has to manually sort and tag the images, so that appropriate categories are maintained.
- As styles evolve, a designer may be left with a large pool of less relevant images to wade through, scarcely better than using google image search.
Solving the first problem (which I’ll refer to as ‘categorical classification’) is the most straightforward, and the one I’ll tackle in this post and those that follow by training a deep neural network to recognize different categories of scenes and classify them on its own.
The second problem, the ‘stylistic classification’, is a bit squishier and needs an approach that accommodates overlap between categories, but it can be attacked using some of the same techniques. In fact, with some modification, the classifier trained on categories may eventually be useful to output soft “style” predictions that can start to respond to queries like “show me something more like this…”.
The initial dataset consisted of about 2000 images of building interiors; this is definitely on the small end for deep learning methods, and ended up being augmented later, to 5000 unique images plus random variations on them. The images were all standardized to a size of 244px by 244px and then reduced further for initial exploration to 85px by 85px so that models would train on desktop hardware. Later in the process, full size images were used on amazon p2.xlarge instances (Tesla K80s).
Because of the particulars of our dataset, I chose to first explore this as a binary classification problem with the category “kitchens” as the positive example and everything else as negative (we have LOTS of images of kitchens… dont ask why). The building of the dataset underwent several iterations and deserves its own notebook.
I’ll start by naively exploring simple models and gradually move towards more complex convolutional architectures that led to our final model achieving 92% accuracy with transfer learning.
or go to: