paint-brush
Video Scene Location Recognition Using AI: ANN-Based Scene Classificationby@rendering

Video Scene Location Recognition Using AI: ANN-Based Scene Classification

tldt arrow

Too Long; Didn't Read

The problem of scene classification has been studied for many years. One approach using ANN to solve this task is described in 1, there convolutional networks were used. The authors propose a high-level image representation, called Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors.
featured image - Video Scene Location Recognition Using AI: ANN-Based Scene Classification
Rendering Technology Breakthroughs HackerNoon profile picture

Authors:

(1) Lukáš Korel, Faculty of Information Technology, Czech Technical University, Prague, Czech Republic;

(2) Petr Pulc, Faculty of Information Technology, Czech Technical University, Prague, Czech Republic;

(3) Jirí Tumpach, Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic;

(4) Martin Holena, Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague, Czech Republic.

Abstract and Introduction

ANN-Based Scene Classification

Methodology

Experiments

Conclusion and Future Research, Acknowledgments and References

2 ANN-Based Scene Classification

The problem of scene classification has been studied for many years. There are many approaches based on neural networks, where an ANN using huge amount of images learned to recognize the type of given scene (for example, a kitchen, a bedroom, etc.). For this case several datasets are available. One example is [11], but it does not specify locations, so this and similar datasets are not usable for our task.


However, our classification problem is different. We want to train an ANN able to recognize a particular location (for example “Springfield-EverGreenTerrace-742- floor2-bathroom”), which can me recorded by camera from many angles (typically, some object can be occluded by other objects from some angles).


One approach using ANN to solve this task is described in [1], there convolutional networks were used. The difference to our approach is on the one hand in the extraction and usage of video images, on the other hand in types of ANN layers.


Another approach is described in [4]. The authors propose a high-level image representation, called Object Bank, where an image is represented as a scale-invariant response map of a large number of pre-trained generic object detectors. Leveraging on the Object Bank representation, good performances on high level visual recognition tasks can be achieved with simple off-the-shelf classifiers such as logistic regression and linear SVM.


This paper is available on arxiv under CC0 1.0 DEED license.