March 30, 2016
Tradeoffs with Image Recognition (Part 2)
By: Robert Dooley

In the last post, we discussed what image recognition could do without delving into what it requires. Since a lot of compelling use cases for image recognition require a mobile device (either a smartphone or a head mounted display), it’s worth examining what it takes to get image recognition running in a mobile environment.

General purpose recognition engines have several components that affect their performance. Two of the most important are sets of data the engine is built around: A training set, which teaches the algorithm how to apply tags; and a target set, which encompasses all of the images the algorithm needs to recognize “out in the wild.” Increasing the size of the training set helps increase the accuracy of the engine, but increasing the target set generally decreases the accuracy. This means that the more general purpose you want your image recognition to be, the bigger the training set and processing power required to maintain accuracy.

The state-of-the art, multi-purpose approaches mentioned in the first post—complex neural networks and machine learning—have upwards of 60 million parameters and a training set of 150,000 manually tagged images, which requires vast computing power and memory. This causes problems when demanding real-time performance from engines that can identify lots of different objects from different angles and lighting conditions. Tagging static images can be made viable by calling a Web-based service, but augmented reality or mixed reality applications may need results faster.

The current solution is an evolution of the QR code concept. The technical goal is to reduce the size of the target set of images and reduce the complexity of the images themselves. Instead of trying to recognize many types of objects, the computer only needs to search from a small set of QR codes. As black and white, grid-based images, QR codes and AprilTags accomplish these objectives, making it possible to run image recognition on them in a mobile device. Mobile devices can even lower the resolution of the image and still get viable matches. Because the target object is two-dimensional instead of three, it can only be viewed from certain angles, further simplifying the requirement of a recognition algorithm. Today’s augmented reality platforms use image targets to snap their digital content into place. Platforms like Wikitude and ODG both use image targets as the most reliable way of enabling augmented reality experiences. Some like Blippar offer the ability to upload your own images, like a brand’s logo, and use them as image targets.

But how do we recognize objects that do not have an image target? Technologies like Digimarc follow the concept by covering objects in invisible, ultra-violet (UV) barcodes. Such objects appear normal to the human eye, but make it easier for a barcode scanner to recognize the encoded target. Still, this approach only works in environments where both the camera hardware and target objects are controlled. For many enterprise situations, this is not the case.

For Tech Labs’ augmented reality projects, we used two approaches to deal with this problem. In one case, we brought in extra data on the user’s location and heading. Our target set consisted of objects that had known, fixed locations in the physical world, so information about a user’s position helped narrow down what we could be looking at. In the other, while enabling a mixed reality experience, we looked only for flat surfaces, but used multiple types of cameras to do so. Information from depth sensors can help confirm what a regular RGB camera is seeing, a feature used out-of-the-box with mixed reality headsets. Both approaches hint at what we see the future of image recognition incorporating: data from new sensors that can better inform the algorithm.


Olson, Edwin. “AprilTag: A robust and flexible visual fiducial system”. 2010.
Digimarc: The Barcode of Everything.
Simonite, Tom. “The Revolutionary Technique That Quietly Changed Machine Vision Forever”. MIT Technology Review. 9 Sept 2014.

Popular Tags

    More blogs on this topic