This is the first in a series of introductory level posts centered on image recognition (IR). Accenture Technology Labs has recently completed several projects involving image recognition technology and related capabilities in augmented and mixed reality. These posts are intended to cover what we’ve learned about the state-of-the-art in image recognition, as well as today’s limitations and future plans. We’ll start by looking at how image recognition is viewed today.
Image recognition is one of the oldest research areas in computing. Most people think of large-scale projects when they hear the term, such as engines that can identify broad categories of objects. The ability to distinguish between a dog on a lawn and a cat on a couch—and then to correctly label the animal and setting—is the capability of image recognition today. Applying such labels to an image is called “image classification.” Algorithms written by companies like Google and academic universities like Stanford have pushed image recognition to this point.
The mathematic approaches for image recognition that are now standard—for example, convolutional neural nets—are much more advanced than earlier approaches like Fourier transforms. They are also the result of large teams of engineers who spent years working the problem. They’ve been able to achieve high degrees of accuracy with these techniques, approaching a 94 percent accuracy rate. (Accuracy is usually measured against a how humans classified the data set.)
Many use cases need more powerful or more precise feedback from computers than simple classification. Companies use video analytics, for example, to identify movement of human-sized objects, a relatively easy task when you know where the camera is positioned and looking. Facebook has worked with image identification, trying to distinguish different human faces from each other to identify who is in a photo and power their suggested tagging feature.
Tech Labs has primarily looked at image recognition from the perspective of our clients, especially those that the group works with frequently. Retailers, for example, are often most interested in identifying their products. If a clothing retailer could do so, it would enable “snap and shop” applications. A person could take a picture of an outfit someone else is wearing, the retailer’s app would recognize it, and then the person could instantaneously shop for that item or similar items.
Augmented reality apps add an intriguing new dimension to image recognition. As just one example, industrial companies that operate heavy equipment and machines can superimpose assembly or repair instructions over specific parts in an augmented reality presentation. This could make it easier for technicians to fix the complex machinery. Unlike classification or identification tasks, which can be done on large server farms or backend systems, augmented reality often demands real-time feedback. Such a requirement can end up greatly affecting where and how image recognition can be done. The next post will take a closer look at the tradeoffs built into the problem of image recognition, and how an application’s requirements come into play.
Markoff, John. “Researchers Announce advance in Image-Recognition Software”. The New York Times. 17 Nov 2014. http://www.nytimes.com/2014/11/18/science/researchers-announce-breakthrough-in-content-recognition-software.html?partner=rss&emc=rss&_r=1
Vinyals, Oriol; Toshev, Alexander; Bengio, Samy; Erhan, Dumitru. “A picture is worth a thousand (coherent) words: building a natural description of things”. Google Research Blog. 17 Nov 2014. http://googleresearch.blogspot.com/2014/11/a-picture-is-worth-thousand-coherent.html
Mordvintsey, Alexander; Olah, Christopher; Tyka, Mike. “Inceptionism: Going Deeper into Neural Networks”. Google Research Blog. 17 June 2015. http://googleresearch.blogspot.com/2015/06/inceptionism-going-deeper-into-neural.html
Taigman, Yaniv; Yang, Ming; Ranzato, Marc’Aurelio; Wolf, Lior. “DeepFace: Closing the Gap to Human-Level Performance in Face Verification”. Research at Facebook. 24 June 2014. https://research.facebook.com/publications/deepface-closing-the-gap-to-human-level-performance-in-face-verification/
Simonite, Tom. “The Revolutionary Technique That Quietly Changed Machine Vision Forever”. MIT Technology Review. 9 Sept 2014. http://www.technologyreview.com/view/530561/the-revolutionary-technique-that-quietly-changed-machine-vision-forever/