Welcome to the May 2022 edition of Baseline, Accenture Federal Services’ machine learning newsletter, where we share thoughts on important advances in machine learning technologies likely to impact our customers. This month we cover the following topics:

  • Creating AI models without code using Hugging Face’s AutoTrain
  • Generating human-interpretable explanations of image classifier decisions
  • Using AI to help increase inclusivity in Wikipedia biographies
  • Reducing an advanced speech task model to be smaller and faster
  • Unifying single and multiple object tracking with a computer vision technique

Click here to subscribe to email updates: Receive Baseline every month in your inbox and stay up-to-date on the latest advancements in machine learning.


AutoTrain: Machine learning models without code

Traditionally, it’s been necessary to have dedicated machine learning (ML) engineers on staff to train and deploy state-of-the-art ML models. Researchers at Hugging Face have released AutoTrain, a service which automatically trains, evaluates, and deploys ML models for various tasks. It is currently available for text classification, text scoring, entity recognition, summarization, question answering, and translation in a variety of languages. Users upload their data in common formats such as CSV or JSON, and then AutoTrain trains various high-performing models to compare performance. From there, it creates dedicated endpoints where users can run inference on their models. This is a great step in lowering the domain knowledge necessary to train and deploy ML models, making ML accessible to more users.

<<< Start >>>

A performance comparison of models automatically trained by AutoTrain for text classification

<<< End >>>


Google AI offers new visual explanations for classifiers

The phenomenon of treating neural network classifiers as a “black box” is problematic when trying to understand the decision making of a model. Not being able to precisely identify which parts of an input led to a classification can contribute to real world consequences such as model bias. Although some progress has been made in this area, current approaches are only able to generally point to a portion of the input that was most significant to the classification. They are not able to explain why it is the most important. Google AI released a new method to explain classifier decision making which disentangles attributes that lead to classification. This allows users to isolate features and explore how manipulating specific attributes separately contributes to the classification. This will allow for more human-interpretable analysis of model decision making and help provide support for downstream decision making. The gif below demonstrates how moving each knob manipulates only the corresponding attribute in the image, keeping other attributes of the subject fixed. Attribute 4, which refers to a mouth being open or closed, was important in identifying the image as a “cat” as dogs are more likely to have their mouths open (source).

<<< Start >>>

Explaining a Cat vs. Dog Classifier: StylEx provides the top-K discovered disentangled attributes which explain the classification.

<<< End >>>


Using AI to create more inclusive Wikipedia biographies

Wikipedia articles can be a great resource for quick, biographical information. However, like most of the internet, the information is crowd-sourced, making it susceptible to bias. For example, according to the Wikimedia Foundation, only about twenty percent of English biographies on Wikipedia are about women. A multi-step model released from Meta AI researches important figures and creates first-draft biographies for them. It uses three modules to retrieve relevant information from the web, generate text, and cite the sources used. While not perfect – the articles produced for women sometimes highlight their personal lives more than their achievements – this method demonstrates how current generative technology can be used to lower the bar for information curation, and in the case of Wikipedia, increase representation.

<<< Start >>>

This diagram illustrates how the model creates a biography (source).

<<< End >>>


TRILLsson: Compact speech models

Speech recognition, the ability of a program to convert human speech into text, is important in use cases such as AI-powered assistants and closed captioning. Google released a 600 million parameter model, CAP12, designed specifically to improve emotion and tone recognition, a challenging task for current speech recognition models. CAP12 achieved state-of-the-art performance in tasks such as speech emotion recognition, speaker identification, and the detection of certain voice-based medical diagnoses. Despite its success, such a large model can be impractical for large-scale deployment or public release. To address this, Google used a technique known as knowledge distillation to produce smaller models which maintain over 90% of CAP12 performance while being as little as 1% of its size. These models, called TRILLsson models, are small enough to run on mobile devices, allowing the capability of CAP12 to be more easily and widely applicable. Given the latest trend of ever-larger models producing state-of-the-art results, techniques such as this one are important in ensuring these models can be used in practice.

<<< Start >>>

Rather than matching CAP12 embeddings to an entire audio clip (left), matching embeddings to smaller audio chunks aids in reducing overall model size.

<<< End >>>


Unified Transformer Tracker for single and multiple object tracking

Visual object tracking is a common task in computer vision and has a wide range of applications, including surveillance and security, traffic monitoring, and behavior analysis. Object tracking is categorized into two main subproblems: single (SOT) and multiple object tracking (MOT) depending on the number of objects being tracked. Generally, different approaches are used for SOT and MOT because of differences in the way training datasets are labeled for each task. Researchers at Meta AI and University of Technology Sydney have introduced a framework for both SOT and MOT tasks called the Unified Transformer Tracker (UTT). The UTT is designed to use the correlation between the visual features of a target and the background of a video to predict the target’s next location. This works in cases where the target is either specified or detected automatically, allowing it to be flexibly applied to both SOT and MOT. Researchers verified that the UTT method achieved comparable performance to state-of-the-art algorithms trained for both SOT and MOT tasks. The introduction of this framework provides greater flexibility by allowing one general model to be used in both types of tracking, simplifying the code required to perform tracking in a variety of scenarios.

<<< Start >>>

The figure included in the research paper gives visualization of the results for the SOT, MOT, and UTT tracking modes.

<<< End >>>


Accenture Federal Services is a leader in artificial intelligence for the U.S. federal government. Our Machine Learning Center of Excellence, Discovery Lab, and Advanced Research Group continually assess, develop, and adapt the world’s most innovative techniques and emerging technologies for mission-critical applications.

Shauna Revay, Ph.D.

Senior Manager – Accenture Federal Services, Machine Learning

Xena Grant

Analyst – Accenture Federal Services, Software Engineering

Shean Scott Jr.

Specialist – Accenture Federal Services, Machine Learning

Subscription Center
Subscribe to Accenture's Federal Viewpoints Blog Subscribe to Accenture's Federal Viewpoints Blog