Blog

Baseline: 2022’s most compelling trends in machine learning

5-MINUTE READ

December 2, 2022

Welcome to the December 2022 edition of Baseline, Accenture Federal Services’ monthly machine learning newsletter. In Baseline, we share insights on important advances in machine learning technologies likely to impact our federal customers.

This edition is a special year-end roundup - our chance to highlight some of the most interesting and impactful advances that occurred in the machine learning space this year. These include:

Breakthroughs in text-guided generation models
Advances in audio machine learning
The impact of large language models
The rise of reinforcement learning from human feedback

These developments have pushed the boundaries of what is possible with machine learning and will continue to have far-reaching ramifications next year and beyond. We’re excited to see what’s next.

Click here to subscribe to email updates: Receive Baseline every month in your inbox and stay up to date on the latest advancements in machine learning.

1. Text-guided generation models created impressive results

In early 2021 OpenAI announced CLIP and DALL-E which were breakthroughs in the way that free text could be used to relate textual and visual concepts. CLIP demonstrated impressive zero shot image classification and DALL-E generated realistic images from text. OpenAI took a cautious approach, initially releasing only the inference code for CLIP and providing waitlist access to DALL-E via an API. To minimize harmful uses, OpenAI included various controls such as not being able to generate images of public figures or obscenities.

In early 2022, DALL-E 2 was released with even more remarkable results in text-to-image generation. This model initially also had a waitlist for API access, but in September the API became available to all users through paid use. Throughout the year, a stream of models were released demonstrating the progress being made in the image generation space such as Parti and Imagen.

The release of Stable Diffusion marked a paradigm shift as it was fully open sourced.

The flood of innovation that was spurred from the Stable Diffusion release demonstrated the potential for rapid acceleration in technology when code is released publicly.

Some of the many spin offs that came out within days of the Stable Diffusion release were inpainting with Stable Diffusion, being able to generate high quality drawings from rudimentary sketches, and text-guided animations.

The jump from text-to-image models to text-to-video models was another area of growth in generative AI this year with the release of Imagen Video and Meta AI’s Make-A-Video. We expect advances in generative AI to continue, with more realistic outputs increasing the adoption of these models in tasks such as synthetic data generation or image and video editing.

2. Advances in audio enabled more robust understanding

OpenAI’s Whisper is an instance of a model that can be used for automatic speech recognition (ASR) across multiple languages and diverse settings without the need for fine-tuning. As a contrast to other ASR models, it was trained on a very large corpus of weakly supervised data in multiple languages, which allowed it to perform well on a diverse set of tasks: from multi-language transcription to translation to English. Following the recent trend of openness in AI, OpenAI released Whisper as a pretrained model that can be downloaded and incorporated into any application.

With the goal of developing models that democratize digital information across the world, Meta AI has been working on a Universal Speech Translator (UST). Different than most speech translation models, where the framework consists of transcription, translation, and speech synthesis, UST aims to develop direct speech-to-speech translation (S2ST) models. This is important because a subset of the world’s languages do not use a written structure. Therefore text-based translation is not possible.

Another challenge in developing S2ST is the lack of training data available for low resource languages. To address this challenge, Meta AI is building datasets scrapped from the internet using LASER, an open source, automatic dataset creation tool, that searches for parallel texts (or speech) on the internet using similarity of embeddings when mapped into a single multilingual representation. In October of this year, Meta AI announced the creation of the first S2ST system for Hokkien, a Chinese dialect without a standard written structure that is spoken by millions of people across Asia.

Because the goal of UST is to have seamless communication between speakers, research that dives into the imitation of human speech will become more relevant. Generative Spoken Dialogue Language Modeling (GSDLM) leverages representation learning using only speech without labels or text and was trained on conversation audio. Although the dialogue generated can be non-sensical at times, the conversation flow and intonation resemble human speech with turn taking, laughter, and other non-lexical elements of speech.

With the ML community following a trend of openness and collaboration, we expect the audio ML landscape will continue to improve. The recent initiatives focused on low resource languages and automatic speech capabilities will open up the possibility to apply audio machine learning capabilities to all languages, as opposed to only being able to produce high quality results on high resource languages.

3. Large language models followed a trend of being more open

The development of large language models (LLMs) has vastly extended the capabilities to build more intelligent machine learning systems with a stronger understanding of language. In mid-2022, BLOOM, a multilingual LLM with 176-billion parameters was released to the community. It is similar to GPT-3, where several tasks can be performed with zero- and few-shot learning, such as text generation, question answering, summarization, and programming.

One of the main differentiators between GPT-3 and BLOOM is that BLOOM was trained on data from 46 natural languages and 13 programming languages, so it has multilingual capabilities, whereas GPT-3 operates on English only. Additionally, it’s notable that GPT-3 is still only available through a paid API, and the model weights themselves are not accessible. In contrast, BLOOM was built as part of a collaborative effort – more than 1,000 scientists and the Hugging Face team worked together to release the model. Its openness is one of the reasons it is so transformative – developers can utilize this extremely powerful language model for a variety of tasks and fine-tune it for their own use cases using the Hugging Face library.

4. Reinforcement Learning from Human Feedback showed promise

LLMs have demonstrated impressive performance on tasks such as question-answering, summarization, and chat. However, in many cases they often still generate responses that a human would find to be obviously wrong or offensive. This deficiency stems from the methods used to train them, which are usually variations of next-word prediction on large amounts of training data created from sources that incorporate human biases, such as the internet. These are also challenging tasks because there is no one right answer and two people may have different preferences.

In order to generate better responses, as rated by humans, OpenAI and other organizations have started to incorporate human feedback to align responses with human preferences, through a process called Reinforcement Learning from Human Feedback (RLHF). Using RLHF, machine learning researchers start with a base LLM and then use reinforcement learning to tune the responses to align with human-generated examples and feedback. Using this approach, OpenAI trained InstructGPT, which achieves better alignment with human expectations while using a 100x smaller model (compared to GPT-3). In another example, Anthropic used RLHF to align models to be more accurate and less harmful.

RLHF could be used to create models for specific groups within an organization, so that when given the same input they generate different outputs that align better with each group’s expectations. To realize this future, easy-to-use RLHF frameworks will need to be developed in the open source community. As we have seen with the release of Stable Diffusion, enabling the open source community to participate in the experimentation processes significantly accelerates development beyond what a few organizations could do alo

Happy holidays and we look forward to sharing more updates on the ever-changing, always-fascinating world of machine learning in 2023.

Accenture Federal Services is a leader in artificial intelligence for the U.S. federal government. Our Machine Learning Center of Excellence, Discovery Lab, and Advanced Research Group continually assess, develop, and adapt the world’s most innovative techniques and emerging technologies for mission-critical applications.

WRITTEN BY

Shauna Revay, Ph.D.

Senior Manager – Accenture Federal Services, Machine Learning