Skip to main content Skip to footer

Blog

Baseline: BLOOMZ, Fine-Tuning Whisper, Galactica and ChatGPT

5-MINUTE READ

January 3, 2023

Welcome to the January 2023 edition of Baseline, Accenture Federal Services’ machine learning newsletter – we hope you had a relaxing holiday season and an enjoyable start to the new year. In Baseline, we share insights on important advances in machine learning technologies likely to impact our federal clients.

This month we cover the following topics:

  • BLOOMZ, a fine-tuned text generation model with improved zero-shot performance across tasks
  • Fine-tuning Whisper for automated speech recognition
  • Galactica, a scientific reasoning model that was taken down shortly after its release
  • ChatGPT, a groundbreaking conversational language model

Click here to subscribe to email updates: Receive Baseline every month in your inbox and stay up to date on the latest advancements in machine learning.

BLOOMZ: Generalizing text generation tasks through multitask fine-tuning

Large language models such as GPT-3 and BLOOM have pushed what is state-of-the-art for text generation and have been shown to work across tasks and languages. Zero-shot performance for these models is usually worse than fine-tuned versions, and so task-specific or language-specific fine-tuning is performed to get the best results. Researchers at Hugging Face, in collaboration with other organizations, have now released BLOOMZ, a family of fine-tuned BLOOM models which has improved English and non-English zero-shot text generation across tasks.

In order to achieve these results, they curated a new dataset from 46 different languages and a variety of tasks. They found that models trained on this new dataset performed better at zero-shot task generalization. Researchers found that performances on tasks with non-English prompts could be improved by simply machine-translating English prompts used during training to non-English languages. Thus, BLOOMZ is a promising step toward being able to perform well on tasks where fine-tuning is infeasible – whether because training data is in a low-resource language, or because the task itself is difficult to curate training data for.

Fine-tuning Whisper for improved automatic speech recognition

When Whisper was released in September 2022 as an automated speech recognition (ASR) model, it was unique in that it was trained with a large quantity of labeled data, unlike many previous large language models for audio which were trained on unlabeled data. This allowed Whisper to operate as a performant ASR model in a set of languages without the need to fine-tune.

Now, Whisper is releasing a guide for fine-tuning to improve performance on specific datasets or languages. Due to the large amount of labeled data used for Whisper pretraining, it can leverage multilingual knowledge and ease the ability to fine-tune on other low-resource languages. This gives users flexibility to modify and improve Whisper for their own use cases, such as new languages, or for tasks with specialized jargon. Additionally, Hugging Face is sponsoring a fine-tuning event in order to help curate a collection of open source, fine-tuned Whisper models in low-resource languages

Galactica: A failed experiment in scientific reasoning

In November, Meta AI released a demo of their new large language model, Galactica. Galactica was trained on scientific articles, websites, textbooks, lecture notes, and encyclopedias in order to “store, combine, and reason about scientific knowledge”. In practice, users testing out the demo found that it could generate realistic-sounding nonsense with misattributed, and sometimes altogether made-up, references and citations.

Due to concerns about misinformation that could be spurred with the use of this model, Meta AI took down the demo after three days.

The feedback from the ML community about the release of the model proved to be a useful mechanism in this case to screen models for potential biases and avenues for misuse.

This case poses an interesting question – what are the implications of the push for open-sourcing models and the dangers posed if models are determined to be unethical or biased after they’ve already been released to the public?

ChatGPT: A language model for dialogue

A wave of innovation and technologies came out soon after the release of Stable Diffusion. Now just months later, ChatGPT is once again pushing forward the field of machine learning, this time in natural language processing. OpenAI has released a demo of ChatGPT, a language model that is optimized for dialogue.

The model works similar to their previously-released model InstructGPT, and both models utilize reinforcement learning with human feedback to train the model toward responding in ways that humans think are preferable. One of the main differences is that InstructGPT is set up to respond to a prompt with a detailed response, whereas ChatGPT responds in a conversational way, with the potential for ongoing dialogue. OpenAI has released ChatGPT as a free API during the “research preview” phase to garner feedback on strengths and weaknesses of the model following a recent trend of crowdsourcing feedback from the greater ML community.

In just a couple days after its release, impressive use cases have been demonstrated by users. These include detecting code vulnerabilities , simulating Linux enviroments, generating code, writing poetry and generating creative content, and acting as a search engine, to name a few.

ChatGPT is extremely impressive, but – like all generative NLP models – it has been shown to hallucinate incorrect information at times and present it as true, which could lead to problems when this type of technology is implemented for critical use cases. But overall, ChatGPT is a disruptive technology in NLP and we expect many companies and practitioners to rethink how they will be able to adopt this type of technology for their own use cases when it becomes available beyond the demo phase.

Accenture Federal Services is a leader in artificial intelligence for the U.S. federal government. Our Machine Learning Center of Excellence, Applied Intelligence Discovery Lab, and Advanced Research Group continually assess, develop, and adapt the world’s most innovative techniques and emerging technologies for mission-critical applications.

WRITTEN BY

Shauna Revay, Ph.D.

Senior Manager – Accenture Federal Services, Machine Learning