Welcome to the June 2022 Baseline, Accenture Federal Services’ machine learning newsletter, where we share thoughts on important advances in machine learning technologies likely to impact our customers. This month we cover the following topics:
- A single visual language model for multiple tasks
- A text generation model that improves performance on multiple languages
- A model to assess the viability of molecular drug designs
- Automated document image analysis
Click here to subscribe to email updates: Receive Baseline every month in your inbox and stay up-to-date on the latest advancements in machine learning.
Flamingo: A single visual language model for multiple tasks
Deep learning models specializing in visual tasks generally require thousands of annotated images to learn a new task, making it a resource-intensive endeavor. These challenges motivated DeepMind to create Flamingo: a general-purpose family of visual language models that sets a new standard in few-shot learning across a wide array of tasks. Using prior knowledge, few-shot learning is a machine learning method for making generalizations from just a few examples. From visual question answering for the blind, to hateful content classification, Flamingo performs better than previous few-shot learning approaches with as few as four examples per task. This lessens the amount of labeled training data needed for fine-tuning and paves the way for more flexible visual models in the future.
Trained on image-caption pairs of a chinchilla and a Shiba, Flamingo delivers an accurate prediction of a new, previously unseen, image.
An improved multilingual GPT
Generative language models such as GPT have proven to be effective for tasks involving text generation. However, due to the data used for training, performance is better in English compared with other languages. AI Forever recently released mGPT, a multilingual language model based on Open-AI's GPT. Using the same data sources from GPT-2 combined with the sparse attention mechanism used in GPT-3, the resulting models generate sequences from input prompts for 60 languages. This model matches the performance of current state-of-the-art generation models, but also covers low-resource Eastern European languages otherwise unsupported by most language models. With mGPT, AI Forever takes a step in expanding NLP capabilities to underrepresented languages, leading to high quality language models that are diverse enough for wider adoption.
TAI Forever’s mGPT models can generate text across a wide array of languages.
Feasible molecule synthesis
AI tools are playing an increasingly larger role in scientific discovery in biology and chemistry. For example, finding new drugs to treat diseases starts with identifying molecules that might bind to a cell receptor to unlock some therapeutic effect. Machine learning accelerates this search, but while models are good at predicting theoretical molecules for targets, those molecules may be impossible to create in the lab due to physical constraints.
SynNet by Gao, et. al. combines the tasks of designing an effective molecule and evaluating if it’s feasible to manufacture. The result is a single reinforcement learning framework of states (molecules) and actions (adding new molecules or running a known chemical reaction). Deep reinforcement learning approaches like AlphaZero’s DQN have been applied to molecular design before (e.g., MolDQN), but SynNet takes a more application-focused approach to the problem. Their approach includes a more relevant set of initial conditions (commercially available reagents), simplified state representations (presence or absence of whole chemical structures instead of individual chemical bonds), actions (known reactions instead of atom/bond addition), and value functions. With this tailored approach, SynNet appears to perform better in finding the most promising synthesizable molecule. By finding simpler-to-synthesize molecules at similar levels of effectiveness as other top models, it is a promising step towards deep learning accelerated drug design.
<<< Start >>>
SynNet approach to molecular design.
<<< End >>>
DiT: The next big step in document AI
Document image analyses are a set of tasks in computer vision in which the objective is to identify and categorize the regions of a scanned document. Microsoft has released Document Image Transformers (DiT), a new model for a variety of document image analysis tasks. DiT leverages text and vision transformer architectures to create a new self-supervised pretrained model which achieves state-of-the-art performance in tasks such as document image classification, document layout analysis, table detection, and text detection for OCR. It also offers the ability to fine tune the model for different use cases and simplifies the integration of these models into pre-existing pipelines. This has the potential to simplify and improve the processing of large image and text document corpora with less manual supervision.
Example output from DiT when applied to a scanned document.
Accenture Federal Services is a leader in artificial intelligence for the U.S. federal government. Our Machine Learning Center of Excellence, Discovery Lab, and Advanced Research Group continually assess, develop, and adapt the world’s most innovative techniques and emerging technologies for mission-critical applications.