Baseline: OmniXAI, CogView2, Parti, & Globetrotter – August 2022
August 5, 2022
August 5, 2022
Welcome to the August 2022 edition of BASELINE, Accenture Federal Services’ monthly newsletter where we share thoughts on important advances in machine learning technologies likely to impact our customers. This month’s issue was produced by our Summer Analysts (interns) serving as guest editors, as they provide insight on what incoming data scientists find most relevant. They selected:
This edition of BASELINE is just one example of how our Summer Analysts have worked side-by-side with our professionals to explore how the cutting edge of AI, ML, and data management can be applied to the federal mission. And our internship program is just one way in which we work to foster the next generation of AI talent. Through programs and initiatives, such as our partnership with AI4ALL, we work to expand and democratize access to AI education and training to promote compelling career paths.
“Never tell a young person that anything cannot be done.” – G. M. Trevelyan
Click here to subscribe to email updates: Receive Baseline every month in your inbox and stay up-to-date on the latest advancements in machine learning.
Understanding why machine learning models make decisions can be difficult and time consuming, but this understanding is often necessary in order to appropriately interpret results and assess models for quality and bias. To help make interpretability easier, researchers at Salesforce have created OmniXAI, an open-source Python library that offers explanation methods supporting various data types and models created using popular frameworks like Scikit-learn, PyTorch, and TensorFlow. The explainability options integrated in OmniXAI include methods like counterfactual analysis and SHAP for Tabular data and Grad-CAM and LIME for image data. These methods are accessible to users with just a few lines of code and explanations are displayed in a GUI, simplifying the user experience. The ease with which OmniXAI dashboards can be made and its application across many domains of machine learning make it a great tool for visualization and interpretation.
<<< Start >>>
This figure from the Salesforce GitHub repository outlines the capabilities of OmniXAI.
<<< End >>>
Generating realistic synthetic images from text is a computationally intensive process, and the more complex and higher resolution the image is, the slower the generation process. CogView2 is a text-to-image generation model that is more computationally efficient than other state-of-the-art models like DALL-E 2, while maintaining comparable performance. CogView2 is built using a hierarchal design, where the system initially generates a batch of low-resolution images which are then transformed into high-resolution images via a super-resolution module. The hierarchal design enables the model to focus on local coherence at higher resolutions, making use of local rather than global attention and saving significant time during generation. By presenting a faster, more efficient generation method, this work may improve the adaptability and scalability of text-to-image generation models.
<<< Start >>>
<<< End >>>
Recent work on text-to-image generation has yielded impressive results, with research tending to fall into one of two camps. Models such as DALL-E and CogView2 are based on autoregressive transformers which treat images as a sequence of tokens to predict. DALL-E 2, GLIDE, and Imagen instead use diffusion models to generate images from scratch. With the ongoing debate over which model type will yield higher performance, researchers from Google recently released Parti. As an autoregressive model, Parti defines image generation as a sequence-to-sequence problem (like machine translation) and uses the outputs of a large language model to predict image components. This approach is useful because this allows Parti to benefit from advances in language models such as new capabilities in scaling data or increasing model size. The researchers demonstrated that scaling Parti to 20B parameters led to the highest quality images, which were comparable to the state-of-the-art performance achieved by the leading diffusion-based approaches. This work shows that autoregressive models are still competitive in the text-to-image generation space, and keeps the possibilities open for further development.
<<< Start >>>
<<< End >>>
In recent years, deep learning models have achieved strong performance on the task of machine translation – automatically translating words or sentences from one language into another. Current state-of-the-art approaches often use supervised learning, which requires parallel corpora (sets of equivalent texts in each language). This data can be difficult or wholly infeasible to collect when considering low-resource languages. Unsupervised approaches have been able to achieve strong performance but are difficult to scale to cover multiple languages. This is where Globetrotter comes in. Globetrotter uses image data to bridge the gap between languages. The key insight here is that image data associated with that text remains largely the same across different languages.
<<< Start >>>
<<< End >>>
Globetrotter takes image captioning datasets in more than 50 languages for this task. It learns visual similarities among images and if it detects large similarity between images, it will also associate large similarity for the image captions (which are in different languages). However, Globetrotter does not require any image data during inference. While Globetrotter does not perform as well as fully supervised models, it does not require parallel corpora, and it greatly outperforms comparable prior unsupervised methods. Because Globetrotter does not require parallel corpora, it is a promising method for machine translation for low-resource languages.
Accenture Federal Services is a leader in artificial intelligence for the U.S. federal government. Our Machine Learning Center of Excellence, Discovery Lab, and Advanced Research Group continually assess, develop, and adapt the world’s most innovative techniques and emerging technologies for mission-critical applications.