Graphs offer a universal language to represent complex systems. Whether it’s a social network, a bioinformatics dataset, or retail purchase data, modelling knowledge as a graph – a network of interconnected facts – lets organizations capture patterns that would otherwise be overlooked. Think about retail: customer-product networks help companies understand purchasing patterns. Or look at healthcare, where modelling interactions between proteins as a graph supports drug discovery.
With companies gathering ever more data to populate these graphs, the number of organizations that can leverage them for discoveries is constantly growing. Uncovering the connections between this data, though, requires considerable investment in AI research and engineering. We need machine learning models specifically designed for graphs, and unfortunately, the machine learning software ecosystem has not kept up with the latest burgeoning research in this domain. The result? Hidden knowledge is buried in underused graphs stored on company servers, waiting to be unearthed.
At Accenture Labs, we’ve developed a tool that fills this gap in the current machine learning toolbox: AmpliGraph is the first open source library to democratize graph representation learning, enabling brand-new knowledge discovery from existing graphs. Previously limited to research labs, this capability is now accessible as an open source library designed to lower entry barriers and bring machine learning on graphs to the mainstream.
AmpliGraph consists of a suite of recent neural machine learning models known as knowledge graph embeddings. These models encode concepts and relations of a graph into low-dimensional vectors, also known as "embeddings," and can be used to discover hidden knowledge.
AmpliGraph enables many machine learning tasks, beginning with predicting missing relationships between concepts. This type of link prediction can be used to discover drug side-effects from existing biomedical data, for example. Link prediction also greatly improves the curation of graphs that are automatically generated from text, which are notoriously noisy and incomplete. Doing so then increases the quality of downstream applications that rely on those graphs, like question-answering systems.
AmpliGraph’s machine learning models generate knowledge graph embeddings – vector representations of concepts in a metric space – which can then be combined with model-specific functions to predict unseen and novel links.
AmpliGraph can also be used to predict the type of a specific node (“Is this movie good or bad?”), or to decide whether two nodes are duplicates of the same concept (“Do these two nodes represent the same actors?”). Knowledge graph embeddings shine in community detection, too, as in “How do we group similar users of a social network by examining their interactions and interests?” In terms of applications, the possibilities are endless. Here at Accenture Labs we use AmpliGraph to predict mass transportation delays, generate novel tapas recipes (fudge and pumpkin canapés, anyone?), and customize employee upskilling programs.
We designed AmpliGraph and its documentation to make knowledge graph embeddings accessible to a large audience, fostering a community of practitioners that can leverage the benefits of open, community-friendly APIs for machine learning on graphs. AmpliGraph aims at becoming a reference platform, where models proposed by the machine learning research community can be used – and extended – by a wide community of data scientists, engineers, and machine learning practitioners. We’re eager to share our innovations with the world.
Visit the GitHub repository and the documentation page to use AmpliGraph in your ongoing projects or contribute to the codebase!
AmpliGraph was jointly developed at Accenture Labs at the Dock by Luca Costabello, Sumit Pai, Chan Le Van, Rory McGrath, and Nicholas McCarthy. For more information about AmpliGraph, contact Luca Costabello.