Blog

Responsible AI starts with data – and CDOs must lead

5-minute read

March 20, 2023

Much of the discussion around responsible AI – or the practice of designing, developing, and deploying AI that is fair, transparent, and trustworthy – in the past few years has primarily focused on the algorithms. How are automated systems creating or perpetuating bias, and how can we prevent that?

However, responsible AI discussions that focus primarily on algorithms are like dropping your keys somewhere on a street at night – but only searching for them under the streetlamps. The light may be better there, but the keys may be outside the well-lit areas. And that’s where data comes in.

Algorithms don’t create bias out of thin air. Bias can enter the process through the data that is used to train the algorithm.

As the federal government continues advancing responsible AI through efforts such as NIST’s Artificial Intelligence Risk Management Framework and the Blueprint for an AI Bill of Rights, CDOs can and should expand their involvement.

Three essentials of responsible AI for CDOs

But how should CDOs be managing data and data processes with responsible AI in mind?

Current federal CDO efforts are focused on centralized governance, and the role of data stewards in enforcing standards. While these top-down approaches provide necessary and important guidance, managing data for responsible AI requires an equally if not more significant focus on bottom-up processes. How are the end-users collecting and inputting data, and how can we ensure everyone involved – from beginning to end – is being responsible by design?

CDOs can combat bias and work toward more responsible AI by focusing on the consistency, correctness, and completeness of their data, from the start:

Consistency: Is the data labeled in a reliably consistent manner?
Correctness: Is the data accurate in relation to real-world, verifiable sources?
Completeness: What amount of data is needed to meet the mission needs while delivering the greatest ROI?

Too often, CDOs are directed to focus on the enterprise and big data – with the premise that analyzing larger volumes of data will create better AI.

Volume affects recall (for example, did the AI find all the trucks among images of vehicles?), and more data over time can improve recall. But consistent and correct data directly improves precision (for example, when the AI says it’s a truck – is it actually a truck?). And if you don’t get consistency and correctness right early on, you can allow bias to seep in that is not easily removed.

Consistency and correctness are most heavily influenced by the users inputting data in the first place. Therefore, CDOs must ask – are data standards designed for users at the edge? Are these standards too lofty, or not realistic in the context of their day-to-day experiences? No amount of stewardship or governance will enable consistently correct data if the processes and tools don’t align with the actual needs and experiences of those inputting the data.

Completeness is equally important because the cost in time, labor, maintenance, and compute for more and more data grows exponentially. It’s more important to hone in on what you need to meet your requirements, and ensure you get that data right, than to stretch yourself thin with all the data you could possibly want.

Therefore, it may be more effective to use a representative sample of data or a carefully stratified smaller data set. Assess the trade-off between unrealistic volume goals versus realistic requirements to make cost-effective decisions without compromising mission.

Leading change

Federal CDOs can take key actions now to support responsible AI and enable more consistent, correct, and complete data. These include:

Driving data quality initiatives. Annotations, which historically involve a “best-guess” approach, are ripe for bias. However, a quality effort employing inter-annotator agreement can help ensure the consistency, correctness and completeness of the data going into an algorithm by noting when annotators label the same object in different ways. With that information, agencies can retrain annotators to control data quality and create higher-precision models.

Advocating for and determining human baseline measurement. The goal is responsible AI, but responsible compared to what? We can’t judge AI against the standard of perfection, but rather against the inefficient, biased, and manual processes of today.

In other words, CDOs must work across the agency to determine the baseline of where their data is today, and how they can improve upon that baseline. Only then can they have a realistic roadmap for truly determining what is responsible AI, and how to attain it.

Data quality initiatives and human baseline measurements should be layered into CDOs’ broader ethical data governance efforts.

CDOs can lead responsible AI progress

CDOs have an opportunity to be a catalyst for AI change and to shape responsible, ethical policies through not only top-down governance and data standards, but also foundational approaches for optimizing the data that algorithms depend on.

When we recognize just how much human-created data influences responsible AI, we can also better recognize that AI as a technology is not inherently biased. Those same biases exist in manual, legacy processes, but automated systems offer the ability to perpetuate them at new speed and scale.

With careful attention to data, and a clear focus on responsible use, we can shift the paradigm – and AI can be seen as a powerful tool for enabling equity and transparency at new speed and scale, while improving inefficiencies and delivering more responsive services to the American people.

Additional authors:

Diana Min: Senior Manager – Accenture Federal Services, Technology Strategy & Advisory Lead
Mike Thieme: Managing Director – Accenture Federal Services, Generative AI & National Security Portfolio Technology Lead

WRITTEN BY

Mimi Whitehouse

Senior Manager – Accenture Federal Services, Emerging Technology