How to identify and mitigate bias in federal AI
October 2, 2020
October 2, 2020
Artificial intelligence has already improved and streamlined decision-making for the federal government - but, in those decisions, it appears that recommendations may not always be neutral and objective, as some may expect. How can that be, though, when AI isn’t subject to human emotions and whims?
It’s because AI doesn’t do anything on its own. Rather, AI’s decisions and outputs are all based on data and processes inputted by humans. And if those humans don’t take steps to mitigate potential effects of bias, it will become encoded into the AI system they are designing.
Last month, the National Institute of Standards and Technology (NIST) held a workshop to discuss bias in AI. The purpose of NIST’s virtual event was “to develop a shared understanding of bias in AI that can guide and speed the innovation and adoption of trustworthy AI systems.” Participants included Dr. Teresa Tung of Accenture Labs as part of the larger group session, Foundational Juggernaut: Addressing data bias challenges in AI.
The NIST workshop emphasized the necessity of addressing unintended consequences, harm to individuals, and threats to privacy and security. The workshop included an interdisciplinary panel serving as a consultancy for developing standards to mitigate bias. Workshop participants came to an agreement that though all sorts of bias can come out of AI, it’s important to address those that cause the most harm to individuals first.
More than 8 in 10 federal executives acknowledge that collaboration between humans and machine through artificial intelligence will be critical to innovation in the future. As the federal government increasingly adopts AI strategies and technologies, understanding how to identify and mitigate potential bias becomes more critical.
<<< Start >>>
As the federal government increasingly adopts AI strategies and technologies, understanding how to identify and mitigate potential bias becomes more critical.
<<< End >>>
Bias happens when human values and perceptions are inadvertently encoded in the algorithm or as a result of data inputs, such that the output is different from what might be expected if the model and data were representative of a desired state. The result can have different degrees of impact to individuals.
For instance, a retailer can use an AI-based recommendation system to suggest items that, according to the system’s algorithm, “someone like you” might want to purchase, but it turns out that you aren’t interested in those items at all. Therefore, you purchase nothing. This example of bias might cause some harm to the retailer—no sales—but not to you.
A different example: A judge receives from a recommendation system the recidivism risk for an individual. The recommendation is based on data from decisions that were affected by historical bias. This risk score is then factored in along with other criteria to determine jail terms —in other words, jail terms could be affected not solely by the crime the individual committed but also by a biased score of potential risk. This is an example of bias causing personal harm.
Bias can be a positive when it reflects relevant experience and the desired process of human decision-making. However, bias that induces an unintended and potentially harmful outcome should be mitigated as much as possible. But let’s be clear, just as with human decisions, it is impossible to expect all bias to be removed.
The workshop discussion brings up previous actions that we’ve taken with clients. Based on Accenture’s commercial and federal expertise, I share below four important steps that agencies can take next, along with more in-depth studies:
Setting priorities at the very beginning, with a diverse team from different backgrounds, is critical to building a North Star that can guide agencies through potentially bias-inducing decisions when designing an AI system.
Though regulations and policies are already in place, reviewing and communicating goals for the project is important. Accenture has taken this a step further by detailing the first step, which includes articulating innovation in one’s strategy as part of an overall well governed approach to unlocking value.
Whether making decisions during the development, or reviewing outputs from the system, the team can then look back at their original mission to understand – is this giving us the outputs we intended? Is it disproportionately impacting or ignoring certain factors or populations?
Consider how an agency might build an AI system to identify business fraud. Their guiding mission can be to identify and address fraudulent business activities to prevent harm and loss to consumers. Suppose the team first focused on investigating businesses based on number of fraud complaints. However, it’s possible that those who complain over-represent those with higher incomes.
In this hypothetical situation, by selecting complaints, the system would not serve those who do not complain. If those who complain have a higher income, then those in lower income brackets may not be served. By staying focused on their original mission, the team could identify this gap before it’s implemented and better assist consumers overall.
In the NIST workshop, Teresa Tung explained how we can mitigate bias and its effects by using more “explainable” models—models built so that we can see how the outcome was produced—and “causal” models, which identify the root cause of an outcome. This helps address where the bias comes from and the consequences of such biases. After all – if you only address bias in the dataset at hand, the bias could continue as new data comes in. NIST’s call for comments on their draft Explainable AI Principles further underscores the importance of designing with explainability in mind.
As another example, reviewing for inter-annotator agreement (IAA) can address disparities in how different people label data. Consider - if one annotator tags a picture of a car as “car” and the other as the car’s make or model (say, “Toyota”), this could lead to confusion and skewing by the AI system. IAA is “a method used to assess the quality of their labeling by measuring level of consensus between sources,” according to a recent Accenture report. If the measurement detects low consensus between annotators, further review is needed to ensure data isn’t skewed.
How Inter-Annotator Agreement Drives Confidence in Federal AI details how IAA is being used in the federal government today.
Data changes over time as human behavior and societal norms evolve. This means that system outputs will imbue those changes as well. For example, algorithms that have been tuned for use by transportation planners to optimize schedules might need to be tuned to factor in vehicle and air traffic changes that happened after March 2020 due to COVID-19.
Just as agencies have resources and processes set up to review impacts related to changes in regulations, agencies should consistently put resources towards model and data governance.
Ultimately, systems that cause harm to individuals are the most relevant ones for attention. Developing ethical guidelines and creating an ethics committee can combat unfair bias.
Federal agencies will move from principles to guidelines in FY21. The groundwork has been laid in many areas, such as the Department of Defense’s ethical principles for the adoption of AI published in February 2020. Furthermore, Accenture recently partnered with Northeastern University to “explore the development of effective and well-functioning data and AI ethics committees.” Questions to ask when building a committee for any organization include:
Bias can never be fully “cured.” But by taking the right precautions and practicing steadfast governance, it can be mitigated.
I encourage you to read my earlier whitepaper – Responsible AI: A Framework for Building Trust in Your AI Solutions – below for more on this important topic.
<<< Start >>>
<<< End >>>