Citation
L'auteur
Hola Adrakey
(h.adrakey@catie.fr) - CATIE - ORCID : https://orcid.org/0000-0001-7918-6335
Copyright
Déclaration d'intérêts
Financements
Aperçu
Contenu
For years, the responsible AI community has fought a familiar battle against bias. We’ve built an arsenal of debiasing techniques – reweighting datasets, oversampling minority classes, adversarial training, post-hoc score calibration, and more – that have moved the needle on fairness. Yet, this often feels like a sophisticated game of whack-a-mole: fix one manifestation of bias, and another pops up elsewhere. In fact, research shows that satisfying one fairness metric can inherently degrade another, underscoring the inevitability of such trade-offs. These interventions, while necessary, tend to treat the symptoms of bias rather than its root causes. They adjust the data or tweak model outputs after the fact, without fundamentally changing how we model a world that is inherently messy, unjust, and uncertain.
What’s needed is a paradigm shift – not just technical but philosophical. We must move beyond simply “patching” biased models and start building systems that explicitly understand and reason about bias in the world itself. The framework most poised to enable this shift is Bayesian inference. Embracing a Bayesian approach to AI fairness means baking uncertainty, causality, and even ethical priors into our models from the ground up, rather than slapping on fixes after deployment. Below, we explore how Bayesian thinking can address bias at a deeper level and why it offers a more robust foundation for fair AI.
Embracing Uncertainty: The Power of « I Don’t Know »
Traditional machine learning models can be deceptively confident. Ask a standard classifier to predict loan default, and it might output a single probability – say, 95% chance of default – with no indication that this estimate could be wildly off for certain groups. This false aura of certainty is dangerous: it masks situations where the model has little data or is extrapolating from biased history, effectively presenting prejudice as indisputable fact. In contrast, a Bayesian model would respond with honest uncertainty. Instead of a point estimate, it produces a full probability distribution. For example: “My best estimate is 75%, but given how little data I’ve seen for applicants from this demographic, the probability could plausibly range from 60% to 90%.” That range – the model’s uncertainty – is everything. It directly signals when the model is on shaky ground. In practice, models are often less certain about predictions for underrepresented or marginalized groups due to sparse data. By quantifying this uncertainty, a Bayesian model provides a built-in flag for potential bias.
Expressing uncertainty isn’t just academic; it’s actionable. If an algorithm admits “I’m not sure,” we can design the broader system to respond appropriately. For instance, researchers have demonstrated that when a model is uncertain – a strong indicator that its training data may be insufficient or biased for that case – the decision can be deferred to a human expert. In other words, the AI is empowered to say « I don’t know » and hand the decision to a person. This kind of Bayesian “selective refusal” or learning to defer framework has been shown to greatly improve overall fairness and accuracy. High-stakes scenarios like healthcare or lending stand to benefit: an automated system might confidently handle routine cases, but when its uncertainty (and thus risk of bias) is high, it yields to human judgment. Such a hybrid approach would prevent countless discriminatory outcomes that occur when algorithms overstep their knowledge and make overconfident guesses based on thin data. In short, Bayesian models turn uncertainty into a virtue – a form of humility that can make AI-driven decisions more transparent and fair.
Modeling the Disease, Not the Symptoms
A Bayesian framework doesn’t just quantify uncertainty; it encourages us to model why bias occurs. Rather than merely re-balancing datasets or equalizing error rates after training, Bayesian thinking pushes us upstream to represent the generative process that gave rise to the biased data. One powerful approach is using causal Bayesian networks (CBNs) to map out how different factors lead to decisions. In a CBN, we explicitly draw the connections between variables (e.g., an applicant’s qualifications, gender, and the loan approval decision) and identify the paths by which bias might seep in.
Imagine two paths in a hiring model’s training data. The “fair” path is from qualification to hiring outcome: applicants’ skills and experience influence their hiring decisions. The “unfair” path is from a sensitive attribute (say, gender or race) to the outcome: perhaps through a human recruiter’s implicit biases that affected past hiring. Traditional models learn from whatever correlations exist in the data, entangling the fair with the unfair. A Bayesian causal model, however, can capture both paths simultaneously – essentially separating the signal from the poison. By viewing unfairness as the presence of an unfair causal path in the data-generating processs, we can literally point to where discrimination is creeping in. Researchers from DeepMind illustrated this using a college admissions example: gender had a direct influence on admissions decisions (an unfair path), as well as an indirect influence via department choice. Mapping these allowed them to reason about which pathways were just or unjust, rather than treating all disparities as equal.
The benefit of modeling such structure is that it enables principled “bias removal”. With a causal Bayesian network, one can calculate the extent to which the sensitive attribute influences outcomes along the unfair path and then subtract that influence out. In effect, we perform a form of counterfactual surgery: “What would the decision be for this person if we could disentangle or neutralize the effect of their race/gender?” This approach, related to techniques of counterfactual fairness, is akin to a surgeon removing a tumor rather than a cosmetic patch-up. It directly addresses the root cause of bias by modeling the causal mechanism of prejudice. Notably, this requires making our assumptions explicit (e.g. drawing a node for “recruiter’s bias”) and is transparent about what we consider fair vs. unfair influence. Such transparency is valuable for stakeholders and regulators: it moves the fairness discussion from opaque statistical adjustments to an open examination of how and why decisions are made.
Encoding Ethics as Priors
Perhaps the most novel advantage of the Bayesian mindset is the ability to inject our values and ethical goals directly into the model through prior distributions. In standard machine learning, fairness interventions are often bolted on after training (for example, adjusting a decision threshold to equalize loan approval rates between groups). In a Bayesian model, by contrast, we can bake high-level fairness principles into the model from the start. A prior represents our belief (before seeing data) about what model parameters or outputs should look like. By setting carefully chosen priors, we effectively give the model an ethical “nudge” in a transparent way.
For example, if our fairness goal is that two equally qualified individuals should have an equal chance of receiving a loan regardless of race, we can encode a fairness prior that penalizes large disparities in loan approval rates between racial groups. Concretely, imagine a parameter in our model that represents the difference in approval probability between a privileged group and a marginalized group. A fairness prior might be a distribution centered at zero for that difference – reflecting a belief that, a priori, we expect no difference absent evidence to the contrary. The Bayesian model will then weigh this prior against the observed data. If the historical data are biased (say, one group was unfairly denied loans more often), the prior will pull the model’s beliefs toward a fairer outcome, unless the data overwhelmingly contradict it. This approach is flexible: the prior is not a hard rule, but a soft constraint that can be overridden by strong evidence. In essence, the model’s final predictions become a principled compromise between the patterns in the (possibly biased) data and the fairness criteria we have encoded upfront.
Academic work has started exploring this concept by directly integrating fairness into model objectives and constraints. One formulation explicitly adds a term for fairness into the model’s optimization goal, trading off accuracy and fairness with a tunable parameter. This is analogous to setting a prior, since we are telling the model how much weight to give fairness relative to raw performance. The beauty of the Bayesian formulation is that it makes these trade-offs transparent and adjustable. Our ethical assumptions become first-class parts of the model, open to scrutiny and debate. If a stakeholder disagrees with how strong the fairness prior is (maybe they argue for an even more aggressive fairness constraint, or a relaxed one), we can adjust it and retrain – much like turning a dial. This stands in stark contrast to many current practices where fairness adjustments are buried in obscure post-processing steps or ad-hoc parameter tweaks. By encoding ethics as priors, we elevate the discussion of fairness to the model-design level, where society can directly interrogate and shape the values driving AI decisions. Such an approach aligns with calls from policymakers for AI systems that have fairness by design rather than as an afterthought.
The Path Forward
Adopting a Bayesian approach to fairness is not without challenges. Bayesian models can be computationally intensive and may require more expert knowledge to develop and interpret. Implementing causal models or setting justifiable priors demands care, interdisciplinary insight, and sometimes difficult choices about values and assumptions. These are engineering and governance challenges, but not insurmountable ones. The cost in computation or complexity is small compared to the cost society pays for biased, untrustworthy AI. Importantly, Bayesian methods inherently provide more interpretable outputs (like uncertainty estimates and causal attributions) that can actually assist in governance and oversight of AI systems. In fact, the trend in regulation and standards (such as the NIST AI Risk Management Framework and the EU’s draft AI Act) is toward requiring transparency about uncertainty and bias in AI decisions – precisely the qualities Bayesian models offer by construction.
For too long, AI fairness has been approached with a reactive mentality – patching systems after they err. The Bayesian framework invites us to be proactive: to build uncertainty, causal understanding, and ethical reasoning into the foundations of our models. It represents a shift from a worldview of feigned certainty to one of honest ignorance when appropriate, from chasing biases after they appear to anticipating and modeling them from the start. This is a more robust and responsible foundation for AI. As David Sumpter quipped, the quest for algorithmic fairness often resembles whack-a-mole – but we don’t have to keep playing that game. By putting down the mallets and embracing Bayesian thinking, we equip ourselves to address bias at its source, leading to AI systems that are not only fairer, but also more transparent and trustworthy by design.
il ne peut pas avoir d'altmétriques.)