Wednesday, 30 October 2024 | 10am PDT/ 1pm EDT/ 5pm BST

Bias Detection in Large Language Models - Techniques and Best Practices

Published on

October 30, 2024

Large Language Models (LLMs) are powerful AI systems trained on extensive text data to generate and predict human-like language. Their applications span numerous fields, including software development, scientific research, media, and education. However, the widespread use of these models has raised concerns about inherent biases that can lead to skewed language generation, unfair decision-making, and perpetuation of systemic inequalities. Early detection of these biases is crucial to ensure that LLMs contribute positively to society.

This webinar will explore bias assessment in traditional machine learning and the specific challenges posed by LLMs. We will discuss policy requirements for bias assessment, such as those outlined in NYC Bias Local Law 144. The session will also cover various types of bias in LLMs and how these biases manifest in different downstream tasks, both deterministic and generative. Additionally, we will introduce several research papers published by Holistic AI. Register now to secure your spot and be part of the conversation on shaping the future of ethical AI.

Q&A

Using SAGED, we can construct benchmarks from human materials embedded in culture to form a fairness baseline sensitive to cultural norms. For example, if we want to evaluate models' gender bias in the context of culture X, we can first prepare cultural materials on gender, such as books and public talks, to create the benchmark. We then conduct the pipeline as usual, applying sentiment and distance metrics during extraction, and using correlation and precision metrics during diagnosis to effectively determine if the model is biased in that cultural context and how closely it aligns with the baseline.

In the context of hiring bias, “resume density” refers to the amount of information or level of detail provided in a resume. The density of a resume can impact the extent of bias exhibited by large language models (LLMs) in scoring or ranking resumes for hiring purposes. In the JobFair framework, a distinction is made between “Taste-based” and “Statistical” biases. Taste-based bias is consistent regardless of resume density, while Statistical bias may vary depending on the amount of information in the resume. For instance, if an LLM relies more heavily on demographic traits (e.g., gender) when less non-demographic information is provided, this could indicate Statistical bias.

Intrinsic bias in LLMs involves internal associations, such as linking "nurse" with female pronouns, reflecting societal stereotypes. Extrinsic bias occurs during practical use, like hiring tools rating male candidates higher for male-associated roles. In content generation, LLMs may produce stereotyped responses to demographic prompts, such as associating men with leadership roles and women with nurturing roles, illustrating both internal biases and biased real-world behavior.

Human feedback in reinforcement learning can introduce biases into LLMs by reflecting the subjective views, preferences, or biases of human evaluators. In Reinforcement Learning from Human Feedback (RLHF), the reward model ranks outputs based on human ratings. If human raters prefer responses that align with certain cultural or ideological views, the model will learn to prioritize those perspectives, thereby embedding and reinforcing the biases present in the feedback.

The SAGED pipeline improves on traditional benchmarking by offering flexibility in building benchmarks from diverse sources, supporting realistic prompts, and enabling comparisons across multiple language models with different configurations. Unlike older approaches, SAGED includes methods like baseline calibration and counterfactual branching to reduce bias more effectively, making it a comprehensive tool for nuanced bias detection in open-ended AI generation tasks.

In SAGED, fairness is defined by the scraped materials themselves rather than assuming equality as the baseline, unlike many previous studies. By deriving a fairness reference directly from the specific context being analyzed, SAGED ensures that bias assessments are more relevant to the domain in question, rather than relying on rigid notions of equality. Additionally, SAGED employs baseline calibration, which involves comparing generated responses to a baseline derived from the scraped materials. This helps adjust for inherent biases in prompts or metric tools, resulting in a more context-aware and precise evaluation.

Selection Rate (SR) is the proportion of a group that is selected. For example, if 20 candidates from a demographic apply and 5 are hired, the SR is 25%. Impact Ratio (IR) compares selection rates between groups by dividing one group's SR by that of a reference group. For instance, if Group A's SR is 25% and Group B's is 50%, the IR for Group A compared to Group B is 0.5. An IR below 0.8 typically indicates potential bias.

With advancements in AI model architecture, bias detection will become more precise, targeting biases at deeper levels. Adaptive techniques could allow models to adjust in real time to minimize bias, while improved contextual awareness will help models make fairer, more nuanced decisions. Extrinsic bias detection methods are likely to become more prevalent as AI grows in complexity due to their adaptability and general applicability in diverse real-world scenarios.

Speakers

Zekun Wu

AI Researcher, Holistic AI

AI Researcher at Holistic AI, leading Responsible Gen AI Research and Development projects and conducting comprehensive AI audits for clients like Unilever and Michelin. Currently a PhD candidate at University College London, focusing on sustainable and responsible machine learning. Collaborations include work with organizations like OECD.AI, UNESCO, and See Talent on AI tool development and metrics for trustworthy AI. Published research in top conferences like EMNLP and NeurIPS, covering bias detection and stereotype analysis in large language models, and delivered lectures and talks at UCL, UNESCO, Oxford, Ofcom and the Alan Turing Institute.

Xin Guan

AI Researcher, Holistic AI

AI Researcher at Holistic AI with a master's and undergraduate degree in mathematics and philosophy from the University of Oxford. Published research in top conferences like EMNLP. Core member of the Chinese Key National Project AGILE Index during stays at the Chinese Academy of Sciences. Remote Research Associate at the Centre for Long-Term AI. His research interests focus on AI for good, including large language model fairness and alignment, long-term AI ethics and safety, and foundational theories of intelligence.

Nate Demchak

AI Research Assistant, Holistic AI

AI Research Assistant at Holistic AI. Third-year undergraduate student at Stanford University, majoring in computer science and computational biology. Passionate about the advancements in large language models and leveraging AI for social good, focusing on open-generation bias assessment in LLMs. Developing customizable bias benchmark and researching biases within existing bias benchmarks, aiming to foster more equitable and accurate AI assessments.

Ze Wang

AI Research Affiliate, Holistic AI

AI Research Affiliate at Holistic AI, specializing in social bias in AI and the intersection of AI with Economics. Currently pursuing a PhD at University College London, where my research focuses on applying AI techniques to empirical and theoretical economics, including areas such as game theory, labour economics, inequalities, and macroeconomic dynamic modelling. I have published work in leading conferences like EMNLP on topics such as bias benchmarks, model collapse, and bias amplification in large language models. Additionally, I have taught statistics tutorials at UCL.

Amina Tkhamokova

AI Risk Management Officer, Holistic AI

AI Risk Management Officer at Holistic AI, specializing in bias mitigation in machine learning. Amina holds an MSc in Computer Science from Warwick University and brings expertise in developing responsible, fair AI systems.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.