In this academic paper researchers from the Centre for Artificial Intelligence (University College London) and Holistic AI present a new framework for auditing LLMs using explainable AI techniques and energy-efficient tools like DistilBERT, aligning with ethical, regulatory, and sustainable standards.
Key Issues:
- LLMs can reinforce biases and stereotypes, affecting areas like political polarization and racial bias in legal systems.
- Existing studies often separate bias benchmarks from text-based stereotype detection, creating a gap in understanding the interaction between these elements.
- The paper's framework aims to audit bias in LLMs using explainable AI techniques and energy-efficient tools like DistilBERT, aligning with ethical, regulatory, and sustainable standards.
Methodology and Findings:
- The study introduces the Multi-Grain Stereotype (MGS) dataset, derived from crowdsourced sources, for training and testing stereotype detection models.
- The research explores multi-class stereotype detection, comparing it against binary classification models and other baselines.
- The paper employs explainable AI (XAI) techniques like SHAP, LIME, and BertViz for robust validation and interpretation of the stereotype classifier.
- An automated method for prompt generation is established to elicit stereotypes from LLMs, with a focus on assessing stereotype bias in different societal dimensions (race, gender, religion, profession).
The paper contributes significantly to AI ethics by providing a comprehensive approach to detecting and auditing stereotypes in LLMs. It underscores the need for continuous improvement and exploration of stereotype detection, particularly in the context of increasingly influential AI models in society.