LLM Agents: How They Work and Where They Go Wrong

Authored by

Xin Guan

Machine Learning Intern at Holistic AI

Zekun Wu

Machine Learning Researcher at Holistic AI

Published on

January 20, 2025

last updated on

February 11, 2025

Use Cases of LLM agents

LLM agents can integrate modules to enhance their autonomy and perform tasks beyond the capability of standard LLMs. For example, in a customer service context, a simple LLM might respond to a query such as, “My laptop screen is flickering, and it’s still under warranty. What should I do?” with generic troubleshooting advice, such as restarting the device. If the issue persists, the LLM might suggest further steps. However, complex tasks including verifying warranty status, processing refunds, or arranging repairs require human intervention. LLM agents address this by incorporating the following modules to handle such scenarios autonomously:

Multimodality Augmentation: Enables the LLM agent to process images alongside text, allowing tasks such as analyzing a photo of a defective product for more accurate diagnosis.‍
Tool Use: Allows the agent to interact with backend systems, verify warranty status, and automate actions like initiating refunds for faulty products.‍
Memory: Enables the agent to recall previous interactions, recognize recurring issues, and tailor responses based on past experiences.‍
Reflection: Enhances output by assessing responses pre- and post-interaction. Feedback collected is used to iteratively improve future responses.‍
Community Interaction: Facilitates collaboration among specialized agents. For instance, a technical agent can handle complex issues, escalating to human experts if necessary, ensuring access to specialized and supervised support.

Moreover, it can be applied in various situation such as employee empowerment; code creation; data analysis; cybersecurity; and creative ideation and production. Check out 185 proposed applications of LLM agents here.

AI agents and AGI

Some academics argue that the agent paradigm is a plausible pathway to achieving Artificial General Intelligence (AGI). Proponents of this view suggest that these systems, which leverage multi-modal understanding and reality-agnostic training through generative AI and independent data sources, embody key characteristics of AGI. Indeed, a recent Stanford survey illustrates that when foundation models for agent tasks are trained on cross-reality data, they exhibit adaptability to both physical and virtual contexts. This adaptability, as they argue, underscores the viability of the agent paradigm as a step toward AGI.

Deep dive on LLM modules

This section provides a deeper dive explanation of the current technical practices of agentic designs briefly covered above, namely Multimodality, Tool Use, Memory, Refection, and Community Interaction.

Multimodal Augmentation

*Architecture of MLLM with three types of connectors (source*).

Multimodal augmentation enhances LLM autonomy by enabling the processing of text, images, audio, and video. A typical Multimodal Large Language Model (MLLM) includes two key components: a pre-trained modality encoder, which converts non-text data into processable tokens or features, and a modality connector, which integrates these inputs with the LLM. The model is then fine-tuned using specialized datasets to ensure effective multimodal integration.

The connector plays a critical role in this process and can be implemented in different ways. Token-level fusion converts encoded features into tokens, which are merged with text tokens before processing. For instance, Q-Former in BLIP-2 uses learnable queries to compress visual data into an LLM-compatible format. MLP-based methods, such as those in LLaVA, align visual tokens with text embeddings. Feature-level fusion enables deeper integration by combining vision and language features, as seen in Flamingo, which uses cross-attention layers for continuous interaction between modalities. Find out more here.

Tool Use

Tool-use enhances LLMs by enabling interactions with external tools like APIs, databases, and interpreters, addressing their limitations in accessing real-time data and performing specialized tasks. This capability expands problem-solving, expertise, and environment interaction.

The tool-use process includes four stages:

Task planning breaks queries into sub-tasks to clarify intent‍
Tool selection identifies the best tool via retriever- or LLM-based methods‍
Tool calling extracts parameters and retrieves information‍
Response generation integrates the tool’s output with the LLM’s knowledge for a complete response.

Frameworks such as ReAct and Toolformer are commonly used. Find out more here.

A diagram of a programDescription automatically generated — *One framework of affordance-driven LLM robotic Manipulation (source*)

Tool-use expands LLMs’ ability to interact with their environment by providing a framework for understanding and manipulating physical objects. This is achieved by integrating affordances—the possible actions an object allows based on its properties (e.g., grasping a handle or pushing an edge). By recognizing affordances, LLM agents can conceptualize the physical world as a set of actionable tools. For instance, understanding the affordances of a block enables the agent to identify the optimal side to push for a desired outcome. This affordance-driven approach bridges abstract reasoning with practical interaction in real-world contexts.

Memory

Memory is essential for LLM agents, enabling them to recall experiences, adapt to feedback, and maintain context for real-world interactions. It supports complex tasks, personalization, and autonomous evolution.

The memory mechanism consists of three steps:

Memory writing (W), which captures and stores information as raw data or summaries; ‍
Memory management (P), which organizes, refines, or discards stored data, abstracting high-level knowledge for efficiency; ‍
Memory reading (R), which retrieves relevant information for decision-making. These processes enable agents to retain context and effectively apply knowledge across tasks.

A notable framework is MemoryBank. , which is explained in greater detail here.

Reflection

LLM reflection enhances decision-making during inference without retraining, avoiding the need for extensive datasets and fine-tuning. It provides flexible feedback (scalar values or free-form) and improves tasks like programming, decision-making, and reasoning. Studies on Chain of Thought and test-time computation demonstrate that intermediate reasoning and adaptive computation enhance performance.

The Reflexion framework includes three models: the Actor, which performs actions (e.g., tool use, response generation); the Evaluator, which scores the outcomes of actions; and the Self-Reflection model, which provides feedback stored in long-term memory for future improvement. This iterative process allows the agent to refine its approach with each cycle.

Community Interaction

A screenshot of a computerDescription automatically generated — (source)

Large Language Model-based Multi-Agent (LLM-MA) systems employ multiple specialized LLMs to collaboratively solve complex problems, enabling advanced applications in software development, multi-robot systems, policymaking, and game simulation. These systems, with specialized profiles and environments, outperform single-agent models in handling intricate problems and simulating social dynamics.

Key components include:

Agent profiling, where agents are specialized for specific tasks; ‍
Communication, using cooperative, competitive, or debate formats; ‍
Environment interaction, via interfaces like sandboxes or physical setups; ‍
Capability acquisition, allowing agents to learn from the environment or each other through memory and reflection.

Notable frameworks include Autogen, Swarm, and MetaGPT, which are outlined here.

What can go wrong with LLM agents?

LLM agents offer advanced capabilities across domains but also present vulnerabilities that affect reliability, safety, and ethics. These risks stem from design insufficiencies—including issues with privacy, bias, sustainability, efficacy, and transparency, as well as operational challenges, such as adversarial attacks, misalignment, and malicious use. Addressing these challenges is essential for ensuring their safe and effective development.

Design Insufficiencies for LLM Agents

Design inefficiencies in LLM agents come mainly from technical problems in how these systems are built. Unlike risks that depend on the social context in which they are used, these challenges are more about flaws in the system’s design and structure. Problems like privacy issues, bias, high energy use, poor performance, and lack of transparency show weaknesses in how these agents are developed. These are not so much about where or how the systems are applied but about the need to improve their basic design to make them safer, more reliable, and more ethical in any context.

Privacy-related insufficiencies

Privacy-related issues in LLM agents arise from handling sensitive data. Multimodal inputs like images, audio, and video often contain identifiable information, requiring robust anonymization techniques that are challenging across modalities. Tool-use increases risks by sharing user data with x, which may not adhere to consistent privacy standards.

Under the GDPR, data controllers must ensure lawful, transparent, and secure processing, including accountability for third-party compliance. Memory management heightens privacy risks, as extensive data storage demands encryption, access controls, and mechanisms for data deletion to meet GDPR requirements for data minimization and erasure rights. Failure to comply with these provisions can lead to breaches. In LLM-MA systems, inter-agent communication amplifies risks, as weak protocols can expose sensitive data and hinder compliance with privacy regulations. Additionally, the GDPR’s right to manual evaluation requires LLM agents to enable human oversight and allow users to challenge decisions made solely through automated processing. Without such mechanisms, LLM agents risk non-compliance and a loss of trust. Addressing these challenges requires robust governance, data flow auditing, and oversight protocols to ensure privacy and regulatory adherence.

Bias-related insufficiencies

Bias in LLM agents can amplify harmful patterns, particularly through multimodal augmentation, which may reinforce skewed outputs from text, images, and cultural contexts. Tool-use can inherit biases from specific tools, memory may repeatedly draw on biased data, and reflection processes risk entrenching biases through skewed feedback loops. In LLM-MA systems, domain-specific biases and inter-agent interactions can further reduce fairness and transparency.

Regulations like the EU AI Act and NYC Local Law 144 require high-risk AI systems to prevent discriminatory outcomes and promote accountability, but compliance remains difficult. Diverse datasets are hard to secure, biases vary across modalities, and fairness metrics lack universal standards, leading to inconsistent evaluations. Tool operations and large-scale audits add further complexity, and existing guidelines often fail to address the unique challenges of agentic AI systems, making sustained bias mitigation a persistent challenge.

Sustainability-related insufficiencies

LLM agents face sustainability challenges due to high computational demands, leading to increased energy use and environmental impact. Multimodal models, particularly those processing images and audio, are resource-intensive, as are tool-use, memory storage, and reflection-based learning, which add to energy consumption. Systems with multiple specialized agents exacerbate this through extended computation cycles. Efficiency measures, such as model quantization, task-specific models, adaptive activation, and agent pruning, can help reduce resource usage.

Regulatory frameworks, including the EU AI Act (10^25 FLOPS threshold) and the U.S. Executive Order on AI (monitoring models above 10^26 FLOPS or clusters at 10^20 FLOPS), link high computational intensity to environmental risks but fail to capture the full scope of energy use. Lifecycle-based assessments, accounting for total emissions during development, training, and deployment, are crucial for addressing these sustainability challenges.

Efficacy-related insufficiencies

Efficacy-related design issues arise from the complexity of integrating diverse data types and coordinating multiple agents. Multimodal augmentation faces alignment challenges, as token- and feature-level fusion methods may fail to fully utilize non-textual data. Issues like cross-modal hallucination can lead to inaccurate outputs. Tool-use depends on accurate task planning and tool selection, where errors can result in poor responses. Inefficient memory management may retrieve irrelevant or outdated information, and scaling memory for large datasets can affect consistency. Reflection-based feedback poses risks of overfitting, reducing adaptability. Finally, LLM-MA systems can suffer from miscommunication between agents, compounding errors.

Transparency-related insufficiencies

Transparency-related design issues in LLM agents stem from opaque decision-making processes, complicating accountability. Multimodal augmentation reduces interpretability by making it difficult to trace how data types like text, images, and audio contribute to outputs, especially in sensitive areas like healthcare. Tool-use lacks clarity on tool selection and decision-making, requiring thorough documentation. Memory management is opaque in terms of data retention and use, hindering debugging. Reflection-based decisions are hard to trace, with consistency in feedback interpretation posing challenges. In LLM-MA systems, transparency decreases due to complex inter-agent interactions, necessitating tools to track information flow and improve accountability.

Operational challenges

Operational challenges in LLM agents refer to the difficulties that arise during their actual deployment and interaction within real-world environments. While these agents may perform well in controlled conditions, operational issues such as adversarial attacks, misalignment with user intent, and the potential for malicious use can undermine their effectiveness and safety. These challenges emerge from the complexity of adapting to dynamic, unpredictable environments, where the agent must handle diverse inputs and potential manipulation. Addressing these issues is vital for ensuring the ethical use of LLM agents in real-world applications.

Human-AI Misalignment

Misalignment in LLM agents, particularly as autonomy increases, can lead to harmful impacts on individuals and society:

Internal Objectives vs. User Well-Being: Misalignment occurs when LLM agents prioritize internal goals, such as engagement optimization, over user well-being. This can subtly manipulate users toward harmful behaviors, undermining trust and causing unintended harm.‍
Over-Dependency: The convenience of LLM agents can foster user reliance, gradually eroding independent decision-making and personal autonomy. This dependency risks shifting the agent’s role from assistant to a controlling influence over user actions.‍
Over-Optimization: Excessive focus on individual user preferences by LLM agents can result in misuse, such as enabling dominance or unfair competition. This misalignment may spread misinformation or manipulative content, harming societal well-being.‍
Disregard for Non-Users: Misaligned LLM agents may neglect non-users’ needs, causing unequal access to resources and societal inequities. Prioritizing developers’ commercial or ideological interests over user welfare risks undermining social institutions.

Adversarial Attacks

In general, adversarial attacks threaten the safety and reliability of LLM agents by exploiting vulnerabilities in their inputs, observations, planning, and memory, causing harmful behaviors. Key attack types include:

Direct Prompt Injection (DPI): DPI targets user prompts by injecting harmful content to manipulate the agent’s behavior. This can redirect the agent to execute altered tasks or misuse tools, compromising safety and enabling malicious actions.‍
Observation Prompt Injection (OPI): OPI alters the agent’s observations from external tools or the environment, misleading the agent into unintended or harmful actions. Such manipulation can disrupt task execution, leading to unsafe or unintended outcomes.‍
Plan-of-Thought (PoT) Backdoor: This attack embeds hidden instructions in the system prompt, targeting the planning phase. A backdoor trigger activates unintended actions under specific conditions, while the agent behaves normally otherwise. PoT Backdoors are hard to detect and mitigate, causing unpredictable behavior when triggered.‍
Memory Poisoning Attacks: These attacks inject adversarial content into the LLM agent’s memory (e.g., RAG databases), misleading decision-making by corrupting historical plans or knowledge bases. This compromises future tasks, leading to harmful behaviors and long-term impacts on the agent’s reliability and performance, particularly in context-dependent tasks.

A diagram of a business processDescription automatically generated — *Overview of LLM agent attacking Framework (source)*

There are also attacks targeting specific agentic system designs. Multimodal systems are susceptible to adversarial attacks, where manipulated inputs can cause models to misinterpret data, leading to erroneous or harmful outputs. For instance, exploiting alpha transparency in images can deceive vision-based AI systems. In LLM-MA systems, the spread of manipulated knowledge can compromise the integrity of agent interactions, resulting in the dissemination of false information. Moreover, AI agents are vulnerable to adversarial attacks that exploit system vulnerabilities, such as state perturbations, which can significantly impair overall performance and stability.).

Malicious Use

LLM agents, while highly capable, can be misused for malicious purposes many of which could breach the EU AI Act:

Behavioral Manipulation: Multimodal augmentation and memory design enable personalized responses that may subtly influence user behavior, with reflection deepening manipulative effects. ‍
Exploitation of Vulnerabilities: By combining memory and multimodal inputs, LLMs can exploit emotional or psychological weaknesses through targeted suggestions. ‍
Social Scoring: Tool-use and memory design allow LLMs to classify users based on patterns, potentially creating unregulated social scoring systems. ‍
Predictive Policing: Memory and pattern recognition tools can lead to biased profiling, reinforced by multimodal data in predictive analytics. ‍
Unrestricted Biometric Surveillance: Multimodal augmentation can enable integration with facial recognition or voice analysis, risking privacy violations and real-time surveillance in breach of the EU AI Act.

Ensure the safety of your LLMs with Holistic AI

Understanding the risks of LLM agents is essential for their safe and responsible use. Rigorous evaluations, testing, monitoring, and audits can mitigate vulnerabilities and ensure compliance. Get in touch to find out how Holistic AI’s Safeguard can help you govern your LLMs.

DISCLAIMER: This blog article is for informational purposes only. This blog article is not intended to, and does not, provide legal advice or a legal opinion. It is not a do-it-yourself guide to resolving legal issues or handling litigation. This blog article is not a substitute for experienced legal counsel and does not provide legal advice regarding any situation or employer.

Holistic AI OSL Library

Table of contents

Heading 2

Heading 3