Building AI Agent Design using LLM
- Brikesh Kumar
- May 20, 2024
- 5 min read
Updated: May 25, 2024
In our previous blog post on demystifying machine learning, we discussed the fundamentals of Large Language Models (LLMs) and how they work. If you missed it, you can catch up here. Building on that foundation, we now turn our attention to a more specialized application of LLMs: designing AI agents.
Andrew Ng recently highlighted a crucial trend in AI development, stating, "Multi-agent collaboration has emerged as a key AI agentic design pattern." This perspective underscores the evolving landscape of AI architecture, where AI agents can create more intelligent, responsive systems.
In this blog post, we'll explore how LLMs can be used to design sophisticated AI agents, leveraging their advanced capabilities to enhance interaction, decision-making, and collaboration. By integrating LLMs into AI agent design, we can unlock new potentials and address complex challenges across various domains.
What is an AI Agent:
They can be described as a system that can use an LLM to reason through a problem, create a plan to solve the problem, and execute the plan with the help of a set of tools. Here are some core components of an AI agent.
Following are the key components in an AI Agent.
Component | Description |
Agent Core | This central module is responsible for coordination and decision-making within the agent. It manages the agent's goals and the tools it can access. It is the key decision module. |
Memory Module | Important for storing interactions and internal logs. It includes short-term memory for tracking the agent’s immediate thought processes and long-term memory for storing interactions over extended periods. |
Tools | Specific functionalities or APIs that the agent uses to perform tasks such as retrieving information, performing calculations, or interfacing with other software. |
Planning Module | Deals with decomposing complex tasks into manageable actions. It aids in planning the agent’s activities in a way that efficiently solves problems or answers questions. |
Here is the sequence diagram describing the user interaction and how an AI agent co-ordinate with several components to execute a task and answer user questions.
Let see see in detail what these components are and what are their functions.
Agent Core
The agent core serves as the central coordination unit for an AI agent, managing its core logic and behavior. Think of it as the agent's "key decision-making module." It encompasses:
General Goals: Defines the objectives and aims of the agent.
Tools: A concise guide to the tools the agent can utilize.
Planning Modules: Instructions on when and how to use various planning modules.
Memory: Dynamically filled with past interactions based on the user's current queries.
Agent Persona: A description that can influence the agent's tool preferences and response style to imbue it with specific characteristics.
Here is how the Agent Core’s agents prompt to LLM could look like.
template = """GENERAL INSTRUCTIONS
Your task is to answer questions. If you cannot answer the question, request a helper or use a tool. Fill with Nil where no tool or helper is required.
AVAILABLE TOOLS
- Search Tool
- Math Tool
- Internal APIs
AVAILABLE HELPERS
- Decomposition: Breaks Complex Questions down into simpler subparts
CONTEXTUAL INFORMATION
<No previous questions asked>
QUESTION
How much did the revenue grow between Q1 of 2024 and Q2 of 2024?
ANSWER FORMAT
{"Tool_Request": "<Fill>", "Helper_Request "<Fill>"}"""
Memory Module
Memory modules are vital for AI agents, storing the agent's internal logs and user interactions.
Short-term memory: Tracks the agent’s immediate thoughts and actions while answering a single question.
Long-term memory: Maintains a history of interactions between the user and agent over weeks or months.
Retrieval: Uses a composite score based on semantic similarity, importance, recency, and other metrics to retrieve specific information.
Tools
Tools are defined workflows or APIs that agents use to perform tasks. Examples include:
RAG pipeline: Generates context aware semantic search to internal data source.
Code interpreter: Execute code to do some computations involved.
Information search APIs: Accesses internet data or some other internal data source.
Utility APIs: Like weather or messaging services or internal APIs within an organization.
Planning Module
Planning module would be a very key component of an AI agent. It's complexity can vary based on the specific use case. For example for a complex problems, an LLM-powered agent can use:
Question Decomposition: Breaks down complex questions into manageable parts. For example, "What are the top three factors affecting our Q2 sales performance?" could be decomposed into tasks like retrieving sales data from internal systems, calculating sales growth rates and trends, and searching for industry-wide factors affecting sales.
Here is a prompt template for the decomposition module:
decomp_template = """GENERAL INSTRUCTIONS
You are a domain expert. Your task is to break down a complex question into simpler sub-parts.
USER QUESTION
{{user_question}}
ANSWER FORMAT
{"sub-questions":["<FILL>"]}""
Reflection or Critic Techniques: Methods like ReAct, Reflexion, Chain of Thought, and Graph of Thought improve reasoning and refine the agent’s execution plans.
Here's where the Reflection or Critic Techniques come in. The agent doesn't simply execute the plan blindly. Instead, it performs an internal reflection process using techniques like ReAct, Reflexion, Chain of Thought, and Graph of Thought:
ReAct (Recurrent Experience in Action-Taking): This technique might involve the agent replaying its planned sequence internally, considering potential outcomes and how they might affect achieving the goal. It's like mentally rehearsing the plan and identifying potential issues.
Reflexion: This technique focuses on the agent reflecting on the reasoning behind its chosen actions. It might involve questioning its assumptions, considering alternative options, and ensuring the logic behind the plan is sound.
Chain of Thought (CoT): This technique involves the agent explicitly recording the thought process that led to its plan. This "chain" of reasoning steps can be reviewed later, helping the agent identify potential biases or flaws in its logic and improve future decision-making.
Graph of Thought (GoT): This technique builds upon CoT by creating a more elaborate internal representation of the agent's thought process. This "graph" might connect different concepts, assumptions, and evidence used to reach a conclusion, allowing for a more comprehensive review and potential refinement of the plan.
Once the basic components are ready, we need to have a mechanism by which the agent will execute this. There are usually three choices:
Linear Solver: This is simplest where the agent does a single pass, one level of planning and execute based on the recommendation from LLM. However, any moderately complex question would require multiple passes and interaction with the tools to answer the question.
Single-thread recursive solver: This is akin to breaking a question to simpler sub-questions and keep solving them recursively until all of them are solved and generate final answer. Here is the pseudo code that illustrates the flow:
def Agent_Core(Question, Context):
# Determine the next action based on the current context and question
Action = LLM(Context + Question)
if Action == "Decomposition":
# Break down the question into smaller sub-questions
Sub_Questions = LLM(Question)
# Recursively solve each sub-question
for Sub_Question in Sub_Questions:
Agent_Core(Sub_Question, Context)
elif Action == "Search Tool":
# Use a search pipeline to find the answer
Answer = RAG_Pipeline(Question)
# Update the context with the new information
Context += Answer
# Continue solving the main question
Agent_Core(Question, Context)
elif Action == "Gen Final Answer":
# Generate the final answer based on the context
return LLM(Context)
elif Action == "<Another Tool>":
# Execute another specified tool
Execute_Another_Tool()
Multi-thread recursive solver: Multi thread solver does the exact same thing but execute them all in parallel.
Developer Ecosystem
Here are some developer tools and libraries that provide the abstractions need to create AI Agents and they also come up with several tools out of box:
Conclusion
AI agents represent a powerful way in which large language models, can be made to perform a wide range of tasks from answering questions and retrieving information to generating content and automating workflows. For more in-depth discussions and updates on the latest AI and ML trends, don’t forget to follow us on LinkedIn and X (Twitter). Stay tuned for more informational blogs and innovations in the world of AI!
Comments