Introduction
In our previous blog (Building AI Agent Desing Using LLM), we discussed the basics of building AI agents using Large Language Models (LLMs). Today, we'll take it a step further and explore how to utilize the LlamaIndex library to create an AI agent. This continuation will provide a practical, hands-on approach, complete with code snippets you can run in Google Colab.
Introduction to LlmaIndex
LlamaIndex is a powerful framework designed for building context-augmented applications using Large Language Models (LLMs). This approach enhances LLMs with your private or domain-specific data, enabling use cases such as:
Question-answering chatbots (RAG systems)
Document understanding and extraction.
Autonomous agents for research and action
LlamaIndex equips you with the tools and rich abstraction to develop these applications from prototype to production, offering capabilities for data ingestion, processing, and implementing complex query workflows with LLM prompts. It is available in both Python and TypeScript.
Key Components in LlmaIndex Agent Libraries
In our previous blog post, "Building AI Agent Design Using LLM," we explored the general design of an AI agent, highlighting its key components: the memory module, tools, and planning module, and their interactions. LlamaIndex simplifies these interactions with its AI Agents framework, allowing you to select and configure elements such as tools, the LLM, memory, and more, providing a flexible foundation for developing your AI applications. Here is brief synopsis of the involved components.
Tools
This is a broad term encompassing various functionalities within LlmaIndex that help you construct your AI agent. These tools might include modules for data management, retrieval, response generation, and interaction with LLMs. You can inject your own custom tools or you can also select from community-created tools and use them from llmahub (https://llamahub.ai/?tab=tools).
Retriever
A specific tool within LlmaIndex that focuses on retrieving relevant information from different data sources based on the user's query. It interacts with data sources like document search engines or knowledge bases to find the most pertinent information for the LLM to process. This can be useful if you’re creating an agent that must choose different tools for different kinds of documents.
What if I want to call my own Python function:
Since Python code snippets are Python objects, to save them as text and later retrieve them as a tool, we would need to serialize them. LlmaIndex provides ObjectIndex, with functions to help us do that. Here is how we can do this.
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex
obj_index = ObjectIndex.from_objects(
# Specify the tools
all_tools,
index_cls=VectorStoreIndex,
)
# Create a retriever
obj_retriever = obj_index.as_retriever(similarity_top_k=3)
# Get tool (Python code) to do a calculation from retriever
tools = obj_retriever.retrieve(
"Calculate the net proffit in Amazon's 2023 report"
)
Router
Another specific tool that acts as a decision-making component for your agent. It analyzes the user's query and potential outputs from retrievers (if used) using the LLM. Based on this analysis, the router makes informed decisions about how to proceed. This might involve selecting the most relevant data source, choosing a response generation strategy, or combining information from multiple sources.
Memory
The memory, module is used for storing past conversations. It can be customized, however by default, it’s a flat list of items. It’s a rolling buffer depending on the size of the LLM context window.
Agent
LlmaIndex agent are composed of two components AgentRunner and AgentWorker. The design is inspired by AgentProtocol, which tries to standardize the architecture of an AI agent.
AgentWorker (Orchestrator)
o This is like the head chef.
o It keeps track of orders (user requests), remembers past orders (conversation history), and assigns tasks to different cooks (workers).
o It also provides the menu (user interface) for customers to place their orders (interact with the agent).
AgentRunner (Cook)
o These are the cooks who follow the recipe (task) step-by-step.
o They receive instructions from the head chef (runner) and use any available ingredients (state information) to complete each step.
o They don't remember the entire recipe themselves but focus on the current step.
o The head chef (runner) collects the finished dish (result) from each cook (worker).
In short:
AgentRunner - Manages tasks, remembers stuff, and handles user interaction. It provides two key functions for the user to interact with, Query and Chat.
AgentWorker - Follows instructions step-by-step to complete tasks.
We can create our custom agent using the AgentWorker and AgentRunner or we can just some prebuilt agent (OpenAIAgent or ReActAgent) that comes with the LlmaIndex library.
Putting it all together
Code snippets
Here is the link to the code snippet on (Intro to LLm agent) on github.
We’ll go through the steps and understand the output.
This example uses interacting with Amazon's 2023 annual report:
Download the Amazon92-page 2023 Report. It's a 92 page document
Save the report in the local directory
Setting: Import the LlamaIndex. OpenAI libraries and keys
Specify the LLM to use in the LlmaIndex settings object
Create and add tools. This example creates the VectorStoreIndex tool, which stores only the embeddings
Create an Agent. Note we can create new Agent using the AgentRunner and AgentWorker classes. In this example, I've used the StructuredPlannerAgent class which wraps any agent worker (ReAct, Function Calling, Chain-of-Abstraction, etc.) and decomposes an initial input into several sub-tasks. Each sub-task is represented by an input, expected outcome, and any dependent sub-tasks that should be completed first.
Give a complex question to the agent:
Summarize the key risks for Amazon in their 2023 report. And compare the operating expense between 2022 and 2023
Import the libraries
%pip install llama-index-agent-openai
%pip install llama-index-llms-openai
!pip install llama-index
Download the report and save in local directory
!mkdir -p 'sample_data/'
!wget 'https://s2.q4cdn.com/299287126/files/doc_financials/2024/ar/Amazon-com-Inc-2023-Annual-Report.pdf' -O 'sample_data/amazon_2023.pdf'
Set up the OpenAI key
import os
os.environ["OPENAI_API_KEY"] = openai_api_key
Specify the LLM version to use and import the LlmaIndex classes
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Use ollama in JSON mode
Settings.llm = OpenAI(
model="gpt-4o",
temperature=0.1,
)
Settings.embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
Create a tool. Here we can use tools available in llmahub or create our own tool like calling specific APIs etc. and add them as a list of tools that the agent can refer to
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.tools import QueryEngineTool
# Load documents, create tools
amazon_documents = SimpleDirectoryReader(
input_files=["./sample_data/amazon_2023.pdf"]
).load_data()
amazon_index = VectorStoreIndex.from_documents(amazon_documents)
amazon_tool = QueryEngineTool.from_defaults(
amazon_index.as_query_engine(),
name="amazon_2023",
description="Useful for asking questions about amazon's 2023 report filling.",
)
Create the worker, and agent (in this case we use inbuilt StructuredPlannerAgent) and specify the tools
from llama_index.core.agent import (
StructuredPlannerAgent,
FunctionCallingAgentWorker,
ReActAgentWorker,
)
# create the function calling worker for reasoning
worker = FunctionCallingAgentWorker.from_tools(
[amazon_tool], verbose=True
)
# wrap the worker in the top-level planner
agent = StructuredPlannerAgent(
worker, tools=[amazon_tool], verbose=True
)
Add this code snippet to allow nested event loop in the LlmaIndex library
import nest_asyncio
nest_asyncio.apply()
Give complex question
response = agent.chat(
"Summarize the key risks for Amazon in their 2023 report. And compare the operating expense between 2022 and 2023"
)
We used verbose = True to inspect the interaction to debug. Let's peek into the output.
Here is the plan that it came up with. It broke the question down into the following sub-questions
List the key risks mentioned in the 2023 report, ensuring there is no dependency.
Amazon's operating expense in 2022 with no dependency.
Amazon's operating expense in 2023 with no dependency.
Compare the operating expenses, which have a dependency on 2 and 3
The final one is to create the summary with risk and expense comparison
=== Initial plan ===
Identify Key Risks:
What are the key risks mentioned in Amazon's 2023 report? -> A list of key risks mentioned in Amazon's 2023 report.
deps: []
Get Operating Expense 2022:
What was Amazon's operating expense in 2022? -> Amazon's operating expense in 2022.
deps: []
Get Operating Expense 2023:
What was Amazon's operating expense in 2023? -> Amazon's operating expense in 2023.
deps: []
Compare Operating Expenses:
Compare Amazon's operating expense between 2022 and 2023. -> A comparison of Amazon's operating expense between 2022 and 2023.
deps: ['Get Operating Expense 2022', 'Get Operating Expense 2023']
Summarize Key Risks and Compare Operating Expenses:
Summarize the key risks for Amazon in their 2023 report and compare the operating expense between 2022 and 2023. -> A summary of the key risks for Amazon in their 2023 report and a comparison of the operating expense between 2022 and 2023.
deps: ['Identify Key Risks', 'Compare Operating Expenses']
At each step, it added the interaction to the memory
Finally analyze and give the final output
> Running step aa9b6e4b-baf5-4b7d-a453-0be654a5e676. Step input: Compare Amazon's operating expense between 2022 and 2023.
Added user message to memory: Compare Amazon's operating expense between 2022 and 2023.
=== LLM Response ===
### Comparison of Amazon's Operating Expense Between 2022 and 2023
- **Operating Expense in 2022**: $501.735 billion
- **Operating Expense in 2023**: $537.933 billion
### Analysis
- **Increase in Operating Expense**: $537.933 billion (2023) - $501.735 billion (2022) = $36.198 billion
- **Percentage Increase**: \(\left(\frac{36.198 \text{ billion}}{501.735 \text{ billion}}\right) \times 100 \approx 7.22\%\)
### Summary
Amazon's operating expense increased by approximately $36.198 billion from 2022 to 2023, representing a percentage increase of about 7.22%. This indicates a significant rise in the company's operating costs over the year.
Output of summary of key risks
### Summary of Key Risks for Amazon in 2023
1. **Intense Competition**
- Amazon faces significant competition across various sectors and regions, including e-commerce, physical retail, and web services.
- Competitors may have more resources, stronger brand recognition, and may use aggressive pricing and marketing strategies.
2. **Expansion Risks**
- Entering new markets, products, services, and technologies involves additional risks.
- Challenges include limited experience, potential service disruptions, and the risk that new ventures may not achieve expected profitability or recover investments.
3. **Economic and Geopolitical Conditions**
- Global economic and geopolitical uncertainties, along with unforeseen events, can impact Amazon's operations and financial performance.
4. **Customer Impact**
- Risks affecting Amazon's customers, including third-party sellers, can indirectly harm Amazon's business.
5. **Market Risks**
- **Interest Rate Risk**: Exposure to changes in interest rates that could affect financial performance. Investments in marketable debt securities with fixed interest rates may lose value if interest rates rise.
Conclusion
AI agents demonstrate the extensive capabilities of large language models, performing tasks from question answering and information retrieval to content generation and workflow automation. To stay updated on AI and ML advancements, follow us on LinkedIn and X (Twitter).
Comments