LangChain, due to its early emergence, has managed to establish itself as the framework of choice for the development of AI agents based on LLMs.
In this article, I will explain why you might be better off using Instructor instead.
But first, what is an AI agent?
When we talk about AI agents in the context of LLMs, we refer to building a system that has an LLM at its core but can leverage tools (functions, API calls, etc.) to provide more accurate answers to users. Additionally, when we talk about AI agents, we often think about some level of autonomous decision-making. However, let’s say there is a spectrum between basic access to tools (search the internet, perform math calculations) to supplement answers and full autonomous decision-making (which can be frail).
When you think about it, the core aspect of building agents is the ability to couple LLMs with other tools. To do so, the key is what we often call function calling or the ability to generate structured outputs. Let me explain.
Building an AI agent with access to the internet
Let’s say you want to build an AI agent that has access to the internet to provide up-to-date answers.
When the agent receives a request that needs an internet search before providing an answer, you want it to communicate properly with an API. To do so, the LLM must be able to generate a request body that matches the specifications of the API. In other words, the LLM must reliably generate a structured output (JSON or otherwise) that can be easily parsed and sent in the body of the request to the API.
Back in April 2023, I was already writing about the problem of unreliable structured output generation with LLMs. Back then, we didn’t have function calling or JSON mode.
Navigating GPT-3's Output Unpredictability: A Developer's Dilemma
And that’s where Instructor shines.
What is Instructor?
It is a Python package that allows you to generate structured outputs that adhere to a predefined Pydantic model. Isn’t that cool?
For the basics, you can check out my previous article on Instructor:
Drop LangChain, Instructor Is All You Need For Your LLM-Based Applications
Here, I want to show you how to use Instructor to actually build an AI agent that has access to the internet.
Here is the code:
import instructor
from pydantic import BaseModel, Field, create_model
from openai import OpenAI
from dotenv import load_dotenv
from typing import Callable, Any, Type
from difflib import SequenceMatcher
import inspect
import requests
import os
load_dotenv()
# What makes an agent ? LLM as brain + some form of memory + tools to interact with the environment
# Let's create a simple agent that can remember a few things and interact with the environment
# Define the agent class
class Agent:
def __init__(self, name: str):
self.name = name
self.brain = instructor.from_openai(OpenAI())
self.memory = []
self.tools = []
def remember(self, memory: str):
self.memory.append(memory)
# retrieve sth from memory using textual similarity from python standard library
def retrieve(self, query: str):
return max(self.memory, key=lambda x: SequenceMatcher(None, x, query).ratio())
# add tool, tool should be a callable
def add_tool(self, tool: Callable):
self.tools.append(tool)
# tools list in string format
def tools_list(self):
return ",".join([tool.__name__ for tool in self.tools])
def get_tool(self, tool_name: str):
for tool in self.tools:
if tool.__name__ == tool_name:
return tool
raise ValueError(f"Tool {tool_name} not found")
class BasicResponse(BaseModel):
response: str
rationale: str
def use_brain(self, prompt: str, response_model: BaseModel = BasicResponse):
return self.brain.chat.completions.create(
model="gpt-3.5-turbo",
response_model=response_model,
messages=[
{"role": "system", "content": f"You are a helpful assistant named {self.name}. You are here to help the user with their queries. Your internal knowledge base might be outdated, so you should rely on the internet for the most recent information. Your knowledge cutoff is 2022 so for anything beyond you have to search the internet. You can use the following tools: {self.tools_list()}"},
{"role": "user", "content": prompt}],
)
# use a tool
def use_tool(self, tool_name: str, *args, **kwargs):
for tool in self.tools:
if tool.__name__ == tool_name:
return tool(*args, **kwargs)
raise ValueError(f"Tool {tool_name} not found")
def get_tool_signature(self, tool: Callable) -> dict:
"""
Extracts the parameters from a tool's signature and prepares them for a Pydantic model.
"""
signature = inspect.signature(tool)
fields = {
param.name: (param.annotation, ...)
for param, param in signature.parameters.items()
if param.name != 'self' and param.kind in [param.POSITIONAL_OR_KEYWORD, param.KEYWORD_ONLY]
}
return fields
def generate_pydantic_model_for_tool(self, tool: Callable, model_name: str = "DynamicToolModel") -> Type[BaseModel]:
"""
Generates a Pydantic model dynamically based on the tool's signature.
"""
fields = self.get_tool_signature(tool)
dynamic_model = create_model(model_name, **fields)
return dynamic_model
# run de agent with a directive and let it use tools, memory to achieve the directive
def run(self, directive: str):
print(f"{self.name} is running with the directive: {directive}")
# use LLM to generate a plan
class Step(BaseModel):
step: str
tool: str = Field(default="", description=f"The tool to use, can be empty if no need for a tool. The tool should be one of {self.tools_list()}")
rationale: str
class Plan(BaseModel):
steps: list[Step]
plan = self.use_brain(directive, Plan)
for step in plan.steps:
print(f"Step: {step.step}")
print(f"Rationale: {step.rationale}")
# check if there is a tool to use
if step.tool:
print(f"Tool: {step.tool}")
# get arguments for the tool by using the brain, should match the signature of the tool
# use step name and rationale to generate arguments
# build a pydantic model that matches the signature of the tool dynamically
tool_model = self.generate_pydantic_model_for_tool(self.get_tool(step.tool))
arguments = self.use_brain(f"Generate arguments for {step.tool} based on {step.step} and {step.rationale} to achieve {directive}", tool_model)
# use tool
tool_output = self.use_tool(step.tool, **arguments.dict())
print(f"Tool output: {tool_output}")
# remember the step and the rationale and the result all at once
self.remember(f"{step.step} - {step.rationale} - {tool_output}")
else:
output = self.use_brain(f"Do {step.step} based on {step.rationale} to achieve {directive}")
self.remember(f"{step.step} - {step.rationale} - {output}")
# Get the memory corresponding to all the most recent steps and use it to generate a conclusion
recent_memory = " ".join(self.memory[-len(plan.steps):])
class Conclusion(BaseModel):
conclusion: str
continue_agent: bool
conclusion = self.use_brain(f"Generate a conclusion based on directive {directive} and responses gathered after taking some steps {recent_memory}. If the conclusion is satisfactory, stop the agent.", Conclusion)
print(f"Conclusion: {conclusion.conclusion}")
if conclusion.continue_agent:
self.run(conclusion.conclusion)
# Create an agent
agent = Agent("Agent 1")
# Define a tool to search on the internet using SERPAPI
def search_on_internet(query: str) -> dict:
search_url = "https://serpapi.com/search"
params = {
"q": query,
"api_key": os.getenv("SERPAPI_API_KEY"),
}
response = requests.get(search_url, params=params)
# return only the top 3 results and text
return {
"results": response.json()["organic_results"][:3],
"text": response.json()["organic_results"][0]["snippet"]
}
# Add the tool to the agent
agent.add_tool(search_on_internet)
# Run the agent
agent.run("Who is the current president of the United States in july 2024?")
If you want to better understand that code or find it difficult to implement, check out my Instructor tutorial on Lycee AI. It is still in the making, and more chapters are dropping by the end of this week:
Learn, practice, and master Artificial Intelligence to improve your productivity.
Lycee AI is a learning management system that offers courses on AI.