For AI Agents, Drop LangChain and Use Instructor

LangChain, due to its early emergence, has managed to establish itself as the framework of choice for the development of AI agents based on LLMs.

In this article, I will explain why you might be better off using Instructor instead.

But first, what is an AI agent?

When we talk about AI agents in the context of LLMs, we refer to building a system that has an LLM at its core but can leverage tools (functions, API calls, etc.) to provide more accurate answers to users. Additionally, when we talk about AI agents, we often think about some level of autonomous decision-making. However, let’s say there is a spectrum between basic access to tools (search the internet, perform math calculations) to supplement answers and full autonomous decision-making (which can be frail).

When you think about it, the core aspect of building agents is the ability to couple LLMs with other tools. To do so, the key is what we often call function calling or the ability to generate structured outputs. Let me explain.

Building an AI agent with access to the internet

Let’s say you want to build an AI agent that has access to the internet to provide up-to-date answers.

When the agent receives a request that needs an internet search before providing an answer, you want it to communicate properly with an API. To do so, the LLM must be able to generate a request body that matches the specifications of the API. In other words, the LLM must reliably generate a structured output (JSON or otherwise) that can be easily parsed and sent in the body of the request to the API.

Back in April 2023, I was already writing about the problem of unreliable structured output generation with LLMs. Back then, we didn’t have function calling or JSON mode.

Navigating GPT-3's Output Unpredictability: A Developer's Dilemma

And that’s where Instructor shines.

What is Instructor?

It is a Python package that allows you to generate structured outputs that adhere to a predefined Pydantic model. Isn’t that cool?

For the basics, you can check out my previous article on Instructor:

Drop LangChain, Instructor Is All You Need For Your LLM-Based Applications

Here, I want to show you how to use Instructor to actually build an AI agent that has access to the internet.

Here is the code:

import instructor
from pydantic import BaseModel, Field, create_model
from openai import OpenAI
from dotenv import load_dotenv
from typing import Callable, Any, Type
from difflib import SequenceMatcher
import inspect
import requests
import os

load_dotenv()

# What makes an agent ? LLM as brain + some form of memory + tools to interact with the environment
# Let's create a simple agent that can remember a few things and interact with the environment

# Define the agent class
class Agent:
    def __init__(self, name: str):
        self.name = name
        self.brain = instructor.from_openai(OpenAI())
        self.memory = []
        self.tools = []

    def remember(self, memory: str):
        self.memory.append(memory)

    # retrieve sth from memory using textual similarity from python standard library
    def retrieve(self, query: str):
        return max(self.memory, key=lambda x: SequenceMatcher(None, x, query).ratio())

    # add tool, tool should be a callable
    def add_tool(self, tool: Callable):
        self.tools.append(tool)

    # tools list in string format
    def tools_list(self):
        return ",".join([tool.__name__ for tool in self.tools])

    def get_tool(self, tool_name: str):
        for tool in self.tools:
            if tool.__name__ == tool_name:
                return tool
        raise ValueError(f"Tool {tool_name} not found")

    class BasicResponse(BaseModel):
        response: str
        rationale: str

    def use_brain(self, prompt: str, response_model: BaseModel = BasicResponse):
        return self.brain.chat.completions.create(
            model="gpt-3.5-turbo",
            response_model=response_model,
            messages=[
                {"role": "system", "content": f"You are a helpful assistant named {self.name}. You are here to help the user with their queries. Your internal knowledge base might be outdated, so you should rely on the internet for the most recent information. Your knowledge cutoff is 2022 so for anything beyond you have to search the internet. You can use the following tools: {self.tools_list()}"},
                {"role": "user", "content": prompt}],
        )

    # use a tool
    def use_tool(self, tool_name: str, *args, **kwargs):
        for tool in self.tools:
            if tool.__name__ == tool_name:
                return tool(*args, **kwargs)
        raise ValueError(f"Tool {tool_name} not found")

    def get_tool_signature(self, tool: Callable) -> dict:
        """
        Extracts the parameters from a tool's signature and prepares them for a Pydantic model.
        """
        signature = inspect.signature(tool)
        fields = {
            param.name: (param.annotation, ...)
            for param, param in signature.parameters.items()
            if param.name != 'self' and param.kind in [param.POSITIONAL_OR_KEYWORD, param.KEYWORD_ONLY]
        }
        return fields

    def generate_pydantic_model_for_tool(self, tool: Callable, model_name: str = "DynamicToolModel") -> Type[BaseModel]:
        """
        Generates a Pydantic model dynamically based on the tool's signature.
        """
        fields = self.get_tool_signature(tool)
        dynamic_model = create_model(model_name, **fields)
        return dynamic_model

    # run de agent with a directive and let it use tools, memory to achieve the directive
    def run(self, directive: str):
        print(f"{self.name} is running with the directive: {directive}")
        # use LLM to generate a plan
        class Step(BaseModel):
            step: str
            tool: str = Field(default="", description=f"The tool to use, can be empty if no need for a tool. The tool should be one of {self.tools_list()}")
            rationale: str

        class Plan(BaseModel):
            steps: list[Step]

        plan = self.use_brain(directive, Plan)

        for step in plan.steps:
            print(f"Step: {step.step}")

            print(f"Rationale: {step.rationale}")

            # check if there is a tool to use
            if step.tool:
                print(f"Tool: {step.tool}")



                # get arguments for the tool by using the brain, should match the signature of the tool
                # use step name and rationale to generate arguments
                # build a pydantic model that matches the signature of the tool dynamically
                tool_model = self.generate_pydantic_model_for_tool(self.get_tool(step.tool))

                arguments = self.use_brain(f"Generate arguments for {step.tool} based on {step.step} and {step.rationale} to achieve {directive}", tool_model)

                # use tool
                tool_output = self.use_tool(step.tool, **arguments.dict())
                print(f"Tool output: {tool_output}")

                # remember the step and the rationale and the result all at once
                self.remember(f"{step.step} - {step.rationale} - {tool_output}")
            else:
                output = self.use_brain(f"Do {step.step} based on {step.rationale} to achieve {directive}")
                self.remember(f"{step.step} - {step.rationale} - {output}")


        # Get the memory corresponding to all the most recent steps and use it to generate a conclusion
        recent_memory = " ".join(self.memory[-len(plan.steps):])
        class Conclusion(BaseModel):
            conclusion: str
            continue_agent: bool

        conclusion = self.use_brain(f"Generate a conclusion based on directive {directive} and responses gathered after taking some steps {recent_memory}. If the conclusion is satisfactory, stop the agent.", Conclusion)

        print(f"Conclusion: {conclusion.conclusion}")

        if conclusion.continue_agent:
            self.run(conclusion.conclusion)

# Create an agent
agent = Agent("Agent 1")

# Define a tool to search on the internet using SERPAPI
def search_on_internet(query: str) -> dict:
    search_url = "https://serpapi.com/search"
    params = {
        "q": query,
        "api_key": os.getenv("SERPAPI_API_KEY"),
    }
    response = requests.get(search_url, params=params)

    # return only the top 3 results and text
    return {
        "results": response.json()["organic_results"][:3],
        "text": response.json()["organic_results"][0]["snippet"]
    }

# Add the tool to the agent
agent.add_tool(search_on_internet)

# Run the agent
agent.run("Who is the current president of the United States in july 2024?")

If you want to better understand that code or find it difficult to implement, check out my Instructor tutorial on Lycee AI. It is still in the making, and more chapters are dropping by the end of this week:

Lycee AI

Learn, practice, and master Artificial Intelligence to improve your productivity.

Lycee AI is a learning management system that offers courses on AI.