Agentic AI

Diving into LlamaIndex AgentWorkflow: A Nearly Perfect Multi-Agent Orchestration Solution

And fix the issue where the agent can't continue with past requests

Peng Qian

07 Mar 2025 — 14 min read

Diving into LlamaIndex AgentWorkflow: A Nearly Perfect Multi-Agent Orchestration Solution. Image by DALL-E-3

This article introduces you to the latest AgentWorkflow multi-agent orchestration framework by LlamaIndex, demonstrating its application through a project, highlighting its drawbacks, and explaining how I solved them.

By reading this, you'll learn how to simplify multi-agent orchestration and boost development efficiency using LlamaIndex AgentWorkflow.

The project source code discussed here is available at the end of the article for your review and modification without my permission.

Introduction

Recently, I had to review LlamaIndex's official documentation for work and was surprised by the drastic changes: LlamaIndex has rebranded itself from a RAG framework to a multi-agent framework integrating data and workflow. The entire documentation is now built around AgentWorkflow.

Multi-agent orchestration is not new.

For enterprise-level applications, we don’t use a standalone agent to perform a series of tasks. Instead, we prefer a framework that can orchestrate multiple agents to collaborate on completing complex business scenarios.

When it comes to multi-agent orchestration frameworks, you've probably heard of LangGraph, CrewAI, and AutoGen. However, LlamaIndex, once a framework as popular as LangChain, seemed silent in the multi-agent space in the past six months.

Considering LlamaIndex’s high maturity and community involvement, the release of LlamaIndex AgentWorkflow caught our attention. So, my team and I studied it for a month and found that for practical applications, AgentWorkflow is a nearly perfect multi-agent orchestration solution.

Smart as you might be, you might ask, since LlamaIndex Workflow has been out for half a year, what's the difference between Workflow and AgentWorkflow? To answer this, we must first look at how to use LlamaIndex Workflow for multi-agent setups.

What Is Workflow?

I previously wrote an article detailing what LlamaIndex Workflow is and how to use it:

In simple terms, Workflow is an event-driven framework using Python asyncio for concurrent API calls to large language models and various tools.

I also wrote about implementing multi-agent orchestration similar to OpenAI Swarm's agent handoff using Workflow:

However, Workflow is a relatively low-level framework and quite disconnected from other LlamaIndex modules, necessitating frequent learning and calls to LlamaIndex's underlying API when implementing complex multi-agent logic.

If you’ve read my article, you'll notice I heavily rely on LlamaIndex’s low-level API across Workflow’s step methods for function calls and process control, leading to tight coupling between the workflow and agent-specific code. This isn’t ideal for those of us who want to finish work early and enjoy dinner at home.

Perhaps LlamaIndex heard developers’ appeals, leading to the birth of AgentWorkflow.

How Does AgentWorkflow Work?

AgentWorkflow consists of an AgentWorkflow module and an Agent module. Unlike existing LlamaIndex modules, both are specially tailored for recent multi-agent objectives. Here, let’s first discuss the Agent module:

Agent module

The Agent module primarily consists of two classes: FunctionAgent and ReActAgent, both inheriting from BaseWorkflowAgent, hence incompatible with previous Agent classes.

Use FunctionAgent if your language model supports function calls; if not, use ReActAgent. In this article, we use function calls to complete specific tasks, so we’ll focus on FunctionAgent:

FunctionAgent mainly has three methods: take_step, handle_tool_call_results, and finalize.

Illustrations of various methods in FunctionAgent. Image by Author

The take_step method receives the current chat history llm_input, and available tools for the agent. It uses astream_chat_with_tools and get_tool_calls_from_response to get the next tools to execute, storing tool call parameters in the Context.

Besides, take_step outputs the current round’s agent parameters and results in a stream, facilitating debugging and step-by-step viewing of intermediate agent execution results.

The handle_tool_call_results method doesn’t directly execute tools – tools are invoked concurrently in AgentWorkflow. It merely saves tool execution results in the Context.

The finalize method accepts an AgentOutput parameter but doesn’t alter it. Instead, it extracts tool call stacks from the Context, saving them as chat history in ChatMemory.

You can inherit and override FunctionAgent methods to implement your business logic, which I’ll demonstrate in the upcoming project practice.

Agentworkflow module

Having covered the Agent module, let’s delve into the AgentWorkflow module.

In previous projects, I implemented an orchestration process based on Workflow. This was the flowchart at that time:

The flowchart of the workflow implemented in the previous article. Image by Author

Since my code referenced LlamaIndex's official examples, AgentWorkflow closely resembles my implementation but is simplified as it extracts the handoff and function call logic. Here’s AgentWorkflow’s architecture:

The architecture diagram of AgentWorkflow. Image by Author

The entry point is the init_run method, which initializes Context and ChatMemory.

Next, setup_agent identifies the duty agent, extracting its system_prompt and merging it with the current ChatHistory.

Then, run_agent_step calls the agent’s take_step to obtain the required tools for invocation while writing large language model call results to the output stream. In the upcoming project practice, I’ll rewrite take_step for project-specific execution.

Notably, handoff, incorporated as a tool, integrates into agent-executable tools within run_agent_step. If the on-duty agent decides to transfer control to another agent, the handoff method defines next_agent in Context and uses DEFAULT_HANDOFF_OUTPUT_PROMPT to inform the succeeding agent to continue handling the user request.

If an agent finds that it can't handle the user's request, it will use the handoff method to transfer control. Image by Author

parse_agent_output interprets executable tools; if none remain, the workflow returns the final result. Otherwise, it initiates concurrent execution.

call_tool finds and executes the specific tool’s code, writing results into ToolCallResult and throwing a copy into the output stream.

aggregate_tool_results consolidates tool call results, and checks if handoff was executed – if so, switch to the next on-duty agent, restarting the process. Otherwise, if no handoff or the tool's return_redirect is False, it restarts. Other scenarios end Workflow, while calling agent's handle_tool_call_results and finalize allows adjusting language model outcomes.

Apart from standard Workflow step methods, AgentWorkflow includes a from_tools_or_functions method for easy name comprehension. When using AgentWorkflow as an independent Agent, this initiates calling FunctionAgent or ReActAgent, executing them. Here’s an example:

from tavily import AsyncTavilyClient

async def search_web(query: str) -> str:
    """Useful for using the web to answer questions"""
    client = AsyncTavilyClient()
    return str(await client.search(query))

workflow = AgentWorkflow.from_tools_or_functions(
    [search_web],
    system_prompt="You are a helpful assistant that can search the web for information."
)

Useful Events in the Event Stream

You might have noticed that after adopting a multi-agent orchestration framework, one of the biggest hurdles we face is the long wait time for the workflow to complete all agent executions, and it's hard to know what's happening during the workflow execution.

The handoff mechanism of AgentWorkflow handles this much better: when an agent gains control, it continuously responds to user requests without having to re-execute the workflow each time. For visualizing the steps during workflow execution, AgentWorkflow solves this by throwing events in the stream output pipeline in real time.

Similar to LlamaIndex Workflow, after calling the workflow's run method, we can use the handler.stream_events() method to get all the events in the pipeline, and then use the isinstance method to filter the events:

handler = workflow.run(
    user_msg=message.content,
    ctx=context
)
stream_msg = cl.Message(content="")
async for event in handler.stream_events():
    if isinstance(event, AgentInput):
        print(f"========{event.current_agent_name}:=========>")
        print(event.input)
        print("=================<")
    if isinstance(event, AgentOutput) and event.response.content:
        print("<================>")
        print(f"{event.current_agent_name}: {event.response.content}")
        print("<================>")
    if isinstance(event, AgentStream):
        await stream_msg.stream_token(event.delta)
await stream_msg.send()

Specifically, in the order of calls, AgentWorkflow throws five events: AgentInput, AgentStream, AgentOutput, ToolCall, and ToolCallResult, as shown in the diagram below:

The yellow oval represents the events in stream_events. Image by Author

AgentInput is thrown in the take_step method of FunctionAgent, mainly containing the current chat history and agent name. Since the chat history is quite long, we only use this event for debugging and do not display it on the interface.

For me, AgentStream is the most useful event because it outputs the intermediate results of the current agent call as a message stream. If you want to understand what the large language model is thinking during workflow execution, you can focus on this event. But this event also outputs many intermediate results you might not need, depending on your choice.

The effect of streaming intermediate processes in AgentStream. Image by Author

AgentOutput is thrown by AgentWorkflow after the take_step method of FunctionAgent is completed. The main difference between this event and AgentStream is that it is a synchronous event. If you need to get all the messages of the current round at once, you can focus on this event.

ToolCall and ToolCallResult are used to contain the parameters of the tool call and the results from the tool call side, respectively. Like AgentInput, since the messages in these two events are quite long, we only use them for debugging rather than displaying them on the interface.

Having covered AgentWorkflow’s basics, we'll now move on to project practice. To offer a direct comparison, this project again uses the customer service example from previous articles, displaying how simple AgentWorkflow's development can be.

Customer Service Project Practice Based on Agentworkflow

💡 Unlock Full Access for Free!
Subscribe now to read this article and get instant access to all exclusive member content + join our data science community discussions.

Diving into LlamaIndex AgentWorkflow: A Nearly Perfect Multi-Agent Orchestration Solution

Peng Qian

Introduction

What Is Workflow?

How Does AgentWorkflow Work?

Agent module

Agentworkflow module

Useful Events in the Event Stream

Customer Service Project Practice Based on Agentworkflow

Read more

Build Long-Term and Short-Term Memory for Agents Using RedisVL

Microsoft Agent Framework (MAF) Middleware Basics: Add Compliance Fences to Your Agent

My Agent System Looks Powerful but Is Just Industrial Trash

Make Microsoft Agent Framework’s Structured Output Work With Qwen and DeepSeek Models