AI Agent Parallel Processing Workflow
My thoughts on the article."Multi-AI Agent: Parallel processing and automatic summarization using multiple LLMs."
Link
Multi-AI Agent: Parallel processing and automatic summarization using multiple LLMs
My thoughts
The medium article explores a challenge when using AI Agents: efficiently working with and comparing responses from multiple LLMs.
The article sets up an example where LLMs are processed sequentially and then explains how to implement them so that they are run in a parallel process using LangGraph.
LangGraph describes itself as:
LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Compared to other LLM frameworks, it offers these core benefits: cycles, controllability, and persistence. LangGraph allows you to define flows that involve cycles, essential for most agentic architectures, differentiating it from DAG-based solutions
Specifically for this blog post, they use LangGraph’s capability for creating branches for parallel execution, which you can read more about here. In short, they describe it as follows:
Parallel execution of nodes is essential to speed up overall graph operation. LangGraph offers native support for parallel execution of nodes, which can significantly enhance the performance of graph-based workflows. This parallelization is achieved through fan-out and fan-in mechanisms, utilizing both standard edges and conditional_edges.
In the example, they create the following structure:
Supervisor to control the process and assign tasks to each AI Agent
AI Agent to complete the assigned task and send the result to a summarizer
Summarizer to get the results from all the AI Agents and generate a final output.
Note that in this case, memory is not written to or read from at a later point.
Creating a memory for transactions and using it can be helpful in case you want to replay the interaction later or if one of the AI Agents becomes unresponsive or times out.
The Core Challenge when working with multiple LLMs
When building AI applications, we often want to:
Compare responses from different LLMs
Process multiple LLM tasks simultaneously
Summarize or aggregate the results
Sequential LLM calls is slow and inefficient.
For example, if each LLM call takes 2 seconds, three models would take 6+ seconds to complete.
But if we run them in parallel, it’ll take 2 seconds.
Their Solution Using LangGraph
The article implements a graph-based solution with three components:
A supervisor to manage tasks
Multiple agents (one per LLM) running in parallel
A summarizer to combine results
In this case, rather than having multiple agents use the same LLM, but on different inputs, they use different LLMs (gemini-1.5-flash-latest, gpt-4o-mini, claude-3-haiku-20240307) on the same input.
Simple DIY Implementation
While LangGraph provides a nice framework, we could also implement this pattern ourselves using Python’s async/await.
Oftentimes, it’s simpler to build a DIY implementation before reaching for a library like LangGraph or others.
It lets you figure out what you actually want to build and do, before adding another layer of abstraction to your project.
Here’s a mocked-up version of the same solution, using Python’s async/await with mocked LLM responses.
# Note
# - using Python 3.9+
# - using Python type hinting conventions
import asyncio # library to write concurrent code using coroutines
# enables parallel execution of I/O-bound tasks like the LLM calls in this example
from dataclasses import dataclass # provides a decorator and functions for __init__() for classes that only hold data values
@dataclass
class LLMResponse:
model: str
response: str
async def query_llm(model: str, prompt: str) -> LLMResponse:
# In practice, replace with actual async LLM calls
await asyncio.sleep(2) # Simulate LLM processing
return LLMResponse(model, f"Response from {model}")
async def parallel_llm_query(prompt: str, models: list[str]) -> list[LLMResponse]:
tasks = [query_llm(model, prompt) for model in models]
return await asyncio.gather(*tasks)
async def summarize_responses(responses: list[LLMResponse]) -> str:
# In practice, send to a summarization LLM
return "\n".join(f"{r.model}: {r.response}" for r in responses)
This mocked-up Python code achieves the same goal as the LangGraph code but with standard Python libraries and better control over the implementation.
Practical Implications
This type of pattern (parallel LLM calls over sequential LLM Calls) can be valuable for:
A/B testing different LLMs
Building more robust systems by comparing multiple models
Creating ensemble systems that combine multiple models’ strengths
Key Takeaways for AI Agent Development
This approach reminds me of Amdahl’s law or Amdahl’s argument for computer architecture: “The overall performance improvement gained by optimizing a single part of a system is limited by the fraction of time that the improved part is actually used”
In simpler terms, from Wikipedia:
“Amdahl’s law is a formula that shows how much faster a task can be completed when you add more resources to the system”
This generally means asking myself, if I had more resources to throw at the problem, how much faster could I get it to work?
Generally, for problems for which you can come up with a parallel or somewhat parallel solution, the more resources there are, the faster we can complete the task.
For the case of this blog’s example, if we have 3 LLM calls that each takes 2 seconds:
Sequential: 6 seconds total
Parallel: ~2 seconds total (limited by the slowest LLM call)
This principle extends naturally to the broader domain of AI Agent architecture.
Just as we can parallelize individual LLM calls, we can apply the same thinking to entire agent workflows.
Instead of having one agent do a thing while all the other agents wait, we should have all the agents working simultaneously whenever their tasks are independent of each other.
Overall, For AI application developers, this means:
Start simple - don’t reach for libraries like LangGraph until you have a standard Python + libraries solution
Consider what you can make parallel in your architecture
Consider whether you need to add memory to your system for replay and record keeping.
Thinking through the problem and what you can do beforehand should save time when you evolve the architecture.