June 30, 2026
Neuro San Now Supports Middleware for AI Agents
How to add logging, PII redaction, rate limiting, summarization, and more to any agent in your network, configured entirely in HOCON.
Building a production-ready multi-agent system requires more than well-designed agents. It requires visibility into what those agents are doing at every step of their reasoning cycle, controls on what data flows through them and what the model is allowed to see, and the resilience to handle failures gracefully when they happen. These concerns cut across every agent in your network, which means they do not belong inside any single agent's instructions and should not require you to modify the runtime or scatter handling logic across your client layer to implement.
This is exactly the kind of problem that middleware was built for. Middleware is code that manages cross-cutting concerns between systems: logging, security, data transformation, error handling. In the context of agent networks, that same idea applies to the reasoning loop itself, giving you a structured place to add observability, security controls, history management, and resilience logic without ever touching your agent graph.
We are excited to share that Neuro San now supports middleware natively. You can attach middleware to any agent in your network directly from your HOCON config, declaring it alongside your agents the same way you already define tools and instructions. Neuro San handles instantiation, injection, and execution automatically.
This post walks through how it works, what ships out of the box, and how to write your own.
What Is Middleware?
Middleware provides a powerful way to intercept, modify, and enhance agent interactions at every stage of execution. Think of it as the connective layer that sits between your agent graph and the runtime, giving you precise control over what happens inside the agent loop without touching your core agent logic. You can inspect messages, modify context, override results, or terminate execution early, all from a single, reusable layer that plugs in through config.
This matters because production multi-agent systems have needs that go well beyond what any individual agent should be responsible for. You need visibility into what each agent is doing at every step of its reasoning cycle. You need to ensure sensitive data never reaches the model. You need conversation history to stay within context limits. You need the system to recover gracefully from LLM failures. These are cross-cutting concerns, and without a dedicated place to put them, they end up scattered across agent instructions, client code, or custom runtime modifications that are brittle and hard to maintain.
Middleware gives all of that a home. You write the behavior once, attach it declaratively in HOCON, and it runs wherever you need it across your network.
In practice, middleware handles things like:
Tracking agent behavior with logging, analytics, and debugging
Transforming prompts, tool selection, and output formatting
Adding retries, fallbacks, and early termination logic
Applying rate limits, guardrails, and PII detection
How It Works
Neuro San has always been data-driven. A subject-matter expert can describe an entire multi-agent system in a HOCON file without writing orchestration code. Middleware follows the same philosophy. Rather than building a custom abstraction from scratch, Neuro San builds on top of LangChain's AgentMiddleware and exposes it through the HOCON you already write. Each agent can declare an ordered list of middleware, and Neuro San instantiates and attaches them automatically when it builds the agent, injecting Neuro San-specific context including chat history, sly_data, journals, and reservations where needed.
Under the hood, your middleware list is passed directly into LangChain's create_agent(...):
# neuro_san/internals/run_context/langchain/core/langchain_run_context.py
return create_agent(
model=llm,
tools=self.tools,
middleware=middleware, # <-- your HOCON middleware, in order
checkpointer=checkpointer,
system_prompt=instructions,
)Each middleware is a Python class that overrides one or more lifecycle hooks. Neuro San runs an async server, so the async variants are preferred:
abefore_agent() fires once before the agent starts. Use it to load resources, open sessions, or prime caches.
awrap_model_call() wraps each LLM call. Use it to modify the request before it hits the model, or inspect and transform the response.
abefore_model() and aafter_model() run before and after each LLM call. Use them to trim or summarize chat history, or post-process output.
awrap_tool_call() wraps each tool call. Use it to intercept, rewrite, or short-circuit tool execution.
aafter_agent() fires once after the agent finishes. Use it to clean up sessions or emit final state.
Hook | When it fires | Typical use |
abefore_agent() | Once, before the agent starts | Load resources, open sessions, prime caches |
awrap_model_call() | Around each LLM call | Modify the request (e.g. inject into the system prompt), inspect/transform the response |
abefore_model() / aafter_model() | Before / after each LLM call | Trim or summarize history, post-process output |
awrap_tool_call() | Around each tool call | Intercept, short-circuit, or rewrite tool execution |
aafter_agent() | Once, after the agent finishes | Clean up sessions, emit final state |
Because middleware can read and rewrite both the request and the response, it is strictly more powerful than a system prompt tweak. It sees the live message stream and can change what the model receives and what the rest of the network sees.
One important note: Neuro San supports class-based middleware only, not the decorator or annotation style.
Tutorial 1: Attach Built-In Middleware in HOCON
The simplest way to get started is with one of the middleware classes Neuro San ships out of the box: summarize the conversation automatically once it gets long. Neuro San ships NeuroSanSummarizationMiddleware, which adapts LangChain’s summarization to Neuro San’s per-agent chat history. Here is how to attach this summarization middleware to an agent to automatically condenses conversation history once it exceeds a threshold.
Add a middleware list to any agent:
{
"name": "MusicNerd",
"function": { "description": "I can help with music-related inquiries." },
"instructions": "You're Music Nerd, the go-to brain for all things rock and pop...",
"tools": [],
"middleware": [
{
"class": "neuro_san.middleware.neuro_san_summarization_middleware.NeuroSanSummarizationMiddleware",
"args": {
# The model used to write the summary
"model": "gpt-4.1",
# Summarize once there are 3+ messages
# (HOCON has no tuples, so use arrays — Neuro SAN coerces them)
"trigger": [["messages", 3]],
# Keep the most recent 1 message after summarizing
"keep": ["messages", 1],
# Keep the generated summary in the agent's chat history
"keep_summary_in_context": true,
# Neuro-SAN-injected arg (see below)
"chat_history": true
}
}
]
}Two things worth noting.
class is the fully-qualified class name. Neuro san resolves it from your PYTHONPATH – so this works equally well for built-in middleware and your own.
args are keyword arguments passed to the constructor. Most are plain values, but a few are special which is covered later in this blog.
Full runnable example: neuro_san/registries/music_nerd_summarize.hocon.
How Middleware Accesses Neuro San Internals
A summarizer needs Neuro San's chat history. A redactor might need sly_data, Neuro San's private data channel that allows agents to securely exchange sensitive state without exposing it to the LLM. A network-copy middleware needs a reservationist. A network-copy middleware needs a reservationist. You do not construct these objects yourself. You request them by name.
If an arg name appears in both your args block and the middleware's constructor signature, Neuro San replaces your placeholder value with the real, framework-provided object at build time. Under the hood, this is MiddlewareFactory._prepare_args(...):
So in the HOCON above, "chat_history": true is just a flag that says “please inject the real chat history here.” The true is a throwaway placeholder; Neuro San swaps in the actual list of messages. The supported keys are documented in agent_hocon_reference.md#args.
This is the bridge that makes middleware first-class in Neuro San: it can participate in history management, private-data channels, journaling/progress reporting, and even spin up temporary networks – all while staying a plain LangChain middleware.
Reservations and Checkpointers
Two additional knobs live alongside class and args.
allow: { reservations: true } grants the middleware a reservationist, used to procure temporary agent networks for a bounded lifetime.
checkpointer is a sibling config that builds a LangGraph checkpointer for the agent. If several middleware request one, the first wins (Neuro SAN warns about the rest).
Tutorial 2: Drop In a LangChain Middleware Directly
You are not limited to Neuro San-authored classes. Any class-based LangChain AgentMiddleware works. Here is PII redaction using the built-in PIIMiddleware to scrub phone numbers from agent output:
Ask this agent to leave a voicemail at 867-5309 and the number comes back redacted. No Neuro San-specific wiring is needed. PIIMiddleware does not request any of the injectable args, so Neuro San constructs it with exactly the values you provided.
Full example: neuro_san/registries/pii_middleware.hocon.
What Ships Today
Neuro San includes the following middleware classes ready to use immediately:
Neuro San Summarization Middleware
This middleware condenses older messages once a trigger threshold is exceeded, either by message count or token count, while keeping recent turns intact. The keep_summary_in_context flag controls whether the generated summary replaces the raw history in the agent context, and it adapts LangChain's summarizer to Neuro San's per-agent chat history model.Llm Config Tool Selector Middleware
Llm Config Tool Selector Middleware
This is an LLM-driven tool selector that understands Neuro San LLM Configs. It exposes only the most relevant tools to the model, which cuts tokens and latency for agents with large tool sets, though this comes at some cost to federation flexibility for deep agent trees.
Network Copy Middleware
This is a proof-of-concept that uses the reservations infrastructure to clone an existing agent network into a temporary, time-limited deployment, with the reservation handle returned through sly_data.
Any class-based LangChain middleware also works out of the box, including PIIMiddleware shown above.
Tutorial 3: Write Your Own
The interface is straightforward. Subclass AgentMiddleware, override the hooks you care about, and declare any injectable args in your constructor signature. Here is a minimal example that appends a policy note to the system prompt on every model call:
Wire it up in HOCON:
To access Neuro San internals, add the matching parameter name to your constructor and include the key in args with any placeholder value. Neuro San injects the real object at build time. The pattern of overriding awrap_model_call, awrap_tool_call, abefore_agent, and aafter_agent is exactly how the Agent Skills middleware is built, which is the subject of the next post.
A Few Things Worth Knowing
Order matters: Middleware runs in the order listed. A redactor placed before a logger sees different data than one placed after it. Be intentional about the chain.
Open in abefore_agent, close in aafter_agent: If your middleware holds a resource like an HTTP session or a database connection, the agent lifecycle hooks are the right place to manage it.
awrap_* hooks must call handler: If you intercept a model or tool call, you are responsible for continuing the chain. Forgetting to call handler is how you accidentally swallow all your agent output.
Keep it cross-cutting: Logic specific to one agent belongs in that agent's instructions. Middleware is for concerns that span the full loop: security, observability, resilience.
Get Started with Middleware in Neuro San
Middleware fills a gap that every serious multi-agent system eventually hits. You can define a well-structured agent graph in HOCON, but there are behaviors that do not belong inside any agent: security controls, observability hooks, rate limiting, history management. Those need a place to live that is not your agent instructions and not a fork of the runtime.
Middleware gives you that. It is Python you already know how to write, attached via config you already know how to use, running exactly where you need it in the agent loop.
To explore the full reference, see docs/agent_hocon_reference.md#middleware.
Working examples are available at neuro_san/registries/music_nerd_summarize.hocon, pii_middleware.hocon, and copy_cat_middleware.hocon in the repo.
Langchain middleware overview: https://docs.langchain.com/oss/python/langchain/middleware/overview
Neuro San is open source and available on GitHub. In the next post, we will show how this same middleware primitive powers Agent Skills: portable, progressively-disclosed packets of expertise built entirely on the mechanism described here.
Noravee specializes in machine learning, NLP, and analytical modeling, with a background in condensed matter physics