Skip to main content Skip to footer


June 30, 2026

Neuro San Now Supports Middleware for AI Agents

How to add logging, PII redaction, rate limiting, summarization, and more to any agent in your network, configured entirely in HOCON.


Building a production-ready multi-agent system requires more than well-designed agents. It requires visibility into what those agents are doing at every step of their reasoning cycle, controls on what data flows through them and what the model is allowed to see, and the resilience to handle failures gracefully when they happen. These concerns cut across every agent in your network, which means they do not belong inside any single agent's instructions and should not require you to modify the runtime or scatter handling logic across your client layer to implement.

This is exactly the kind of problem that middleware was built for. Middleware is code that manages cross-cutting concerns between systems: logging, security, data transformation, error handling. In the context of agent networks, that same idea applies to the reasoning loop itself, giving you a structured place to add observability, security controls, history management, and resilience logic without ever touching your agent graph.

We are excited to share that Neuro San now supports middleware natively. You can attach middleware to any agent in your network directly from your HOCON config, declaring it alongside your agents the same way you already define tools and instructions. Neuro San handles instantiation, injection, and execution automatically.

This post walks through how it works, what ships out of the box, and how to write your own.

What Is Middleware?

Middleware provides a powerful way to intercept, modify, and enhance agent interactions at every stage of execution. Think of it as the connective layer that sits between your agent graph and the runtime, giving you precise control over what happens inside the agent loop without touching your core agent logic. You can inspect messages, modify context, override results, or terminate execution early, all from a single, reusable layer that plugs in through config.

This matters because production multi-agent systems have needs that go well beyond what any individual agent should be responsible for. You need visibility into what each agent is doing at every step of its reasoning cycle. You need to ensure sensitive data never reaches the model. You need conversation history to stay within context limits. You need the system to recover gracefully from LLM failures. These are cross-cutting concerns, and without a dedicated place to put them, they end up scattered across agent instructions, client code, or custom runtime modifications that are brittle and hard to maintain.

Middleware gives all of that a home. You write the behavior once, attach it declaratively in HOCON, and it runs wherever you need it across your network.

In practice, middleware handles things like:

  • Tracking agent behavior with logging, analytics, and debugging

  • Transforming prompts, tool selection, and output formatting

  • Adding retries, fallbacks, and early termination logic

  • Applying rate limits, guardrails, and PII detection

Diagram with loop

How It Works

Neuro San has always been data-driven. A subject-matter expert can describe an entire multi-agent system in a HOCON file without writing orchestration code. Middleware follows the same philosophy. Rather than building a custom abstraction from scratch, Neuro San builds on top of LangChain's AgentMiddleware and exposes it through the HOCON you already write. Each agent can declare an ordered list of middleware, and Neuro San instantiates and attaches them automatically when it builds the agent, injecting Neuro San-specific context including chat history, sly_data, journals, and reservations where needed.

Under the hood, your middleware list is passed directly into LangChain's create_agent(...):

# neuro_san/internals/run_context/langchain/core/langchain_run_context.py
return create_agent(
    model=llm,
    tools=self.tools,
    middleware=middleware,      # <-- your HOCON middleware, in order
    checkpointer=checkpointer,
    system_prompt=instructions,
)

Each middleware is a Python class that overrides one or more lifecycle hooks. Neuro San runs an async server, so the async variants are preferred:

  • abefore_agent() fires once before the agent starts. Use it to load resources, open sessions, or prime caches.

  • awrap_model_call() wraps each LLM call. Use it to modify the request before it hits the model, or inspect and transform the response.

  • abefore_model() and aafter_model() run before and after each LLM call. Use them to trim or summarize chat history, or post-process output.

  • awrap_tool_call() wraps each tool call. Use it to intercept, rewrite, or short-circuit tool execution.

  • aafter_agent() fires once after the agent finishes. Use it to clean up sessions or emit final state.

Hook

When it fires

Typical use

abefore_agent()

Once, before the agent starts

Load resources, open sessions, prime caches

awrap_model_call()

Around each LLM call

Modify the request (e.g. inject into the system prompt), inspect/transform the response

abefore_model() / aafter_model()

Before / after each LLM call

Trim or summarize history, post-process output

awrap_tool_call()

Around each tool call

Intercept, short-circuit, or rewrite tool execution

aafter_agent()

Once, after the agent finishes

Clean up sessions, emit final state

Because middleware can read and rewrite both the request and the response, it is strictly more powerful than a system prompt tweak. It sees the live message stream and can change what the model receives and what the rest of the network sees.

One important note: Neuro San supports class-based middleware only, not the decorator or annotation style.

Tutorial 1: Attach Built-In Middleware in HOCON

The simplest way to get started is with one of the middleware classes Neuro San ships out of the box: summarize the conversation automatically once it gets long. Neuro San ships NeuroSanSummarizationMiddleware, which adapts LangChain’s summarization to Neuro San’s per-agent chat history. Here is how to attach this summarization middleware to an agent to automatically condenses conversation history once it exceeds a threshold.

Add a middleware list to any agent:

{
    "name": "MusicNerd",
    "function": { "description": "I can help with music-related inquiries." },
    "instructions": "You're Music Nerd, the go-to brain for all things rock and pop...",
    "tools": [],

    "middleware": [
        {
            "class": "neuro_san.middleware.neuro_san_summarization_middleware.NeuroSanSummarizationMiddleware",
            "args": {
                # The model used to write the summary
                "model": "gpt-4.1",

                # Summarize once there are 3+ messages
                # (HOCON has no tuples, so use arrays — Neuro SAN coerces them)
                "trigger": [["messages", 3]],

                # Keep the most recent 1 message after summarizing
                "keep": ["messages", 1],

                # Keep the generated summary in the agent's chat history
                "keep_summary_in_context": true,

                # Neuro-SAN-injected arg (see below)
                "chat_history": true
            }
        }
    ]
}

Two things worth noting. 

  1. class is the fully-qualified class name. Neuro san resolves it from your PYTHONPATH – so this works equally well for built-in middleware and your own.

  2. args are keyword arguments passed to the constructor. Most are plain values, but a few are special which is covered later in this blog.

Full runnable example: neuro_san/registries/music_nerd_summarize.hocon.

How Middleware Accesses Neuro San Internals

A summarizer needs Neuro San's chat history. A redactor might need sly_data, Neuro San's private data channel that allows agents to securely exchange sensitive state without exposing it to the LLM. A network-copy middleware needs a reservationist. A network-copy middleware needs a reservationist. You do not construct these objects yourself. You request them by name.

If an arg name appears in both your args block and the middleware's constructor signature, Neuro San replaces your placeholder value with the real, framework-provided object at build time. Under the hood, this is MiddlewareFactory._prepare_args(...):

special_args = {
    "origin":            middleware_origin,        # where this middleware sits in the network
    "origin_str":        middleware_origin_str,    # string form of the above
    "reservationist":    reservationist,           # create temporary agent networks
    "sly_data":          sly_data,                 # private data kept out of the chat stream
    "chat_history":      self.chat_history,        # this agent's full message history
    "activation_capsule": self.activation_capsule, # build models / call agents
}

# Plus "journal" and "progress_reporter", which are created lazily on demand.

So in the HOCON above, "chat_history": true is just a flag that says “please inject the real chat history here.” The true is a throwaway placeholder; Neuro San swaps in the actual list of messages. The supported keys are documented in agent_hocon_reference.md#args.

This is the bridge that makes middleware first-class in Neuro San: it can participate in history management, private-data channels, journaling/progress reporting, and even spin up temporary networks – all while staying a plain LangChain middleware.

Reservations and Checkpointers

Two additional knobs live alongside class and args. 

  • allow: { reservations: true } grants the middleware a reservationist, used to procure temporary agent networks for a bounded lifetime.

  • checkpointer is a sibling config that builds a LangGraph checkpointer for the agent. If several middleware request one, the first wins (Neuro SAN warns about the rest).

Tutorial 2: Drop In a LangChain Middleware Directly

You are not limited to Neuro San-authored classes. Any class-based LangChain AgentMiddleware works. Here is PII redaction using the built-in PIIMiddleware to scrub phone numbers from agent output:

"middleware": [
    {
        "class": "langchain.agents.middleware.PIIMiddleware",
        "args": {
            "pii_type": "phone_number",
            "detector": "[a-zA-Z0-9]{3}-[a-zA-Z0-9]{4}",
            "strategy": "redact",
            "apply_to_output": true
        }
    }
]

Ask this agent to leave a voicemail at 867-5309 and the number comes back redacted. No Neuro San-specific wiring is needed. PIIMiddleware does not request any of the injectable args, so Neuro San constructs it with exactly the values you provided.

Full example: neuro_san/registries/pii_middleware.hocon.

What Ships Today

Neuro San includes the following middleware classes ready to use immediately:

Neuro San Summarization Middleware

This middleware condenses older messages once a trigger threshold is exceeded, either by message count or token count, while keeping recent turns intact. The keep_summary_in_context flag controls whether the generated summary replaces the raw history in the agent context, and it adapts LangChain's summarizer to Neuro San's per-agent chat history model.Llm Config Tool Selector Middleware

Llm Config Tool Selector Middleware

This is an LLM-driven tool selector that understands Neuro San LLM Configs. It exposes only the most relevant tools to the model, which cuts tokens and latency for agents with large tool sets, though this comes at some cost to federation flexibility for deep agent trees.

Network Copy Middleware

This is a proof-of-concept that uses the reservations infrastructure to clone an existing agent network into a temporary, time-limited deployment, with the reservation handle returned through sly_data.

Any class-based LangChain middleware also works out of the box, including PIIMiddleware shown above.

Tutorial 3: Write Your Own

The interface is straightforward. Subclass AgentMiddleware, override the hooks you care about, and declare any injectable args in your constructor signature. Here is a minimal example that appends a policy note to the system prompt on every model call:

from typing import Any, Awaitable, Callable
from typing import override

from langchain.agents.middleware.types import (
    AgentMiddleware, ContextT, ModelRequest, ModelResponse, ResponseT,
)
from langchain_core.messages import SystemMessage


class HousePolicyMiddleware(AgentMiddleware):
    """Appends a fixed policy note to the system prompt on every model call."""

    def __init__(self, policy: str) -> None:
        self.policy = policy

    @override
    async def awrap_model_call(
        self,
        request: ModelRequest[ContextT],
        handler: Callable[[ModelRequest[ContextT]], Awaitable[ModelResponse[ResponseT]]],
    ) -> ModelResponse[ResponseT]:
        system = request.system_message
        original = system.content if system is not None else ""
        new_system = SystemMessage(content=f"{original}\n\n{self.policy}")
        # Hand the modified request to the rest of the chain
        return await handler(request.override(system_message=new_system))

Wire it up in HOCON:

"middleware": [
    {
        "class": "my_package.house_policy_middleware.HousePolicyMiddleware",
        "args": { "policy": "Never reveal internal employee IDs." }
    }
]

To access Neuro San internals, add the matching parameter name to your constructor and include the key in args with any placeholder value. Neuro San injects the real object at build time. The pattern of overriding awrap_model_call, awrap_tool_call, abefore_agent, and aafter_agent is exactly how the Agent Skills middleware is built, which is the subject of the next post.

A Few Things Worth Knowing

  • Order matters: Middleware runs in the order listed. A redactor placed before a logger sees different data than one placed after it. Be intentional about the chain.

  • Open in abefore_agent, close in aafter_agent: If your middleware holds a resource like an HTTP session or a database connection, the agent lifecycle hooks are the right place to manage it.

  • awrap_* hooks must call handler: If you intercept a model or tool call, you are responsible for continuing the chain. Forgetting to call handler is how you accidentally swallow all your agent output.

  • Keep it cross-cutting: Logic specific to one agent belongs in that agent's instructions. Middleware is for concerns that span the full loop: security, observability, resilience.

Get Started with Middleware in Neuro San

Middleware fills a gap that every serious multi-agent system eventually hits. You can define a well-structured agent graph in HOCON, but there are behaviors that do not belong inside any agent: security controls, observability hooks, rate limiting, history management. Those need a place to live that is not your agent instructions and not a fork of the runtime.

Middleware gives you that. It is Python you already know how to write, attached via config you already know how to use, running exactly where you need it in the agent loop.

Neuro San is open source and available on GitHub. In the next post, we will show how this same middleware primitive powers Agent Skills: portable, progressively-disclosed packets of expertise built entirely on the mechanism described here.



Noravee Kanchanavatee

Senior Data Scientist, Cognizant AI Lab

Noravee image

Noravee specializes in machine learning, NLP, and analytical modeling, with a background in condensed matter physics



Subscribe to our newsletter

Get our latest research and insights on AI innovation


Latest posts

Related topics