Skip to main content Skip to footer
  • "com.cts.aem.core.models.NavigationItem@45db2681" Careers
  • "com.cts.aem.core.models.NavigationItem@76be6049" News
  • "com.cts.aem.core.models.NavigationItem@4cd47e50" Events
  • "com.cts.aem.core.models.NavigationItem@31e4e125" Investors
Cognizant Blog
I spend most of my time building AI systems for organisations that run critical infrastructure: energy networks, transport systems, telecoms, logistics hubs, cloud platforms. A year ago, everyone wanted a copilot. Now they want to know: can the system act on its own at 3am, safely, without waking anyone up?

The answer is yes — with the right architecture. Not a bigger dashboard. Not another chatbot. A genuinely autonomous operational layer: agents that detect, decide and act across systems, teams and customer channels so outcomes improve while people sleep.

But it requires drawing some sharp lines. Lines I've learned the hard way most teams don't draw until something goes wrong.

Two problems at 3am, not one

It's 3am, April 2027. A critical asset drifts out of tolerance. A core service pathway shows instability. In most organisations today, that's a cascade of alarms and a bridge call. I've been on those calls. They're not fun.

In the architecture we're building, the system maps impact, triggers workflows, reallocates capacity, notifies stakeholders, and logs everything into an audit trail that feeds back into a continuously improving playbook. No engineer woken. When the morning shift signs on, they're reviewing what the system did, not scrambling to figure out what happened.

But that scenario contains two separate problems that most teams conflate.

Problem one: understanding what broke and fixing it. Root cause analysis, impact assessment, remediation. This is genuinely complex reasoning over messy data — and this is where agentic AI earns its keep.

Problem two: executing the fix through a defined process. Versioning, testing, promotion, deployment. This should be rigid, deterministic, binary. Did the tests pass? Does the version match the changelog? These aren't judgment calls. They're yes/no gates.

Stripe's internal coding agents — Minions — produce over 1,300 merged pull requests per week containing no human-written code [1]. They've built precisely this distinction into their architecture: agentic nodes for genuine reasoning, deterministic nodes for everything that should just run. In their words, "putting LLMs into contained boxes compounds into system-wide reliability."

Agents do have a genuine role in the confidence pipeline feeding into release decisions — classifying blast radius, flagging regressions, presenting structured evidence rather than a raw diff. The human still approves. But they approve with data. As confidence grows and rollback mechanisms mature, the approval boundary shifts progressively: human sign-off for major changes, automated gates for the rest. A graduated trust model grounded in evidence, not a leap of faith.

The coordination problem

Operational environments have long been structured as separate domains — control rooms, operations centres, field teams, customer teams — each with its own systems and escalation procedures. Every handoff creates delay and information loss. A minor asset degradation that could have been managed in minutes becomes a customer complaint, a regulatory report and a costly truck roll — because the right information didn't reach the right decision point fast enough.

The pattern is familiar to anyone who has tried to connect more than a handful of enterprise systems: bespoke point-to-point integrations that multiply with every new tool added. The answer — in integration architecture and in agentic operations alike — is standardised protocols over custom wiring. Emerging standards like the Model Context Protocol (MCP) for tool access and Agent-to-Agent (A2A) for inter-agent coordination [2,3] are making this practical at scale, giving agents a common language without requiring every underlying system to change. Agents that speak a common language can hand off work, share context, and escalate across domains without needing a new custom connection every time. One agent monitors asset health. Another maps events to customer and SLA impact. Others handle communications, remediation, or field dispatch. When full autonomy isn't appropriate, they escalate — with complete context, not a 3am phone call that starts from scratch.

Memory: the operational secret sauce

Here's an implementation reality that most teams discover too late: an agent that forgets what it learned three incidents ago is barely better than a fresh grad on their first night shift.

The value of autonomous operations isn't just what the system does tonight — it's what it learns tonight and applies next time. Every incident should feed back into a shared knowledge base: specific facts with provenance, not vague summaries. That memory needs to be searchable in two ways — by concept ("incidents involving thermal drift on this asset class") and by exact identifier (a specific firmware version, a specific ticket number). Concept-based search alone will miss the precise detail that matters most at 3am.

Critically, that memory must be human-correctable. When the morning shift reviews what the agent did, they need to read what it knew at decision time and fix it if it was wrong. A black-box memory store you can't audit is a liability. Transparent memory is how you build the feedback loop that compounds operational knowledge — the difference between an organisation that gets smarter with every incident and one that keeps making the same mistakes with better tools.

The security conversation nobody wants to have

Everything described above — autonomous agents with operational access, persistent memory stores, orchestration layers that can reconfigure assets — is an extraordinarily valuable target. And almost nobody building these systems is treating their agentic infrastructure as the crown jewel asset it actually is.

Earlier this year, an autonomous offensive agent found full read and write access to McKinsey's internal AI platform with no credentials [5]. Within two hours: 46.5 million chat messages, 57,000 user accounts, 728,000 files. The vulnerability was SQL injection — one of the oldest bug classes in existence. McKinsey's own scanners missed it.

But here's the part that should terrify anyone building autonomous operational systems: the system prompts controlling AI behaviour were stored in the same database. A single update could have rewritten how the AI responded to 43,000 consultants. No deployment, no code change, no log trail.

Map that to your architecture. If an attacker subtly modifies your agent's remediation playbooks, protection thresholds, or escalation criteria, your autonomous system at 3am isn't protecting your operations — it's an attacker's proxy inside them. A compromised server leaves a process anomaly. A modified prompt leaves nothing.

AI prompts, agent memory and orchestration policies are the new crown jewel assets. Version-control them. Monitor their integrity. Build immutable audit trails capturing not just what the agent did, but what instructions it was operating under. And test adversarially — not annual pen tests, but continuous probing of the prompt layer and memory stores.

Start in 90 days

This doesn't start with a moonshot. Start with two or three high-volume, low-risk runbooks.

Weeks 1–3: Connect telemetry and service topology, stand up the orchestration layer, run agents in shadow mode — they recommend, a human approves. Instrument memory from day one. For most organisations, this means building a coordination layer above your existing operational stack so agents can collaborate without requiring your underlying platforms to change. That's the approach we've taken at Cognizant with clients running 15-year-old infrastructure alongside modern cloud platforms — the agents adapt to what's there, not the other way around.

Weeks 4–7: Enable pre-approved actions for low-risk scenarios with rollback plans. Run comparisons to prove impact. Start the security assessment of your agentic infrastructure in parallel. If you're doing security "later," you won't.

Weeks 8–10: Expand to cross-domain flows. Conduct a post-mortem. Publish a scale-out plan.

The human role doesn't disappear — it elevates to strategy, oversight, validation. Fewer 3am bridge calls. Faster recovery. A calmer morning stand-up.

Some organisations will sleep through operational incidents in 2027, confident their systems have already acted. Others will still be waking engineers at 3am.

The architecture to make that choice is already here.

If you want help identifying the right first runbooks, establishing guardrails, or standing up an agentic operations fabric — let's talk about what your first 90 days could look like.

References

[1] Stripe Engineering, "Minions: Stripe's one-shot, end-to-end coding agents," 2025

[2] Lui, A., "MCP is giving traditional RAG a flicking," LinkedIn, May 2025

[3] Google, "Agent-to-Agent (A2A) Protocol," 2025

[4] OpenClaw, "Stop Losing Context: A Guide to OpenClaw Memory," 2025

[5] CodeWall, "How We Hacked McKinsey's AI Platform," 2026


Anthony Lui

AI Value Engineer, UK&I, Cognizant

Author Image




In focus

Latest blog posts

More blog posts