Imagine hiring a brilliant new assistant. They're fast, tireless, and can handle dozens of tasks at once — booking meetings, managing your calendar, even controlling your office systems. Now imagine that someone slips a false content into their in-tray, and your assistant, not knowing any better, follows those instructions to the letter.
That's essentially what agent hijacking is. And as businesses race to adopt AI, it's becoming one of the most important security threats you've probably never heard of.
What is an AI agent?
Most of us are familiar with AI tools like ChatGPT — you ask a question, it gives you an answer. Simple.
An AI agent is a step beyond that. It doesn't just answer questions — it actually does things. It can book appointments, update databases, send messages, control smart devices, and carry out multi-step tasks, often with little or no human involvement.
Think of it like the difference between a sat-nav that tells you where to turn, and a self-driving car that actually steers for you. Useful? Absolutely. But also a lot more to go wrong.
So what is agent hijacking?
Here's where it gets interesting — and a little alarming.
Because AI agents are designed to process information and act on instructions, attackers have discovered a clever way to exploit them. They hide malicious instructions inside what looks like ordinary data — a document, a web page, or a customer message — and the AI, none the wiser, follows those hidden commands.
This is known as an indirect prompt injection attack, a term used by the US National Institute of Standards and Technology (NIST). In plain English: the AI gets tricked into doing something it was never supposed to do.
Why does this work? Because AI agents receive two types of input:
- Instructions from developers — the rules baked in by whoever built the system
- Data from users — live information the agent processes to do its job
The problem is that both arrive through the same channel. The AI can't always tell the difference between a legitimate instruction and a malicious one disguised as harmless data. And because these agents are built to be helpful and efficient, they'll often act confidently — even when something has gone very wrong.
What could actually go wrong?
Let's make this concrete.
Scenario 1: The Booking Agent A travel company uses an AI agent to reserve seats for customers. An attacker embeds hidden instructions in a booking request. Instead of reserving one seat, the agent books thousands — triggering massive financial losses and chaos in the system.
Scenario 2: The Security Camera A smart building uses an AI agent to manage its security cameras. An attacker slips in hidden instructions. The agent switches the cameras off and wipes the security logs — leaving the building unprotected while the attacker does as they please.
In both cases, the AI did exactly what it was designed to do. It just did it for the wrong person, following the wrong instructions — and nobody noticed until the damage was done.
The consequences can include:
- Sensitive data being accessed or stolen
- Records being altered or deleted
- Financial losses
- Security systems being disabled
What makes this especially dangerous is that the agent's actions can appear completely normal. There's no obvious "hack." Everything looks authorised.
How can organisations protect themselves?
The good news is that these risks can be managed — but only if organisations take them seriously from the beginning, not as an afterthought once the AI is already up and running.
Here are the key steps every organisation should take:
1. Limit what the agent can do. An agent should only have access to the tools it genuinely needs. The fewer tools it can use, the less damage a hijacked agent can cause.
2. Validate instructions before acting. Build in checks so the agent pauses and verifies unusual or high-impact actions before carrying them out.
3. Set limits on activity. If an agent suddenly makes thousands of bookings in a minute, something is wrong. Rate-limiting tools can flag or block unusual behaviour automatically.
4. Keep detailed logs. Every action the agent takes should be recorded. This makes it far easier to spot problems — and to investigate them afterwards.
5. Monitor continuously. Don't just log — actively look for anomalies. Automated oversight can catch attacks in progress, before they escalate.
Critically, these protections need to be built into your policies, procedures, and governance frameworks — not just added as loose technical fixes. Security has to be part of the design, not bolted on at the end.
The bottom line
AI agents are genuinely exciting. They can transform how businesses operate, saving time, cutting costs, and opening up new possibilities. But with great power comes — as someone once wisely said — great responsibility.
Agent hijacking is not a far-fetched, sci-fi scenario. It's a real and growing threat that organisations need to take seriously now, before it catches them off guard.
The solution isn't to avoid AI agents altogether. It's to build them wisely — with safety, security, and governance built in from day one.
Because the best time to think about what could go wrong is before anything does
------------------------------------------------------------------------------------------------------------------------
Sources
1.Agentic AI – Threats and Mitigations – 17 Feb 2025
2.Technical Blog: Strengthening AI Agent Hijacking Evaluations – Jan 2025
3. OWASP Top 10 for Agentic Applications for 2026 – Dec 2025 -
4. Multi-Agentic system Threat Modeling Guide v1.0 – 23-April-26 -
5. OWASP Top 10 for LLM Applications 2025