Everyone's talking about AI tools. Not enough people are talking about AI security.

MoltBook, OpenClaw, n8n, ElevenLabs, Claude, OpenAI, Lovable. The ecosystem of AI tools and platforms is exploding, and rightfully so. They're incredible. We use them internally. Our clients use them. They're making teams faster and more capable than ever.

But here's the conversation that isn't happening loudly enough: what actually happens when the guardrails fail?

When the AI hallucinates. When it goes rogue. When it does something you didn't anticipate. What are organizations doing to protect their assets when that happens? And more importantly, is it enough?

Think about what we're giving these tools access to. Internal data. Sensitive data. Confidential data. We spent years hardening our web applications and locking down our networks. But the attack surface has expanded, and now it includes agents.

Your AI agent has credentials. It has access. It has context about your environment.

So the question nobody seems to be asking loud enough: what happens if your agent is compromised?


This isn't theoretical. It's already happening.

In January 2025, Wiz Research discovered a publicly accessible ClickHouse database belonging to DeepSeek at oauth2callback.deepseek.com. No authentication required. Over one million log entries were exposed, including plaintext chat histories, API keys, backend operational details, and metadata. DeepSeek secured it within an hour of notification, but the damage window was unknown.[1]

In August 2024, security researcher Johann Rehberger demonstrated "ASCII smuggling" in Microsoft 365 Copilot. The attack used invisible Unicode characters to exfiltrate data from emails, documents in OneDrive, and SharePoint through crafted hyperlinks. A malicious document could trigger prompt injection, cause Copilot to search for additional sensitive data via automatic tool invocation, and embed that data in invisible characters within clickable links. Microsoft took eight months to patch it after initial disclosure.[2]

In September 2024, Rehberger followed up with "SpAIware," demonstrating that malicious instructions could be planted in ChatGPT's persistent memory feature. Once injected, the instructions survived across all future conversations, continuously exfiltrating data to an attacker-controlled server. OpenAI addressed the specific exfiltration vector but acknowledged that the underlying issue (AI accepting prompts from untrusted sources) remains.[3]

In May 2025, CISA added CVE-2025-3248 to its Known Exploited Vulnerabilities catalog. The popular AI agent builder Langflow had a missing authentication vulnerability in its code validation endpoint. The endpoint called Python's exec() on user-supplied code without any authentication or sandboxing. Threat actors deployed the "Flodric" botnet through compromised instances. CVSS score: 9.8.[4]

In December 2025, n8n, the widely used AI workflow automation platform, disclosed CVE-2025-68613 (CVSS 9.9). A sandbox escape in its expression evaluation allowed OS-level command execution. Over 105,000 vulnerable instances were found. Compromising a single n8n instance could expose every credential stored in its workflows: Salesforce, AWS, OpenAI, database connections, everything.[5]

These aren't edge cases. These are mainstream tools used by thousands of organizations. And the common thread is the same every time: excessive access, insufficient controls.


The OWASP Top 10 for LLMs: a framework that matters

The OWASP Foundation released an updated Top 10 for LLM Applications in 2025. If you're deploying AI in any capacity, this should be your baseline reference. Here's what it covers:[6]

  1. Prompt Injection (LLM01) — Manipulating LLM behavior through crafted inputs, either directly or indirectly through ingested content. Still the number one risk.
  2. Sensitive Information Disclosure (LLM02) — LLMs inadvertently revealing PII, proprietary data, or confidential information in responses.
  3. Supply Chain (LLM03) — Risks from third-party components, pre-trained models, poisoned training data, and compromised plugins.
  4. Data and Model Poisoning (LLM04) — Manipulation of training data or fine-tuning processes to introduce backdoors, biases, or vulnerabilities. New for 2025.
  5. Improper Output Handling (LLM05) — Failing to validate LLM outputs before they reach downstream systems, enabling XSS, SSRF, or remote code execution.
  6. Excessive Agency (LLM06) — LLMs granted too many permissions, functions, or autonomy, allowing unintended harmful actions.
  7. System Prompt Leakage (LLM07) — Exposure of system prompts that reveal internal logic, security controls, filtering rules, or access permissions. New for 2025.
  8. Vector and Embedding Weaknesses (LLM08) — Vulnerabilities in RAG systems where embeddings can be manipulated to inject malicious content or bypass access controls. New for 2025.
  9. Misinformation (LLM09) — LLMs generating false or misleading information that appears authoritative.
  10. Unbounded Consumption (LLM10) — Resource exhaustion attacks causing excessive compute, token usage, or denial-of-service conditions.

Three new entries were added this year: Data and Model Poisoning, System Prompt Leakage, and Vector and Embedding Weaknesses. The old "Insecure Plugin Design" was merged into Supply Chain. The landscape is evolving fast, and the framework is evolving with it.


The 10 guardrails every organization deploying AI needs right now

We've been reviewing AI deployments across our client base for the past year. These are the ten things that matter most, based on what we're actually seeing go wrong in production.

1 Enforce least-privilege access for every AI agent

This is the single most impactful control you can implement. Your AI agent should only be able to perform actions that the delegating user is authorized to perform, and it should operate under an attenuated version of those permissions, not the full set.

The n8n vulnerability (CVE-2025-68613) is the textbook example of why this matters. A single compromised n8n instance exposed every credential stored in its workflows because the platform had broad access to Salesforce, AWS, OpenAI, and database connections by design. If each workflow had been scoped to only the specific resources it needed, the blast radius would have been a fraction of what it was.[5]

Define which specific tools, endpoints, and data stores each agent can access. Use tool-level permissions, not blanket access. If an agent doesn't need write access to your database, don't give it write access.

2 Validate and sanitize all inputs to and outputs from AI models

Prompt injection remains the number one risk in the OWASP Top 10 for LLMs for a reason. The Microsoft Copilot ASCII smuggling attack demonstrated how invisible characters in documents could hijack an AI's behavior and exfiltrate data through trusted Microsoft domains.[2] The ChatGPT SearchGPT manipulation showed how hidden webpage content could override negative reviews with artificially positive assessments.[7]

But output validation matters just as much. If your LLM generates a SQL query, a shell command, or HTML content that gets rendered downstream, and you aren't sanitizing that output, you're one hallucination away from XSS, SSRF, or remote code execution. This is OWASP LLM05 (Improper Output Handling), and it's the bridge between "the AI said something wrong" and "the AI compromised our infrastructure."

3 Never hardcode credentials. Use a secrets manager.

In May 2024, a hacktivist group called Rabbitude discovered hardcoded API keys in the Rabbit R1 AI device codebase. The exposed keys provided full admin access to ElevenLabs (including the complete history of every text-to-speech message from every R1 device), Azure, Yelp, Google Maps, and SendGrid. Despite being notified on May 16, Rabbit took no action until researchers published findings publicly on June 25. The keys remained exploitable for over a month.[8]

This isn't unique to Rabbit. Research from Cybernews found that 72% of Android AI apps contained hardcoded secrets. CovertLabs found 98.9% of iOS AI apps were actively leaking data.[9] And in the MCP ecosystem specifically, credentials are routinely stored in plaintext within .env files or JSON configurations.[10]

Use a dedicated secrets manager (AWS Secrets Manager, HashiCorp Vault, Azure Key Vault). Rotate credentials regularly. And audit your AI tool configurations for plaintext secrets today, because they're almost certainly there.

4 Authenticate every endpoint. No exceptions.

The Langflow RCE (CVE-2025-3248) was caused by a single unauthenticated endpoint: /api/v1/validate/code. That endpoint called exec() on user-supplied code. No authentication. No sandboxing. CVSS 9.8. CISA added it to the Known Exploited Vulnerabilities catalog.[4]

DeepSeek's exposed ClickHouse database? No authentication required. Anyone who found the endpoint could read over a million log entries including plaintext chat histories and API keys.[1]

The Chat & Ask AI app (50 million installs) exposed 300 million chat messages because its Firebase database rules were set to allow public read access.[11]

Every endpoint that your AI system exposes, whether it's an API, a database, an admin panel, or a health check, needs authentication. The MCP specification itself currently makes authorization optional, which means many MCP server setups rely on nothing more than an API key.[10] That's not enough.

5 Implement human-in-the-loop for sensitive operations

OWASP LLM06 (Excessive Agency) exists because organizations give AI agents the ability to perform destructive or high-privilege actions autonomously. Database writes, financial transactions, external communications, infrastructure changes: these should require explicit human approval before an agent executes them.

The Perplexity Comet browser vulnerability demonstrated why this matters at the user level. Attackers embedded hidden commands in Reddit comments. When users activated Comet's "summarize current page" feature, the AI executed concealed instructions that could log into the user's email, bypass captchas, and transmit credentials to the attacker, all within 150 seconds.[12]

Design your AI workflows with explicit approval gates. Any action that modifies data, sends communications, accesses sensitive resources, or incurs cost should require a human to confirm before execution.

6 Audit your AI supply chain: models, plugins, and MCP servers

In February 2025, ReversingLabs discovered malicious ML models on Hugging Face that evaded the platform's Picklescan security scanning. The models exploited a novel technique: compressing pickle files with 7z instead of ZIP, causing the security scanner to improperly validate them. Payloads included system fingerprinting, credential theft, and reverse shells. Out of 13,466 model files using unsafe serialization, only 38% were flagged by Hugging Face's scanner. The other 62% went undetected.[13]

In the MCP ecosystem, 38% of MCP servers running in enterprises are unofficial implementations from unknown authors.[10] Researchers demonstrated that MCP tools can carry hidden backdoors in their descriptions: seemingly harmless tools containing invisible instructions that models obediently follow when loaded.

Treat AI models and plugins the same way you treat software dependencies. Vet the source. Pin versions. Scan for known vulnerabilities. Don't pull untrusted models into production environments. The Cloud Security Alliance launched an MCP Security Resource Center in August 2025 with specific guidance on this.

7 Protect against data and model poisoning

Anthropic published landmark research in January 2024 demonstrating that models can be trained with persistent backdoor behavior that survives standard safety training. Their "sleeper agents" were trained to write secure code when the prompt included "2023" but insert exploitable vulnerabilities when the year was "2024." The critical finding: standard safety training (supervised fine-tuning, RLHF, adversarial training) failed to remove the backdoor. Adversarial training actually taught the models to better hide their malicious behavior.[14]

RAG systems are vulnerable too. The PoisonedRAG attack (presented at USENIX Security 2025) demonstrated that injecting just 5 malicious documents into a corpus of millions caused a targeted AI to return the attacker's desired false answers 90% of the time for specific trigger queries.[15]

A study published in Nature Medicine showed that replacing just 0.001% of training tokens with medical misinformation produced models that matched clean counterparts on standard benchmarks, making the poisoning virtually undetectable through normal evaluation.[16]

If you're fine-tuning models or building RAG pipelines: validate your training data sources, implement integrity checks on your document corpus, monitor for anomalous model behavior over time, and don't assume that standard evaluations will catch poisoning.

8 Log everything. Build an audit trail.

When IBM's 2025 Cost of Data Breach Report found that shadow AI incidents accounted for 20% of all breaches with an average cost of $4.63 million (versus $3.96 million for standard breaches), one of the key cost drivers was the lack of visibility. Organizations couldn't determine what the AI had accessed, what data had been exposed, or how long the compromise had been active.[17]

Log all agent interactions: every tool invocation, every prompt, every model response, every action taken. Use OpenTelemetry for end-to-end traceability. Create immutable audit records that can't be modified after the fact. When (not if) something goes wrong, you need to be able to reconstruct exactly what happened.

LayerX's 2025 report found that 77% of enterprise employees who use AI have pasted company data into chatbot queries, with 22% including confidential personal or financial data.[17] If you're not logging what's going into and coming out of your AI systems, you have no idea what's been exposed.

9 Sandbox AI execution environments

Both the Langflow and n8n vulnerabilities had the same root cause: unsandboxed code execution. Langflow's exec() call ran in the same process as the application. n8n's expression evaluation had insufficient sandbox isolation. In both cases, compromising the AI layer meant compromising the entire host.

Run AI agent code in isolated environments with limited filesystem and network access. Use containers with restricted capabilities. Implement network segmentation so that a compromised agent can't pivot to your production database or internal services. If your AI agent needs to execute code (and many do), make sure the execution environment is disposable and firewalled.

10 Use short-lived tokens and implement proper identity management

The MCP specification recommends OAuth 2.1 with PKCE (Proof Key for Code Exchange) and Dynamic Client Registration for AI agent authentication. Okta's 2025 benchmarks showed a 92% reduction in credential theft incidents when using 300-second tokens versus 24-hour sessions.[10]

Implement ephemeral identities for AI agents: secure, verifiable identities that exist only for the duration of the needed activity. No stored secrets. No long-lived API keys sitting in configuration files. Every agent session should authenticate fresh, with scoped permissions, and expire quickly.

The Samsung incident, where engineers leaked confidential source code by pasting it into ChatGPT, happened partly because there were no identity controls on who could use external AI tools and with what data.[18] Proper identity management isn't just about securing the agent. It's about securing the humans interacting with the agent.


The regulatory landscape is catching up

The EU AI Act (Regulation 2024/1689) is the world's first comprehensive AI legislation, and its enforcement timeline is already underway. As of February 2, 2025, AI systems posing "unacceptable risks" are strictly prohibited and AI literacy requirements are active. By August 2, 2025, rules for general-purpose AI models apply. By August 2026, the majority of rules come into force, including requirements for high-risk AI systems. Penalties run up to EUR 35 million or 7% of global annual turnover.[19]

In the US, the landscape is more fragmented. Colorado's AI Act (SB24-205) takes effect June 30, 2026. California's Transparency in Frontier AI Act and the Texas Responsible AI Governance Act are both scheduled for January 1, 2026. A December 2025 executive order signals federal intent to preempt conflicting state AI regulations, but that battle is far from settled.[20]

On the standards side, NIST's AI Risk Management Framework (AI RMF) and ISO/IEC 42001:2023 are emerging as the baseline references. ISO 42001 is certifiable and designed for formal management systems. NIST AI RMF is voluntary guidance organized around four core functions: Govern, Map, Measure, and Manage. Neither is legally mandated today, but both are being referenced by emerging regulations as compliance benchmarks.[21]

The takeaway: regulatory requirements for AI security are coming fast. Organizations that start building these controls now will be ahead. Organizations that wait will be scrambling.


The attack surface has changed. Our approach needs to change with it.

Successfully deploying AI inside an organization isn't just about preventing the AI from sabotaging itself. It's about security. Full stop.

We treat web applications as attack surfaces. We treat APIs as attack surfaces. We treat cloud infrastructure as an attack surface. AI deployments deserve the same rigor: threat modeling, access control reviews, penetration testing, configuration audits, and ongoing monitoring.

The average enterprise now hosts over 1,200 unauthorized AI applications. Only 37% of organizations have processes to assess AI tool security before deployment. Shadow AI incidents cost an average of $4.63 million per breach, 17% more than standard breaches.[9][17]

The threat landscape has changed. The tools have credentials, access, and context about our environments. We need to start treating every AI deployment with the same security rigor we apply to any other attack surface. Because the question isn't whether your AI will be targeted. It's whether you'll be ready when it is.

Sources

  1. Wiz Research, "Wiz Research Uncovers Exposed DeepSeek Database Leak," January 2025. wiz.io
  2. Rehberger, J., "M365 Copilot Prompt Injection, Tool Invocation, and Data Exfil Using ASCII Smuggling," Embrace The Red, August 2024. embracethered.com
  3. "ChatGPT macOS Flaw Could've Enabled Long-Term Spyware via Memory Feature," The Hacker News, September 2024. thehackernews.com
  4. "Critical Langflow Flaw Added to CISA KEV Catalog Amid Ongoing Exploitation," The Hacker News, May 2025. thehackernews.com
  5. "Critical n8n Flaw (CVSS 9.9) Enables Remote Code Execution," The Hacker News, December 2025. thehackernews.com
  6. OWASP, "Top 10 for LLM Applications 2025," OWASP GenAI Security Project. genai.owasp.org
  7. "ChatGPT Search Tool Vulnerable to Hidden Text Manipulation," The Guardian, December 2024. theguardian.com
  8. "Rabbit R1 Data Breach Shows the Dire Need for Improved Secrets Security," Arnica, June 2024. arnica.io
  9. "LLM Security in 2025: Risks, Examples, and Best Practices," Oligo Security, 2025. oligo.security
  10. "MCP Security Issues and Best Practices," Knostic, 2025. knostic.ai; "Securing the AI Agent Revolution: MCP Security," CoSAI, 2025. coalitionforsecureai.org
  11. "Every AI App Data Breach Since January 2025," Barrack.ai. barrack.ai
  12. "Top 5 Real-World AI Security Threats Revealed in 2025," CSO Online, 2025. csoonline.com
  13. "Malicious ML Models Found on Hugging Face," The Hacker News / ReversingLabs, February 2025. thehackernews.com
  14. Anthropic, "Sleeper Agents: Training Deceptive LLMs That Persist Through Safety Training," January 2024. anthropic.com
  15. PoisonedRAG, USENIX Security 2025. github.com
  16. "Introduction to Data Poisoning: 2025 Perspective," Lakera, 2025. lakera.ai
  17. IBM, "2025 Cost of Data Breach Report," 2025; LayerX, "2025 Shadow AI Report," 2025. cybersecuritydive.com
  18. "Samsung Bans ChatGPT After Confidential Source Code Leak," 2023. reco.ai
  19. EU AI Act Implementation Timeline. artificialintelligenceact.eu
  20. "State AI Laws Under Federal Scrutiny," White & Case, 2025. whitecase.com
  21. NIST AI Risk Management Framework. nist.gov; ISO/IEC 42001:2023 AI Management System Standard.

Deploying AI in Your Organization?

We help organizations assess their AI attack surface: agent permissions, credential exposure, prompt injection risks, and supply chain vulnerabilities.

Book a Consultation Our Services
-- views
Link copied!
Lorikeet Security

Lorikeet Security Team

Penetration Testing & Cybersecurity Consulting

We've completed 170+ security engagements across web apps, APIs, cloud infrastructure, and AI-generated codebases. Everything we publish here comes from patterns we see in real client work.