AI Code Review Tools Compared: How Good Are They at Finding Security Vulnerabilities? | Lorikeet Security Skip to main content
Back to Blog

AI Code Review Tools Compared: How Good Are They at Finding Security Vulnerabilities?

Lorikeet Security Team February 26, 2026 11 min read

AI code review tools promise to catch bugs and security issues automatically, right inside your pull request workflow. GitHub Copilot now does code review. Amazon CodeGuru analyzes code for defects and security issues. Dozens of startups, from Korbit AI and CodeRabbit to Qodo and Sourcery, offer AI-powered review bots that comment on every PR with suggestions, warnings, and fixes.

The pitch is compelling: automated security review on every commit, at machine speed, for a fraction of the cost of a human reviewer. But how good are these tools at finding security vulnerabilities specifically? Not style issues. Not refactoring opportunities. Not documentation gaps. Actual exploitable vulnerabilities, the kind that lead to data breaches, account takeovers, and compliance failures.

We tested the major AI code review tools against real vulnerability patterns from our penetration testing and secure code review engagements. We fed them code containing IDOR flaws, broken access controls, injection vectors, JWT implementation mistakes, race conditions, and business logic bypasses. The results were instructive and, for anyone relying solely on these tools for security, concerning.


What AI Code Review Tools Actually Do

Before evaluating individual tools, it helps to understand what AI code review tools are doing under the hood. Most fall into one of two categories: pattern-matching tools that use trained models to identify known-bad code patterns, and generative AI tools that use large language models to "understand" code and provide natural-language feedback.

The pattern-matching tools are essentially evolved static analysis. They look for specific code constructs that correlate with vulnerabilities: string concatenation in SQL queries, missing output encoding, hardcoded credential strings, and known-insecure function calls. They are good at what traditional SAST has always been good at, just with lower false positive rates thanks to machine learning.

The generative AI tools are doing something different. They read your code the way an LLM reads text, generating a contextual understanding of what the code does and then producing review comments based on that understanding. This gives them the ability to comment on code quality, suggest refactors, and explain logic, but it does not give them the ability to reason about security in the way a human security engineer does.

Where AI code review excels

Where AI code review struggles

This distinction matters. The vulnerabilities that AI tools catch reliably are the same ones that traditional SAST tools and even linters have been catching for years. The vulnerabilities that AI tools miss are the ones that actually get companies breached.


GitHub Copilot Code Review

GitHub Copilot's code review capability, integrated directly into pull requests, is the most widely accessible AI review tool. It leverages the same underlying models that power Copilot's code generation, now applied to reviewing diffs and suggesting changes.

What Copilot catches

Copilot's review is reasonably effective at flagging basic injection patterns where user input flows directly into SQL queries or shell commands through obvious string concatenation. It catches obvious credential exposure, including API keys, passwords, and tokens that appear as string literals in source code. It also identifies some dependency issues when combined with GitHub's Dependabot integration, flagging PRs that introduce packages with known CVEs.

In our testing, Copilot consistently flagged hardcoded JWT secrets, basic SQL injection via template literals, and a few instances of missing input sanitization on user-facing endpoints. These are real issues worth catching, and catching them on every PR before merge has genuine value.

What Copilot misses

Copilot struggles significantly with vulnerabilities that require understanding application context. It did not flag BOLA/IDOR patterns where an endpoint accepted a resource ID parameter and returned the resource without verifying the requesting user had access to it. It missed business logic authorization flaws where a role check existed but was insufficient for the specific operation. It did not identify context-dependent SSRF where a URL parameter was validated against a blocklist but the blocklist was incomplete. And it failed to catch JWT implementation flaws beyond hardcoded secrets, missing issues like algorithm confusion, missing expiration validation, and key reuse across environments.

Pricing: Included with GitHub Copilot Enterprise ($39/user/month). Copilot Business ($19/user/month) includes limited review features.

Best for: Catching low-hanging fruit in pull requests. Effective as a first pass that reduces the burden on human reviewers by filtering out basic issues before they reach the review queue.


Amazon CodeGuru

Amazon CodeGuru takes a different approach. It is trained on Amazon's internal codebase, which gives it strong coverage of infrastructure-level issues and AWS-specific patterns, but makes it narrower in scope than general-purpose AI review tools.

What CodeGuru catches

CodeGuru is particularly effective at identifying resource leaks such as unclosed database connections, file handles, and HTTP clients. It catches concurrency issues including thread safety violations and synchronization problems, reflecting Amazon's internal emphasis on highly concurrent systems. It also flags some security anti-patterns specific to AWS services, like overly permissive IAM policies referenced in code and insecure S3 bucket configurations.

What CodeGuru misses

CodeGuru's security coverage outside of AWS-specific patterns is limited. In our testing, it missed most web application security patterns including XSS, CSRF, and insecure deserialization. It did not flag API authorization issues, even straightforward ones where endpoints lacked any permission checks. Its coverage of OWASP Top 10 items was inconsistent; it caught some injection patterns in Java but missed equivalent patterns in Python. The tool's strength is code quality and AWS-specific security, not general application security.

Pricing: Pay-per-line-of-code scanned. Approximately $0.50 per 100 lines for Reviewer, $0.002 per sampling hour for Profiler. Costs can add up quickly on large codebases.

Best for: Java and Python codebases deployed on AWS where resource management and AWS-specific security patterns are primary concerns. Not a substitute for application security testing.


Korbit AI

Korbit AI positions itself as a security-focused AI code review tool, which sets higher expectations than general-purpose alternatives. It integrates with GitHub and GitLab to provide automated review comments on pull requests with an emphasis on security findings.

What Korbit catches

Korbit performs well on OWASP-pattern detection, flagging common injection vectors, missing output encoding, and insecure cryptographic function usage. It catches basic input validation issues where user-controlled data flows into sensitive operations without sanitization. It is also effective at detecting hardcoded secrets and credentials, including patterns that other tools miss like base64-encoded keys and secrets in configuration objects.

What Korbit misses

Despite its security positioning, Korbit shares the same fundamental limitations as other AI review tools. It did not identify complex injection chains where the injection point and the execution point were in different files or different services. It missed business logic vulnerabilities where the security flaw was not in what the code did, but in what it failed to check. And it did not catch authorization patterns that spanned multiple files, such as middleware that checked roles but a specific route handler that bypassed the middleware.

Best for: Teams that want a security-oriented layer on top of their existing PR workflow. More security-aware than general-purpose tools, but should not be the only security review mechanism for critical applications.


CodeRabbit

CodeRabbit is a popular AI-powered PR review bot that provides comprehensive code review comments including summaries, suggestions, and issue detection. It uses large language models to generate contextual feedback on code changes.

What CodeRabbit catches

CodeRabbit is strong on code quality issues including complexity, readability, and maintainability concerns that indirectly affect security by making code harder to audit. It catches basic security patterns similar to what Copilot detects: obvious injection points, hardcoded credentials, and missing error handling. It also provides useful dependency analysis, flagging newly introduced packages that have known vulnerabilities or are unmaintained.

What CodeRabbit misses

CodeRabbit's security analysis is shallow compared to dedicated security tools. It operates on individual pull request diffs, which means it lacks the full codebase context needed to evaluate whether a change introduces a vulnerability in the context of the broader application. It missed deep security issues that required understanding the application's authentication flow, data model, or trust boundaries. A new endpoint that looked perfectly fine in isolation was actually accessible to unauthenticated users because of how the routing middleware was configured, and CodeRabbit had no way to know that from the diff alone.

Pricing: Free for open source. Pro plan starts at $15/user/month.

Best for: General code quality improvement across all PRs. Good developer experience and useful summaries. Not a security tool, and should not be evaluated as one.


SAST Tools with AI: Semgrep and Snyk Code

Semgrep and Snyk Code occupy a different category. They are static application security testing (SAST) tools that have added AI capabilities, rather than AI tools that attempt security analysis. This distinction matters because their foundation is security-first.

Semgrep with AI

Semgrep's core strength is its custom rule engine. You can write precise, pattern-based rules that match exactly the vulnerability patterns relevant to your codebase and your tech stack. The AI layer adds assisted triage, helping developers understand whether a finding is a true positive and providing remediation guidance in natural language.

Semgrep's CI/CD integration is mature, and its rule ecosystem covers a broad range of languages and frameworks. In our testing, Semgrep with well-configured rules caught the highest percentage of pattern-based vulnerabilities of any tool we evaluated. Its false positive rate was the lowest, largely because the rule language allows you to specify precise conditions rather than relying on probabilistic model output.

The limitation is the same as any SAST tool: it matches patterns, not behavior. It can tell you that a query is not parameterized, but it cannot tell you whether the authorization logic on that endpoint is correct for your application's permission model.

Snyk Code

Snyk Code provides cross-file data flow analysis powered by machine learning, which gives it an advantage over single-file pattern matchers. It can trace a user input from an API endpoint through several function calls and transformations to a database query or system command, flagging the chain even when no single file contains a complete vulnerability.

This flow analysis makes Snyk Code more effective at catching indirect injection vulnerabilities where the taint source and the sink are in different files. It also provides real-time scanning in the IDE, catching issues before they even make it to a pull request.

In our testing, Snyk Code caught several injection chains that other tools missed. It also correctly identified insecure deserialization in a Java application where the deserialization call was three function calls removed from the user input. However, like all tools in this comparison, it did not catch authorization logic errors or business logic flaws.

Pricing: Semgrep offers a free tier for individual developers; Team plans start at custom pricing. Snyk Code is included in Snyk plans starting at a free tier with limited scans, with Team plans from $25/user/month.

Best for: Teams serious about application security who want tools purpose-built for security analysis rather than general-purpose code review tools that happen to flag some security issues.


AI Code Review Tool Comparison

The following table summarizes how each tool performed across key security dimensions in our testing.

Capability GitHub Copilot Amazon CodeGuru Korbit AI CodeRabbit Semgrep (AI) Snyk Code
OWASP Top 10 Partial Limited Good Basic Strong Strong
Business logic None None Minimal None None None
Authorization flaws Minimal None Basic checks None Rule-dependent Basic flow analysis
Dependency analysis Via Dependabot Limited Basic Good Via Supply Chain Strong (Snyk SCA)
False positive rate Moderate Low-moderate Moderate Low Low Moderate
Language support Broad Java, Python Major languages Broad 30+ languages 10+ languages
Pricing tier $19-39/user/mo Pay-per-scan Per-seat Free-$15/user/mo Free-custom Free-$25/user/mo

The pattern is clear across every tool: pattern-based detection is strong, logic-based detection is absent. No tool in this comparison reliably identified business logic vulnerabilities or complex authorization flaws. The tools that came closest, Semgrep and Snyk Code, did so through precisely configured rules and data flow analysis, not through AI "understanding" of the code's intent.


What AI Code Review Still Cannot Do

The limitations of AI code review tools are not bugs that will be fixed in the next release. They are fundamental constraints of how these tools work, and understanding them is essential for building a security program that does not have blind spots.

Cannot understand business context

An AI tool can determine that an endpoint returns user data. It cannot determine whether User A should be allowed to see User B's data. That distinction, the difference between what IS authorized and what SHOULD be authorized, requires understanding the business rules of the application. No amount of code analysis can derive business intent. This is why business logic vulnerabilities consistently rank among the most impactful findings in our manual secure code reviews.

Cannot test runtime behavior

AI code review tools analyze source code, not running applications. They cannot detect vulnerabilities that only manifest at runtime: race conditions that depend on specific timing, memory corruption that depends on allocation patterns, configuration issues that depend on the deployment environment, or authentication bypasses that depend on how the web server handles edge cases in HTTP parsing.

Cannot follow complex data flows across microservices

Modern applications are distributed across multiple services, each with its own codebase. A user input enters through an API gateway, gets processed by Service A, queued to Service B, and eventually written to a database by Service C. An injection vulnerability in this chain is invisible to any tool that analyzes one repository at a time. Even Snyk Code's cross-file analysis stops at the service boundary.

Cannot identify race conditions in distributed systems

Race conditions like time-of-check-to-time-of-use (TOCTOU) vulnerabilities require reasoning about concurrent execution. Can two requests hit this endpoint simultaneously and both pass the balance check before either deducts the funds? AI tools do not model concurrency. They see sequential code and analyze it sequentially.

Cannot evaluate cryptographic implementations in context

AI tools can flag the use of MD5 or SHA1. They cannot determine that your AES-256-GCM implementation reuses nonces under specific conditions, that your key derivation function uses insufficient iterations for the threat model, or that your HMAC comparison is vulnerable to timing attacks. Cryptographic security depends on implementation details that require specialized knowledge to evaluate.

Cannot assess whether authentication flows are architecturally sound

An authentication system may use all the right primitives (bcrypt, secure sessions, CSRF tokens) and still be architecturally flawed. Maybe the password reset flow does not invalidate existing sessions. Maybe the OAuth implementation does not validate the state parameter. Maybe the MFA enrollment process can be bypassed by directly calling an enrollment API endpoint. These are authentication bypass techniques that require architectural reasoning, not pattern matching.


When to Use AI Review vs. Manual Security Review

This is not an either-or decision. AI code review and manual security review serve different functions and belong at different stages of your development process. Here is a practical decision framework.

Use AI code review tools for every pull request

AI review tools should run on every PR, every day. Their cost is low, their speed is instant, and they catch the kind of surface-level issues that waste human reviewers' time. A well-configured Semgrep ruleset or a Snyk Code scan running in CI catches hardcoded secrets, basic injection patterns, and known-vulnerable dependencies before a human ever looks at the code. This is the security equivalent of automated linting: it does not replace human judgment, but it raises the floor.

Use manual security review for security-critical moments

Combine both for a layered approach

The strongest security posture comes from layering automated and manual review. AI tools filter out noise and catch common patterns on every commit. Human reviewers focus their limited time on the complex, context-dependent issues that AI cannot evaluate. This is the same principle behind DevSecOps pipeline design: automate what you can, and reserve human expertise for what you cannot automate.

AI code review tools are excellent at catching the easy stuff. But in our experience, the vulnerabilities that lead to actual breaches, authorization bypasses, business logic flaws, race conditions, require a human who understands what the application is supposed to do, not just what the code says. The hard truth is that the most dangerous vulnerabilities are the ones no tool is equipped to find automatically.


Building an Effective Code Review Security Strategy

Based on our experience reviewing code across hundreds of engagements, here is the approach that works.

Layer 1: Automated AI scanning in CI/CD. Run Semgrep or Snyk Code on every pull request. Configure custom rules for your tech stack. Block merges on high-severity findings. This catches 60-80% of pattern-based vulnerabilities with zero ongoing human effort.

Layer 2: AI-assisted PR review. Use Copilot review or CodeRabbit to provide contextual feedback to developers. This catches code quality issues and some security patterns that the SAST rules miss. It also educates developers by explaining why certain patterns are problematic.

Layer 3: Periodic manual security review. Quarterly or before major releases, have security engineers perform a manual review of security-critical code. Focus on authentication, authorization, data handling, and any areas where business logic determines security. This is where you catch the vulnerabilities that automated tools fundamentally cannot find.

Layer 4: Penetration testing. Annually or after major infrastructure changes, test the running application from an attacker's perspective. This validates that the security measures identified in code review actually work in the deployed environment and catches configuration, infrastructure, and runtime issues that code review does not cover. See our guide on choosing between code review and pentesting for more detail.

Each layer catches what the layer above misses. No single layer is sufficient on its own. The teams we see with the strongest security posture are the ones that invest in all four, with the right balance of automated and manual effort for their stage and risk profile.

Need a Security-Focused Code Review?

AI tools catch the patterns. Our security engineers catch the logic flaws, authorization bypasses, and architectural weaknesses that automated tools miss. Lorikeet Security's manual secure code review goes beyond what any AI tool can deliver.

-- views
Link copied!
Lorikeet Security

Lorikeet Security Team

Penetration Testing & Cybersecurity Consulting

We've completed 170+ security engagements across web apps, APIs, cloud infrastructure, and AI-generated codebases. Everything we publish here comes from patterns we see in real client work.

Lory waving

Hi, I'm Lory! Need help finding the right service? Click to chat!