Secrets Sprawl: How Hardcoded Credentials and Leaked API Keys Become Breaches | Lorikeet Security Skip to main content
Back to Blog

Secrets Sprawl: How Hardcoded Credentials and Leaked API Keys Become Breaches

Lorikeet Security Team June 25, 2026 10 min read

TL;DR: GitHub found over 39 million secrets leaked in public repositories in 2023 alone. API keys, database passwords, OAuth tokens, private keys, and service account credentials are among the most common initial access vectors in cloud breaches — not because attackers are sophisticated, but because developers regularly commit secrets to version control and forget them. One leaked key with the right permissions can mean total cloud account compromise. The window between exposure and exploitation is often measured in minutes, not days.

Why Secrets End Up in Code

The path from a working credential to a committed secret almost never involves malice — it is almost always convenience colliding with habit. Understanding how it happens is the first step to stopping it.

The most common scenario: a developer needs to test an integration quickly, pastes an API key directly into a config file, gets it working, and commits the whole directory. The key was supposed to be temporary. It was not. .env files are the most frequent vector — a developer clones a repository, copies .env.example to .env, fills in real credentials, and then accidentally stages the file. If .env was never added to .gitignore, one git add . is all it takes.

Copy-paste from an existing working codebase into a new repository is another persistent source of sprawl. The original repo had the key in an environment variable. The new repo has it hardcoded. Nobody noticed during the port.

CI/CD pipelines introduce their own categories of exposure. Environment variables used in build steps appear in plaintext in build logs when debug modes are enabled — GitHub Actions' ACTIONS_STEP_DEBUG=true flag, Jenkins console output, CircleCI build logs. A developer enabling verbose logging to debug a failing build and then forgetting to disable it can expose every secret injected into that pipeline to anyone with log access.

Docker images are a particularly underappreciated source of credential exposure. Secrets passed as build arguments (ARG API_KEY) are baked into image layers even if the final image does not reference them directly. The layer cache stores the value and it can be extracted from any image built with --build-arg. Images pushed to public registries with embedded secrets are a reliable find in any external assessment.

Beyond code, secrets sprawl into communication channels: database connection strings pasted into Teams messages for a colleague to use, AWS access keys shared in a support ticket, API credentials in a screenshot attached to a Jira comment. Every channel that stores message history is a potential secrets repository.

Stack traces and verbose error messages complete the picture. A misconfigured application that exposes a full exception trace in a browser or in logs can leak database connection strings, including credentials, in plain sight. This is not hypothetical — it is a routine finding in web application penetration tests.


Where Secrets Hide: A Tour of the Attack Surface

When a penetration tester or attacker performs a secrets-focused review, they work through a predictable set of locations. Each has distinct characteristics that affect both how secrets end up there and how they are found.

Git History

This is the most important location to understand, because it defies the intuition that deleting something removes it. Git stores every version of every file ever committed. When a developer commits a secret, realizes the mistake, and immediately commits a deletion, the secret is still present in the repository's object store at the original commit hash. Cloning the repository gives an attacker full access to the entire history. Tools like TruffleHog scan every commit in a repository's history — not just the current HEAD — specifically to find this class of exposure.

The practical implication: if a secret was ever committed to a repository, assume it has been seen. Rotation is mandatory regardless of how quickly the deletion commit followed.

.env Files and Configuration

The .env file pattern is so common it has become its own attack category. Beyond .env, secrets accumulate in config.yml, application.properties, appsettings.json, database.yml, secrets.yml, and framework-specific configuration files. These are frequently committed with real credentials when developers work against production systems directly, or when staging environments use production keys for convenience.

CI/CD Pipeline Logs and Environment Variables

Build logs from GitHub Actions, Jenkins, CircleCI, and GitLab CI are a consistent source of credential exposure. When a build step echoes its environment for debugging — env in a shell script, printenv in a Makefile, or automatic secret masking that fails due to multi-line values — secrets appear in the log output. If build logs are accessible to all repository contributors (the GitHub default for public repositories), the exposure radius extends to everyone with read access.

Cloud function environment variables present a similar risk: AWS Lambda, Google Cloud Functions, and Azure Functions display environment variables in their console interfaces. A developer who inadvertently grants excessive IAM permissions to a function's execution role, or who uses the console to inspect a function's configuration, can expose secrets to anyone with access to the cloud console.

Docker Image Layers

Docker build arguments passed via --build-arg are stored in image layer metadata and recoverable with docker history --no-trunc. Secrets embedded in RUN commands during the build — even if the file containing the secret is removed in a subsequent layer — remain in the intermediate layer. Multi-stage builds mitigate this, but only when used correctly. Images pushed to Docker Hub, GitHub Container Registry, or other public registries with embedded secrets are scanned by automated tooling continuously.

NPM Packages and PyPI

Published packages are a frequently overlooked exposure path. Developers who accidentally include their .env file, personal .npmrc with auth tokens, or local configuration in an npm or PyPI package have published secrets to the entire public registry. Automated scanners monitor new package publications for high-entropy strings and known secret patterns. This is not a theoretical attack — it is a documented, repeated occurrence.


Real Incidents: What Secrets Sprawl Actually Costs

The consequences of credential exposure are not abstract. These incidents illustrate what happens when secrets reach public repositories or unauthorized parties.

Toyota Japan (2023): Five Years of Exposure

Toyota's Japanese subsidiary left AWS credentials in a public GitHub repository for nearly five years. The credentials provided access to a cloud environment containing data from Toyota's T-Connect telematics service. The exposure potentially affected 2.15 million customer records, including vehicle identification numbers and email addresses. The credentials were not discovered by Toyota — they were identified by external researchers. Five years of public exposure represents the worst case of the "we'll fix it later" mindset applied to a committed secret.

Samsung (2023): Internal Source Code and AWS Keys

Samsung experienced a significant data breach when internal source code, including code for Samsung's Galaxy devices, was leaked publicly. The exposure included AWS credentials embedded in the source code. Beyond the immediate impact of the leaked credentials, the source code exposure provided a roadmap for identifying other vulnerabilities — internal tools, proprietary algorithms, and security mechanisms that were never intended to be public.

Uber (2022): Keys in a Slack Message

The 2022 Uber breach began not with a sophisticated exploit but with social engineering. An attacker obtained an Uber contractor's credentials through a phishing attack, then found AWS access keys in internal Slack messages. Those keys provided access to Uber's internal systems, including an admin panel for HackerOne, where the attacker accessed confidential vulnerability reports submitted by security researchers. The breach demonstrated that secrets sprawl is not limited to version control — communication platforms with stored message history represent the same risk.

Twitch (2021): Misconfiguration to Source Code Exposure

A misconfigured server at Twitch led to the exposure of approximately 125GB of internal data, including the platform's source code, internal security tools, creator payout information, and proprietary SDKs. Source code exposure is categorically different from data exposure: it reveals the architecture, authentication mechanisms, and internal APIs of an application, providing a detailed attack map to anyone who obtains it. When that source code contains hardcoded credentials — as is common in internal tooling — the exposure compounds.


What Attackers Do with Leaked Secrets

The exploitation of leaked credentials follows a predictable pattern that has been well-documented across incident reports. Understanding it clarifies why rotation speed matters so much.

The four-minute window: GitGuardian's research found that leaked secrets are often exploited within 4 minutes of a public commit. Automated scanners continuously monitor GitHub's public event stream, npm publish events, and PyPI releases for high-entropy strings and patterns matching known API key formats. The scan, test, and exploitation cycle is fully automated — human involvement happens after a valid key is confirmed, not before.

The specific impact depends entirely on what the key can access:


Detection: Finding Secrets Before Attackers Do

Detection operates at two points: prevention before secrets enter version control, and discovery of secrets that have already been committed.

Pre-Commit Hooks

Pre-commit hooks run before a commit is finalized, scanning staged changes for patterns that match known secret formats. Three tools dominate this space:

The limitation of pre-commit hooks is that they are developer-side controls — they can be skipped with git commit --no-verify. They should be treated as a convenience layer, not a security boundary. CI/CD pipeline scanning provides the enforcement layer.

GitHub Secret Scanning

GitHub's native secret scanning automatically identifies patterns matching known API key formats across repository contents and commit history. When a match is found, GitHub alerts repository administrators and, for supported providers (AWS, Stripe, GitHub, and many others), directly notifies the provider to revoke or flag the exposed credential. Enabling secret scanning on all repositories — including private ones, where it requires a license — is a baseline control that costs nothing beyond the license and provides significant coverage.

TruffleHog

TruffleHog is the most thorough tool for historical secret scanning. It works by scanning every commit in a repository's history, not just the current state, using both pattern matching and entropy analysis to identify high-entropy strings that may be secrets even without matching a known format. Running TruffleHog against a repository before open-sourcing it, before a security review, or as part of an acquisition due diligence process provides the most complete picture of historical exposure.

TruffleHog can also scan S3 buckets, Confluence, Jira, Slack exports, GitHub Actions logs, and Docker images — making it useful beyond pure git history scanning.

SAST and Container Scanning

Static analysis tools like Semgrep catch hardcoded credential patterns in source code during code review and CI/CD pipelines. Rules targeting common patterns — password = "...", inline connection strings, base64-encoded credentials — provide a layer of coverage that complements entropy-based tools.

Container image scanning tools including Trivy and Grype analyze Docker image layers for embedded secrets as part of the image build and publish pipeline. Integrating these into a registry scanning policy — rejecting images that contain known secret patterns before they reach production or public registries — closes the Docker-specific exposure path.

Tool Primary Use Integration Point Covers History
gitleaks Secret pattern scanning Pre-commit hook, CI/CD Yes
detect-secrets Baseline + new-secret detection Pre-commit hook, CI/CD No (current state)
TruffleHog Full history + entropy scanning CI/CD, standalone audit Yes (all commits)
GitHub Secret Scanning Known provider key patterns GitHub native (push events) Yes (on enable)
Semgrep SAST pattern matching CI/CD, IDE No (current state)
Trivy / Grype Container image layer scanning Registry, CI/CD All image layers

Secrets Management: The Correct Architecture

Detection tools address the symptom. The root cause is storing secrets anywhere other than a dedicated secrets manager. The correct architecture removes secrets from code, configuration files, and environment variables entirely — secrets are retrieved at runtime by the application using short-lived tokens scoped to the specific permissions needed.

Dedicated Secrets Managers

The major cloud providers and HashiCorp all offer secrets management solutions with similar core capabilities:

Rotation Policies

Secrets managers are most effective when combined with aggressive rotation policies. Long-lived credentials that are never rotated have an indefinite window of exploitation if compromised. Automatic rotation — where the secrets manager generates a new credential and updates the stored value on a schedule — reduces this window without requiring manual intervention. For credentials that cannot be automatically rotated, manual rotation cadences (quarterly at minimum) combined with monitoring for anomalous usage provide a compensating control.

Least Privilege Service Accounts

The impact of a leaked credential is directly proportional to its permissions. A developer's personal AWS access key with administrator permissions represents a catastrophic breach if exposed. A service account key scoped to read-only access on a single S3 bucket represents a much more contained incident. Applying least privilege to every service account, IAM role, and API key — granting exactly the permissions required for the specific function and nothing more — limits the blast radius of any single credential exposure.


Incident Response for Leaked Secrets

When a secret is confirmed or suspected to have been exposed, the response sequence matters as much as the response itself. Speed on the critical actions is more important than a complete investigation before acting.

Critical sequence: Rotate immediately — then investigate. Do not audit access logs, assess scope, or convene a meeting before rotating. The key may be actively exploited during any delay. Rotation is the only action that stops an ongoing compromise. Everything else happens afterward.

  1. Rotate immediately. Generate a new credential and invalidate the exposed one. For AWS access keys, this means deactivating the key in IAM. For API keys, use the provider's revocation mechanism. Do this before any other step.
  2. Audit access logs. Check cloud provider audit logs — AWS CloudTrail, GCP Audit Logs, Azure Activity Log — for any usage of the leaked credential during the exposure window. Look for API calls, resource accesses, and any IAM modifications that may have created persistent access (new users, roles, or access keys created using the compromised credential).
  3. Assume breach if exposed publicly for more than a few minutes. Given the four-minute exploitation window, any secret that was publicly visible should be treated as compromised. Auditing will confirm or contradict this, but the working assumption drives a more appropriate response urgency.
  4. Purge from git history. Deleting a file does not remove it from git history. Use git-filter-repo (the current recommended tool) or BFG Repo Cleaner to rewrite history and remove the secret from all commits. Force-push all affected branches and tags. Notify collaborators that they must re-clone or fetch the rewritten history — any local clone retains the old history until updated.
  5. Notify affected services. If the leaked credential belongs to a third-party service, notify the provider. For credentials that affect customer data, assess notification obligations under applicable regulations (GDPR, state breach notification laws).
  6. Review for persistence. If audit logs show the credential was accessed, investigate whether the attacker created any persistent access mechanisms before the credential was rotated — additional IAM users, roles, Lambda functions, EC2 instances, OAuth applications, or webhooks that may remain active after the original credential is rotated.

How Code Review Engagements Find Secrets

Penetration testing and secure code review engagements that include secrets detection go beyond running automated tools against the current codebase. The approach mirrors what a motivated attacker would do, and it surfaces exposure that automated scanning frequently misses.

A thorough secrets-focused assessment scans the full git history using TruffleHog with entropy analysis enabled, not just the current HEAD. It reviews CI/CD pipeline configurations — GitHub Actions workflows, Jenkinsfiles, .circleci/config.yml — for secrets injected as environment variables and for patterns that may cause secrets to appear in build logs. It examines Docker-related files (Dockerfile, docker-compose.yml, .dockerignore) for build arguments that embed secrets in image layers. It reviews configuration files across the codebase for connection strings, inline credentials, and default passwords that may have been overlooked. It checks NPM and Python package configurations to verify that published packages do not include files containing secrets.

Beyond the codebase, code review engagements examine cloud configuration: IAM policies for over-privileged service accounts, Secrets Manager usage (or lack of it), CloudTrail enablement, and whether rotation is configured for credentials that support it. The combination of automated scanning and manual review catches what neither approach catches alone — the obfuscated credential, the base64-encoded connection string, the environment variable echoed in a startup script.

If your organization is preparing for a security review, compliance audit, or simply wants to understand its secrets exposure before an attacker does, a source code review engagement provides a structured, comprehensive assessment of where credentials live across your codebase and infrastructure.

Find Your Secrets Before Attackers Do

Lorikeet Security's source code review and penetration testing engagements include systematic secrets detection — finding hardcoded credentials, leaked API keys, and misconfigured CI/CD pipelines before they become breach entry points. Book a consultation to discuss your exposure.

-- views
Link copied!
Lorikeet Security

Lorikeet Security Team

Penetration Testing & Cybersecurity Consulting

Lorikeet Security helps modern engineering teams ship safer software. Our work spans web applications, APIs, cloud infrastructure, and AI-generated codebases — and everything we publish here comes from patterns we see in real client engagements.

Lory waving

Hi, I'm Lory! Need help finding the right service? Click to chat!