Open a terminal in any modern software project and count the dependencies. A typical Node.js application pulls in between 500 and 1,500 packages. A Python project with a handful of direct dependencies can resolve into hundreds of transitive ones. A Java application with 20 declared dependencies in its POM file might load 200 JARs at runtime. Every one of those packages is code you did not write, did not review, and are running with the same privileges as your own application.
That is the software supply chain problem. You are not just trusting the libraries you chose. You are trusting every library those libraries chose, and every library those libraries chose, all the way down. A single compromised package anywhere in that tree can exfiltrate secrets, install backdoors, or destroy data. And attackers know this.
The software supply chain problem
Software has always relied on shared components. What has changed is the scale, the speed, and the trust model. In the 1990s, a developer might include a handful of third-party libraries, each one carefully evaluated and downloaded from a vendor's website. Today, a single npm install or pip install command can pull hundreds of packages from a public registry in seconds, with no human review at any step.
Transitive dependencies and the trust tree
The most dangerous aspect of modern dependency management is transitivity. You explicitly choose to depend on Package A. Package A depends on Packages B, C, and D. Package B depends on Packages E and F. Package E depends on Packages G, H, and I. You evaluated Package A. You probably did not evaluate Packages B through I. But all of them are running in your application with full access to your environment variables, filesystem, and network.
The event-stream incident in 2018 demonstrated this perfectly. A widely-used npm package called event-stream had been handed off to a new maintainer who injected malicious code, not into event-stream itself, but into a new transitive dependency called flatmap-stream. The malicious code specifically targeted the Copay Bitcoin wallet application, stealing cryptocurrency private keys. Millions of downstream applications pulled in the compromised code through their dependency trees without knowing it existed.
Trust chains and package registries
Public package registries like npm, PyPI, Maven Central, and RubyGems operate on a fundamentally trust-based model. Anyone can create an account and publish a package. There is no mandatory code review, no security audit, and in most cases no identity verification beyond an email address. The registry's job is to host packages and resolve dependencies, not to verify that those packages are safe.
This means that when you run npm install express, you are trusting that the npm account that published the express package has not been compromised. You are trusting that every maintainer of every transitive dependency of express has not been compromised. You are trusting that the registry itself has not been compromised. And you are trusting that no one has inserted a package with a confusingly similar name into the registry to trick your build system into pulling the wrong thing.
Each of those trust assumptions has been violated in real-world attacks.
Notable supply chain attacks
Supply chain attacks are not theoretical. They have affected some of the largest and most security-conscious organizations in the world. Understanding how these attacks worked is essential for building defenses against them.
SolarWinds (2020)
The SolarWinds attack remains the most consequential software supply chain compromise to date. Attackers, attributed to Russian intelligence services, compromised the build system for SolarWinds' Orion IT monitoring platform. They inserted a backdoor into a routine software update that was then digitally signed by SolarWinds and distributed to approximately 18,000 customers, including the U.S. Treasury, the Department of Homeland Security, and major corporations.
The backdoor, known as SUNBURST, was sophisticated. It lay dormant for two weeks after installation before activating. It communicated with command-and-control servers using DNS queries disguised as legitimate Orion traffic. It checked for the presence of security tools before executing. And it was delivered through a legitimate, signed software update, meaning every security control designed to verify software authenticity treated it as trusted.
The lesson: Compromising the build pipeline means compromising every customer downstream. Digital signatures prove provenance, not safety. If the build system is compromised, the signature is legitimate but the code is not.
Codecov (2021)
Codecov, a code coverage tool used in CI/CD pipelines, had its Bash Uploader script modified by attackers. The modified script exfiltrated environment variables, including CI/CD secrets, API tokens, and credentials, from every build pipeline that used it. Because Codecov was integrated into CI/CD workflows with access to source code and deployment credentials, the blast radius was enormous. Hundreds of companies were affected, including Twitch, HashiCorp, and other technology firms.
The lesson: CI/CD pipelines are high-value targets because they have access to production secrets. Third-party tools integrated into build pipelines need the same security scrutiny as production dependencies.
ua-parser-js (2021)
The ua-parser-js npm package, downloaded over 7 million times per week, was compromised when an attacker gained access to the maintainer's npm account. Three malicious versions were published that installed a cryptocurrency miner and a password stealer on Linux and Windows systems. The attack lasted only a few hours before being detected, but the package's download volume meant millions of builds were potentially affected.
The lesson: A single compromised maintainer account can affect millions of downstream users. The most popular packages are the highest-value targets.
event-stream (2018)
A social engineering attack in slow motion. The original maintainer of the event-stream npm package, burned out and no longer interested in maintaining it, handed off ownership to a new contributor who had been helpfully submitting pull requests. The new maintainer added a dependency called flatmap-stream containing obfuscated malicious code that targeted the Copay Bitcoin wallet. The attack was discovered only when another developer noticed the obfuscated code and investigated.
The lesson: Maintainer succession is an attack vector. Social engineering can target not just users but open source maintainers, who are often overworked volunteers with no security training.
Log4Shell (2021)
CVE-2021-44228, known as Log4Shell, was not a supply chain attack in the traditional sense. It was a critical remote code execution vulnerability in Apache Log4j, a Java logging library used in virtually every Java application. The vulnerability allowed attackers to execute arbitrary code by simply causing the application to log a specially crafted string. Because Log4j was so ubiquitous, the vulnerability affected hundreds of millions of devices and applications worldwide.
The lesson: A vulnerability in a single deeply-embedded dependency can create a global security crisis. Most organizations could not even determine whether they were affected because they did not have an inventory of their transitive dependencies.
Attack vectors: how supply chains get compromised
Understanding the attack surface requires knowing the specific techniques attackers use to inject malicious code into supply chains. These are not theoretical; each one has been used in real attacks.
Typosquatting
Attackers publish packages with names that are slight misspellings of popular packages: coffe-script instead of coffee-script, crossenv instead of cross-env, lodahs instead of lodash. A developer who makes a typo in their package.json or requirements.txt installs the malicious package instead of the legitimate one. The malicious package typically includes the legitimate package's functionality (by depending on it) plus a payload that exfiltrates environment variables or installs a backdoor.
Typosquatting is alarmingly effective. Research has repeatedly shown that typosquatted packages on npm and PyPI receive thousands of downloads before being detected and removed. Some attackers register dozens or hundreds of typosquat variants for popular packages, casting a wide net.
Dependency confusion
Dependency confusion, also known as namespace confusion, exploits the way package managers resolve packages when both public and private registries are configured. Many organizations use internal package names for their private libraries. If an attacker publishes a package with the same name on the public registry with a higher version number, the package manager may prefer the public version over the private one.
Security researcher Alex Birsan demonstrated this in 2021, successfully injecting code into the build systems of Apple, Microsoft, and dozens of other companies by publishing public npm, PyPI, and RubyGems packages that matched the names of their internal packages. The package managers resolved the public packages because they had higher version numbers.
Compromised maintainer accounts
Many high-impact npm, PyPI, and RubyGems packages are maintained by individual developers using personal accounts with weak or reused passwords and no two-factor authentication. Compromising a single account gives the attacker the ability to publish new versions of every package that account controls. The ua-parser-js, coa, and rc incidents all involved compromised maintainer accounts.
The scale of this risk is staggering. Research has found that on npm, the top 20 most-depended-upon packages are controlled by accounts that, collectively, if compromised, would give an attacker access to over 50% of the entire npm ecosystem through transitive dependencies.
Malicious updates to legitimate packages
Even without compromising an account, attackers can gain publish access through social engineering, as in the event-stream case. They contribute helpful patches to build trust, then request and receive maintainer access. Alternatively, they may purchase or otherwise acquire abandoned packages that still have significant download numbers and inject malicious code into a new version.
This attack vector is particularly difficult to defend against because the malicious update comes from a legitimate source. The package name is correct, the publisher account is authorized, and the version number is higher than the previous release. Every automated system treats it as a routine update.
Registry-specific risks
Each package ecosystem has its own characteristics, security controls, and vulnerability patterns. Understanding the specific risks of the registries your organization depends on is important for building effective defenses.
| Registry | Ecosystem | Key Risks | Notable Controls |
|---|---|---|---|
| npm | JavaScript / Node.js | Massive dependency trees, install scripts execute arbitrary code, typosquatting rampant | 2FA for maintainers, npm audit, provenance attestations |
| PyPI | Python | No namespace protection, setup.py executes during install, limited maintainer verification | Trusted publishers, Sigstore-based attestations, mandatory 2FA for critical projects |
| Maven Central | Java / JVM | Deep transitive dependency chains, Log4Shell demonstrated ubiquity risk, GAV coordinate spoofing | Namespace verification, PGP signatures required, Sonatype oversight |
| RubyGems | Ruby | Gem install hooks, maintainer account compromises, dependency confusion | 2FA support, WebAuthn, gem signing |
| crates.io | Rust | Build scripts (build.rs) execute arbitrary code, proc macros run at compile time | Immutable publishes, GitHub-linked accounts, cargo-audit |
| Go Modules | Go | Module proxy caching, vanity import paths, init() functions execute on import | Checksum database (sum.golang.org), module proxy transparency |
npm is the most targeted registry by volume because of the sheer size of the ecosystem and the aggressive use of transitive dependencies. A typical npm project's node_modules directory contains far more code than the application itself. Install scripts (preinstall, postinstall) execute arbitrary shell commands during npm install, giving malicious packages immediate code execution without the developer ever importing or running the package.
PyPI has historically had the weakest publisher verification of the major registries. Until recently, anyone could publish any package name without namespace protection. The setup.py execution model means that simply installing a package can execute arbitrary Python code. The Trusted Publishers initiative and Sigstore attestations are improving the situation, but adoption remains partial.
Maven Central has stronger publisher verification through namespace ownership, but the depth of transitive dependency chains in the Java ecosystem means that a vulnerability in a deeply-nested library can be extraordinarily difficult to identify and remediate, as Log4Shell demonstrated.
SBOM: knowing what you are running
A Software Bill of Materials (SBOM) is a formal, machine-readable inventory of every component in your software: every direct dependency, every transitive dependency, every version number, every license. It is the foundational document for supply chain security because you cannot secure what you cannot see.
When Log4Shell was disclosed, organizations with SBOMs could search their inventory and determine within hours whether they were affected. Organizations without SBOMs spent days or weeks manually auditing applications, Docker images, and embedded systems to find instances of Log4j. Some never found them all.
SBOM formats
Two formats dominate the SBOM landscape: SPDX (Software Package Data Exchange), maintained by the Linux Foundation, and CycloneDX, maintained by OWASP. Both are capable of representing dependency trees with component names, versions, licenses, and vulnerability information. CycloneDX has gained significant traction in the security community due to its purpose-built focus on security use cases, including vulnerability tracking and exploit reachability analysis.
Generating and maintaining SBOMs
SBOMs should be generated automatically as part of your CI/CD pipeline, not manually maintained. Tools like Syft, Trivy, and CycloneDX generators can produce SBOMs from source code, container images, and compiled binaries. The key is to generate SBOMs at build time and store them alongside the artifacts they describe, so you always know exactly what went into a specific release.
An SBOM is not a one-time artifact. It must be regenerated with every build and continuously monitored against vulnerability databases. A dependency that was safe when you built your application may have a critical vulnerability disclosed tomorrow. Without a current SBOM and continuous monitoring, you will not know until an attacker tells you.
U.S. Executive Order 14028 (May 2021) requires SBOM delivery for all software sold to the federal government. The EU Cyber Resilience Act imposes similar requirements for products sold in Europe. SBOM generation is no longer optional for organizations selling to regulated industries or government customers.
Dependency scanning tools and their limitations
Dependency scanning tools, such as Snyk, Dependabot, Renovate, Grype, and OWASP Dependency-Check, monitor your dependencies for known vulnerabilities by comparing package versions against databases like the National Vulnerability Database (NVD) and the GitHub Advisory Database. They are essential, but they have significant limitations that teams must understand.
What they catch
Dependency scanners excel at detecting known vulnerabilities in known packages. When a CVE is published for a package version you are using, the scanner flags it. This is valuable. Many organizations run dependencies with years-old critical vulnerabilities simply because no one is tracking them.
What they miss
- Zero-day vulnerabilities. Scanners check against known vulnerability databases. A new vulnerability that has not yet been assigned a CVE will not be detected.
- Intentionally malicious packages. A typosquatted package or a package with an intentional backdoor is not a vulnerability in the CVE sense. It is malware. Most dependency scanners do not detect it.
- Compromised legitimate packages. When a trusted package publishes a malicious update, scanners have no basis for flagging it. The package name and publisher are legitimate. The version number is valid. The malicious code is intentional, not a bug.
- Reachability analysis. A scanner may flag a critical vulnerability in a transitive dependency, but that vulnerability might be in a code path your application never calls. Without reachability analysis, teams waste time remediating vulnerabilities that cannot actually be exploited in their specific context.
- License compliance risks. Some scanners include license detection, but many do not. A dependency with a copyleft license buried in your transitive tree can create legal obligations that are just as impactful as a security vulnerability.
Dependency scanning is a necessary layer of defense, not a sufficient one. It catches the low-hanging fruit and must be supplemented with the practices described in the following sections.
Lockfile integrity and reproducible builds
A lockfile (package-lock.json, yarn.lock, Pipfile.lock, Gemfile.lock, go.sum) pins every dependency in your tree to a specific version and, in most cases, a specific content hash. Lockfiles are the single most important supply chain security control that most teams already have but do not properly enforce.
Why lockfiles matter
Without a lockfile, your build system resolves dependencies at install time, potentially pulling newer versions that include malicious code. The event-stream attack was effective precisely because applications that ran npm install without a lockfile (or that did not pin to a specific version) automatically pulled the compromised version.
Content hashes are the critical feature. A lockfile that records only version numbers protects against version changes but not against a compromised registry serving different content for the same version number. Content hashes (integrity fields in package-lock.json, hashes in Pipfile.lock) ensure that the exact bytes you reviewed are the exact bytes that get installed. If the content changes, the hash will not match, and the install will fail.
Enforcing lockfile integrity
- Always commit lockfiles to version control. This is non-negotiable. Lockfiles must be tracked in Git alongside the code that depends on them.
- Use deterministic install commands in CI/CD. Use
npm ciinstead ofnpm install,pip install --require-hashesinstead of barepip install, and equivalent commands in other ecosystems. These commands fail if the lockfile does not match, rather than silently updating it. - Review lockfile changes in pull requests. When a lockfile changes, that change should receive the same code review scrutiny as any other code change. A lockfile diff that adds unexpected packages or changes content hashes warrants investigation.
- Detect lockfile manipulation. Attackers who gain access to a repository may modify the lockfile to point to malicious package versions. CI/CD pipelines should verify that lockfile changes correspond to intentional dependency updates, not unauthorized modifications.
Reproducible builds
A reproducible build is one that produces byte-for-byte identical output when given the same source code and build environment. Reproducibility proves that the binary you are running was built from the source code you reviewed, with no modifications injected during the build process. This directly addresses the SolarWinds-style attack where the build pipeline itself was compromised.
Achieving full reproducibility is challenging, as build tools often embed timestamps, file ordering, or environment-specific paths into output. But even partial reproducibility, where builds produce consistent dependency resolution and deterministic compilation, significantly raises the bar for build pipeline attacks.
Vendoring vs. pinning vs. floating dependencies
Teams have three fundamental strategies for managing how dependencies are resolved, each with different trade-offs between security, maintainability, and operational overhead.
Floating dependencies
Floating dependencies use version ranges (^1.2.3, ~1.2.3, >=1.0.0) that allow the package manager to pull newer versions within the specified range. This is the default behavior for most ecosystems and the least secure option. A compromised patch release will be automatically pulled into your next build. The only defense is the lockfile, and if the lockfile is regenerated (as happens during routine dependency updates), the malicious version will be locked in.
Pinning dependencies
Pinning specifies exact version numbers (1.2.3 with no range operators) for every direct dependency. Combined with a lockfile, this prevents your build from pulling unexpected versions. However, pinning creates maintenance overhead: you must explicitly update every dependency version, which means security patches are not applied automatically. Teams that pin without a disciplined update process often end up running outdated dependencies with known vulnerabilities.
Vendoring dependencies
Vendoring copies the full source code of every dependency into your repository. Your build uses the vendored copy, never fetching from the public registry. This provides the strongest supply chain security because your build is entirely self-contained: a compromised registry, a deleted package, or a malicious update cannot affect you. The Go ecosystem has native vendoring support (go mod vendor), and tools exist for other ecosystems.
The trade-off is repository size and update complexity. A vendored node_modules directory can be hundreds of megabytes. Updating a vendored dependency requires re-vendoring and reviewing the changes. But for high-security applications, the isolation vendoring provides is worth the overhead.
Our recommendation: For most teams, the right approach is pinned direct dependencies, a committed and enforced lockfile with content hashes, and automated dependency update tools (Dependabot, Renovate) that create pull requests for review. Vendoring is appropriate for critical infrastructure, air-gapped environments, or applications where build reproducibility is a regulatory requirement.
Private registry security
Organizations that publish internal packages must secure their private registries with the same rigor they apply to production infrastructure. A compromised private registry is a direct path to compromising every application that depends on it.
Preventing dependency confusion
The dependency confusion attack works because package managers check public registries for packages that should only exist in private registries. Defenses include:
- Scoped packages. Use organization scopes (
@yourcompany/package-nameon npm) for all internal packages. Register the scope on the public registry even if you do not publish public packages under it, to prevent attackers from claiming it. - Registry configuration. Configure package managers to use your private registry exclusively for your organization's namespace. On npm, use
.npmrcto route scoped packages to your private registry and everything else to the public registry. - Reserved names. Publish placeholder packages on the public registry for every internal package name, with version numbers higher than any you use internally. This prevents attackers from claiming those names.
Access control and audit logging
Private registries should enforce strict access control: who can publish packages, who can read packages, and who can administer the registry itself. Publish access should require multi-factor authentication and be limited to CI/CD service accounts or designated maintainers. Every publish event, authentication attempt, and configuration change should be logged and monitored.
Open source maintainer risk assessment
Before adding a dependency, evaluate not just the code but the people and processes behind it. A technically excellent library maintained by a single, anonymous individual with no security practices is a higher risk than a less elegant library maintained by a well-resourced team with established security processes.
Factors to assess
- Maintainer count and activity. A single maintainer is a single point of failure and a social engineering target. Look for projects with multiple active maintainers and recent commit history.
- Security practices. Does the project have a security policy (SECURITY.md)? Do maintainers use 2FA? Are releases signed? Is there a vulnerability disclosure process?
- Funding and sustainability. Projects that are critical infrastructure but maintained by unpaid volunteers are at risk of maintainer burnout and social engineering attacks. Check whether the project has financial backing through GitHub Sponsors, Open Collective, or corporate sponsorship.
- Dependency depth. A package that pulls in 50 transitive dependencies increases your exposure. Prefer packages with minimal dependency trees.
- OpenSSF Scorecard. The Open Source Security Foundation Scorecard project automatically evaluates open source projects for security practices like branch protection, code review, CI testing, signed releases, and vulnerability reporting. A low Scorecard score is a red flag.
This assessment does not need to be exhaustive for every dependency. Focus your evaluation effort on direct dependencies, dependencies that handle sensitive data (cryptography, authentication, data serialization), and dependencies that are deeply embedded in your application's critical paths.
Sigstore, SLSA, and software provenance
The next generation of supply chain security is built around the concept of provenance: cryptographic proof of where software came from, how it was built, and who was involved at each step. Two initiatives are driving this forward.
Sigstore
Sigstore is an open source project that provides free code signing for open source software. It solves a long-standing problem: traditional code signing requires developers to manage private keys, which are frequently compromised or lost. Sigstore uses short-lived certificates tied to developer identity (through OIDC providers like GitHub or Google), so there are no long-lived keys to protect.
Sigstore's components include Cosign for signing container images and blobs, Fulcio as a certificate authority that issues short-lived certificates, and Rekor as a transparency log that records all signing events. Together, they create an auditable chain of custody for software artifacts.
npm and PyPI have both integrated Sigstore-based provenance attestations, allowing package publishers to cryptographically prove that a specific package version was built from a specific source commit using a specific CI/CD workflow. This makes it significantly harder for an attacker who compromises a maintainer's account to publish a malicious version without detection: the provenance attestation would not match.
SLSA (Supply chain Levels for Software Artifacts)
SLSA (pronounced "salsa") is a security framework that defines four levels of supply chain integrity, from basic (Level 1) to comprehensive (Level 4). Each level adds requirements for build process documentation, hermetic builds, and provenance verification.
| SLSA Level | Requirements | What It Prevents |
|---|---|---|
| Level 1 | Build process is documented and produces provenance metadata | Unknown build processes, missing audit trail |
| Level 2 | Build service generates and signs provenance automatically, source is version-controlled | Tampered provenance, unversioned source modifications |
| Level 3 | Build platform is hardened, provenance is non-forgeable, builds are isolated | Compromised build environments, cross-build contamination |
| Level 4 | Hermetic builds, all dependencies declared, two-person review of source | SolarWinds-style build pipeline compromises, insider threats |
SLSA Level 3 is the practical target for most organizations. It requires that builds run on a hardened, hosted build platform (like GitHub Actions or Google Cloud Build), that provenance is generated automatically by the platform (not by the developer), and that the build process is isolated so that one build cannot influence another. This directly addresses the class of attacks where the build pipeline itself is the target.
SLSA Level 4 adds hermetic builds (no network access during build) and two-person review requirements, which are appropriate for critical infrastructure but operationally demanding for most teams.
Supply chain security checklist for engineering teams
This is a practical, prioritized checklist for engineering teams who want to improve their supply chain security posture. Items are ordered roughly by impact relative to implementation effort.
Foundation (implement immediately):
Commit lockfiles to version control and enforce them in CI/CD with deterministic install commands (npm ci, pip install --require-hashes).
Enable automated dependency scanning (Dependabot, Snyk, or Renovate) and triage alerts within defined SLAs.
Require multi-factor authentication on all package registry accounts, especially those with publish access.
Pin direct dependencies to exact versions. Use lockfile content hashes for integrity verification.
Review lockfile changes in every pull request with the same scrutiny as code changes.
Intermediate (implement within a quarter):
Generate SBOMs automatically in your CI/CD pipeline and store them with release artifacts.
Implement dependency confusion protections: scoped packages, registry configuration, reserved public names.
Evaluate direct dependencies using OpenSSF Scorecard and maintainer health metrics before adoption.
Audit install scripts (preinstall, postinstall) in npm packages. Consider using --ignore-scripts by default and explicitly allowing scripts for trusted packages.
Establish a process for responding to supply chain incidents: who gets notified, how are affected builds identified, what is the rollback procedure.
Advanced (implement as your program matures):
Verify Sigstore provenance attestations for critical dependencies.
Work toward SLSA Level 3 for your own build pipelines: hardened build service, automatic provenance generation, isolated builds.
Implement a private registry or caching proxy (Artifactory, Nexus, Verdaccio) to control what packages enter your build environment.
Vendor critical dependencies for air-gapped or high-security environments.
Conduct periodic manual security reviews of your most critical and least-maintained dependencies.
The organizational dimension
Supply chain security is not purely a technical problem. It is an organizational one. Someone needs to own the dependency inventory. Someone needs to triage vulnerability alerts. Someone needs to make the call on whether a new dependency is worth the risk it introduces.
In most organizations, nobody owns this. Dependency management falls into the gap between development teams (who add dependencies to ship features), security teams (who worry about vulnerabilities but do not control the dependency tree), and platform teams (who manage build infrastructure but do not evaluate package security). The result is that nobody is accountable for supply chain risk until an incident forces the question.
Effective supply chain security requires clear ownership: a team or role responsible for dependency governance, tooling, and incident response. This does not need to be a large team. It needs to be a defined responsibility with authority to set policy (for example, requiring OpenSSF Scorecard evaluation for new dependencies) and the tooling to enforce it.
The investment is small compared to the alternative. A single supply chain compromise can require emergency remediation across every application in your portfolio, customer notifications, regulatory reporting, and months of forensic investigation. Building the controls described in this article is a fraction of that cost.
Secure Your Software Supply Chain
We help engineering teams identify supply chain risks through dependency audits, secure code reviews, and CI/CD pipeline assessments. Find the vulnerabilities before attackers find them for you.
Book a Consultation CI/CD Security Guide