Dependency Confusion: How I Hacked Into Apple, Microsoft and Other Companies

Dependency Confusion: How I Hacked Into Apple, Microsoft and Other Companies(medium.com)

1107 points by Robadob 5 years ago | 402 comments

ryukafalz 5 years ago |

I see a lot of people saying things like "this is why package signing is important" and "we need to know who the developers are" and "we need to audit everything." Some of that is true to some degree, but let me ask you this: why do we consider it acceptable that code you install through a package manager implicitly gets to do anything to your system that you can do? That seems silly! Surely we can do better than that?

This article from Agoric is extremely relevant here, from a previous such incident (re: the event-stream package on npm): https://medium.com/agoric/pola-would-have-prevented-the-even...

Put simply: in many cases, the dependencies you install don't need nearly as much authority as we give them right now. Maybe some of these packages need network access (I see a few named "logger" which might be shipping logs remotely) but do they need unrestricted filesystem access? Probably not! (They don't necessarily even need unrestricted network access either; what they're communicating with is likely pretty well-known.)

eecc 5 years ago | |

Uh, well the original developers of the Sun JVM didn’t do such a bad job after all when designing it: https://docs.oracle.com/javase/7/docs/technotes/guides/secur...

WatchDog 5 years ago | | |

Javas security manager system, is usually not in effect for the majority of use cases. While maven/grade dependencies, can't run code on installation, generally once the application is ran/tested, it will be with full user permissions, not under a security manager.

The security manager is an additional layer of security that most languages don't have, however Java applets have shown it to be full of holes and generally unsuitable for running untrusted code.

The applet security posture has contributed a great deal towards negative opinion towards the language, probably would have been better off never having existed.

szc 5 years ago | | |

The Sun JVM, as originally implemented, can express operations that are not valid for Java objects. There are parts of the JVM that attempt to constrain opcode sequences to only be from "valid java compilers operating on java objects".

In 1996, Java was being overwhelmed by exploits because the mapping of the language to the VM was not well matched. There was a Java summit with lots of interesting people. This summit was also when Sun got confirmation that MicroSoft had quite a few engineers working on an independently implemented runtime. To Sun's credit, they did get rather more serious about Java security -- but they had already created a rocky foundation.

It is my opinion, that the business model Sun had "in mind" for Java was a free runtime for everyone that they were in control of, but to make money from selling an "official" Java compiler suite.

I do not believe that the Sun Java JVM was created with security in mind.

easton 5 years ago | |

I believe that Deno (the "successor" to Node being written by Ryan Dahl) is supposed to fix this for server-side JavaScript/TypeScript. It doesn't grant any permissions to anything unless you specifically give them out (so you can say that only a specific module gets access to the filesystem, for instance, and on top of that it can only access /srv and not /etc).

https://deno.land/manual@v1.7.2/getting_started/permissions

ryukafalz 5 years ago | | |

This looks like it's... getting there, but still too coarse-grained. It looks like those permissions are granted to the whole Deno process? So if your program needed both access to sensitive data on the filesystem and network access, and it used a malicious dependency, that dependency could take advantage of those permissions and exfiltrate that data.

I could be wrong, but I don't see any mention of permissions on imported code: https://deno.land/manual@v1.7.2/examples/import_export

nogbit 5 years ago | | |

I think this is critical. The actual runtime of any code needs to do way more than what it’s doing now.

Simply relying on package signing and the like permits trusted but malicious actors. With Deno packages configured well it can really lock down and limit a ton of attack vectors.

bluetech 5 years ago | |

This is so obviously what needs to happen, it's really surprising it's not a feature in all major languages by now. I bet in 10 years time, giving dependencies complete control would seem crazy.

Here is an interesting proposal on how to possibly get there in JS with import maps: https://guybedford.com/secure-modular-runtimes

Deno uses ambient permissions for the entire process and unfortunately missed the opportunity to do it right.

detaro 5 years ago | |

Indeed, being able to apply capabilities on a package level would be great, but I don't know many languages/environments that implement this as a first-class feature.

newhouseb 5 years ago | | |

The WASM ecosystem is exploring this through the use of what they call "nanoprocesses" wherein libraries are wrapped into modules and provided access to nothing by default [1]. This seems to be more of a pattern and consequence of how WASM works than a specific feature.

1. https://hacks.mozilla.org/2019/11/announcing-the-bytecode-al... (ignore the title, it's irrelevant to the excellent explanation that constitutes 70% of the post)

ryukafalz 5 years ago | | |

Yeah. JavaScript is probably the closest to being there (with things like SES[0], LavaMoat[1], etc.) but we're not quite there yet. It's just shocking that this sort of thing is as seemingly obscure as it is; it's like the whole industry has collectively thrown up their hands and said code execution is unavoidably radioactively dangerous. (While simultaneously using package managers that... well.) But it doesn't have to be!

[0] https://github.com/Agoric/ses-shim

[1] https://github.com/LavaMoat/LavaMoat

eecc 5 years ago | | |

Java does. Of course it’s never been used systematically and it has received precious little attention to DevOps ergonomics, but the infrastructure is there

dwohnitmok 5 years ago | | |

Safe Haskell is one in this vein (it's lower level and you would apply a capability layer on top), although like other past efforts on this front it's mostly languished in obscurity even among the Haskell community and is used by very few people.

The main hope at the moment seems to be JS.

bryanrasmussen 5 years ago | |

>why do we consider it acceptable that code you install through a package manager implicitly gets to do anything to your system that you can do?

I thought it was because operating systems still use access based instead of capabilities based security?

ptx 5 years ago | |

Could you solve this in Java using the SecurityManager stuff that was used to sandbox applets, or is all that considered broken these days? (I'm not sure if you can different SecurityManagers for different parts of the app though.)

jjav 5 years ago | | |

Yes, with Java you can.

That's how the web/application server containers worked (probably still do, but I've been disconnected). The server classes have different permissions from the application code classes (loaded from the .war/etc files). If an application code method calls into a system class, the permissions which apply are those or the application since that method is in the calling stack frame.

I wrote this support into several Java web container and J2EE application server products back in the day. AFAIK, all that still works great today in Java.

ryukafalz 5 years ago | | |

I'm not familiar enough with Java to have a strong opinion on this, but this HN comment from the linked article mentions that you can only have one SecurityManager per app, so sounds like that's still too coarse-grained: https://news.ycombinator.com/item?id=18599365

thw0rted 5 years ago | |

I'm reading through all these responses and it sounds like nobody read the article. Everybody keeps bringing up JVM SecurityManager, or how granular Deno's permission system is, or a syntax for granting runtime permissions to modules (like your Agoric link). That's not what happened here. The actual attack in the article was a post-install script run by the package manager. That means whatever kind of limits you might place on runtime capabilities of the library wouldn't have mattered. You need a system that lets the package installer request granular permissions from the package manager, where the package manager runs the scripts in a sandbox and only explicitly-provided privileges are granted. I don't know of any package managers that support this feature today.

safog 5 years ago | |

This is a really nice idea but considering we haven't even solved the relatively simple case of users giving permissions to apps and expecting them to behave responsibly, I'm not optimistic that we can solve the much more challenging case of importing library code.

e.g., If someone gives an app the ability to upload photos, it can silently read all photo metadata, upload all photos to a private server instead of uploading just the single photo that the user picked. This can be solved with OS level standard photo pickers but it hasn't been yet.

Same with package code. Maybe a package needs network access for stuff it genuinely needs to do. However it can (and probably will) at some point go above and beyond in the amount of data it collects. FB Mobile SDK outage is a good example of this. https://www.bugsnag.com/blog/sdks-should-not-crash-apps

nahuel0x 5 years ago | |

Giving fine-grained permissions to sandboxed libraries is the way forward, and probably a really good use case for WASM.

edejong 5 years ago | |

It sounds like you are interested in a (distributed) capability-based security model. [1]

[1] https://enacademic.com/dic.nsf/enwiki/295618

ryukafalz 5 years ago | | |

Yes, I didn't use the term explicitly in my comment but you're precisely right. ;)

Agoric (the company whose blog post I linked to) and the people behind it have done a ton of object capability work over the years.

jjn2009 5 years ago | |

Its unfortunate that the proposed realms is still just a proposal. Even still I've heard many arguments that since the method of isolation lives inside JS it cannot be expected to be entirely secure and you would be much better off relying on OS level security primitives, a point that comments I've read so far completely glosses over. I'd love for someone to prove me wrong that this is air tight so we can champion realms at my work.

POLA is good to live by regardless if it can be implemented.

zomgwat 5 years ago | |

Not a complete answer (by any means) but keeping tight control over egress network access helps (I wish it was easier to limit egress access over port 443).

Systemd has some capability to restrict access to system resources. I haven't experimented with the capabilities yet so not sure what's all there.

linuxftw 5 years ago | |

I'm late to this conversation, but the thing that can restrict "doing anything" on your machine is SELinux.

Communitivity 5 years ago |

This doesn't surprise me. Horrify.. yes.

I've noticed more dev teams succumbing to the temptation of easiness that many modern package managers provide (NPM, Cargo, Ivy, etc.) - especially as someone who has to work with offline systems on a regular basis.

Because of that ease there are fewer tools and tutorials out there to support offline package management. There are more for using caches, though these are often along the lines of either 'the package manager will do this for you and it just works (but in case it doesn't, delete node_modules or cargo clean and re-try)', or stand up a dependency server on your own machine with these proxy settings (which has it's own security issues and is frequently disallowed by IT cybersecurity policies).

As an example, many blog articles I found a while back suggest using yumdownloader from the yum-utils package. This is unfortunately not reliable, as there are some packages that get skipped.

I have found I need to script reading a list of dependencies from a file; then for each dependency: create a directory for it, use repotrack to download its RPM and it's transitive dependency RPMs in the dependency's directory; then the script aggregates all the RPMs into one directory, removes the OS installed RPMs, uses createrepo to turn that directory into a RPM repository, and then makes an USF ISO image out of the directory for transfer onto offline system and installation.

kasperni 5 years ago |

I'm surprised the reverse fully-qualified domain name (FQDN) model used by Java isn't more widely adopted. If you want to upload artifacts to the main repository (Maven Central) you first need to show ownership of a particular domain. For example, via a DNS TXT record (example [1]). Would make these kind of attacks a lot more difficult.

[1] https://issues.sonatype.org/browse/OSSRH-61509

rachelbythebay 5 years ago |

I’m cackling at how great this is. This is what happens when you trust the internet forever and just scarf down any old thing at build time. Of course it’ll get exploited! That’s what evil people do.

nstart 5 years ago |

This post seems like a good time to note that by default, there's no direct way to verify that what you are downloading from dockerhub is the exact same thing that exists on dockerhub [1].

Discovered after seeing a comment on HN about a bill of materials for software, i.e., a list of "approved hashes" to ensure one can audit exactly what software is being installed, which in turn led me to this issue.

[1] - https://github.com/docker/hub-feedback/issues/1925

guax 5 years ago | |

I remember when we used to sign binaries and packages and nobody checked the pgp files anyways. We could have something similar better today, just need to be automated enough.

beermonster 5 years ago | |

I think image signing support (or at least was) is not as good as it can be. It would be nice if more images were signed by publishers and verification performed by default.

Even then, that only gives you a stronger indication that the image hasn't been altered since it was signed by the image author at any point after it being signed. However it is not a guarantee that the source produced the binary content. It's also not a guarantee that the image author knew what they were signing - though this is a different issue.

Debian has a reproducible builds initiative[1] so people can compile packages themselves and them match byte for byte what Debian built. Not sure how far they've got with that.

https://wiki.debian.org/ReproducibleBuilds

iam-TJ 5 years ago | | |

Approximately 25,000 of just over 30,000 source packages are now reproducible builds - generating over 80,000 binary packages. See the graphic on the page you linked to:

https://tests.reproducible-builds.org/debian/unstable/amd64/...

noisenotsignal 5 years ago | | |

You can enable client enforcement of Docker Content Trust [1] so that all images pulled via tag must be signed. Whether people are actually signing their images is a different question that I don't know the answer to.

[1] - https://docs.docker.com/engine/security/trust/#client-enforc...

DecoPerson 5 years ago |

Imagine we navigated the web using a command line tool called “goto” which works exactly like a package manager. If I want to open my bank’s site, I type “goto mybank” .

I could easily find myself in trouble, because:

- There’s no autocomplete or bookmarks, so typos are easy.

- If “mybank” is a name provided by my company’s name server, I could find myself redirected to the public “mybank” entry because Mr. Not-A-Hacker says his name entry is more up to date (or because I forgot to tell ‘goto’ to check the company name server.)

- There’s no “green padlock” to check while I’m actively using the destination site. (Though at this point it’s too late because a few moments after I hit enter the destination site had the same access to my machine & network that I do from my current terminal.)

- A trusted site may later become malicious, which is bad due to the level of unrestricted and unmonitored access to my PC the site can have.

- Using scripting tricks, regular sandboxed browser websites can manipulate my clipboard so I paste something into ‘goto’ that I didn’t realize would be in my clipboard, making me navigate to some malicious site and giving it full access to my machine (if ‘sudo’ as added to the front).

This is just a few cases off the top of my head. If ‘goto’ was a real thing, we’d laugh it into being replaced by something more trustable.

How have current package managers not had these vulnerabilities fixed yet? I don’t understand.

mavhc 5 years ago |

https://security.googleblog.com/2021/02/know-prevent-fix-fra...

At Google, we have those resources and go to extraordinary lengths to manage the open source packages we use—including keeping a private repo of all open source packages we use internally

ex_amazon_sde 5 years ago |

Ex-Amazon SDE here.

> a unique design flaw of the open-source ecosystems

This is a big generalization.

Inside Amazon, as well as in various Linux distributions, you cannot do network traffic at build time and you can only use dependencies from OS packages.

Each library has its own package and the code and licensing is reviewed. The only open source distribution that I know to have similar strict requirements is Debian.

[I'm referring to the internal build system, not Amazon Linux]

[Disclaimer: things might have changed after I left the company]

trishankkarthik 5 years ago | |

I work in this area. This is not a supply chain attack. This is a typosquatting "attack" people keep rediscovering every year or two.

I know, because I wrote an as yet unpublished paper on safely pulling packages from private and public repos.

acdha 5 years ago | | |

I think you’re getting downvoted because your point is obscured by the confrontational tone. Argument by authority is especially unconvincing when you aren’t using common terms correctly. In normal usage, “typosquatting” refers to someone registering common misspellings in a shared namespace. As clearly described in the post this is not that but rather exploiting non-obvious differences in the order in which different namespaces are checked.

Using terms correctly is especially important in security: someone who read your comment might incorrectly believe that this did not affect them because they are using the correct names for all of their dependencies.

ex_amazon_sde 5 years ago | | |

This is not correct.

Installing packages only from a trusted (and signed) source protects against typosquatting, misread or confusing package names and many other risks.

pgo 5 years ago | | |

There are no typos here, the root issue is package manager preferring public packages of the same name over private ones

psanford 5 years ago | | |

There's no typos in this attack.

itake 5 years ago | | |

Can you share a link to the paper? My email is in my HN bio.

donaldihunter 5 years ago |

This was inevitable from the moment we let build systems and runtime systems fetch things automatically and unsupervised from public repos. This is the simplest and most blatant approach yet, but taking ownership of existing projects and adding malicious code is an ongoing problem. Even deleting a public project can have the effect of a DOS attack.

When I first used maven, I was appalled by how hard it was to prevent it from accessing maven central. And horrified to see karaf trying to resolve jars from maven central at run time. What a horrible set of defaults. This behaviour should be opt-in, disabled by default, not opt-out through hard to discover and harder to verify configuration settings.

brabel 5 years ago | |

Funny that you mention Maven, because Maven is not really vulnerable to this kind of attack simply because it requires a groupId in all dependencies, and to publish under a certain groupId you must prove control of the domain it refers to, which makes this attack nearly impossible (it's only possible if you use an internal groupId which is not controlled by you on Maven Central, AND an attacker could claim that groupId successfully with Sonatype, AND you configure Maven to first look at Maven Central, and only then at your internal repos which would be stupid to do as you normally do the exact opposite - and most enterprise setups won't even proxy to Maven Central at all).

Also, Maven uses pinned versions, normally, and won't just download whatever newer minor version happens to be published when it builds, which again makes this attack quite hard to pull off.

donaldihunter 5 years ago | | |

Back then it would have been maven 2 which supported version ranges in a similar way to OSGi manifests. But I really only mentioned maven as the first build tool I used which reached out to public repos uninvited and could break my builds as a consequence of that.

keyle 5 years ago |

I'm flabbergasted by how silly this is. Bump the version and the package manager chooses yours online vs. the private one. Amazing. How silly and how expensive is this going to be as this blatant security issue is going ripple on for the next months to come.

prepend 5 years ago | |

This is why explicit pins are a good idea. Whenever you finish a project you should set the explicit versions in the lock and then tag it. The problem is with dependencies of your dependencies, but if they are public, then by their nature they won’t be using private packages that can be hijacked.

frakkingcylons 5 years ago | | |

Even public packages have been hijacked. Pin all your dependencies (transitive included) and then use automation (e.g. dependabot) to update the pinned versions as needed.

thw0rted 5 years ago | |

I don't think it actually works this way for NPM specifically, if you're using scoped packages correctly. I believe you can associate a scope with one (private) repo and it will not fall back on the public repo, or choose newer / higher-numbered versions on the public repo over a version from the private one.

beermonster 5 years ago | |

Debian has apt pinning for this kind of thing.

ehnto 5 years ago |

Pulling packages down at build time seems ludicrous to me, I can understand it in a development environment, but I don't understand how "Pull packages from the public internet and put them into our production codebase" past any kind of robustness scrutiny.

I guess it's a case of the ease of use proving too great, so convenient in fact that we just kind of swept the implications under the rug.

jtsiskin 5 years ago |

That is insane that any company allowed this to happen.

""That said, we consider the root cause of this issue to be a design flaw (rather than a bug) in package managers that can be addressed only through reconfiguration," a Microsoft spokesperson said in the email."

No, npm has scopes for a reason, why would that not fix this issue?

fillest 5 years ago | |

Probably, it's more fun to play with syscall filtering in containers or with fuzzers than to review side-channels or educating coworkers. Therefore, security theater.

asiachick 5 years ago | |

Isn't it considered best practice to be secure by default? Wasn't that big fiasco with MongoDB? Why should PyPI, RubyGems, or npm be any different? I'm sure there is some reason but I'd expect them to all pull private repos before public.

Maybe the bug wasn't explained correctly but if it prefers public over private that seems like a bug.

OTOH, it certainly is an issue that if you forget and happen to test some code without being configured to have the private package server as your default then you'd get public repos.

Maybe instead of named packages companies should be using private URLs for packages. That way you always get what you ask for?

joepie91_ 5 years ago | | |

npm does not have any 'private package' functionality at all, instead you point it at a different registry server (using eg. Verdaccio or Artifactory) which then serves local packages and proxies public packages if they don't exist locally - or at least that's what they're supposed to do.

Artifactory apparently didn't, and served up whichever was the highest version of public vs. private. Which is stupid.

But the bottom line is that when using npm, the exact package selection policy is determined by whatever registry implementation you're talking to, and so it's the registry implementation which should prioritize private packages by default.

Cthulhu_ 5 years ago | |

Scopes were only introduced in NPM 2, and iirc it's still an optional feature. Companies that used NPM early on may have opted to never use those.

But that's just NPM, it's an issue in all of the mentioned package managers.

high_byte 5 years ago | | |

true. simplest solution is just always prioritize internal over external.

Triv888 5 years ago | |

Automatic updates from poor sources is probably a bad idea anyways... whether they prioritize local packages or not. (I.E.: Play Store, PyPy, etc...)

jbverschoor 5 years ago |

It won't be just companies. It'll be developers, sysops, etc who npm install a bazillion of packages, because the core language and libaries are not enough. Those people have keys, credentials and access to the internal networks.

randomsearch 5 years ago | |

Yep a big part of npm’s problems are actually just flaws with JS. Comparing with Java, it’s insane how many dependencies you have to manage, and that batteries are not included (esp weird when you consider front end apps involve downloading the code!).

nightpool 5 years ago |

The article mentions that RubyGems is vulnerable to this, and that Shopify in particular downloaded and ran a gem named "shopify-cloud", but I'm curious as to how this is possible given a "normal" bundler pure-lockfile setup, or more generally the source-block directives I've seen in most Gemfiles.

That is, given a Gemfile.lock like, e.g.

    GIT
      remote: https://github.com/thoughtbot/appraisal
      revision: 5675d17a95cfe904cc4b19dfd3f1f4c6d54d3502
      specs:
        appraisal (2.1.0)
          bundler
          rake
          thor (>= 0.14.0)

How would Bundler ever try and download the `appraisal` gem from RubyGems?

The Gemfile section is more explicable. While newer Gemfiles look like this:

    source "http://our.own.gem.repo.com/the/path/to/it" do
      gem 'gemfromourrepo'
    end
    # or
    gem 'gemfromourrepo', source: "http://our.own.gem.repo.com/the/path/to/it"

Older Gemfiles apparently looked like the following:

    source 'https://rubygems.org'
    source 'http://our.own.gem.repo.com/the/path/to/it'

    gem 'gemfromrubygems1'  
    gem 'gemfromrubygems2'
    gem 'gemfromourrepo'

Which seems obviously vulnerable to the dependency confusion issue mentioned.

So is the understanding that Shopify's CI systems were running `bundle upgrade` or another non-lockfile operation? (possibly as a greenkeeper-like cron job?) Or is `--pure-lockfile` itself more subtly vulerable?

siraben 5 years ago |

This attack demonstrates one of the problems outlined in the Nix thesis[0], that is the problem of nominal dependencies. That is, dependencies of the dependencies, build flags and so on are not taking into account, and in particular, the source of a package.

Nix makes it possible to query the entire build time and runtime dependency graph of a package, and because network access during build time is disabled, such a substitution attack would be harder to pull off.

The declarations for how the source is downloaded is specified declaratively and can be pinned to a specific commit of a specific Git repository, for instance.

[0] https://edolstra.github.io/pubs/phd-thesis.pdf

andrejserafim 5 years ago |

Why would you want your CI to depend on an external source. Say a legit upgrade happened, but it has a breaking change. Now your build is broken.

Fixed versions for as many things as you can (including OS images, apt packages, Docker images, etc) lead to changes in your CI under your control.

Sure, you have to upgrade manually or by a script. But isn't plain build stability worth it? Not even talking about security.

HereBeBeasties 5 years ago | |

It probably doesn't. But are you saying devs never updates their dependencies?

andrejserafim 5 years ago | | |

When one updates internal dependency versions one usually has to find them. At least that was the story with my gigs. So there's a listing somewhere.

So you wouldn't get a random version even considered.

Version shadowing and overriding is a totally different concern of course.

koolba 5 years ago |

> The packages had preinstall scripts that automatically launched a script to exfiltrate identifying information from the machine as soon as the build process pulled the packages in.

Pre and post install scripts in NPM packages are such a terrible idea. Even when it’s not malware, it usually just a nagging donation request with a deliberate “sleep 5” to slow down your build and keep the text displayed.

heipei 5 years ago |

Previous discussion: https://news.ycombinator.com/item?id=26081149

database_lost 5 years ago | |

and the link to the original author's post: https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...

jl6 5 years ago |

The real solution is to design and build software components that can be finished, so they can be ruthlessly vetted - rather than the endless churn of updates.

Daho0n 5 years ago | |

Gamers often complain how they become free QA testers if they buy a game in the first few months after release as most games are full of bugs (hi Bethesda!) but it is way worse in things like JavaScript libraries etc. It's as if finished have become a foreign word to most developers. Look at the resent story about Linux stable kernels that have had more than 255 minor releases and think how much of a shit show it would have been if they added features too like most developers do. The excellent small stable tools of Unix should have taught us something.

mschuster91 5 years ago | | |

> Look at the resent story about Linux stable kernels that have had more than 255 minor releases and think how much of a shit show it would have been if they added features too like most developers do.

At least some distribution kernels do new feature backports - mostly to support new hardware on LTS versions (like e.g. Ubuntu did for the Raspberry Pi, see https://github.com/raspberrypi/linux/issues/3464).

CuriousNinja 5 years ago | |

Not sure why parent is being down-voted as I believe this is an important point. In my opinion this would be applying the unix philosophy of having small tools that does one thing and does it well to code libraries.

hobofan 5 years ago | | |

Because as long as the underlying hardware and technology overall keeps progressing there isn't much practicality in "finishing" software.

Sure you could just "finish" Linux at 5.0 and then introduce e.g. io_uring via Linux-with-io_uring 1.0 instead of adding it to Linux 5.1. Same goes for all the libraries that add support for io_uring.

Yes, you could "finish" some software on the feature level, but you would still need to maintain it if you want to add support for new platforms, etc., or it will become obsolete sooner or later. In the case of still maintaining libraries, this would solve nothing in the context of this attack vector.

HereBeBeasties 5 years ago | | |

But this is exactly the philosophy in the NPM ecosystem where things like left-pad are rife. And NPM is generally considered a dumpster fire precisely because you need 8,000 deps for relatively "simple" projects like a basic create-react-app project.

forty 5 years ago |

I don't understand why there is this issue. We publish our internal npm packages in the @company namespace and we own this namespace on the public npm registry. Problem solved, isn't it?

andrethegiant 5 years ago | |

Yes, I'm confused by this too. Scoped packages on npm solves this problem, yet it isn't mentioned in the article at all.

walrus01 5 years ago |

npm in particular has been problematic for a long time:

https://naildrivin5.com/blog/2019/07/10/the-frightening-stat...

https://techbeacon.com/security/check-your-dependencies-gith...

https://thenewstack.io/npm-password-resets-show-developers-n...

joepie91_ 5 years ago | |

Except this wasn't a problem with npm but rather with private registry implementations, and a setup with npm + Verdaccio is apparently actually one of the few configurations that isn't vulnerable to this problem.

Not that I didn't expect someone to immediately take the opportunity to complain about npm, of course, despite it having nothing to do with the problem at hand... as has become tradition in tech circles.

ipsum2 5 years ago |

Package management isn't what I initially think of when I hear "supply chain". Neat hack! It's like left-pad but malicious.

numbsafari 5 years ago | |

It should be. If you are a developer, your package manager, OS distributions, and any commercial software you use is all part of your supply chain.

Your code is what it depends on.

nine_k 5 years ago |

randompwd 5 years ago | |

Doesn't cover the same attack vector, and that link is a piece of fiction.

nine_k 5 years ago | | |

It's a piece of fiction, of course, but explains a very similar attack vector in colorful details.

Using DNS for exfiltration is a nice trick but does not change much in the general approach.

umvi 5 years ago |

> I have been fascinated by the level of trust we put in a simple command like this one

sigh... am I the only one that likes environments where you can run simple commands to install stuff and you can generally trust your package managers? All the security folks love to act dumbfounded when people trust things, but post-trust environments have terrible UX in my experience. I hate 2FA, for example, because now I have to tote my phone around at all times in order to be able to access any of my accounts. If I lose my phone or my phone is stolen while travelling, I'm hosed until I can figure out how to get back in.

> So can this blind trust be exploited by malicious actors?

Yes, it can. Trust can always be exploited by malicious actors, and no amount of software can change that. And it creates a world that sucks over time. Show me a post-trust, highly secure environment that isn't a major PITA to use. And not just for computers. I'm sure you could use social engineering to abuse trust of customer service reps (or just people in general) and do bad things, and the end result will be a world where people are afraid do any favors for other people because of the risk of getting burned by a "malicious actor".

vlovich123 5 years ago |

Does this work with AOT compiled languages? Surely the fake packages that get uploaded don't know the structure of the internal libraries enough, so for something like Cargo this would just cause in your build suddenly failing mysteriously & easy to spot. A build.rs could probably do some damage to your build systems temporarily for the 1 or 2 days (if not hours) it takes for engineers to track down what's happening.

alkonaut 5 years ago |

What I don’t get from the article is the reasoning behind the design that the central repository “wins” over the local/override repository.

How was that design chosen, not just once but in all 3 of those large package ecosystems. Did pypi/gems/node borrow their design from each other given their similarity in other aspects?

Are there any situations where this behavior is desired?

Does any of the other ecosystems have flaws like this (nuget, cargo..)?

carols10cents 5 years ago | |

Cargo will not ever look on crates.io for a library if you specify a `registry` attribute for a dependency. https://doc.rust-lang.org/cargo/reference/registries.html

xurukefi 5 years ago |

I never understood why these package repositories don't include some (opt-in?) integrity checking option using digital signatures. If I download code that executes on my machine there should be at least the option to establish some level of trust. We have been doing that with linux distro package managers for decades. Seems like common sense to me.

BillinghamJ 5 years ago | |

They largely do in various forms. Both npm and yarn, by default, record hashes of the dependencies you're using and check them when redownloading.

I think the issue tends to be more that there's just so many packages (often nested 10+ deep) and it's best practice to keep them as up to date as possible.

When it's fairly typical for a JS project to have thousands of dependencies, there isn't really any practical way to both stay up to date and carefully review everything you pull in.

I think the only viable solution for companies taking this issue really seriously is to keep their numbers of dependencies down and avoid having significant deep/indirect dependencies.

Edit: as an example, in my company's Node stack (for 10 services) - there's >900 dependencies. In our React stack (for 2 sites), more than 1600.

Contrary to what you might think, these are actually pretty small, lightweight systems. So really whatever you might have thought was the worst-case scenario on numbers of deps, the reality is more like 10x that in the modern JS ecosystem.

In many ways, the vast number of tiny dependencies are one of the strongest points of the JS ecosystem. But it doesn't come without caveats.

sp332 5 years ago | |

The package integrity would be fine in this case. The packages downloaded from PyPI would be legitimately signed by PyPI, and the internal packages would be signed by the local package server. The issue is not knowing which source to use for each package, and you'd have the same issue with not knowing which certificate to use to check them.

magicalhippo 5 years ago | |

I was thinking the same thing. Surely PayPal's packages should be signed by a certificate only PayPal has, and they would want to verify that before using their packages?

angry_octet 5 years ago | | |

If the signature reqt is attached to the package metadata, the new package just removed it. If it's part of their custom build system, what signs third party packages? Would it just sign the new one anyway, because how does it know which ones should have PayPal internal signing? Or are you proposing manual controls? shudder

Channels and priorities embedded in the package tools are a better approach, combined with something like Artifactory. Some channels might require packages are signed, and possibly monotonically versioned.

robertlagrant 5 years ago |

I've often wondered about this, even in the accidental case of someone registering a package you use internally.

And I know it's not perfect, but in Python if you use Poetry means you get a poetry.lock file with package hashes built in, so that's something.

p0d 5 years ago |

I teach and one of my students, with little IT experience, asked me last week about the security of package management. I found myself using the many eyeballs argument. It only takes one set of bad eyeballs.

It seems to me that down through the years ease of deployment trumps security. npm, mongodb, redis, k8s.

Or maybe sysadmin has just become outdated? Maybe front of house still needs a grumpy caretaker rather than your friendly devops with a foot in both camps.

We can now even outsource our security to some impersonal third-party so they can 'not' monitor our logs.

EOG # end of grump

snarfy 5 years ago |

Sadly I've had to fix this at more than one company.

It's a bit of cognitive dissonance having to explain why downloading random shit from the internet during the build is a bad idea, yet here we are.

wyc 5 years ago |

Here's the application called deptrust I submitted to the Mozilla Builders program (didn't get in :P) to address this problem space before I had to focus more on my current job. Please let me know if there are any collaborators who would like to work on this together someday!

https://docs.google.com/document/d/1EW6uSZB0_D0qZuDSGuxujuVE...

Jefro118 5 years ago | |

Hey there, any chance you want to turn this doc into a dedicated webpage? Here's a demo: https://demos.writary.app/deptrust

CGamesPlay 5 years ago |

I know that node has `package-lock.json` and `yarn.lock`, which include integrity checks. Are these checks decorative only? How could npm have been affected by this issue?

nstart 5 years ago | |

IIRC you need to use npm ci to ensure that package-lock.json is used. That said, when developing locally you are going to use npm install or npm update and update the package.json and package-lock.json files accordingly. I could be entirely off target here since I'm writing purely from memory. But there seems to be a few different ways one could trigger a pull from the malicious repo and end up with it inside the package-lock.json file

K0nserv 5 years ago | | |

No that's not correct, in fact this comment and the two sibling comments are both wrong.

Quouting from NPMs documentation[0] for npm install

> This command installs a package, and any packages that it depends on. If the package has a package-lock or shrinkwrap file, the installation of dependencies will be driven by that, with an npm-shrinkwrap.json taking precedence if both files exist. See package-lock.json and npm

Consider an example where in package.json you have `"react": "^16.11"` and this has been resolved and fixed in package-lock.json as 16.12 at a previous point in time. Running npm install will not cause NPM to install 16.14 even though it matches the pattern specified in package.json, instead 16.12 will be installed because that's what package-lock.json says.

What npm install does do, is detect if you've made changes in package.json and only then does it re-resolve the dependency. In the above example, if you changed `"react": "^16.11"` to `"react": "^16.14"` in package.json and then ran npm install the package-lock.json would be changed and version 16.14 would be installed.

Bundler and CocoaPods also work this way.

nickkell 5 years ago | | |

I feel stupid for using npm for so long without knowing about this command. In hindsight it's obvious that install was updating the lock file each time, so why should it have been any different on the ci server?

furstenheim 5 years ago |

Npm ecosystem already has the solution. Use namespaces @yelp/infra-js where @yelp is the npm user.

It's not possible for an attacker to publish on that name in the public npm

unilynx 5 years ago |

These install hooks... Why are they needed at all and why can't package (de)installation be without side effects ?

I'm sure the hooks are needed for things NPM can't do by itself, but they shouldn't run by default. That puts pressure on developers to avoid them, and puts pressure on NPM to add whatever functionality is missing from package.json in a safe way.

(and have npmjs.com search rank packages without scripts above those that do)

bagacrap 5 years ago | |

What would happen if the install hooks weren't there? You'd still have client code calling into the compromised package. Would it be possible to handle those calls without knowing the symbol names used by the internal package?

jwr 5 years ago |

I have to build some CSS libraries that sadly use npm for building. The way I approach this is through rubber gloves: I create custom docker containers with npm and a specific set of dependencies, frozen in time. This way I can at least get reproducible and reliable builds.

This doesn't mean I'm not vulnerable to dependency attacks, but it at least limits the window, because I update these dependencies very, very rarely.

pabs3 5 years ago |

For apt repositories you can do pinning by origin, which should prevent this issue.

ex3ndr 5 years ago | |

Does apt use tls by default today?

beermonster 5 years ago | | |

Apt supports TLS via apt-transport-https (as you are probably already aware) but I don't think it's default in either Debian nor (X)Ubuntu derivatives. I'd like to know why TBH.

The packages themselves are signed though, so I guess the risk is now on server authenticity as opposed to package integrity.

mbag 5 years ago |

To mitigate this kind of supply chain attacks for python, we have created following tool [1], that will check python packages on Artifactory instance you specify and create packages with the same name on the PyPi.

[1] https://github.com/pan-net-security/artifactory-pypi-scanner

joshlk 5 years ago | |

Uploading dummy packages to PyPi isn't the solution. It just pollutes PyPi and a nuisance to others.

You have always been able to specify the `index-url` when installing packages using pip. This can also be added to `requirements.txt` files as well.

mbag 5 years ago | | |

Yes, if you have packages on the artifactory the `index-url` is always a way to go. However, if you forget to specify `no-index`, you might not get what you wanted, see [1] for how packages are found. And it's easy to make such mistake when using local resources (you forget to set proxy or internal DNS, new developer is not familiar with the setup and does plain `pip install`, internal server is temporarily unreachable).

>It just pollutes PyPi and a nuisance to others. I agree, but so are the packages that are no longer maintained. You also reserve pakcage name if you decide to opensource it. Furthermore, by creating package you are leaking metadata about your organization, i.e. some functionality can be inferred from package names.

And sure you can train and try to enforce security awareness, but your people need to be right 100% of the time, while attackers need them to make only one mistake. Similar with namesquatting of the popular packages.

https://pip.pypa.io/en/stable/reference/pip_install/#finding...

seg_lol 5 years ago | |

The thing that just happened is like a catastrophic chain-reaction collision in space. Now we will have to use guids for everything. Nothing has meaning.

tantalor 5 years ago |

Like some other commenters, I too initially balked at the apparent misuse of "supply chain attack" but the linked paper provides a good definition,

A software supply chain attack is characterized by the injection of malicious code into a software package in order to compromise dependent systems further down the chain.

Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks

https://link.springer.com/chapter/10.1007%2F978-3-030-52683-...

To be clear, just calling this a "supply chain attack" and omitting "software" is going to cause confusion with traditional supply chains.

The analogy is not quite apt: in a software build system you have complete visibility into the dependency tree, so this attack is less useful, whereas with hardware suppliers you are relying on the security of your vendor.

Terretta 5 years ago | |

> The analogy is not quite apt: in a software build system you have complete visibility into the dependency tree, so this attack is less useful, whereas with hardware suppliers you are relying on the security of your vendor.

Not necessarily — plenty software still ships with the third party supply chain bits incorporated as binaries, including commercial software. User is relying on security of one or more in a chain of upstream vendors.

See Cyberpunk 2077 DLLs for instance.

https://twitter.com/CDPRED_Support/status/135660404767189811...

Cyberpunk “builds” their game with a software build system, but not all of it is them building it.

tantalor 5 years ago | | |

Yes, good point of course. Although I would be very suspicious of binary deps served through npm.

fareesh 5 years ago |

This seems to be tending towards the generic problem of permissions that we have seen previously elsewhere.

For example in the case of Facebook, it used to be that users would accept permissions without considering them, and in-turn, various apps would access their data in bad faith.

Likewise for mobile apps.

Eventually Facebook removed many of the overtly powerful permissions entirely, likewise with the mobile operating systems.

In the case of mobile, the concept of "runtime permissions" was also introduced that required explicit approval to be granted at the time of authorization.

On Android, location access now prompts the user in the notification area informing the user of an app that accessed their location.

Can some of these ideas be borrowed to the package/dependency management world? "The package you are about to install requires access to your hard drive including the following folders: x/y/z/all?

smilliken 5 years ago |

This is both a security bug and a reproducibility bug. If anyone outside your network can break your build, your build is broken! It's mission critical to have a working build.

The way Nix handles this is that every external resource is cached and hashed, and every reference to an external resource must have a hash integrity check. If someone swaps out a package on a web server somewhere, rebuilds keep working because they don't need to re-fetch (because the hash wasn't changed by an operator), and fresh builds fail with an error indicating the hash is invalid, which should trigger an investigation (in practice, this is exceedingly rare, and IMO always deserves attention).

I dream for when build reproducibility is considered table stakes like version control.

adolph 5 years ago |

I think JFrog and Azure won the prize for product placement on this one. When the article listed “Azure Artifactory” I wondered if Azure was “sherlocking” JFrog, but no, they have a partnership. Given the SolarWinds vector I expect more investment in tooling security.

SideburnsOfDoom 5 years ago |

The upstream article was posted yesterday, here

"Dependency Confusion: RCE via internal package name squatting " https://news.ycombinator.com/item?id=26081149

"Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies, The Story of a Novel Supply Chain Attack, Alex Birsan" https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...

rectang 5 years ago |

PGP signing of packages should be table stakes for publishing to a public repository. If unsigned packages are accepted by a public repository to reduce friction for newbies, such packages should be hidden by default.

Then, build tools should be configurable such that they only pull in dependencies signed by PGP keys drawn from a whitelist.

Finally, companies need to maintain private repositories of vetted dependencies and avoid pulling from public repositories by default — and this requirement needs to be configurable from the project's build spec and captured in version control.

angry_octet 5 years ago | |

If you've seen the PGP/GPG code you'll know what a trash fire it is, and if you follow its development you'll see how unfriendly the maintainers are when bugs are pointed out.

Adding dependencies on PGP just makes everything worse.

X.509 PKI for code signing is also terrible and very very complicated and error prone.

Also consider the community nature of development. You need to handle all sorts of painful crypto issues now.

konaraddi 5 years ago |

For npm enterprise, it looks like setting the scope (e.g. @acmecorp/internal-pkg) would mitigate the public and private confusion. For Verdaccio, an open source light weight npm registry, it first checks if a private package is available before searching the public npm registry (however, their best practices say to use a prefix for private packages https://verdaccio.org/docs/en/best )

fortran77 5 years ago |

I don't use npm much, but once I'm out of the initial development phase with any package manager and am "feature complete" we generally lock versions down so at least we're always pulling a specific version in.

And, of course, on production build machines, all packages are local.

This isn't just for "security" -- it's to ensure we can always build the same bits we shipped, and to avoid any surprises when something has a legitimate update that breaks something else.

0xbadcafebee 5 years ago |

My favorite supply chain attack is still the chip vendors. Even if you come up with a hardware security module in your chip to verify the code that's running on it, that can be (and has been) hacked too. Sleeping dragons could be lying in wait in billions of devices and nobody would know unless they went out of their way to do a low-level analysis.

angry_octet 5 years ago |

I've been wishing npm/pypi/apt etc would improve for ages, but it seems like infrastructure improves one disaster at a time, software one hack at a time. I'm only annoyed I didn't do it myself.

The pypi maintainer is being ridiculous, it is much better to have this guy poke MSFT than have the Russians do it, he's doing them a favour.

PhineasRex 5 years ago |

The only really shocking part of this is that Artifactory is vulnerable to this. I expect developers to be lazy about build security because I've seen it over and over again at multiple companies, but Artifactory's whole purpose is to provide secure build dependency management.

I'll be rethinking using Artifactory in my infrastructure.

HereBeBeasties 5 years ago | |

A good look at their (public) bug tracker might change your mind about how surprising this is.

NicoJuicy 5 years ago |

I used this version trick in nuget, but the other way around.

To update existing non-maintained public packages, mostly because they were on. Net framework and a lot moved to .net core.

In visual studio you can set the priority of where packages have to be checked. My own package repo has a higher priority.

I never thought about using it as an attack vector though.

wozer 5 years ago | |

I believe you can work around this attack vector .NET by referencing strong-named assemblies.

wepple 5 years ago |

Does NPM offer cryptographic hash pinning of packages the way that PyPI does?* why is this not more widely used?

* https://flawed.net.nz/2021/02/02/PyPI-Security-State/

milo_im 5 years ago |

Diffend allows you to manage the risks that come with using open-source third party dependencies by providing malware detecting security scanning and a risk management platform for your Ruby dependencies.

https://diffend.io/

soheil 5 years ago |

This brings a whole new level of awareness to package files where a simple typo can mean your machine can be rooted. From now on I'll always be terrified whenever changing any of my package.json, Gemfile or requirements.txt files.

homakov 5 years ago |

Why didn't npmjs/rubygems just check failed lookup requests for "shopify-cloud" etc and block those for a while to prevent damage, and notify the companies (doing their best)? Seems like low hanging solution.

ddtaylor 5 years ago |

Does anyone else skip reading articles on Medium because of their login policies?

3np 5 years ago |

It surprises me a bit the way they refer to in-house dependencies purely by version number. When we have internal dependencies in e.g. package.json, it's always referred to by an explicit url and git ref.

peteretep 5 years ago |

> After spending an hour on taking down these packages, Ingram stressed that uploading illicit packages on PyPI puts an undue burden on the volunteers who maintain PyPI.

I dunno, feels like fair game to me

pinacarlos90 5 years ago |

Question for you guys here:

Is this kind of attack possible using Nuget-Package manager?

maximcus 5 years ago | |

Fresh [PDF from MSFT mentioning nuget](https://azure.microsoft.com/mediahandler/files/resourcefiles...)

forecast10 5 years ago | |

Yes, it's possible. I've tried it myself.

progre 5 years ago | |

Seems like it's a possibility at least. I'd be very very interested to know as well.

stephenr 5 years ago |

And still people won’t vendor their dependencies, so changes to dependencies are never reviewed.

To paraphrase family guy: you’re making this harder than it needs to be.

technics256 5 years ago |

Can't there be a "package signature" of some sort that is specified and checked against in a package-lock.json or yarn.lock?

dininski 5 years ago | |

I'll try to answer this from a JS-specific perspective. As someone previously mentioned - you do get hash checks if you're using `npm ci` in your CI/CD setup. You get the resolution path as well. Which is all you need to reproducibly resolve dependencies, *if* you have set up npm correctly in your pipeline. It would be unlikely to be exposed to this particular attack, at least not automatically in your deployment pipelines.

However this is still very, very dangerous, because of day-to-day engineering, really. Any engineer doing a simple `npm install` can inadvertently bring in and execute malicious code from their machine. From there on out it would be somewhat trivial to gain further access to the same network the code war run from.

fernandotakai 5 years ago | |

pip has hashing signatures and i don't know why people don't use it. it's quite easy too.

https://pip.pypa.io/en/stable/reference/pip_hash/

https://pip.pypa.io/en/stable/reference/pip_install/#hash-ch...

jcims 5 years ago |

Can anyone point me at a resource where I can download the full set of packages that are in the npm registry?

jcims 5 years ago | |

using https://replicate.npmjs.com/_all_docs

claw_howitzer 5 years ago |

Did this guy just make hundreds of thousands of dollars off this single bug (type)?

cooervo 5 years ago |

this shouldn't be a problem with golang right? because it uses an id when go mod is used. I'm rusty on go since I haven't used it in over 2 years but I believe this shouldn't affect it?

malinens 5 years ago |

luckily I reserved my company's namespace in packagist few months ago. each package manager works differently and it is hard to know inner workings of all package managers

tppiotrowski 5 years ago |

Dev 1: “npm -g package_name” doesn’t work.

Dev 2: try “sudo npm -g package_name”.

megous 5 years ago |

But will anything change in the minds and hearts of developers?

sillysaurusx 5 years ago | |

Probably not. :)

omega3 5 years ago |

Less than $4k average bounty for this.

Bannednad 5 years ago |

Bullshit you would be next to julian

seniorivn 5 years ago |

solution: use nix as your package manager

ben_w 5 years ago | |

Helps in this specific case, but will not eliminate the broader issue. The broader issue is how can you trust 3rd party code to not do anything harmful, and it’s not like we can even perfectly trust our own fingers in that regard.

croon 5 years ago | | |

But the broad issue will never be resolved. The entire foundation of society is built around some trust. What has proven to work is that we trust actors who are identifiable and have something to lose. This goes with everything from your bank, to government, to name brands, to your local restaurant.

Nix package managers/repositories have a level of scrutiny to get into, and highly dedicated people in charge of. Random github repos (or npm packages) are extremely low effort/risk to set up.

Of course the former can be abused, but the incentives are at least in its favor to likely be more trustworthy. And we have to make assumptions of trust everytime we sit down in our chair or turn on our computer, or plug in a space heater. We will never get around trust, but there are differences in levels of trust and trustworthiness.

medecau 5 years ago | |

lol

has if the nix peeps are reading the code nix is wget'ing

baybal2 5 years ago |

This is why end-developer signing is essential.

This is not as amendable to CI, but that's the point.

HereBeBeasties 5 years ago | |

How does it work in practice, though? For example, create-react-app in NPM has a bajillion deps. Do I trust 8,000 keys? Which ones are OK?

I get that you could in principle namespace things (at least for package managers that support this) and insist on a small set of company-internal signing keys for those namespaces. But managing all that isn't easy and what about for package ecosystems that don't really have namespaces (e.g. PyPI, NuGet)?

baybal2 5 years ago | | |

> Do I trust 8,000 keys? Which ones are OK?

You can at least trust more 8000 developers whose keys are centrally signed, than 8000 packages thrown into signing CI tooling by who knows.

rmoriz 5 years ago | |

As you see with chrome extensions and android barcode apps, this is not a solution. Developers change or for whatever reasons can change their mind and ship bad things.

baybal2 5 years ago | | |

Not a solution to that, but certainly a big impediment for platform attacks.

trishankkarthik 5 years ago |

This is NOT a supply chain attack. Solarwinds was a supply chain attack. This is a typosquatting demonstration that happens every one or two years.

Abimelex 5 years ago | |

it's not and the article explains why

sitkack 5 years ago |

We need a blockchain for source. It is obvious and we just haven't come to terms with it yet. Then anyone can run anything provided they have the right key.