I dunno, feels like fair game to me
Is this kind of attack possible using Nuget-Package manager?
To paraphrase family guy: you’re making this harder than it needs to be.
However this is still very, very dangerous, because of day-to-day engineering, really. Any engineer doing a simple `npm install` can inadvertently bring in and execute malicious code from their machine. From there on out it would be somewhat trivial to gain further access to the same network the code war run from.
https://pip.pypa.io/en/stable/reference/pip_hash/
https://pip.pypa.io/en/stable/reference/pip_install/#hash-ch...
Dev 2: try “sudo npm -g package_name”.
Nix package managers/repositories have a level of scrutiny to get into, and highly dedicated people in charge of. Random github repos (or npm packages) are extremely low effort/risk to set up.
Of course the former can be abused, but the incentives are at least in its favor to likely be more trustworthy. And we have to make assumptions of trust everytime we sit down in our chair or turn on our computer, or plug in a space heater. We will never get around trust, but there are differences in levels of trust and trustworthiness.
has if the nix peeps are reading the code nix is wget'ing
This is not as amendable to CI, but that's the point.
I get that you could in principle namespace things (at least for package managers that support this) and insist on a small set of company-internal signing keys for those namespaces. But managing all that isn't easy and what about for package ecosystems that don't really have namespaces (e.g. PyPI, NuGet)?
You can at least trust more 8000 developers whose keys are centrally signed, than 8000 packages thrown into signing CI tooling by who knows.
That's not really true though. Nix doesn't support signed sources, there are no signatures in the package repository and in theory "John Doe" with no information can add packages and send pull-requests.
In practise nixpks is just a well moderated user repository and the level of scrutiny is less then the enterprise distros can offer.
How was that design chosen, not just once but in all 3 of those large package ecosystems. Did pypi/gems/node borrow their design from each other given their similarity in other aspects?
Are there any situations where this behavior is desired?
Does any of the other ecosystems have flaws like this (nuget, cargo..)?
I think the issue tends to be more that there's just so many packages (often nested 10+ deep) and it's best practice to keep them as up to date as possible.
When it's fairly typical for a JS project to have thousands of dependencies, there isn't really any practical way to both stay up to date and carefully review everything you pull in.
I think the only viable solution for companies taking this issue really seriously is to keep their numbers of dependencies down and avoid having significant deep/indirect dependencies.
Edit: as an example, in my company's Node stack (for 10 services) - there's >900 dependencies. In our React stack (for 2 sites), more than 1600.
Contrary to what you might think, these are actually pretty small, lightweight systems. So really whatever you might have thought was the worst-case scenario on numbers of deps, the reality is more like 10x that in the modern JS ecosystem.
In many ways, the vast number of tiny dependencies are one of the strongest points of the JS ecosystem. But it doesn't come without caveats.
Channels and priorities embedded in the package tools are a better approach, combined with something like Artifactory. Some channels might require packages are signed, and possibly monotonically versioned.
And I know it's not perfect, but in Python if you use Poetry means you get a poetry.lock file with package hashes built in, so that's something.
It seems to me that down through the years ease of deployment trumps security. npm, mongodb, redis, k8s.
Or maybe sysadmin has just become outdated? Maybe front of house still needs a grumpy caretaker rather than your friendly devops with a foot in both camps.
We can now even outsource our security to some impersonal third-party so they can 'not' monitor our logs.
EOG # end of grump
It's a bit of cognitive dissonance having to explain why downloading random shit from the internet during the build is a bad idea, yet here we are.
https://docs.google.com/document/d/1EW6uSZB0_D0qZuDSGuxujuVE...
Quouting from NPMs documentation[0] for npm install
> This command installs a package, and any packages that it depends on. If the package has a package-lock or shrinkwrap file, the installation of dependencies will be driven by that, with an npm-shrinkwrap.json taking precedence if both files exist. See package-lock.json and npm
Consider an example where in package.json you have `"react": "^16.11"` and this has been resolved and fixed in package-lock.json as 16.12 at a previous point in time. Running npm install will not cause NPM to install 16.14 even though it matches the pattern specified in package.json, instead 16.12 will be installed because that's what package-lock.json says.
What npm install does do, is detect if you've made changes in package.json and only then does it re-resolve the dependency. In the above example, if you changed `"react": "^16.11"` to `"react": "^16.14"` in package.json and then ran npm install the package-lock.json would be changed and version 16.14 would be installed.
Bundler and CocoaPods also work this way.
It's not possible for an attacker to publish on that name in the public npm
I'm sure the hooks are needed for things NPM can't do by itself, but they shouldn't run by default. That puts pressure on developers to avoid them, and puts pressure on NPM to add whatever functionality is missing from package.json in a safe way.
(and have npmjs.com search rank packages without scripts above those that do)
This doesn't mean I'm not vulnerable to dependency attacks, but it at least limits the window, because I update these dependencies very, very rarely.
The packages themselves are signed though, so I guess the risk is now on server authenticity as opposed to package integrity.
[1] https://github.com/pan-net-security/artifactory-pypi-scanner
You have always been able to specify the `index-url` when installing packages using pip. This can also be added to `requirements.txt` files as well.
>It just pollutes PyPi and a nuisance to others. I agree, but so are the packages that are no longer maintained. You also reserve pakcage name if you decide to opensource it. Furthermore, by creating package you are leaking metadata about your organization, i.e. some functionality can be inferred from package names.
And sure you can train and try to enforce security awareness, but your people need to be right 100% of the time, while attackers need them to make only one mistake. Similar with namesquatting of the popular packages.
https://pip.pypa.io/en/stable/reference/pip_install/#finding...
A software supply chain attack is characterized by the injection of malicious code into a software package in order to compromise dependent systems further down the chain.
Backstabber’s Knife Collection: A Review of Open Source Software Supply Chain Attacks
https://link.springer.com/chapter/10.1007%2F978-3-030-52683-...
To be clear, just calling this a "supply chain attack" and omitting "software" is going to cause confusion with traditional supply chains.
The analogy is not quite apt: in a software build system you have complete visibility into the dependency tree, so this attack is less useful, whereas with hardware suppliers you are relying on the security of your vendor.
Not necessarily — plenty software still ships with the third party supply chain bits incorporated as binaries, including commercial software. User is relying on security of one or more in a chain of upstream vendors.
See Cyberpunk 2077 DLLs for instance.
https://twitter.com/CDPRED_Support/status/135660404767189811...
Cyberpunk “builds” their game with a software build system, but not all of it is them building it.
For example in the case of Facebook, it used to be that users would accept permissions without considering them, and in-turn, various apps would access their data in bad faith.
Likewise for mobile apps.
Eventually Facebook removed many of the overtly powerful permissions entirely, likewise with the mobile operating systems.
In the case of mobile, the concept of "runtime permissions" was also introduced that required explicit approval to be granted at the time of authorization.
On Android, location access now prompts the user in the notification area informing the user of an app that accessed their location.
Can some of these ideas be borrowed to the package/dependency management world? "The package you are about to install requires access to your hard drive including the following folders: x/y/z/all?
The way Nix handles this is that every external resource is cached and hashed, and every reference to an external resource must have a hash integrity check. If someone swaps out a package on a web server somewhere, rebuilds keep working because they don't need to re-fetch (because the hash wasn't changed by an operator), and fresh builds fail with an error indicating the hash is invalid, which should trigger an investigation (in practice, this is exceedingly rare, and IMO always deserves attention).
I dream for when build reproducibility is considered table stakes like version control.
"Dependency Confusion: RCE via internal package name squatting " https://news.ycombinator.com/item?id=26081149
"Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies, The Story of a Novel Supply Chain Attack, Alex Birsan" https://medium.com/@alex.birsan/dependency-confusion-4a5d60f...
Then, build tools should be configurable such that they only pull in dependencies signed by PGP keys drawn from a whitelist.
Finally, companies need to maintain private repositories of vetted dependencies and avoid pulling from public repositories by default — and this requirement needs to be configurable from the project's build spec and captured in version control.
Adding dependencies on PGP just makes everything worse.
X.509 PKI for code signing is also terrible and very very complicated and error prone.
Also consider the community nature of development. You need to handle all sorts of painful crypto issues now.
And, of course, on production build machines, all packages are local.
This isn't just for "security" -- it's to ensure we can always build the same bits we shipped, and to avoid any surprises when something has a legitimate update that breaks something else.
The pypi maintainer is being ridiculous, it is much better to have this guy poke MSFT than have the Russians do it, he's doing them a favour.
I'll be rethinking using Artifactory in my infrastructure.
To update existing non-maintained public packages, mostly because they were on. Net framework and a lot moved to .net core.
In visual studio you can set the priority of where packages have to be checked. My own package repo has a higher priority.
I never thought about using it as an attack vector though.
So this probably wouldn't show up on the final build distributed and deployed somewhere. But it did manage to run arbitrary code on developers' machines of those companies.
That's what we do. Dependencies are committed to our repository, so changes show up as diffs which are looked through then.
Then again we don't use libraries for padding strings and such...
Diff inspection will catch some obviously bad things, but it will rarely catch anything clever. So it would be down to luck, if you had merged in this patch before it was spotted/announced. Unless you have something to separate the namespaces? Check for conflicts? I guess CI might work, hoping your CI machines are sandboxed.
Here is an explanation of the process under 'The current scheme for package signature checks' at the following url
https://www.debian.org/doc/manuals/securing-debian-manual/de....
This article from Agoric is extremely relevant here, from a previous such incident (re: the event-stream package on npm): https://medium.com/agoric/pola-would-have-prevented-the-even...
Put simply: in many cases, the dependencies you install don't need nearly as much authority as we give them right now. Maybe some of these packages need network access (I see a few named "logger" which might be shipping logs remotely) but do they need unrestricted filesystem access? Probably not! (They don't necessarily even need unrestricted network access either; what they're communicating with is likely pretty well-known.)
The security manager is an additional layer of security that most languages don't have, however Java applets have shown it to be full of holes and generally unsuitable for running untrusted code.
The applet security posture has contributed a great deal towards negative opinion towards the language, probably would have been better off never having existed.
In 1996, Java was being overwhelmed by exploits because the mapping of the language to the VM was not well matched. There was a Java summit with lots of interesting people. This summit was also when Sun got confirmation that MicroSoft had quite a few engineers working on an independently implemented runtime. To Sun's credit, they did get rather more serious about Java security -- but they had already created a rocky foundation.
It is my opinion, that the business model Sun had "in mind" for Java was a free runtime for everyone that they were in control of, but to make money from selling an "official" Java compiler suite.
I do not believe that the Sun Java JVM was created with security in mind.
I could be wrong, but I don't see any mention of permissions on imported code: https://deno.land/manual@v1.7.2/examples/import_export
Simply relying on package signing and the like permits trusted but malicious actors. With Deno packages configured well it can really lock down and limit a ton of attack vectors.
Here is an interesting proposal on how to possibly get there in JS with import maps: https://guybedford.com/secure-modular-runtimes
Deno uses ambient permissions for the entire process and unfortunately missed the opportunity to do it right.
1. https://hacks.mozilla.org/2019/11/announcing-the-bytecode-al... (ignore the title, it's irrelevant to the excellent explanation that constitutes 70% of the post)
The main hope at the moment seems to be JS.
I thought it was because operating systems still use access based instead of capabilities based security?
That's how the web/application server containers worked (probably still do, but I've been disconnected). The server classes have different permissions from the application code classes (loaded from the .war/etc files). If an application code method calls into a system class, the permissions which apply are those or the application since that method is in the calling stack frame.
I wrote this support into several Java web container and J2EE application server products back in the day. AFAIK, all that still works great today in Java.
e.g., If someone gives an app the ability to upload photos, it can silently read all photo metadata, upload all photos to a private server instead of uploading just the single photo that the user picked. This can be solved with OS level standard photo pickers but it hasn't been yet.
Same with package code. Maybe a package needs network access for stuff it genuinely needs to do. However it can (and probably will) at some point go above and beyond in the amount of data it collects. FB Mobile SDK outage is a good example of this. https://www.bugsnag.com/blog/sdks-should-not-crash-apps
Agoric (the company whose blog post I linked to) and the people behind it have done a ton of object capability work over the years.
POLA is good to live by regardless if it can be implemented.
Systemd has some capability to restrict access to system resources. I haven't experimented with the capabilities yet so not sure what's all there.
I've noticed more dev teams succumbing to the temptation of easiness that many modern package managers provide (NPM, Cargo, Ivy, etc.) - especially as someone who has to work with offline systems on a regular basis.
Because of that ease there are fewer tools and tutorials out there to support offline package management. There are more for using caches, though these are often along the lines of either 'the package manager will do this for you and it just works (but in case it doesn't, delete node_modules or cargo clean and re-try)', or stand up a dependency server on your own machine with these proxy settings (which has it's own security issues and is frequently disallowed by IT cybersecurity policies).
As an example, many blog articles I found a while back suggest using yumdownloader from the yum-utils package. This is unfortunately not reliable, as there are some packages that get skipped.
I have found I need to script reading a list of dependencies from a file; then for each dependency: create a directory for it, use repotrack to download its RPM and it's transitive dependency RPMs in the dependency's directory; then the script aggregates all the RPMs into one directory, removes the OS installed RPMs, uses createrepo to turn that directory into a RPM repository, and then makes an USF ISO image out of the directory for transfer onto offline system and installation.
Discovered after seeing a comment on HN about a bill of materials for software, i.e., a list of "approved hashes" to ensure one can audit exactly what software is being installed, which in turn led me to this issue.
Even then, that only gives you a stronger indication that the image hasn't been altered since it was signed by the image author at any point after it being signed. However it is not a guarantee that the source produced the binary content. It's also not a guarantee that the image author knew what they were signing - though this is a different issue.
Debian has a reproducible builds initiative[1] so people can compile packages themselves and them match byte for byte what Debian built. Not sure how far they've got with that.
https://tests.reproducible-builds.org/debian/unstable/amd64/...
[1] - https://docs.docker.com/engine/security/trust/#client-enforc...
I could easily find myself in trouble, because:
- There’s no autocomplete or bookmarks, so typos are easy.
- If “mybank” is a name provided by my company’s name server, I could find myself redirected to the public “mybank” entry because Mr. Not-A-Hacker says his name entry is more up to date (or because I forgot to tell ‘goto’ to check the company name server.)
- There’s no “green padlock” to check while I’m actively using the destination site. (Though at this point it’s too late because a few moments after I hit enter the destination site had the same access to my machine & network that I do from my current terminal.)
- A trusted site may later become malicious, which is bad due to the level of unrestricted and unmonitored access to my PC the site can have.
- Using scripting tricks, regular sandboxed browser websites can manipulate my clipboard so I paste something into ‘goto’ that I didn’t realize would be in my clipboard, making me navigate to some malicious site and giving it full access to my machine (if ‘sudo’ as added to the front).
This is just a few cases off the top of my head. If ‘goto’ was a real thing, we’d laugh it into being replaced by something more trustable.
How have current package managers not had these vulnerabilities fixed yet? I don’t understand.
At Google, we have those resources and go to extraordinary lengths to manage the open source packages we use—including keeping a private repo of all open source packages we use internally
> a unique design flaw of the open-source ecosystems
This is a big generalization.
Inside Amazon, as well as in various Linux distributions, you cannot do network traffic at build time and you can only use dependencies from OS packages.
Each library has its own package and the code and licensing is reviewed. The only open source distribution that I know to have similar strict requirements is Debian.
[I'm referring to the internal build system, not Amazon Linux]
[Disclaimer: things might have changed after I left the company]
I know, because I wrote an as yet unpublished paper on safely pulling packages from private and public repos.
Using terms correctly is especially important in security: someone who read your comment might incorrectly believe that this did not affect them because they are using the correct names for all of their dependencies.
Installing packages only from a trusted (and signed) source protects against typosquatting, misread or confusing package names and many other risks.
When I first used maven, I was appalled by how hard it was to prevent it from accessing maven central. And horrified to see karaf trying to resolve jars from maven central at run time. What a horrible set of defaults. This behaviour should be opt-in, disabled by default, not opt-out through hard to discover and harder to verify configuration settings.
Also, Maven uses pinned versions, normally, and won't just download whatever newer minor version happens to be published when it builds, which again makes this attack quite hard to pull off.
I guess it's a case of the ease of use proving too great, so convenient in fact that we just kind of swept the implications under the rug.
""That said, we consider the root cause of this issue to be a design flaw (rather than a bug) in package managers that can be addressed only through reconfiguration," a Microsoft spokesperson said in the email."
No, npm has scopes for a reason, why would that not fix this issue?
Maybe the bug wasn't explained correctly but if it prefers public over private that seems like a bug.
OTOH, it certainly is an issue that if you forget and happen to test some code without being configured to have the private package server as your default then you'd get public repos.
Maybe instead of named packages companies should be using private URLs for packages. That way you always get what you ask for?
Artifactory apparently didn't, and served up whichever was the highest version of public vs. private. Which is stupid.
But the bottom line is that when using npm, the exact package selection policy is determined by whatever registry implementation you're talking to, and so it's the registry implementation which should prioritize private packages by default.
But that's just NPM, it's an issue in all of the mentioned package managers.
That is, given a Gemfile.lock like, e.g.
GIT
remote: https://github.com/thoughtbot/appraisal
revision: 5675d17a95cfe904cc4b19dfd3f1f4c6d54d3502
specs:
appraisal (2.1.0)
bundler
rake
thor (>= 0.14.0)
How would Bundler ever try and download the `appraisal` gem from RubyGems?The Gemfile section is more explicable. While newer Gemfiles look like this:
source "http://our.own.gem.repo.com/the/path/to/it" do
gem 'gemfromourrepo'
end
# or
gem 'gemfromourrepo', source: "http://our.own.gem.repo.com/the/path/to/it"
Older Gemfiles apparently looked like the following: source 'https://rubygems.org'
source 'http://our.own.gem.repo.com/the/path/to/it'
gem 'gemfromrubygems1'
gem 'gemfromrubygems2'
gem 'gemfromourrepo'
Which seems obviously vulnerable to the dependency confusion issue mentioned.So is the understanding that Shopify's CI systems were running `bundle upgrade` or another non-lockfile operation? (possibly as a greenkeeper-like cron job?) Or is `--pure-lockfile` itself more subtly vulerable?
Nix makes it possible to query the entire build time and runtime dependency graph of a package, and because network access during build time is disabled, such a substitution attack would be harder to pull off.
The declarations for how the source is downloaded is specified declaratively and can be pinned to a specific commit of a specific Git repository, for instance.
Fixed versions for as many things as you can (including OS images, apt packages, Docker images, etc) lead to changes in your CI under your control.
Sure, you have to upgrade manually or by a script. But isn't plain build stability worth it? Not even talking about security.
So you wouldn't get a random version even considered.
Version shadowing and overriding is a totally different concern of course.
Pre and post install scripts in NPM packages are such a terrible idea. Even when it’s not malware, it usually just a nagging donation request with a deliberate “sleep 5” to slow down your build and keep the text displayed.
At least some distribution kernels do new feature backports - mostly to support new hardware on LTS versions (like e.g. Ubuntu did for the Raspberry Pi, see https://github.com/raspberrypi/linux/issues/3464).
Sure you could just "finish" Linux at 5.0 and then introduce e.g. io_uring via Linux-with-io_uring 1.0 instead of adding it to Linux 5.1. Same goes for all the libraries that add support for io_uring.
Yes, you could "finish" some software on the feature level, but you would still need to maintain it if you want to add support for new platforms, etc., or it will become obsolete sooner or later. In the case of still maintaining libraries, this would solve nothing in the context of this attack vector.
https://naildrivin5.com/blog/2019/07/10/the-frightening-stat...
https://techbeacon.com/security/check-your-dependencies-gith...
https://thenewstack.io/npm-password-resets-show-developers-n...
Not that I didn't expect someone to immediately take the opportunity to complain about npm, of course, despite it having nothing to do with the problem at hand... as has become tradition in tech circles.
Your code is what it depends on.
Using DNS for exfiltration is a nice trick but does not change much in the general approach.
sigh... am I the only one that likes environments where you can run simple commands to install stuff and you can generally trust your package managers? All the security folks love to act dumbfounded when people trust things, but post-trust environments have terrible UX in my experience. I hate 2FA, for example, because now I have to tote my phone around at all times in order to be able to access any of my accounts. If I lose my phone or my phone is stolen while travelling, I'm hosed until I can figure out how to get back in.
> So can this blind trust be exploited by malicious actors?
Yes, it can. Trust can always be exploited by malicious actors, and no amount of software can change that. And it creates a world that sucks over time. Show me a post-trust, highly secure environment that isn't a major PITA to use. And not just for computers. I'm sure you could use social engineering to abuse trust of customer service reps (or just people in general) and do bad things, and the end result will be a world where people are afraid do any favors for other people because of the risk of getting burned by a "malicious actor".
The fact that pip/npm/gem etc. look for packages in a fallback location if not found in the private repository is a terrible design flaw. One which not all package managers have.
For example, when you add a cargo dependency from a private registry, you have to specify the registry that the dependency comes from, so cargo will never go looking in some other place for that crate. I'm sure many other package managers also have designs that are not vulnerable in this way.
Similarly, many package managers do not support pinning of transitive dependencies (with hashes), or pinning does not happen by default, so that many people are still using floating dependencies.
Proof: https://www.theregister.com/2016/03/23/npm_left_pad_chaos/
Sudden unplanned loss of availability is a catastrophic security problem, the A in the security CIA[1]. Worse is that the dependency that caused that problem was something that should never have been a dependency in the first place.
Proper dependency management requires a degree of trust and integrity validation which are completely counter to automation. Most developers are eager to accept any resulting consequences because they don't own the consequences and because they are fearful of writing original code.
[1] https://en.wikipedia.org/wiki/Information_security#Key_conce...
Look at what happened when the "left-pad" function disappeared from npm a few years ago. IIRC, it broke react. The downside of package managers like this is that many people have no idea what they are using.
Anyone who uses this must have already understood and just overlooked this vulnerability when they realize their private package must have a unique name that doesn't match a public package
In which case, would you not get the same issue, if you do the same attack, but with a transitive dependency which you haven't specified?
Right, that's why we see this kind of attack all the time on Maven Central, but never on npm... oh, wait?! NO! The kind of simple attacks you see routinely on npm (typo squatting, ownership transfers to malicious authors, now this) just doesn't happen on Maven Central at all.
You have a username/password to Maven Central and you also have a private key to it.
But in order to be granted a groupID (think of it as an account), you need to prove at the time of account creation that you own the domain that matches the groupID (think account name).
So if you try to register com.foo on Maven Central, at that time you need to own foo.com, otherwise you'll be rejected.
If you do own it at that time, well your account is approved and now you have a username/password to it and a private key you need to use to sign artifacts when you publish them.
If your domain expires and is later bought by someone else, that doesn't make them the new owner of your Maven Central groupID.
I find it hard to believe any high profile organization would allow their domains to expire, or else they would also lose e-mail and websites, right?
How long did it take npm to have scoped packages. Sure, let me create a "paypal" project, they only need one js project no?
If Java suffers from excessive bureaucracy, the newer package developers/repos suffer from too much eagerness to ship something without thinking
Not to mention dependency and version craziness. If you want your software to be repeatable you need to be specific with the versions and code you're taking.
The very existence of package-lock grinds my gears and that's before it starts flip flopping because someone mistook URLs for URIs. Of course that only exists because ranged dependencies are a terrible idea, and that's before anybody even mentions things like namespaces or classifiers.
No maven wasn't perfection, and it could be (and has been) improved on - but npm doesn't even get into spitting distance.
What I want as a developer is to establish my trust relationship to developers of libraries I depend on.
`npm install <somepackage>` should first check a record of signing keys in my source code repo, then check a user-level record of signing keys I've trusted before, and then - and only then - add a tentative trust relationship if this is brand new.
`npm release` or whatever (npm is just an example - every system could benefit from this) - would then actually give me the list of new trust relationships needed, so I can go and do some validation that these are the packages I think they are.
The key thing with Go is that all dependencies have a checksum (go.sum file) and that should be committed to the repo.
So even if the domain gets hijacked and a malicious package is served up, then the checksum will fail and it will refuse to build.
People should be using internal module proxies anyway for Go. You can just store the module files in a directory, a git repo or a web service and serve up an internal cache.
Packages are typically considered immutable once published. If I have a particular package e.g. "FooLib.Bar v 1.2.3" then this zip file should _always_ contain the same bits. If I need to change those bits, e.g. to fix a bug then I need to ship e.g. "FooLib.Bar v 1.2.4"
Also packages aren't always small. So it makes sense to cache a copy locally. On dev machine "package cache" and in an org's "internal feed" and only check upstream if it's not there.
So I shouldn't need to go to the source url to get it. Ideally, I just ask "who has "FooLib.Bar v 1.2.3" for me?"
It also means that tampering can be detected with a hash.
But the "check upstream" model is now vulnerable to fake new versions.
Free software / open source propels engineering as you can share and leverage the results of collective efforts. However, at no point did the concept come with inherent guarantees about concerns such as security.
esr defined 19 points for "good" open source software development in his seminal essay "The Cathedral and the Bazaar". I feel some of those are sometimes easily thrown out of the window for the sake of "efficiency" or "cost-effectiveness".
This issue resonates with bullet point 17 in particular:
> A security system is only as secure as its secret. Beware of pseudo-secrets.
I think this issue has less to do with package managers, and a lot with companies rushing into the convenience of public code platforms such as Github without properly vetting whether or not they might be inadvertently leaking internal information through packaging manifests.
That is not true at all, the industry both Development and even more so in Operations has been outsourcing responsibility for a long time, they is why we have support contracts, SLA's and other very expensive services we pay many many times more than the cost of hardware for...
To outsource responsibility... Network down -- Call Cisco... Storage Down Call EMC or Nimble... etc
It seems dumb that they don’t have per repo tokens. I think the issue is with their licensing as if they made proper tokens users could abuse it by giving tokens to their friends. But this should be detectable in a friendly (please don’t do that) way.
I want to be able to give read-only access to private repos.
I want to be able to give fine grained function level and repo level access.
If I’m an admin on multiple repos, I want to be able to issue a token for just a single repo so I can give that to a CI job without worrying if every single repo I admin is at risk.
They allow ssh keys with some similar functionality, but ssh keys can’t be used as much as tokens.
I’ve been waiting for a story about how some third party app granted access to my whole org gets taken over and wreaks havoc. Eventually this will probably be the attack that alters real packages instead of these name overloading packages.
https://docs.github.com/en/developers/apps/scopes-for-oauth-...
https://docs.github.com/en/rest/reference/permissions-requir...
Anyway. I’m sticking with Pop / Linux. But it does make me nervous!
I'd guess distros are generally better off in that respect, but kernel space & user space aren't that different nowadays, when caring about your own security
I'm very happy to finally have a real world example to motivate all the folks that eye-rolled me every time I've raised it in the past. It just resonates better, especially with less technical leadership folks.
what could possibly go wrong?
Random libraries, possibly pulled in by a dependency of a dependency of a dependency... not so much.
curl -sSL https://dot.net/v1/dotnet-install.sh | bash /dev/stdin <additional install-script args>
See https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-in...
There are other examples I've seen from time to time.
If I'm not mistaken insider knowledge wasn't necessary.
Remember how much was temporarily broken in the leftPad event? Imagine if all that had been silently back-doored instead?
I wonder how they built this culture and if it is even realistic for smaller companies to aim for it.
I think it typically comes down to a few key leaders having the political capital/will to enforce policies like this. Google's `third_party` policies[0] were created relatively early on and were, as far as I understand, supported by high level technical leaders.
The ROI of policies like these is not always immediately evident, so you need the faith of key leaders in order to make room for them. Those leaders don't necessarily need to be high in the org chart — they just need to be respected by folks high in the org chart.
As a counterfactual, establishing Google's strong testing culture seems to have been a mostly bottoms-up affair. Good article on the history of that at Mike Bland's blog[1].
0. https://opensource.google/docs/thirdparty/ 1. https://mike-bland.com/2011/09/27/testing-grouplet.html
Fortunately there was a hard legal requirement to vet every dependency license, otherwise I am not sure I would have been able to keep this workflow. As other posts say you do need a very strong commitment at the management level for this to work, besides security (where it feels that often it matters only until it costs money or until it’s even slightly inconvenient) it might be helpful to make a legal case (what if we ship something with a nested dependency on AGPL ) to get some help to establish these procedures.
I have been writing and architecting security related software for pretty much all my career and I find it quite scary how these days so much software delegates so much control to unvetted external dependencies.
We would pay to access their ”distribution”, a limited set of packages vetted by them. Distribution vendor would screen changes from upstream and incorporate into their versions.
Of course this is more limited world. It’s like using a paid Linux distribution with certain amount of software covered by the vendors support policies.
It's also useful for your organization to rebuild all of the source code from scratch (for reproducible packages anyway) and compare the new ones to the old ones, looking for things like compiler or hardware injection attacks. Secure build systems are definitely non-trivial.
https://cloud.google.com/security/binary-authorization-for-b...
I can't. It's incredibly wasteful time and resource-wise, and ties your development process to third-party providers (and your ISP), which fall over often enough in practice.
It's a good practice to have a local cache of all the third-party dependencies you use, available to both developers and CI infrastructure.
We have that, it's called Java and .NET, but apparently solved problems aren't interesting anymore.
For a distributed company with developers from all over the globe the "local" here doesn't really make much sense. But from my experience with NPM, you download packages on your developer machine once you set up a project, and then only when something really messes up node_modules, which happens once in three months, on average.
You do re-download packages for every build in CI pipeline as you build a docker image from scratch though, and that's when NPM mirror is usually set up.
Pulling packages from the internet is fine and that's how all Linux distros work but the more important thing is signature verification, imo
At my last gig (Java), developers reviewed all third-party libraries + dependencies and manually uploaded them to a private Ivy server. I don't think that could work in the Node ecosystem, where every module seems to have 100+ dependencies.
EDIT:
There's a real security vs accessibility trade-off here. You can't be a productive web developer, according to modern standards, and review every single transitive dependency that gets pulled into your application. And it's very inefficient to have individual developers at different orgs separately reviewing the same libraries over and over again.
One would naturally turn to repository administrators to enforce stricter security standards. Maybe RubyGems could review all source code for every new version of a package and build it themselves instead of accepting uploads of pre-built artifacts. But these repositories are run by smallish groups of volunteers, and they don't have the resources to conduct those kinds of reviews. And no open-source developer wants to have to go through an App Store-like review process to upload their silly McWidget library.
I try not to think about it too much and have faith in the powers that be
The web really is held together by duct tape and bubble gum.
(I also don't think it's true that the attacker has a "small window of time"—as soon as they get a single RCE, it's over, if they're running on a normal dev machine then they can daemonize into the background, add persistence, and snoop events over time. CI systems are obviously less vulnerable to this by nature.)
However, this section is concerning:
> The presence of a source block in a Gemfile also makes that source available as a possible global source for any other gems which do not specify explicit sources. Thus, when defining source blocks, it is recommended that you also ensure all other gems in the Gemfile are using explicit sources, either via source blocks or :source directives on individual gems.
Yikes! This is yet another easy footgun for people to reintroduce this issue
pkg managers that do have that: cargo (build.rs), pip (setup.py), npm (install scripts), apt/rpm/pacman (postinstall hooks)
Maybe the only exceptions are Go and Java package managers?
The reason is simple because without it you can't properly bind to system libraries.
And even without, the supply chain attack still works against at least developers as packages are not just build but also run, often without any additional sandbox. (E.g. you run tests in the library you build which pulled in a corrupted package).
The main problem here are not build scripts (they still are a problem, just not the main) but that some of the build tools like npm haven't been build with security but convenience as priority and security was just an afterthought. For example npm did (still does?, idk) not validate if the packag freezing file and the project dependencies match so you could try to sneak in bad dependency sources.
Also for things which are classical system package managers (i.e. not build tools) like apt/rpm/pacman it build scripts really does not matter at all. The reason is that what you produce will be placed and run in your system without sand-boxing anyway, so it's a bit different then a build tool which is often used to build binaries (installers, etc.) at one place and then distribute them to many other places.
Edit: Another attack vector is to bring in a corrupted package which then "accesses" the code and data of another package, this could use speculative pointer accesses or similar but in languages like Java,Python, JavaScript you often can use reflections or overriding standard functions to archive this much more reliable.
Such 'nagging donation requests' were banned by npm pretty much days after they first appeared, IIRC, and npm itself is literally a tool for installing code to execute later, so there's no security issue here. If someone wanted to embed malware into a package, they wouldn't need postinstall scripts for it.
This is really a complete nothingburger.
What does "banned by npm" mean? Here's an example from the source of the latest version of nodemailer (with 1.4M weekly downloads) sleeping for 4,100 ms on every install so that it can show a "Sponsor us to remove this lag" message: https://github.com/nodemailer/nodemailer/blob/a455716a22d22f...
> and npm itself is literally a tool for installing code to execute later, so there's no security issue here. If someone wanted to embed malware into a package, they wouldn't need postinstall scripts for it.
It's fine to have a standard mechanism for postinstall steps. It should be opt-in by the end user rather than opt-out. That way people know that they're running additional code and ideally selectively pick which packages are allowed to do so. The vast majority of packages do not need it anyway as they do not have C++ bindings or need to generate data.
The defaults for NPM are such that you have to know quite a bit of how NPM works to download a package and inspect the contents without executing random code.
> This is really a complete nothingburger.
It's defensive in depth. With the default being to execute remote code, a single typo could be installing a package that immediately runs malware.
And yes, you want to sandbox the install too anyway, but it at least needs permissions enough to do its job, i.e. interact with the network somehow. (Although I’m working on a tool to make that fully deterministic so it can never exfiltrate anything.)
There’s also the possibility that there’s no “execute” step at all, like installing a dependency tree just to inspect source, or in theory being able to skip auditing unused code paths.
Sandboxing code is hard, regardless of what language, runtime or operating system approach you use.
Very useful for your own images that you publish in your registry though.
I have scoped, private packages in `@myscope`. I set up my `npmrc` with `@myscope:registry=url/to/my/private/repo`. I just checked, and if I try to install `@myscope/commmon-library`, when it's not found on our repo, it will fail, because `npm` has associated `@myscope` with one and only one (private) registry.
The only hiccup is that if I'm a new developer, and I haven't made this entry in my user-level `npmrc`, and I'm not using an existing project with a `.npmrc` in the project root, then it will try to hit `@myscope` on the public repo. But if I don't have the registry for `@myscope` configured at all, my actual dependencies won't work, so I should notice that right away.
If nothing else, I suppose the takeaway is to grab the group scope for whatever private scope you choose on the public repo before somebody else squats on it. Still, at least for NPM, this seems like a solved problem, you just have to implement the existing solution correctly.
[0]: https://lists.nongnu.org/archive/html/gpsd-dev/2021-01/msg00...
Scroll down to contribution activity. Forked a repo, created a pull request, etc, on Feb 1st 2021.
They offer PGP signatures too!
If you know your computer was compromised. Shut down and reinstall from a backup, you don’t try and clean it.
Edit: I’m assuming the attacker would be replacing a dependency with an empty repo since they don’t know the actual source code. If they know the interface the dependency is supposed to provide, it could spread across the entire organization before anyone noticed.
> I would hope most SSH keys are password-encrypted
TBH, these are probably weak passwords for convenience.
Applications I encounter use a ridiculous amount of outside tooling to do relatively simple work, and that's where I see dependencies explode most often.
When I work on package managed, env managed projects, every other day is a new environment issue, configuration issue or version mismatch. It's all a colossal waste of time. Had the project just chosen a handful of comprehensive tools and committed them to the repo, there would be no dark rituals of configuration and package management to perform. The code is in the repo, the code works with itself, life is good.
The gist is:
Rule #1: All instance variables must be private
Rule #2: No mutable static state or statically accessible authority
Rule #3: No mutable state accessible across thread boundaries
Off the top of my head I remember both Microsoft (hotmail.co.uk) and Foursquare forgetting to renew their domains.
URL references also contain the version number, typically an immutable Git tag reference. They also benefit from just needing to download the source code that's referenced and not the entire package. With Deno you can also optionally host versioned 3rd Party packages on their deno.land/x CDN.
URL references are also cached locally on first use and can be used offline thereafter.
They do this to address the shortfalls of modern conventions like small packages, continuous release cycles and dependencies nested several layers deep.
So if you were using the internal package FooLib.Bar v 1.2.3 and an attacker posts FooLib.Bar v 1.2.4 to a global repository, anyone using auto-updating will update to it.
Of course, mitigation is needed. Supply chain attacks are a hot topic after SolarWinds.
But identifying a package version solely by a url doesn't seem like the right abstraction to me. IMHO, the metadata is more structured: Name (text), version (SemVer) and also maybe now fields to verify and mitigate these attacks: content hash, source feed, etc.
Even if I run an internal feed that transparently proxies and caches the public one, as well as hosting my company's internal artefacts, the rules now might need to be different between packages?
for e.g. between Newtonsoft.Json (new versions always originate on the public feed, never locally) and "SDCo.GeneralUtils" (new versions always originate on the local feed, never upstream)
The only one I can remember was against Microsoft for forcing an upgrade to Windows 10, which wasn't (IIRC) a warranty claim but a bait-and-switch issue.
For distribution, js and ruby use rubygems and npm to host packages. If a developer wants to verify that the package hosted on npm is the same code being displayed and worked on by contributors on github, they need to pull down both sets of code and then either run a checksum or compare line by line to verify the code matches up. Malware or a nefarious package owner could slip in unexpected code into the package before shipping it to the package host, leaving the github version without the changes. No typo-squatting needed.
Just because some form of the source code is published to Github, doesn't mean its the same code that is hosted on npm or ruby gems.
Yet, reviewing hundreds of thousands SLOCs (across different languages) and also checking legal compliance requires significant skills, time and efforts.
As an individual, you cannot justify reviewing the entire dependency tree across all your projects.
Thankfully you can rely on the packages reviewed and built internally by your colleagues - or use a Linux distribution that does thorough vetting.
I think there are supply chain attack vectors in those resources
Maybe I'm a dinosaur, but I was taught a long time ago to use a dependency only if there was no other feasible alternative, and to thoroughly know every dependency you bring in as if it were your own code, and treat it as such. Because at the end of the day, you're shipping that code to the customer and stamping your name or your company's name on it. It's your reputation on the line, not the person behind the StudMaster69 GitHub account.
Even if it is a small library that has only a single maintainer the chances of you replicating it in a day seem slim to me unless it is truly trivial, or the library was also written in a day.
More likely you get a day in and realize that the problem has a whole bunch of gotchas that you didn't anticipate and the library maintainer already found and dealt with.
Again, this is only if the problem isn't truly trivial
Some dependencies are too large to rewrite yourself - most statistical suites would fall under this definition - and while accepting their direct code might be acceptable, it's not usually feasible to fork their code and rewrite the parts that aren't. Lots of smaller parts you could write yourself quickly add up.
What looks elegant as a concept "we just have a graph of dependencies and automatically pull that in" quickly becomes an unmaintainable nightmare and consequently into a huge attack vector.
I...think you might have bigger problems going on there. You're tryingto throw a tech solution at a problem that is fundamentally human in nature.
That tends to leave nobody satisfied.
Unfortunately, package signing does nothing to protect against the threat vector presented here. The authentication system in npm is working fine. The problem is we put too much trust in software from the internet.
Alice has a thing. Bob had a thing that Alice figured would make her life easier so integrates it without looking too hard at it. Alice didn't reallize that by adding Bob's thing, something Alice wanted private was no longer the case even if her primary use case was solved.
The technical solution is making Alice's thing include a really onerous to configure permissions framework that takes the work of getting a thing set up and increases the task list from program thing to program and configure permissions for thing.
The human solution is to realize you don't know Bob from Adam, or his motivations, and to observe what Bob's thing actually does. Then depending on criticality, remake something similar, or actually take the time to get to know Bob and see if he can make what you want for you under some sort of agreement that facilitates good business and trust all around. You can't be sampling for malicious changes in real-time, so it's all about risk management. The issue in our case, is a lot of these projects are essentially gifts with no active attention paid to them after a certain point. It's a variant of cargo cults. You want this thing? Go here, get that, presto. Businesses, developers, (and their exploiters) like that. The price though is that once a project is abandoned, and the rights transferred to someone you don't know, you have to rerun your risk management calculation again.
The thing people should be worried about is all the PHB's (pointy-haired bosses) who just got ammo for their NMIH (Not-Made-In-House) cannons now that supply chain attacks are becoming increasingly visible vectors for attack.
Just use name spaces. foo.com/yaml instead of yaml. NPM way of doing things is/was just insane, with no regard for trust or security. No wonder NPM corp then went into the business of selling namespaces, AKA selling trust...
https://central.sonatype.org/pages/requirements.html#sign-fi...
If only people creating new package managers would bother to spend an hour or two learning prior art.
With npm, you can only add "trust" to npm itself LOL: https://docs.npmjs.com/about-pgp-signatures-for-packages-in-...
What a joke.
No, no... We should move fast and break things. We can implement this in a week because the old dinosaurs are too close minded to implement these things.
Today, the hardware is cheap and network is reliable. No need for any safeguards or security features.
/s
That said, it's interesting to me that several people are trying to get the project to drop the Web of Trust, and focus on code reviews. I'm the exact opposite - the code reviews are an interesting, experimental approach, but I'm interested in the project because of the cryptographic Web of Trust. Any use of dev-originated code signing in a package ecosystem is great. For this reason, I'd love for this to get major pickup from Rust, and beyond.
Finally, I am a bit wary because the project is starting to look moribund. It's important for projects like these to know that the maintainer is in it for the long haul, even if there's initially very little adoption. When the project founder writes that they're in a "fight for survival", it makes me think they may abandon the project if it doesn't get significant adoption.
Technically you can create one new GitHub account per repo and generate a token for that... But that is highly annoying :)
They need to support IAM / RBAC style policies and tie every authn+z method to those policies, but my guess is they have different auth methods strung all throughout their codebase so implementing it will take a few years. Then of course they have to make it "user friendly" as we all know how painful IAM can be...
GitHub is terrible.
A support contract allows you to hold a sub-contractor accountable. But that's the extent of what an SLA does. What it doesn't do is diminish your responsibility towards anyone who relies on the services you provide yourself. These are distinct things.
Put more succinctly, if the network's down: that's still very much your problem. Especially if you over-promised 100% availability to the end users of your services. Your end users do not care about Cisco, EMC or Nimble. They don't have contracts with any of those. They have a contract with you and they can and will hold you accountable if you don't deliver on what you sold them.
for a sysadmin the customer is "the employer" and they do not really have a contract with the sysadmin, rather the employer has contract with Cisco, or Nimble, etc. the sysadmin has "outsourced" his/her responsibility in that context.
For example instead of rolling your own storage device using linux, or freenas or something else, you buy an expensive 3rd party solution with expensive support contracts to outsource their responsibility. If it goes down "I have a support ticket open with vendor" instead of "I am attempting to merge the lastest kernel patch I have downloaded from github"
That is the source of the phrase "No one ever got fired for buying Cisco" or insert name of other large vendor. They do not get fired for it because they have outsourced their responsibility
An employee/employer relationship is defined by a few key properties. As an employee, you sell your time and your expertise to your employer, and you agree to submit to the authority of your employer in exchange for a salary. The extent of your responsibility - and this is absolutely key - is captured in your contract.
It also means that many things simply aren't your responsibility to begin with, even though you deal with them on a day-to-day basis.
As a systems administrator you, quite likely, won't get fired for failing Cisco gear or services because you're not the one who ultimately signs off on the contract with Cisco on behalf of your employer. Responsibility has always resided with the executives who cover and sanction your actions and decisions.
An executive, though, usually won't get fired over failing Cisco gear/services itself, but they will get fired over any ensuing loss of revenue, damage to brand/image, litigation over exposed liabilities,...
A great example of this is President Harry S. Truman who famously had a sign on his desk stating "The buck stops here".
https://en.wikipedia.org/wiki/Buck_passing
As for the systems administrator, your role is to actively engage in day-to-day operations. You're basically hired "to keep the lights on". Whether the proverbial "light" was procured from Cisco or handcrafted in-house is inconsequential to your employer as far as your individual role as an employee is concerned.
Of course JS isn't a great language for this. A malicious program could spider the object graph looking for something valuable. You would have to be very careful to keep these objects hidden. And a container library would have huge amounts of access with it needs none. (For example if you want to store a hashmap of open sockets)
A stronger typed language like Rust or Haskell could do better, as you container library can be prevented from casting T to File. However even that is not enough as you can just manually cast a pointer if you somehow know what type it is. (And there is a small amount of reflection that can do this even in safe code).
Deno can provide extra syntax or annotations for imports to allow the dev to explicitly allow permission per-import. These can be in the source code, or in a config file.
Let's say I'm using a `left-pad` function that someone else wrote, and I'm using a system in which modules aren't granted any authority except what you give them. If I then call
left-pad('foo', 5')
...I don't really have to worry that it'll go rummaging around my filesystem for my private keys and exfiltrate them somewhere. After all, I didn't give it access to my filesystem or the network! (Side-channel attacks notwithstanding, things get thorny real quick at that point.)Now, you still have to worry about the function not doing what being correct - it might return an empty string, it might go into an infinite loop, etc - but you've tremendously reduced the scope of what this module could do if the developer were malicious.
the only good differentiator right now that could definitely be implemented in nodejs directly is the the flag `--allow-net=<domain>`. it could prevent data exfiltration but requires the whole stack to require this flag.
> FUNDAMENTALS-5: Minimise the number of permission checks Java is primarily an object-capability language. SecurityManager checks should be considered a last resort.
(Note: quite a lot of Java’s standard library is not designed along object-capability lines so you should take this advice with a pinch of salt).
[1]: https://www.oracle.com/java/technologies/javase/seccodeguide...
It's probably worth posting a quick refresher. The system is old but people don't use it much these days, and the documentation isn't that good. At one point I wrote a small JavaFX PDF viewer that sandboxed the PDF rendering code, to learn the system. I lost the source code apparently, but the hard part wasn't coding it (only a small bit of code was required), it was learning how to configure and use it. I tested the sandbox by opening a PDF that contained an exploit for an old, patched security bug and by using an old, vulnerable version of the PDFbox library. The sandbox successfully stopped the exploit.
Fortunately the Java team still maintain the sandbox and via new technology like the module system and GraalVM, are reinforcing it. In fact, GraalVM introduces a new sandboxing technology as well that's simpler to use than the SecurityManager, however, it's also probably less appropriate for the case of blocking supply chain attacks.
Java's internal security is based on two key ideas:
1. Code that can protect its private state. When the SecurityManager is enabled and a module is sandboxed, it isn't allowed to use reflection to override field or method visibility.
2. Stack walks.
Let's tackle these backwards. Imagine it's time to do something privileged, like open a file. The module containing the file API will be highly privileged as it must be able to access native code. It will have a method called read() or something like that. Inside that method the code will create a new permission object that represents the permission to open files under a certain path. Then it will use AccessController, like this:
FilePermission perm = new FilePermission("/temp/testFile", "read");
AccessController.checkPermission(perm);
The checkPermission call will then do a stack walk to identify the defining module of every method on the stack. Each module has its own set of granted permissions, the access controller will intersect them to determine what permissions the calling code should have. Note: intersection. That means if any unprivileged code is on the stack at all the access check fails and checkPermission will throw an exception. For example, if an unprivileged module registers a callback from a highly privileged module, that doesn't work: the low privileged module will be on the stack and so the privilege is dropped.Access control contexts are themselves reified as objects, so instead of doing a permission check immediately you can 'snapshot' the permissions available at a certain point and use it later from somewhere else. And, starting a thread copies the permissions available at that point into the new thread context. So you cannot, in the simple case, elevate privilege.
Stack walking and permission intersection is slow. It was optimised a lot in Java 9 and 10 so the performance impact of enabling sandboxing is much less than it once was, but it's clearly not zero overhead. Therefore the JVM provides other techniques. One is the notion of a capability, known from many other systems. Instead of doing a permission check on every single file read (slow), do it once and then create a File object. The File object allows reading of the underlying native file via its private fields. Whoever has a pointer to the File object can thus read from it. Because pointers cannot be forged in Java, this is secure as long as you don't accidentally lose your pointer or pass it to code that shouldn't have it.
Sometimes you need to wrap a privileged operation to "dilute" it somehow. For example, imagine you have a module that allows arbitrary socket access. You also have an HTTP client. You would like the HTTP client to have network access, but for it to be usable by other modules that should only be able to contact specific hosts. Given what I've described so far that wouldn't work: the highly privileged code that can do native calls would do a stack walk, discover the unprivileged module on the stack and throw an exception. But there's a fix: AccessController.doPrivileged. This is kind of like sudo. It takes a lambda and truncates the stack that's examined for access checks at the point of use. Therefore it allows a module to use its own assigned permissions regardless of who is calling it. Of course, that is powerful and must be used carefully. In this case the HTTP client would itself check a different, HTTP specific permission. If that permission passed, then it would assert its own power to make arbitrary network connections and go ahead and use the lower level API.
There are a few more pieces but they aren't core. One is the class called SecurityManager. This is the most famous part of the API but in fact, it's no longer really needed. SecurityManager simply delegates to AccessController now. Its API is slightly more convenient for the set of built in permissions. For the purposes of understanding the design you can effectively ignore it. The SecurityManager needs to be activated using a system property as otherwise, for performance reasons, permission checks are skipped entirely at the check sites. Beyond that it can be left alone, or alternatively, customised to implement some unusual security policy. Another piece is the policy language. Permissions are not intrinsic properties of a module in the JVM but rather assigned via an external file. The final piece is the module system. This isn't relevant to the sandbox directly, but it makes it easier to write secure code by adding another layer of protection around code to stop it being accessed by stuff that shouldn't have access to it. After a careful review of the old JVM sandbox escapes from the applet days, the Java team concluded that the module system would have blocked around half of them.
So as you can see the design is very flexible. There's really nothing else like it out there, except maybe .NET CAS but I believe they got rid of that.
Unfortunately there are some pieces missing, if we want to re-awaken this kraken.
The first is that modules have no way to advertise what permissions they need to operate. That has to be specified in an external, per-JVM file, and there are no conventions for exposing this, therefore build tools can't show you permissions or integrate the granting of them.
The second is that some code isn't sandbox compatible. The most common reason for this is that it wants to reflectively break into JVM internals, for example to get better performance. Of course that's not allowed inside a sandbox.
A third is that some code isn't secure when sandboxed because it will, for example, create a File object for your entire home directory and then put it into a global public static field i.e. it doesn't treat its capabilities with care. The module system can help with this because it can make global variables less global, but it's still not ideal.
The final piece is some sort of community consensus that sandboxing matters. Bug reports about sandboxing will today mostly be ignored or closed, because developers don't understand how to use it and don't see the benefit. It's fixable with some better tutorials, better APIs, better tooling etc. But first people have to decide that supply chain attacks are a new thing that matters and can't be ignored any longer.
Not sure if this is something Java enables, but in principle you could do this in a capability-style way as well. Let's say you have an HTTP client module that you want to allow another module to use, but only to make requests to a specific host. You could write a wrapper with a subset of the HTTP client's functionality, only including (for example) a send() method that would send an HTTP request to the specified host. You'd then pass that to the module that you want to be able to make HTTP connections (rather than the raw HTTP client), and provided your wrapper object doesn't expose functionality from the underlying module that would let a client specify arbitrary hosts, you're in a pretty good spot.
The problem with Java’s stack walking is that it is too complex and too easy to find privileged code that can be coaxed into performing unintended operations. There are plenty of old write ups of Java sandbox bypass bugs due to this, eg http://benmmurphy.github.io/blog/2015/10/21/zdi-13-075-2013-...
No one gets fired for using npm, you might get fired for insisting you build your own dependency management system because npm is insecure rather than working on your team's domain problem.
Most developers are eager to accept any resulting consequences because they don't own the consequences and because they are fearful of writing original code.
This can't change bottom up. Even if you went the professional licensing route you'd need top-down regulation to force companies to only higher vetted and licensed professionals, and to actually do verification of projects to make sure all your best practices were being followed and following up on penalizing developers who weren't.
If you yank a package with cargo, it doesn't break people's builds who already depend on that package, it just makes it harder to add new dependencies on the package.
Again, developers don’t care because they don’t own the consequences of such monumental failures, which is why they will happily and frequently repeat this deliberate mistake until they are terminated.
So yeah, there might be a "trusted security reviews with payments" shaped technical solution. I'd love to see someone flesh that out - that sounds like a potential solution to this problem (unlike developer-signed packages).
By bringing licenses into it, you push for a business relationship first, but discourage further toolmaking. Programs are math. Rederivation and application should really be the norm, but can't be if we're drawing boxes around arrangements of symbols and saying "Do not cross."
It's the weird contradiction at the core of what we do as software people that still keeps me scratching my head. We all run to make a hydrant to mark, then try to make rent extracting business around it instead of maximizing the number of variants of hopefully practical and efficient ways to allow everyone else to solve their own problems.
I'm not against people being able to make a living doing what they love, but the incentive structure seems all out of jibe with what I understood to be the overall goal.
Or something. Still wrapping my head around it I guess.
For example what if you have a callback library that calls a function that does IO? What if you pass an IO function directly as a callback? (For example File.close) If it is the file where the call is textually written how do you handle dynamic calls? (or are they forbidden).
I think the capability model is probably the right one here.
I don’t know how many GitHub orgs the Linux Foundation has, but... hundreds? Having one bit account with wide permissions is a non starter
Also, having the domain doesn't make it available on Maven Central. You need to apply to have your domain become a registered groupID on it. This is a manual review process. They validate your domain through TXT verification to make sure the requester to create the group is the domain owner. Then they look to make sure the library is packaged to it. And finally there's a check that the groupID isn't too similar to any existing ones in name, especially to popular ones.
This generally takes 3 to 7 days to get approved.
Once you have a groupID you can release many libraries under it, you don't have to go through that process again.
Now from the user side, things are simpler too, because every lib has a groupID ownership and the lib name. Similar to how on GitHub you have owner/repo.
So it's much easier for me as a user not to confuse org.apache/Log4J with org.malware/Log4J
And like I said, even if someone owned the domain apoche.org they most likely wouldn't get approved to register org.apoche on Maven, because the name is too similar.
It still isn't fool proof admittedly. But it seems much harder to manipulate. And especially if you're a careful user, much easier to trust the source. As long as you got the groupID correct, it's signed and validated. And you can be sure that what you found on apache.org is going to be org.apache on Maven.
Finally, even if the domain changes hands, it doesn't matter. You won't be given access to the Maven repo. Access is given to the Maven user account who registered the group. All you need is to own the domain when you create your groupID. Now if someone transfers their Maven user/pass to a malicious users or become malicious themselves you're still at risk.
Also, I believe there is an appeal, again manually reviewed, like in case you believe your account was stolen, where if you can prove that you own the source repo and/or domain and all they might reinstate you.
But also artifacts are signed, so if your account gets stolen, the thief would need to steal your signature too so it can publish malicious artifacts to the Maven repo.
Oh yes, the differences is necessarily explained by Maven design being better, and absolutely not because there are two orders of magnitude difference in usage between these two systems…
There are supply chain attacks in Maven Central too[1], but it's not gonna make the front page of HN…
[1] a quick ddg search, just last month https://securityboulevard.com/2021/01/sonatype-stops-softwar...
To be noted, it makes this kind of attack more difficult, but not impossibile.
Especially the mix public/private artifacts. I guess it will force a lot of companies to at least lock their groupId on maven central, if they never bothered to do so.
Also, it's cute how you think maven is used orders of magnitude less.
Sure, the build systems probably won't CONSTANTLY be redownloading all the modules like NPM does, instead they keep a cache, but come on.
npm design was so bad that you could at the beginning upload over an existing version of your package name and break dependencies retroactively even to people that pinned versions.
if you want to try some good old whataboutism, at least try to be in the same ballpark.
Maven was (yes, I'm using the past on purpose) not a panacea that later system failed to equal: it has the usability of an IRS form and never gained as much popularity in the Java world than npm in the JavaScript one for that reason. In 2014, last time I did Java for work, the main security feature against supply chain attack was: “we are getting .jar files individually and not using maven because it's a fucking mess”
[1]: not implementation, which is what's make npm arguably a pile of shit
So let's imagine a REST API library that needs disk and network access to do it's work. How do I know it doesn't abuse my permission?
And what about its transient dependencies?
Sure, something needs to have permission to use the higher level of privilege. On your typical POSIX OS, your program is probably born with the ability to create arbitrary TCP/UDP sockets by default; on a capability OS, maybe you've explicitly provided it with access to your network stack. Regardless, at the entry point to your program you presumably have modules providing arbitrary network access in scope somehow.
If I'm understanding correctly, the case you described is that you have an HTTP client module that you'd like to have direct access to the network, but you'd like to restrict the consumers of the HTTP client to only querying certain hosts. From the start of your program, you'd instantiate an HTTP client (passing it a capability to use the network interface) then instantiate one of those HTTP client proxy objects that only allows communication with one host (passing it a capability to use the HTTP client). From there, you pass the capability to that proxy object to the unprivileged consumer of the module.
This seems to work without any kind of stack walking authentication logic, just normal variable scope, provided the language is capability-based. Am I missing something?
Indeed, if you look at the history of Java sandbox escapes they are largely confused deputy attacks: some privileged code source can be tricked into doing something it shouldn’t do.
You're imagining a system where there's no specific authentication system for code. Instead in order to use a library, you need to explicitly and manually obtain all the capabilities it needs then pass them in, and in main() you get a kind of god object that can do everything that then needs to be progressively wrapped. If a library needs access to a remote service, you have to open the socket yourself and pass that in, and the library then needs to plumb it through the whole stack manually to the point where it's needed. If the library develops a need for a new permission then the API must change and again, the whole thing has to be manually plumbed through. This is unworkable when you don't control all the code in question and thus can't change your APIs, and as sandboxing is often used for plugins, well, that's a common problem.
There's no obvious way to modularise or abstract away that code. It can't come from the library itself because that's what you're trying to sandbox. So you have to wire up the library to the capabilities yourself. In some cases this would be extremely painful. What if the library in question is actually a networking library like Netty? There could be dozens or hundreds of entry points that eventually want to open a network connection of some sort.
What does this god object look like? It would need to hold basically the entire operating system interface via a single access point. That's not ideal. In particular, loading native code would need to also be a capability, which means any library that optimised by introducing a C version of something would need to change its entire API, potentially in many places. This sort of design pattern would also encourage/force every library to have a similar "demi-god" object approach, to reduce the pain of repeatedly passing in or creating capabilities. Sometimes that would work OK, other times it wouldn't.
The stack walking approach is a bit like SELinux. It allows for a conventional OO class library, without the need for some sort of master or god object, and all the permissions things need can be centralised in one place. Changes to permissions are just one or two extra lines in the security config file rather than a potentially large set of code diffs.
Now all that said, reasonable people can totally disagree about all of this. The JVM has been introducing more capability objects with time. For example the newer MethodHandle reflection object is a capability. FileChannel is a capability (I think!). You could build a pure capability language that runs on the JVM and maybe someone should. Perhaps the usability issues are not as big a deal as they seem. It would require libraries to be wrapped and their APIs changed, including the Java standard library, but the existing functionality could all be reused. The new libraries would just be a thin set of wrappers and forwarders over pre-existing functionality, but there'd be no way for anything except the god object to reach code that'd do a stack walk. Then the security manager can be disabled, and no checks will occur. It'd be a pure object capability approach.
It's going to keep getting worse until a) developers and project managers realize doing inherently unsafe things is bad and b) they have the resources to give the additional ongoing levels of scrutiny. I'm not hopeful that this will happen at large in the industry, though I know it _is_ happening within individual companies and projects.
I'm sure we'll mitigate the damage to some extent by making package managers smarter and implementing finer-grained permissions. That will improve the situation over time, but it also takes us in the wrong direction by allowing us to forget that when we're shipping dependencies, we ultimately own their behavior.
And I'm not really arguing against vetting your dependencies or improving dependency management. I'm just saying in the real world, that if I made this particular imperfection in software development practices my hill to die on at work, there's a 99% chance it is not good for me or my career. So my options are, swim with the tide knowing we're doing things imperfectly, or fight an uphill battle for a more perfect world knowing that unless we avoid some major vulnerability every other Javascript developer falls victim to, there will be many eyes in my office staring over at me wondering if my extra caution is really worth the company's investment. If I keep my job at all.
I want to write great software, but to do that, I need to actually have a job writing software. And until I get a job at Google or Facebook or Amazon (none of those being places I've ever actually applied to) I am generally working in conditions without the resources to do the kind of dependency vetting we're talking about in this thread.
You could also treat supply chain attacks on software dependencies like another IT security risk your company is exposed to (just like virus infection, ransomware attacks, phishing, etc) and go through the same thinking (and if appropriate other) processes to manage them. The company can then make a conscious decision on whether it's worth investing in mitigating, eliminating or accepting the risk.
There's lots of information out there on dealing with cyber security risks, e.g. https://www.ncsc.gov.uk/collection/risk-management-collectio....
(Apologies if this is all obvious, I'm just trying to highlight an alternative approach which might help you deal with the dilemma and not have to "solve" it all by yourself)
Do you actually get to do this wherever you work? Honestly it would be great to have the luxury of that kind of patience and time to invest in my work. But it's universally unrealistic in my experience.
This is not at all a question of "what would be the ideal or perfect scenario." This is a question of what's pragmatic and politically accomplishable in most work environments.
You're right that java.io.File isn't a capability. It just represents a file path with a few utility methods to list files, and therefore does a stack walk when you try to access the filesystem. A FileChannel is a file capability in the sense I meant above, because it represents an opened file, not a path. There's an access check once, when it's opened, and then the rest of the time there aren't any stack walks.
It's a pity that Ben Murphy didn't write up all his bugs. There are only two listed there. A few patterns cropped up repeatedly in the old sandbox escapes:
1. Accessing internal code you weren't meant to have access to. Often, some sorts of privileged pseudo-reflection API. Fixing this is the goal of Jigsaw.
2. Serialization acting as a back door, allowing the internal private state of classes to be tampered with. Serialization security has been improved with time and they're now working on a feature that will make it harder to screw this up, by allowing serialization to use normal constructors instead of this ad-hoc form of reflection.
3. Overly general frameworks that allowed attackers to construct nearly arbitrary programs out of privileged objects by chaining them together (this crops up in gadget attacks too). There's probably no platform level fix for this, people just have to be aware of the risks when working in sandboxed context.
I don't think a pure capability language is workable, to be honest. At least not at a large scale. In the purest sense you need a god object passed into the start of your program which vends all possible capabilities, and every library would require the user to construct and pass in all resources it needs externally, including things it might possibly need. And that code isn't easily modularised because you can't just use helpers provided by the library itself: that's the very same code you're trying to sandbox. There's no good way to make that usable or extensible. The combination of stack walking and capabilities lets you find the right balance in terms of API design between simplicity of use and sandbox simplicity.
* The (now defunct) E programming language: http://erights.org and Joule: https://en.wikipedia.org/wiki/Joule_(programming_language) * Pony: https://www.ponylang.io * Monte: http://www.monte-language.org * The Midori project at Microsoft: http://joeduffyblog.com/2015/11/10/objects-as-secure-capabil...
It's actually not at all unworkable to use object-capability for large programs. In fact, one of the main benefits of ocaps is how well it aligns with well-established good software design principles such as dependency injection, avoiding singletons, avoiding global mutable state, and so on.
I'd like to see a more modern attempt that wasn't as totally obscure as those other languages. However, nobody is doing that.
You don't need to do this. There are a variety of ways to handle this, just as you would any other kind of dependency injection:
1. Design libraries to actually be modular so that dependencies (including capabilities) can be injected just where they are needed.
2. Pass in a factory object that lets the library construct sockets as and when it needs them. You can then enforce any arbitrary checks at the point of creating the socket. (This is much more flexible than a Java policy file).
3. Use a powerbox pattern [1] to allow the user to be directly asked each time the library attempts to open a socket. This is not always good UX, but sometimes it is the right solution.
> If the library develops a need for a new permission then the API must change and again, the whole thing has to be manually plumbed through.
Capturing permission requirements in the API is a good thing! With the stack walking/policy based approach I won't know the library needs a new permission until some library call suddenly fails at runtime.
OK, so in a pure capability language how would you implement this: program A depends on dynamically loaded/installed plugin B written by some third party, that in turn depends on library C. One day library C gets a native implementation of some algorithm to speed it up. To load that native library requires a capability, as native code can break the sandbox. However:
1. You can't change the API of C because plugin B depends on it and would break.
2. You can't pass in a "load native library" capability to plugin B because you don't know in advance that B wants to use C, and if you did, B could just grab the capability before it gets passed to C and abuse it. So you need to pass the capability directly from A to C. But now A has to have a direct dependency on C and initialise it even if it's not otherwise being used by A or B.
Stack walking solves both these problems. You can increase the set of permissions required by library C without changing its callers, and you don't have the problem of needing to short-circuit everything and create a screwed up dependency graph.
With the stack walking/policy based approach I won't know the library needs a new permission until some library call suddenly fails at runtime
You often wouldn't need to. What permissions a module has is dependent on its implementation. It's legitimate for a library to be upgraded such that it needs newer permissions but that fact is encapsulated and abstracted away - just like if it needed a newer Java or a newer transitive dependency.
Now, I'm a little outside my area of expertise due to not having worked with capability systems very much yet. (There aren't that many of them and they're still often obscure, so even just trying to gain experience with them is difficult at this point.)
But that said... in an ideal capability system, isn't the idea that native code could just break the sandbox also wrong? I would imagine that in such a system, depending on another module that's running native code would be just fine, and the capability system's constraints would still apply. Maybe that could be supported by the OS itself on a capability OS; maybe the closest thing we'll get to native code for that on our existing POSIX systems is something like WASI[0].
> You often wouldn't need to. What permissions a module has is dependent on its implementation. It's legitimate for a library to be upgraded such that it needs newer permissions but that fact is encapsulated and abstracted away - just like if it needed a newer Java or a newer transitive dependency.
If our goal is to know that the dependencies we're using don't have more authority than they need, isn't it a problem if a module's permissions may increase without explicit input from the module's user (transitive or otherwise)?
[0] https://hacks.mozilla.org/2019/03/standardizing-wasi-a-webas...
> isn’t it a problem if a module’s permissions may increase without explicit input from the module’s user (transitive or otherwise)?
Exactly right.
[1]: https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/
The modules permissions can't increase without explicit input e.g. changes to the policy file. But the person who cares about the sandbox integrity is the user of the overall software or computing system. The plugin developer doesn't really care how the API is implemented or what permissions it needs. They just want it to work. The person who cares is the person who owns the resource or data an attacker may be trying to breach.
I thought IRS forms were hard?
I have more than a decade of experience collected from work with CDs (Delphi), downloaded libraries (Delphi, Java, PHP), Ant (Java), Maven (Java), PEAR (PHP), Composer (PHP), Nuget (.Net), NPM/Yarn (Javascript/TypeScript) and Gradle (Java/Kotlin).
Two of these have been somewhat easy to work with for as long as I used them: Maven and Yarn. I hear NPM us usable now, but it absolutely wasn't good early on.
> “we are getting .jar files individually and not using maven because it's a [...] mess”
It seems obvious from your writing that you either worked in a place that was really serious about security or had no clue. Both could result in this conclusion, but based on your writing my bet is on the latter, i.e. they were clueless.
Edit:
Reading your comment here: https://news.ycombinator.com/item?id=26091480
I have concluded that it was a mix of those two: the people you worked for were trying really hard to be really serious about security and failing to automate it.
The proper way to audit your dependencies is to run an in-house Maven repository server. Just like you would for npm, or any package repository really.
So you just spin up Sonatype Nexus, proxy the repositories you trust and disallow releases from being overwritten. That way you're certain the jar you're using today, is the exact same as the one you'll be using years from now.
Alternatively, if you have a vendor who ships their jars manually, you can just vet them, then upload them to your in-house Maven repository and have your developers pull them in as usual.
Can you explain what exactly you mean here, because the way I read it realistic time estimates for the projects grew by weeks pr day which probably means
- I misread
- you wrote something you didn't mean
Anyways it sounds like something was way off and I have worked on some projects with Maven and other systems.
> “we are getting .jar files individually and not using maven because it's a fucking mess”
That seems odd and a bizarre edge case. Nobody worked like that with Java projects, and I bet nobody does today either.
Many of us still prefer Maven.
Having to deal with other people's artistic Gradle files works wonders when it comes to understanding the goodness of standards.
Another example: SBT was supposed to be the "savior" in the Scala world and... I still have nightmares about SBT from a couple of years ago. Maybe it's finally become usable and intelligible in recent years.
Gradle for example still uses Maven for its dependency artifact repository. So Maven is still the standard for Java.
It was a common development practice in these entities (I was working as a contractor, for different customers), most of them have been using computer programs at there core long before the internet went mainstream.
Another thing that strikes me as odd in your comparison: those customers you worked with wouldn't have used javascript+NPM either, since it has all of the problems of Maven and external deps and then some! So what exactly are we comparing then?
It sounds like the processes in use to do this may have been pretty crappy in the organizations you've been with, but it also sounds like it would take less time than implementing a dependency's functionality from scratch in most cases you'd want to pull in a dep.
In any security conscious organization the only way to pull dependencies is from a local trusted repository. And the only way they get placed there is through a review process.
The Java approach is nice because it avoids any need for DI. DI is not a widely accepted pattern. There are no DI engines that would have any support for this kind of policy-driven injection. And whilst popular in some areas of software like Java web servers, it hardly features in most other languages and areas, there are no programming languages with built in support for it and that includes modern languages like Kotlin. DI engines meanwhile have changed how they work pretty radically over time - compare something like the original Spring XML DI to Guice to Dagger3. Plus, DI is awkward when the dependency you need isn't a singleton. How would I express for example "I need a capability injected that gives me access to the $HOME/.local/cache/app-name directory"? Annotation based DI struggles with this, but with the AccessController it's natural: the component just requests what it needs, and that's checked against a policy, which can be dynamically loaded from a file, or built by code.
You are confusing DI frameworks (often terrible) with the general concept of dependency injection, which is in fact extremely widely used.
We (small shop) don't vet the code of all of our dependencies, since we simply don't have the manpower. But we do run nexus to have the guarantee of version pinning. So something that is fine today will be fine in 5 years.
Because otherwise the workload should shrink fast after one has vetted the dependecies for a couple of packages?
beside, companies that care usually also have a database of previously cleared packages, so one can reduce one own work/delays by picking from the approved deps list.