Post-mortem of this weekend's NPM incident(blog.npmjs.org) |
Post-mortem of this weekend's NPM incident(blog.npmjs.org) |
I wonder why they are only preventing republishing for 24 hours. Is there a good reason to allow a package namespace to be recycled with less than, say, a week? Is it based on the assumption that the only case where it comes up is during an incident, and 24 hours is enough time to assume an incident will be resolved? I'm curious what went in to that number.
If npm packages used a Github-style "author/package" format, name collision would never be an issue again.
(That is to say, trusting that any given named package that `npm install` downloads is what you think it is is really no different that trusting `wget https://example.com/thecode.tgz`. Even if you verify that the domain hasn't switched hands, you have no guarantee that the author's pipeline wasn't compromised, or that the author didn't add malware themselves. There's a reason Debian, Red Hat et. al. put a lot of effort into ensuring integrity of their repositories.)
They have that, and many are finally starting to take advantage of it (with babel being the most prominent with their latest version)
But this doesn't completely "fix" the problem, since the exact same conflicts can still happen with the "author" name (if someone takes "google\" there are going to be some very upset californians)
Assuming no actual users are depending on packages which are literally just spam, I don't really see an issue with reusing the names of those packages.
I don't know how you gain any kind of critical mass trying to compete against a well-established registry like npmjs, though.
Although you won't get updates without asking for them - I'm not sure that's a bad thing - you can be assured that you'll either get the package you were expecting or no package at all.
> Our first action, which began immediately after the incident concluded, was to implement a 24-hour cooldown on republication of any deleted package name.
Seems like npm might want to review the policy that allows stuff like that to happen.
Even if a user violates the spam policy (which, to be clear, it seems the affected user in this case did NOT do), that hardly seems to be appropriate grounds for deleting everything the user has ever published on npm.
That is a policy that is just begging for griefing.
That's one of the things the post mentions as what they are doing.
https://en.wikipedia.org/wiki/Joe_job
> A joe job is a spamming technique that sends out unsolicited e-mails using spoofed sender data. Early joe jobs aimed at tarnishing the reputation of the apparent sender or inducing the recipients to take action against them [...]
`In the course of reviewing and acting on spam reports, an npm staffer acted on this flag without further investigating the user and removed the user and all of their packages from the registry.`
Specifically, a policy that allows removing "all of [a user's] packages" based on something related to the user rather than on the packages themselves.
Feels like there should be a disconnect between decisions made about a 'user' and those made about a 'package'.
Once the package is published, there should be an understanding that the package belongs to npm and npm's users, even if the original publisher retains some authority over it.
And if there is cause to ban a user, it should not automatically mean that packages published by the user are affected (aside from removing whatever authority the user had).
Uh... Add in the fact that staff are now trigger happy, since a single button can do a lot of damage.
Why not infinity hours? I don't get it.
I don't understand this. Why hard delete packages at all? Soft deleting feels like it would be easier and would stop people republishing with the same name.
They could also bake their warning process for dependent libraries (i.e. "this package is gone!") into the soft delete process.
I see this acknowledgement, but I cannot find where they will remedy this by putting placeholders in place of spam removals. As a concession, maybe only placeholders for spam removals of packages that are older than X days or depended on (explicitly or transitively) by X packages. Did I miss where the remedy for this spam-removed-package-reuse was in the blog post?
Explained somewhere near the bottom of the post, basic rational is that it gives them time to notice fuckups and fix them.
Maybe they've had situations where a spammer has created a very large number (millions, billions?) of packages. It's possible that the majority of user submissions are automated spam from botnets. I would assume npm has some mitigations in place to prevent this abuse in the first place, such as rate limiting and captchas, though maybe that's not enough to stem the tide.
Though, given that they say they have humans doing the package deletion, that makes me think that the number of spam packages created can't be that high. Certainly not high enough to outweigh the risks of package name reuse. Increase your prices a few pennies a month so you can afford to store the placeholders forever.
Better to (a) ship your code with explicit version and content dependencies (e.g. a SHA-256 of the package you expect to exist), or (b) delegate trust in upstream packages to fewer, more well-vetted, authorities (i.e. distro-style).
Package signing I can't tell if it's a good idea or not. It would require you to accept a new key in this case, but that is going to happen super frequently with NPM, inducing alert fatigue, and it doesn't do anything to prevent the authors' publishing credentials getting stolen.
e.g. philipwhiuk/gnome moves to gnome/core and there's a version published to philipwhiuk/gnome that pulls in gnome/core
1. only packages > nK downloads 2. only packages with tests 3. only packages with publicly available code that is same that I am downloading 4. only packages with > nnn stars, forks, issues, pull requests
All are just proxies for being a valid node in a dependency graph.
And like to know if any transitive dep violates my inclusion rules.
For example, if Google published "Google/polymer", dependents would be unaffected if a new package was published from "badguy/polymer". I think you're saying it's still not fixed because someone could register the Google author name before the real Google gets it, but that is a understood problem across every website.
Not a problem at all. They will just get the name by force. It already happened before, google "npm kik".
The problem is a package URI being reused without me realizing, and that problem would be solved.
The idea is that the author name maps to an actual repo and should serve as a second layer of authentication outside of NPM's control. The namespace should belong to the repository owner.
No one should be able to "take" google/ since the actual Google already owns github.com/google.
Since NPM uses a centralized registry, though, some limitations would have to be imposed, and allowing multiple sources would mean allowing for conflicts between github:google/package and bitbucket:google/package. Ideally, namespaces should be unique and canonical, and the simplest way to accomplish that is just to allow publishing to a single source.