70% of new NPM packages in last 6 months were spam

70% of new NPM packages in last 6 months were spam(blog.phylum.io)

225 points by louislang 1 year ago | 111 comments

NikxDa 1 year ago |

> Contrary to what npm states, this package actually depends on one of our aforementioned spam packages. This is a by-product of how npm handles and displays dependencies to users on its website.

For me personally, this is the biggest surprise and takeaway here. By simply having a key inside package.json's dependencies reference an existing NPM package, the NPM website links it up and counts it as a dependency, regardless of the actual value that the package references (which can be a URL to an entirely different package!). I think this puts an additional strain on an already fragile dependency ecosystem, and is quite avoidable with some checks and a little bit of UI work on NPM's side.

louislang 1 year ago | |

(Full disclosure: I'm one of the co-founders @ Phylum)

We could do a full write-up on npm's quirks and how one could take advantage of them to hide intent.

Consider the following from the post's package.json:

    "axios": "https://registry.npmjs.org/@putrifransiska/kwonthol36/-/kwonthol36-1.1.4.tgz"

Here it's clear that the package links to something in a weird, non-standard way. A manual review would tell you that this is not axios.

The package.json lets you link to things that aren't even on npm [1]. You could update this to something like:

    "axios": "git://cdnnpmjs.com/axios"

And it becomes less clear that this is not the thing you were intending. But at least in this case, it's clear that you're hitting a git repository somewhere. What about if we update it to the following?

    "axios": "axiosjs/latest"

This would pull the package from GitHub, from the org named "axiosjs" and the project named "latest". This is much less clear and is part of the package.json spec [2]. Couple this with the fact that the npm website tells you the project depends on Axios, and I doubt many people would ever notice.

[1] https://docs.npmjs.com/cli/v10/configuring-npm/package-json#...

[2] https://docs.npmjs.com/cli/v10/configuring-npm/package-json#...

aragilar 1 year ago | |

This feels like the more important takeaway (and feels like an actual security bug), I'm surprised this so buried in the article...

OptionOfT 1 year ago | |

And worse, it shows axios, and links to the actual axios package.

https://www.npmjs.com/package/sournoise?activeTab=dependenci...

If it would show axios and link to the package provided in package.json, that at least would be better.

But here they actually link to the wrong package.

3np 1 year ago | |

You should think of the package metadata as originating from the publisher, not from the registry. Aside from the name, version, and (generated) dist and maintainers fields, I don't think any of it is even supposed to be validated by the registry?

Agreed the website UX is confusing and could be better but in general package metadata is just whatever the publisher put there and it's up to you to verify if you care about veracity.

pas 1 year ago | | |

the fucking website processes it and after some mighty compute somehow shits out the wrong link. it's actively making things worse by trying to be helpful.

confusing is one thing, but there's a screaming security chasm around that innocent little UX problem.

MS bought npmjs and now it's LARPing as some serious ecosystem (by showing how many unresolved security notices installed packages have) while they cannot be arsed to correctly show what's actually in the metadata?

brynb 1 year ago | | |

this is a little too stoic a take with respect to a tool that very unserious people building things for serious but non-technical people use on a daily basis. i think we should strive for more. npm can continue to exist in its very libertarian form, but perhaps there's room for something that cares a bit more about caution

mkl 1 year ago |

How about removing the incentive? Take down every package with tea.yaml in it, after say 1 month's warning, so legitimate packages trying to use it don't leave their users in the lurch. The tea protocol is clearly not going to accomplish what it set out to (see below), and is instead incentivising malicious behaviour and damaging the system it set out to support.

From https://docs.tea.xyz/tea/i-want-to.../faqs: "tea is a decentralized protocol secured by reputation and incentives. tea enhances the sustainability and integrity of the software supply chain by allowing open-source developers to capture the value they create in a trustless manner."

n_ary 1 year ago |

Why are these spam accounts not perma banned and removed?

For example, this[1] account mentioned in the article has 1781 packages of gibberish.

Also, the whole reporting process is onerous, there is a large form. Of course, gatekeeping on reporting is good, but there should be a possibility to report an entire profile of package publisher.

[1] https://www.npmjs.com/~eleanorecrockets

nixnixers 1 year ago | |

Isn't it better to leave accounts that correlate spam than to force spammers to obscure the connection by creating a new account for each piece of spam?

esprehn 1 year ago | | |

That primarily works if you can shadow ban the account. Otherwise the spam is still negatively impacting the community (ex. By polluting search results).

meiraleal 1 year ago | | |

That's not how spammers work. There is this profile with thousands and there are still hundreds of spam profiles with just a handful of packages yet. If you let them grow unchecked, they grow, exponentially. The broken Windows theory fits well here

marcus_holmes 1 year ago |

> Next, because the AI hype train is at full steam, we must point out the obvious. AI models that are trained on these packages will almost certainly skew the outputs in unintended directions. These packages are ultimately garbage, and the mantra of “garbage in, garbage out” holds true.

hmm, inspiring thoughts. An answer to "AI is going to replace software developers in the next 10 years" is to create 23487623856285628346 spam packages that contain pure garbage code. Humans will avoid, LLMs will hallucinate wildly.

forcha 1 year ago |

The Tea protocol's flawed incentive model is a disaster, effectively encouraging developers to pollute npm with spam. It's a prime example of what happens when protocols prioritize quantity over quality, compromising the entire ecosystem.

daotoad 1 year ago |

TLDR:

1. a cryptocurrency scheme for funding OSS development[1] is incentivizing spammers to try and monetize NPM spam

2. it's easy to spoof your dependencies with package.json[2]

  "dependencies": {
    "axios": "https://registry.npmjs.org/@putrifransiska/kwonthol36/-/kwonthol36-1.1.4.tgz"
  }

[1]: https://tea.xyz/blog/the-tea-protocol-tokenomics

[2]: https://www.npmjs.com/package/sournoise?activeTab=code

johnmw 1 year ago |

I was sad to read this and thought "this is why we can't have nice things."

But following the links was fun and educational:

"The end goal here [of the Tea protocol] is the creation of a robust economy around open source software that accurately and proportionately rewards developers based on the value of their work through complex web3 mechanisms, programmable incentives, and decentralized governance."

Which lead to:

"The term cobra effect was coined by economist Horst Siebert based on an anecdotal occurrence in India during British rule. The British government, concerned about the number of venomous cobras in Delhi, offered a bounty for every dead cobra. Initially, this was a successful strategy; large numbers of snakes were killed for the reward. Eventually, however, people began to breed cobras for the income. When the government became aware of this, the reward program was scrapped. When cobra breeders set their snakes free, the wild cobra population further increased."

Which lead to:

"Goodhart's law is an adage often stated as, 'When a measure becomes a target, it ceases to be a good measure.'"

patwolf 1 year ago |

I recently stumbled upon a bunch of repos which were clearly copied from popular projects but then renamed with a random Latin name and published to npm.

I reported some of them as spam, but there were hundreds of them. I couldn't figure out why somebody would waste the time to do that, but now it makes sense.

Fatnino 1 year ago |

There was a similar thing to tea a while back. I think I saw the project posted on here. Went to their github and found a typo in their Readme. Opened a pr with a correction and then they started sending me about a dollar in btc every month till they ran out of money and the project imploded.

renegat0x0 1 year ago |

I am really interested if that really matters.

Package managers often comes with rating system. npmjs has weekly downloads, pull requests, and other popularity scores.

I am layman in AI, but why would anyone think that this would affect anything, like AI? Why would anyone train on noname package, that noone uses?

Stats for spam packages can have higher-than-none stats, but that also makes them vulnerable for sweep removal of all potential spam packages, since they are connected, etc. etc.

Any credible company will not use a noname spam package, will verify their contents. That is at least what happened in all companies I have worked for.

EVa5I7bHFq9mnYK 1 year ago |

Spam is the least of the worries.

minkles 1 year ago | |

Yeah this when I see one of our pipelines pull in 300 npm packages I wonder how much we really know about what our systems do.

pixl97 1 year ago | | |

Heh, I work in a sector that works with some very large companies we all know the names of. I've seen applications that are seemingly very little code written by them but hundreds or thousands of packages/modules glued together. It is quite common that the tooling they use catch 'low reputation' packages where they've actually put the wrong package name in, then when it didn't work, add the package they needed but didn't remove the misnamed package.

Completely terrifying to me.

vb-8448 1 year ago |

I wonder what is the long term plan.

Maybe the next step is to sell the control of all these packages to a rogue entity to be used for a supply chain attack?

joeyh 1 year ago |

Tea is absolutely NOT "taking steps to remediate this problem". They are grifters and part of their grift is claiming to take steps when called out.

mikl 1 year ago |

A pox on Tea and the cryptobros that thought it was a good idea.

yas_hmaheshwari 1 year ago | |

Haha. Who needs actual useful code when you can have a million variations on a 'memecoin' generator :-)

danaris 1 year ago |

I'm fairly proficient in Javascript, but mismanagement of the ecosystem like this is a major reason why any time I see that something requires Node.js, I just turn and run in the other direction. It's just not worth the headaches.

fennecbutt 1 year ago |

I mean realistically it's representative of the Internet as a whole. Makes me wonder where all the porn packages are.

The pulling in of unexpected dependent packages is a real issue though, how do other ecosystems deal with it? NPM is really missing some level of trust beyond just using "brand name" packages.

My general judgement is usually how often it's worked on/how many downloads it has but gut feel isn't really enough, is it?