twitter/the-algorithm

twitter/the-algorithm(github.com)

374 points by johns 4 years ago | 380 comments

johns 4 years ago |

Will Norris who works on OSS at Twitter posted this[0]: "watch this space https://github.com/twitter/the-algorithm"

[0]: https://twitter.com/willnorris/status/1518694675909013504

bsimpson 4 years ago | |

Oh hey - I didn't know he'd left Google. He was a big part of open source there too.

willnorris 4 years ago | | |

same team. different company.

lupire 4 years ago | |

did the repo ever exist or was he joking, or teasing?

FR10 4 years ago | | |

I just checked and its gone. But when it did exist it was just an empty repo no README even.

shrimpx 4 years ago |

I don't understand the concept of open-sourcing "the algorithm".

First of all, "the algorithm" is probably hundreds of thousands of lines of code, including all the tedious boilerplate like cache policies and multi-AZ logic.

And second of all, doesn't the algorithm include machine learning components, which are trained on terabytes of data? That data will likely be impossible to open source. And open sourcing the neural nets without the training data is mostly meaningless from a transparency perspective?

owaislone 4 years ago | |

Open sourcing is in this is not about the implementation or the CS algorithm. It is more about transparency. I think the idea is that the public should know how tweets are ranked, why tweets show up in timelines and which timelines, what makes tweets popular etc. Imagine Google publishing a document detailing how their system ranks pages aka publishes internal SEO rules officially. I don't know if it is a good idea or not. People with enough resources might be able to game the system (if they don't do it already).

Traster 4 years ago | | |

The only change is that people with resource would stop guessing how to game the system, and start employing people to ensure they systematically game the system.

rootusrootus 4 years ago | | |

> if they don't do it already

That's an interesting point. A practical description of the algorithm from the perspective of someone trying to game it may be more useful than anything Twitter or Google would release.

WORMS_EAT_WORMS 4 years ago | |

Probably right but I think trying to be transparent is better than not trying at all.

Gesture carries weight to the users too.

Not sure any big company has tried this before. I could be wrong, but either way looking forward to it / FWIW hope it catches on.

ProAm 4 years ago | |

Or is it six cats in a basement with a laser pointer and a mouse?

The point of releasing it is to let people know exactly why they see the tweets they do in the order they do. I hope Elon just goes back to time base ordering of tweets.

LordDragonfang 4 years ago | | |

I personally know a FAANG employee whose full time job is building tools to try and help understand why the company's recommendation algorithm picks the things it does (and more specifically, predict how changes to the algo will affect that).

Even the people who build these systems barely know what the algorithm is going to do, much less why. It will be a herculean task to try and convey that to an average user.

nthot 4 years ago | | |

Disclaimer: I'm pretty new to twitter, so I may be misunderstanding something. On my Twitter Home Screen there are three little stars on the upper right of the central container which allow you to toggle between "top tweets" (ie. 'The Algorithm') and "latest tweets" which is the time based ordering.

buu700 4 years ago | | |

Maybe it's a room full of lava lamps.

jonas21 4 years ago | |

This is pretty standard in the machine learning world. You'll open-source the code and weights trained on a public data set (these are often licensed specifically for non-commercial use). But in production, you'll be using different weights trained on a proprietary data set.

vijaybritto 4 years ago | |

The whole point of the repo seems to just show that there's nothing called as "the algorithm" It's probably something like 100s of 1000s of algorithms doing different things

threeseed 4 years ago | |

You can open source the model code.

And developers will be able to train a model using it on a subset of Twitter data. Just that the quality of the outcome won't be the same as having the full set of Twitter data.

thorgutierrez 4 years ago | |

If it's too complicated, there is a good chance Elon will ask to simplify it until it can be open sourced.

mrkramer 4 years ago | |

Ever heard of Pseudocode which happens to be human-readable? If they are really going do it they will release source code for programmers and in general computer scientists to analyse and on top of that they will release pseudocode which non-techincal people can somewhat understand.

mrintellectual 4 years ago |

This seems to be a practical joke by a Twitter engineer as opposed to an actual release.

somishere 4 years ago | |

Could you take it any other way? I mean obviously you could, because most here appear to be taking 'the-algorithm' very seriously .. but, seriously? It's funny. No joke.

johntb86 4 years ago | |

It seems more like a symbolic gesture.

hunterb123 4 years ago | |

Or a commit is being prepared? It's only 20 minutes old.

mrintellectual 4 years ago | | |

It just seems unlikely that the algorithm would be open-sourced right after a deal for Twitter is agreed upon (but before it actually goes through). I've never seen a buyout of this scale done by an individual, but I imagine the SEC and several other parties will need to be involved.

At the minimum, I would make a private Github repo first, add all relevant commits, and then make it public once there's actually content.

kaladin-jasnah 4 years ago | | |

https://twitter.com/elonmusk/status/1518677066325053441 states that Elon Musk is to looking "make the algorithms open source."

transitivebs 4 years ago |

Is this supposed to be a joke? It's clearly an empty repo.

Either this is a mistake, or this is a really, really misguided attempt at a joke from Twitter.

koboll 4 years ago | |

The tweet announcing it was captioned "watch this space":

https://twitter.com/willnorris/status/1518694675909013504

Which seems like a promise they intend to actually open source something there.

seaman1921 4 years ago | | |

Who is that guy even ? Is he even associated with twitter ? Calling it an announcement seems like a stretch

transitivebs 4 years ago | |

For anyone wanting a non-empty version, check out my article on how Twitter's algorithmic feed works from last week https://transitivebullsh.it/oss-twitter-algorithm-part-1

aeyes 4 years ago | | |

How you _think_ it works from 1000 miles above.

evandale 4 years ago | |

It wouldn't surprise me. It wouldn't be the first joke about him and his buyout.

https://twitter.com/TwitterComms/status/1511456430024364037

dmonitor 4 years ago | |

It was created 20 minutes ago. Maybe a WIP?

transitivebs 4 years ago | | |

For a company of Twitter's scale and resources, any public repos are supposed to go through legal to clear (this is how things work at every FAANG I've worked at).

So if it was a WIP, it'd be a private repo until it's ready to release publicly.

Irishsteve 4 years ago |

Many people have commented that it is empty. However what they do not realize is that there has never actually been an algorithm and that is why it is empty.

axg11 4 years ago |

I've worked on very large scale recommendation systems at a FAANG. If Twitter's system resembles anything like ours, the concept of publishing or open sourcing "the algorithm" doesn't make sense.

Even if we were to open source all associated code and publish all related documents it would be very difficult to make sense of the entire system. That is precisely why companies such as Twitter A/B test the hell out of everything. What most people think of as "the algorithm" is a complex system that receives many inputs (maybe hundreds) and has dependencies on many other internal Twitter services. Tweets likely pass through multiple filtering steps as well as scoring before you ever see them. Each of these steps is highly contextual, depending on: location, past tweets, verification status, etc. You can attempt to predict the effect of a certain change, but you never know the actual outcome until you test it.

I think what will ultimately happen is that _some_ details will be published. Elon will parade that around as a victory for free speech as Twitter is now more "open". In reality, nothing of value will be gained as "the algorithm" isn't a simple function.

mrkramer 4 years ago |

Imagine having something like this for Google's and YouTube's algorithms; $100bn+ SEO industry would go bankrupt or at least they would pivot to some sort of advising but there wouldn't be the mayhem that we have today.

bigfudge 4 years ago | |

Scientific knowledge is written down and freely available but I still don’t understand most of it. I think a public algorithm would increase SEO business if anything because it would get more effective once the bullshit was debunked.

sjtindell 4 years ago | |

Results would also become an absolute cesspool. Say what you want about how they are now, but if the people gaming it could see the exact rules, it would become completely useless.

CincinnatiMan 4 years ago |

Makes me wonder how Twitter employees internally are handling the news. If they are celebrating or commiserating?

mostlysimilar 4 years ago | |

Twitter Locks Down Product Changes After Agreeing to Musk Bid - https://finance.yahoo.com/news/twitter-locks-down-product-ch...

brink 4 years ago | | |

Looks like a wise move to keep potential destructive protests at bay.

memish 4 years ago | | |

> Twitter imposed the temporary ban to keep employees who may be miffed about the deal from “going rogue,” according to one of the people.

What could a rogue employee do?

thejackgoode 4 years ago | | |

Are there precedents of this? This looks like there is not much trust internally. I doubt this was the first Elon’s decision

temp8964 4 years ago | | |

Thanks! Good to know.

I was actually wondering some people may want to remove traces of what they have been doing.

I wish someday we can see the internal communications lead to the Hunter Biden laptop story ban.

philjohn 4 years ago | |

Wonder if their RSU grants have a clause that converts them to cash in the event of a takeover.

And having been in a company that was taken over, it's a mixture of emotions - is my job safe, will this be the same culture I joined for etc. etc.

bogomipz 4 years ago | | |

>"Wonder if their RSU grants have a clause that converts them to cash in the event of a takeover"

This is interesting question since RSUs are a big part of total comp but how are unvested RSUs dealt with when the stock is retired? Are those put on a future cash comp schedule? And if so at what conversion rate?

syshum 4 years ago | |

I am sure there is a small set of Vocal employee's blasting internal communications with the end of the world messages, a majority of employee's just wanting to keep working and get their salary, and a different small set of employees hoping Elon purges the first set of employee's making the work place better.

coffeeblack 4 years ago | |

There are probably some secret celebration.

paulpauper 4 years ago | |

I am sure those with in the money options are happy

datalopers 4 years ago | |

Sell RSUs and get out.

standyro 4 years ago |

I tried to make a pull request already, haha.

error forking repo: HTTP 403: The repository exists, but it contains no Git content. Empty repositories cannot be forked. (https://api.github.com/repos/twitter/the-algorithm/forks)

My thoughts:

- Explicit rules for temporary and permanent bans

- Edit button

- More fun and thoughtful conversations like HN

- Less thought bubble Brooklyn based reporters, less VC and side grind hustle snake oil, maybe more comedians and memes?

semitones 4 years ago | |

Care to explain "thought bubble Brooklyn based reporters"? Did you choose Brooklyn for the alliteration? Or is there something about my home I should know about?

JohnWhigham 4 years ago | | |

Large amount of blue checkmark journalists live there, simple as that.

standyro 4 years ago | | |

It was just a joke. I have many friends in Brooklyn. I just really dislike the culture of Twitter addicted journalists who see the world through a myopic lens of whatever is trending on Twitter is important to cover, and there’s quite a lot of them in New York.

jdrc 4 years ago |

I know the algorithm i use, it ends with ORDER BY date DESC.

edouard-harris 4 years ago |

Assuming Twitter is serious about publishing their feed algorithm [1], it's possible they're merely anticipating the EU's upcoming Digital Services Act which was finalized over the weekend. Among other things, the Act will compel large platforms to "make the working of their recommender algorithms (used for sorting content on the News Feed or suggesting TV shows on Netflix) transparent to users." [2]

Twitter's EU user base is probably [3] above the 45 million threshold that triggers the strictest transparency requirements under the Act. So perhaps they figure if they're going to be forced to disclose anyway, they might as well do it proactively.

[1] If it's even coherent to talk about their feed ranking system as a single algorithm — see the other comments in this thread.

[2] https://www.theverge.com/2022/4/23/23036976/eu-digital-servi...

[3] https://www.statista.com/statistics/242606/number-of-active-...

nighthawk454 4 years ago |

Seems weird to start as a non-private repo until there's some content. Also bit of an unusual name. Can't tell if this is internal trolling or the future

pavon 4 years ago | |

The Twitter board has unanimously approved Musk's purchase, pending approval by stockholders, and Musk has stated that once he owns the company they will open source the "algorithms" for transparency.

So not a troll, but yes it is odd to put up an empty repo, and announce the repo before there is anything in it.

madeofpalk 4 years ago | | |

Seems like a perfect troll/joke, especially given everything.

extheat 4 years ago | |

They probably have their own internal Git hosting service. Pushing to the public Github repo could be done at a later point, but the Git repo hasn't actually been created yet here, just the project on Github.

nickysielicki 4 years ago |

Surely you guys don’t think that twitters sorting algorithm is already factored out into its own repo. Of course it’s empty.

That doesn’t mean it’s a joke, I see it as a show of goodwill — that there are a handful of people inside Twitter that are excited for transparency and for a revenue model that isn’t entirely based on ads, that are excited to get to work on this right away.

pddpro 4 years ago |

Does it remind anyone of Po and the Dragon scroll from Kung Fu Panda?

rickreynoldssf 4 years ago |

Wait until Musk finds out its a bunch of gnarly PHP 5.4 code much of which is a black box everyone is afraid to touch.

Sirened 4 years ago | |

The Timeline is actually just a SQL expression with 500 sub-queries

paxys 4 years ago |

I'm going to guess some engineers at Twitter with Github org permissions are having fun with the "release the algorithm" discussion.

Sirened 4 years ago | |

haha that's what I figured, someone decided to quit if Musk acquired Twitter and figured they'd just leave one last practical joke

Barrin92 4 years ago |

whatever will show up in this repo, I hope people realize that depending on what data you put into some algorithm you can get whatever output you want, and twitter is never going to (and neither can or should they) publish everyone's personal information and interaction on the site.

So I'm not sure what the ultimate point of this exercise is other than producing faux-transparency.

bigfudge 4 years ago | |

I still think there could be lots of interest, even without the data. In fact some of the most interesting parts would be the algorithm in the broadest sense — how tech interacts with company policy and SOP. For one small example, what aspects of moderation/banning happen automatically without any further human intervention?

xena 4 years ago |

I can't believe they missed the chance to make it a rick roll. Such a wasted opportunity.

unethical_ban 4 years ago |

There are elements of their algo that I think should be openly defined, and perhaps there should be some regulatory branch that reports to Congress that has full access. However, obfuscation is often necessary to countering bad actors.

LegitShady 4 years ago | |

>perhaps there should be some regulatory branch that reports to Congress

I think only if you offer twitter users the level of first amendment protection they'd expect with a government body. Otherwise reporting to congress would be an a bold faced circumvention the first amendment. Twitter is a privately held company with no need to report to congress.

readbeard 4 years ago | |

On the other hand, wouldn't open sourcing the algorithm help accelerate the identification of possible exploits?

unethical_ban 4 years ago | | |

"algorithm" here isn't some fancy, hard to debug code. It is the business logic of weighting tweets and how recommendations are made.

There is great opportunity to abuse this by Twitter, yes. There is also a lot of money to be made. But in defense of some of that being secret, is the fact that any publicly known ruleset (with no hidden exceptions) _will_ be exploited by bad actors. Imagine if search engines told spam sites exactly why their site dropped in page rankings.

newbamboo 4 years ago |

The government, at federal, state and local levels, all rely on Twitter to conduct official taxpayer funded work. Taxpayer funded work should not happen on proprietary systems that operate with zero oversight or public transparency.

Elon polled Twitter users about this and the response was overwhelmingly in favor of open source and transparency. Everyone on Twitter got a vote.

If you oppose transparency, as many now are, you lose your credibility. So it’s another one of Elon’s people hacks, and look at all the morons falling for it.

EMIRELADERO 4 years ago |

Kind of unrealistic but I hope Twitter now open-sources not only the algorithm but also the Rails monolith itself. Would be kind of interesting to see how everything is done

influx 4 years ago | |

The rails monolith is long gone.

EMIRELADERO 4 years ago | | |

In that case the same applies for the microservices

kaladin-jasnah 4 years ago | | |

That gives even more reason to open-source it, right? If Twitter isn't using that codebase anymore.

qgin 4 years ago |

I have literally no idea how a "twitter algorithm" could be published on github. Maybe I've been doing recommender systems wrong.

solenoidalslide 4 years ago | |

You can publish data flows and models to github in addition to source code components.

bpodgursky 4 years ago |

I'm very technical and I think it would still be valuable to have a list of all the things that weight into the timeline view, even without the models or underlying data.

Like, there's no public admission right now of whether "shadow banning" or "ghost banning" is even officially a thing!

Some transparency seems unquestionably more powerful than none, and we can work from there.

yabones 4 years ago |

There is something vaguely threatening about this.

rvz 4 years ago |

Perhaps Twitter will be the new Mozilla if it decides to open-source 'everything' then.

Maybe that is where it is going.

holtkam2 4 years ago |

I don't get it ¯\_(ツ)_/¯

Traster 4 years ago |

At the time of posting, Will Norris (the open source lead at twitter, admin of their github account presumably) posted this. It has 44 retweets, 193 likes, 17 quote tweets, on github it has 1.6k stars.

That seems... bizarre to me?

______-_-______ 4 years ago | |

That's more stars than I'll ever have on any of my repos. Maybe I made a mistake not joining a faang

enw 4 years ago | |

Nope. People are just excited that the Twitter cesspool might finally improve.

sakopov 4 years ago |

I agree that there is no such thing as "the algorithm." It is Twitter in its entirety. And with that I have a wild question. Can Musk make Twitter fully open-source on GitHub?

g105b 4 years ago |

Can someone explain this to me? All I can see from this link is an empty GitHub repository. Not sure what I'm missing here.

tux1968 4 years ago | |

It's just a declaration of intent at this point.

topspin 4 years ago | | |

It's 404 today.

threeseed 4 years ago |

Anyone who actually uses Twitter already knows the algorithm:

* Chronological - reverse sort by date

* Home - for all of the followed topics, recommended topics, retweets and tweets in the past day determine the estimated level of engagement, include the highest and reverse sort by date. This is likely to be a fairly basic ML model.

It will be uncontroversial, technically unsophisticated and of no practical use to anyone - users, developers or researchers.

This is not going to be PageRank where some genuine new insight was discovered.

whimsicalism 4 years ago | |

What are you basing this on? I wouldn't assume response prediction is using a "fairly basic ML model," it can be a lot more than that.

threeseed 4 years ago | | |

If it's a simple, constrained problem where the number of available features is low then inherently the complexity can never be high.

I've built hundreds of models and run a ML company and I don't believe it's technically possible for this rule not to be the case.

tomcam 4 years ago | |

You forgot the gigantic amount of human intervention required to unperson people who tweet against Twitter‘s political interests

root_axis 4 years ago | | |

What does "unperson" mean?

Synaesthesia 4 years ago |

So nobody is being shadowbanned or suppressed?

dimgl 4 years ago | |

Last I heard, Twitter has internal tooling that allows moderators to shadowban or suppress.

hazb 4 years ago |

"The algorithm" could mean a lot of things. Whatever it means, it probably spans hundreds or even thousands of services. That doesn't mean it cannot be made open-source.

I imagine they'd probably start with documentation and white-papers that communicate "here's how we intend for it to work".

It's seriously unlikely anyone in Twitter knows actually works how any non-trivial algorithm in the company works. To figure THAT out, they could decide to do a company-wide documentation and instrumentation push like they probably would've had to do for GDPR anyway, which is painful and boring and going to take a very long time.

Failing that, they could just say 'the algorithm as it stands is no longer fit for purpose, given part of its core requirement has become that it needs to be transparent and publishable, and presumably legible. We need to make a new one. Publish the core algorithm. We probably won't deploy it in that exact state, it's going to span multi-services and so on, you obviously don't get the data we used to train the models, but we will work backwards from it and here's an open mechanism to measure how true-to-form it actually is'

tmaly 4 years ago |

I could see GPT-3 being added in the empty space.

minroot 4 years ago |

Why do we want to know the "algorithm"?

bee_rider 4 years ago | |

Clearly it will contain a straightforward bias against our pet interest, confirming a grand conspiracy and validating our paranoia.

zelon88 4 years ago |

if ($has_blue_checkmark) show_post_to($everyone);

rzarate 4 years ago | |

Ugh, php...

qudat 4 years ago |

I’ve spent the better part of a decade writing open source projects for few to see. An empty repo gets hundreds of stars immediately. It’s all a popularity contest.

drnonsense42 4 years ago |

Apples are red. The sky is blue. Twitter shadowbans and tinkers with who sees who. I wonder what the old guard will do with the codebase over the next few months.

LugarOS 4 years ago |

It's empty.

heffer 4 years ago | |

It's a performance.

enahs-sf 4 years ago | |

Did it get taken down already? I think open-sourcing the algo would materially change the value of the deal.

lhoff 4 years ago | | |

Twitters main value is its user base (especially including most journalists, "stars", politicans, companies,...)

I would say the algo itself is worthless, but my estimate would be nowhere near even 1B$.

baisq 4 years ago | | |

Taken down by whom? It was posted on the official github account of twitter.

a-dub 4 years ago |

it's probably just a ripoff of pagerank with a separate spam filtering and banning system along with an army of contractors manually fixing it up.

if twitter is a game, sinking $43bn into it is kinda like winning or losing the grand final boss level. (unclear which)

wish elon would get back to facilitating the building of useful things. we still don't have a great clean energy generation story.

TrapLord_Rhodo 4 years ago |

Musks first order of business?

LegitShady 4 years ago | |

clean house

oxplot 4 years ago |

Musk has repeatedly talked about "open sourcing" twitter's algorithm. Given Musk is (understandably) super impatient, this repo may be his first move. I expect this to start with bunch of readme and other high level docs and evolve into details and eventually code.

12ian34 4 years ago | |

Seems like the most reasonable take. This move feels in-character for Musk.

asd88 4 years ago |

#drama?

4e530344963049 4 years ago |

Nice, making it much easier to game!

arthurcolle 4 years ago |

is this performance art?

ArtWomb 4 years ago |

It was all in your head ;)

NaturalPhallacy 4 years ago |

Not "the algorithm", but you can check if twitter is silently suppressing your account here: https://taishin-miyamoto.com/ShadowBan/

u1tron 4 years ago |

It's already gone.