Who could possibly have predicted that happening?
Predictably, everyone started talking in Slack like their jobs depended on it. Everyone was responding to everything. Instead of writing out a complete message and pressing enter, they'd send each fragment of the sentence as a new line.
The Slack leaderboard was never shown again. Unfortunately the habit remained because people were afraid they were going to be secretly judged by how much Slack activity they generated.
I expect the same thing is going to happen at companies who had token leaderboards. Once you've instilled that fear in people, they internalize the expectation.
Insanity
No amount of "this isn't used for anything" will change that. It's inherent in human nature in the 21st century to believe any and all metrics will be used against them, and therefore must be gamed.
It's why you also have to set UNBELIEVABLY clear goals and have incentives tied to those goals. Incentives meaning money. If you want to measure things, measure them. But have clear, consistent, and meaningful goals tied to bonuses or something if you want a thing done correctly.
> Oh wow! If I paid for this myself I would have spent a lot of money! Are other people spending as much as me? I’m going to create a leaderboard!
> Oh no, my misinformed manager is using the leaderboard as a slight of hand for work. I need to game this now.
Then the leaderboard is banned… I can’t see how this ever really goes up the chain beyond director.
Charles Goodhart :-)
He started being drastically more serious into AI in 2022, and 2023 and he has nothing to show for it.
Heck, he could have rented GPUs the way Elon did at this point and either mended the bleeding or stopped it, not sure how many he has, but it beats losing this badly.
If he doesn't wake up and learn how to business, I suspect he will lose his empire he's built up for himself.
"Meta building cloud business to sell excess AI capacity, Bloomberg News reports Meta building cloud business to sell excess AI capacity, Bloomberg News reports"
https://www.reuters.com/business/meta-sell-excess-ai-computi...
Everyone except the executives who get paid millions to predict exactly that.
It's a hard job, someone has to not pay consequences for bad decisions.
people who make it to managers tend to have bozo tendencies & are yes men.
before it was lines of code, Jira tickets closed. Now it's tokens spent.
The subscriptions are for personal use not enterprise.
i.e. [1] "This article is about paid Max plans for individual consumers. If you're part of an organization looking to use Claude with your team, refer to Team and Enterprise Plans."
[1]: https://support.claude.com/en/articles/11049741-what-is-the-...
I could believe it, but I'd want to see something a little more concrete.
Just wonder what happens when more and more companies introduce similar restrictions. Will that lead to devaluations of the LLM companies?
It wants to see faster R&D, higher revenues from existing assets, greater operating margins, higher sales to invested capital ratio and so on…
The best way to measure that for a software firm is up-time of services, usage and project completion duration
This is also not easy. In particular proactively preventing bugs is not rewarded
The main way I think you can proactively prevent bugs in a meaningful way is by crafting and propagating better architecture.
Better (or worse) architecture and adoption of it can be measured through a mix of quantitative and qualitative means so those metrics could be used to evaluate the impact of the engineer driving that architecture.
When shit just works for months or years no one is going to come and praise you for stuff you did a while back.
You are better off breaking stuff and then fixing them to show how useful you are.
Just a pristine comment section yap.
The times I’ve been asked to evaluate a prospective candidate and I see that product on their résumé, it’s been an instant veto, in the same category as working at Palantir.
it's not that difficult to say it confidently if you use any of their services and applications because exactly nothing has changed.
For reference most labor productivity increases for the last 50 years amounted to about 2% per year. If a hypothetical FB engineer had doubled their productivity with their gazillion tokens that would be 30 years of productivity gains in one year. I'd wager the evidence would be quite evident if you opened any of their apps
Meta sounds like a cluster-F of a place to work. Massive reorgs around wild ideas like the metaverse and everything Ai all the time. Employees terrified of being fired. Incentivizing token spending and then cutting it off. While the overall company may be fine, the dev department sounds rudderless and absolutely miserable.
I'd argue most of the AI value is related to how 'Dead' the internet is.
Ultimately the spend on tokens has to benefit the firm financially or it won’t continue spending on it.
IMO claude, chatgpt/codex, etc should be able to optimize the PDF use case to be extremely token efficient as it's a very obvious use case. But when I start to explain to my wife/friends why it burns through so much quota, I find myself thinking "why should they have to understand this aspect of it". to me, that the details of PDF parsing and extracting are relevant to users (instead of solved such that you don't have to pay attention to it) shows how these tools are not nearly as "ready" as they are made out to be. I may be preaching to the choir on this one, but just my 2c
Gemma 4 works perfectly well offline on limited hardware (I have an 8GB video card) and can handle extracting text from image-based PDFs just fine.
Take a PDF -> run it through MarkItDown [1], using the OCR plugin if you need (point it to Gemma 4) -> now you can ask Gemma 4 questions about the (markdown) document.
I am sure Gemma 4 could even create a GUI to make this process very simple for a non technical user.
This workflow is highly optimized.
This discussion was about measures, goals and incentives. Follow the incentives.
You can rack up token consumption extremely quickly when you embed LLMs into automated processes or products.
I'd be very surprised if these numbers are just typical coding usage with no scripting/pipeline/automation stuff
Having a speed limit does not imply the utility of driving is zero.
But yeah, it's like they've never actually met human beings...
Source; my last job working with accessibility and that nightmare.
If so, your metric cannot distinguish between a bad engineer and a good one.
If not, you have the same problem you started with: measuring contributions to “uptime”.
A metric that moves in the same direction and amount for everyone based on external event isn’t a problem. The delta in performance of the great engineer will outweigh that of the poor, since the metric movement that is due to external circumstances will be the same in each kind of engineer and thus not count.
The answer is simpler on the surface: focus.
Generally the problem is the larger the firm’s operations, the harder it is to focus.
Apple is the only firm that has done well on this consistently and doesn’t have a huge grave yard of failures to show for it.