I was curious about the amount of code on GitHub that is generated with Claude Code, and this is my attempt at finding that answer. Spoiler alert: It's a lot - around 19M commits by my count In a nutshell it is a dashboard that presents some basic, and hopefully, interesting stats about commits signed by Claude Code on GitHub - in public repos. Not all commits are signed (via the author field, or a commit "trailer"), and many repos are private, which means Claude's reach is probably wider than what you see here. But I think it's enough to see the spread and learn a bit about how it's used. Technology wise, it's a pretty basic Next.js app with Recharts for graphing and PostgreSQL for the DB. I started off with using BigQuery because I estimated I would need the analytical scale, but I eventually pivoted to Postgres because the small writes, and frequent reads for deduplication became too expensive. The ingestion/backfill job is the more interesting part since I went from severely under-engineering it (start smol and all that) to ending up with a bare-bones, but capable, ETL pipeline. Primarily, the challenge to overcome in reading the data, was GitHub's rate limits - both on their search API, and on their GraphQL API. On search it's 30 req/min, on GraphQL it's 5000 req/hour - per access token. Because of this difference, and response time differences, I split the work:
There is a bit of lag in reading these commits currently, and it is still pulling historical commits, which is why the most recent dates are a bit low on commits, and why some repos don't yet have a language set yet.I wouldn't say it's 100% done - I want to improve the ingestion still, and I think there is more I can extract from the data - but I have definitely enjoyed looking at what I have so far. Let me know if you have an idea for what I can add to the dashboard, or can think of something else I should also be reading. For some more info on my methodology and the evolution of the backfill job, head to the About page. :-) |