Leaving GitHub for Forgejo(jorijn.com) |
Leaving GitHub for Forgejo(jorijn.com) |
also: releases, packages, actions... its all very convenient
I suppose forgejo is meant to be a lightweight, low maintenance, feature-rich, self-hostable otherwise GitHub-like free software.
But I suppose you could open issues to propose your ideas.
Auth using OpenID Connect working well, would recommend good security hardening.
Runners are in EC2 auto scaling groups. Don't have to be though, there is also a K8s based solution out there I believe.
Given the trending-to-zero software writing cost, a local self-contained source code management tool would be natural for delivering software, i.e., software becomes an agent with specialized code as its harness (in addition to its default ones) source control would be one default tool for the agentic software to manage its own evolution.
Most of git is append only immutable blobs - just sharing these between devices just works for me. "users" and authentication is handled by syncthing.
I have pre and post hooks to make sure no device tries to change HEAD of branch owned by another device, just to be safe, be it hasn't been activated once yet.
Definitively the easiest way to approach this, and the most standard way too. If you already have ssh, which I'm guessing you do if you managed to setup syncthing on it in the first place, then you can literally just point git to host+path and it'll use whatever ssh authentication you already have in place.
Can hardly get simpler :)
Federating to self-hosted means you are depending on google for discovery, which will never rank.
If you need a backup remote, just backup to Google Drive / iCloud. Private team collab? run a server.
While I understand the frustration with github, the prescription to self host seems like cutting off your nose.
It’s easy to make the case that Forgejo is good enough for public hosting. Gitea probably is, too, but there’s less direct evidence.
I do kind of worry though: there's a broad trend of countries trying to become less reliant on eachother, and in my mind the long period of peace we've had in much of the west this past half century has partly been because we're all in business together.
At this point I just need to finish cleaning up my GH profile by deleting the few repos I still have there.
GH was fun while it lasted…
Frankly, the modern internet as a whole is scary. Google has so much power, Github, Meta, etc., they all control such fundamental parts of society now and get to run free since they’re private companies. Not saying they should be government owned, that would drastically worse, but some more detailed oversight would be nice.
Gitlab it’s getting to heavy for my needs
Also syncing is automatic and immediate, I don't have to initiate push and pull.
In contrast, Forgejo hosts its code on Codeberg, and Codeberg is running the Forgejo software.
Git is decentralized, GitHub is just another place you can host your code in, but you can push your code to multiple remote servers.
So, no thanks. I'll not be committing any personal code there anymore.
And no, I don't care for the social aspects either. Discoverability, stars, and AI bot powered issue bombardment.
I'm fine like this.
Also, remember, "Open Source is not about You".
And, even if you move your repository somewhere else, can you really prevent anyone from uploading it to Github? To do so, you may have to create your open source license.
Did they train autocomplete? I mean the code is open source so anyone can scrape it and train it too. I'm kind of glad they did train it because otherwise we'd still be stuck with Apple level AI models right now.
The whole reason we have so many models, including open weight models, that are all competitive with each other is because the data is free and anyone can be training off it. If the goal was to monetize the source code I guess the authors shouldn't make it open source.
This is a silly opinion to hold, isn't it? I mean, you release projects under a license with the express purpose of freely distributing your code among anyone in the world that may have any interest whatsoever, and even allow they themselves to share it with anyone they feel fit. But you are somehow outraged if people actually use said code?
Please make it make sense.
Leaving is still the right move. But this applies to all centralized large services: Our use of Google and Google Drive, any Microsoft products, Adobe products, etc.
As long as your code is out in the open anywhere the slop factories can train their models at it, regardless of whether it's on GitHub or your private server. So you might as well keep using GitHub at least as a gratis mirror. My reasoning is that if I'm going to be milked anyway I might as well freeload off GitHub.
> As of v9.0 in August 2024 the project relicensed from MIT to GPLv3+, with the explicit goal of staying copyleft and resisting future commercial capture of the codebase.
The value of copyleft for decentralization is too often overlooked.
About an hour of work, small and frankly trivial diff: https://peoplesgrocers.com/code/forks/forgejo/pulls/1
I didn't have to fight the architecture at all, the seams were right where I needed them. Added migration adding a boolean column to the repo config table, a few tweaks in permission middleware, and voila, it just worked. Really excellent decoupling in the Forgejo codebase [1]
You can't do anything like this with GitHub. That's the actual freedom! Separate from the where-do-I-host-my-git question. There is a big difference between software that "sure technically I can change it since I have access to the source" vs software that's been constructed specifically to be customized and changed.
[1] Permission checks live in obvious places, the template system let me modify UI without touching unrelated code. Someone (many someones) clearly cared a lot about keeping this codebase modifiable by outsiders, and it shows. That's hard to do and should be more celebrated.
Issues, releases, CI, docs, security advisories, search and discoverability all tend to get coupled to GitHub over time.
For open-source projects, I like the idea of self-hosted as the source of truth, but still keeping a read-only GitHub mirror so people can actually find it.
We let Microsoft parasitize our brains with this. The software community has long had alternate forums. GitHub isn't even a particularly good one, and it's recently just become a swamp of generated content, fake stars, and mining your content.
In the last couple months at least once a week I get some LLM generated phishing spam from some bot that "found your projects on GitHub and want to collaborate" etc.
And it's well documented now how you can just go out and "buy" GitHub stars.
Please. Cut the umbilical.
I think trying to re-host git itself might be more trouble than its worth. My kingdom for someone to build this so I don't have to use ADO boards anymore.
Does the code really need to be hosted in a central location like this? (Clearly not, which is why people are leaving GitHub in the first place)
But the one part GitHub provides that's genuinely valuable is the social aspect, and when you get a PR from a user named torvalds you can trust that this is in fact Linus. This isn't the case with more distributed systems.
That's why I'd really like to see some entity handle just the auth/identity providing. Forgejo/ Gitea/ Gitlab instances can then choose to use that. Then, for example if you want to take on another contributor and they have their own forgejo instances, you can invite them through this provider, when they fork your repo it ends up in their own forgejo, and they can easily create PR's into your repo.
Agree, I feel like a true alternative should focus on this missing piece to bridge the gap.
GitHub is still really, really nice in that it’s five seconds to throw up a repo that’s accessible worldwide (98% of the time lol) and everyone’s on there. Whatever replaces it (just like whatever replaces twitter) may be better in many ways, but it will be “worse” in others, even if just in splintering.
But bzr lost the battle, Canonical was slow to adopt Git, lack of investment in the platform, so it was another lunch that got taken from them.
Yes, I understand that people are upset about the Copilot issues and maybe even the "frequent" outages (which usually only affect fringe parts of the site not everyone uses daily)
It's good that there are other solutions (forge, sourcehut, whatever) but most projects are still alive and very well on GitHub and my guess is that this will stay for a while.
Also, personally I have no issues with GitHub training AI on my (badly-coded and bug-ridden) code if they really want to :)
[1] https://www.reddit.com/r/github/comments/1snqyj3/is_there_an...
That's precisely what I have been doing for years, I still get to keep one foot in the GitHub ecosystem. I still get most contributions on GitHub, and that's fine by me. When a PR comes it I check it out locally, review it, and when it's done I merge locally to master, push to origin, and then origin pushes to GitHub which then automatically closes the PR and gives the author attribution. I never actually have to interact with GitHub after the initial pull.
> Everyone seems to be leaving GitHub
A small minority is leaving Github; this group is more likely to write articles about the choice than those who still use Github.
Also "just use multiple remotes" doesn't solve the problem. If you don't trust GitHub you shouldn't be pushing code there to begin with. So the ideal platform for hosting an equivalent platform as Github needs to be a trusted one as well.
1) MSFT is using our free labour to train its AIs.
2) Being Canadian, I do not want any dependence on a US-based company, seeing as the US can no longer be considered a trustworthy ally.
And here lies your misconception: services such as GitHub are really not about git. That's a red herring. It's not about tooling either. People use services such as GitHub because of things like issue management, access control, release management, project pages, and CICD integration. You click on a button and you create a repository that's automatically added to your organization, with all access controls sorted out. You click on a button and you grant read access to someone. You click on a button and you onboard a whole team.
Underneath it all, it's completely irrelevant if you are even using Git. Some people even use github's CLI interface instead. Does it matter if it's git or not? Do you even care?
I have personal projects hosted and mirrored across GitHub, Gitlab, and BitBucket. That works, but only as far as backups are concerned. Even in projects that onboarded onto a third party CICD system, git is really not the reason for picking one service over another.
Because is a kind of filesystem.
How a TEAM operate IS NOT.
And that is the point of Github.
There is no escape to the coordination problem!
(And if you say mails, patches, and other asynchronous ways: same thing, more complex)
Another good contender is the Radicle[4][5] which is completely decentralized on top of the Git.
[1] https://codeberg.org/forgejo-contrib/federation/src/branch/m...
[2] https://liberapay.com/forgejo
It's a shame that all these companies that benefited from open source have poisoned the industry like this
If the projects I am interested in are elsewhere I’ll meet them where they are.
Has anyone tried this?
https://tangled.org/h14h.com/knot
Overall, I think the platform looks really promising. The AtProto separation between Personal Data Servers, Relays, and AppViews seems like the appropriate set of trade-offs, IMO.
Being able to host my git repos as a headless, data-only server is about as painless as self-hosting can get. Compared with ActivityPub solutions (like Forgejo), it's great that I get to side-step the tedium of hosting and scaling an entire webapp when all I really care about is controlling my data.
Since the initial setup, the only ops maintenance I've had to do is bump the knot-server version and redeploy (tangled.org displays a banner warning notifying me when it's outdated).
Excited to spend some more time with Tangled on other projects and test out their other features! I'm particularly interested in their native support for jj and stacked PRs.
I hope they don't start charging for regular use of GitHub, but when I see how some of the vibe coders make thousands of commits a day, I'm becoming more and more skeptical. Would be a real shame if we can't share and cooperate on code for free.
This is what Anthropic is already doing with CC, and tbh GitHub and GitLab are probably doing the same. The cost is some hate from devs on Twitter and random small subreddits ofc, but I bet that's well worth it!
OTOH, it does kinda blow my mind how often I see people (on /r/vibecoding and elsewhere) paying for a $200/mo subscription to produce what amount to hobby projects and toy sites. I've been known to make some silly money decisions when I can afford it, but this feels different.
I guess it's a $2400 annual subscription to a service providing Meaning and Purpose? If you're around 40 and realizing that you'll never be rich or famous, this might actually affordable compared to the alternatives!
People spend more than that on their (non-vibecoding) hobbies. Think of folks who do woodworking, 3D printing, sailing, car racing, etc. So $200/mo is not excessive if they get their enjoyment out of it.
When/if you need to change hosting providers, you get to lose zero data in Fossil because of it.
network effects. I just can not bring my team to use fossil. They have to share code with others. Other departments. And everyone (99%+) uses git. It just feels like a disservice to force them to use fossil. It is a catch-22.
It is similar to so many other things in the tech space. Trying to get fellow developers to use functional style idioms. Trying to enforce immutability. It is like something big (like a facebook or google project) has to force the community to get on board.
I’m thinking about making public instance and use it with https, but minimize the attack surface, any recommendations especially about gitea/forgejo?
All I want is hosting and a read only web interface, plus access control in case I have collaborators.
All the offerings are enterprise priced because they offer "minutes of CI", "AI assistants" and other icing on the cake.
The outages might be due to AI load, but that’s to relevant because your leaving isn’t due to outages. Even though the article is primarily about outages.
If you have a problem with your code being scanned for AI training, then write that article.
But this article is about outages.
Illegal tariffs, threatening tariffs, NATO sabotage, threatening invasion, abandoning Ukraine, supporting Russia. It's not just one guy doing this, he has a whole party behind him that could reel him in any second.
I can imagine an alternate history in which Tim Berners-Lee founds an internet giant in Europe that dominates the 21st century. Imagine a European Google with a search monopoly, a smartphone (from Nokia), and a social network (Xing? Mastodon?). But of course, Sir Timothy never wanted that.
Alas, we're on a different timeline. I wish I were a young man so I could live to see how it all turns out.
I’ll admit I got tired of reading about outages and didn’t absorb the remainder of the article well.
Just skimming the comments on the internet and you see the same flavor of dumb, false narratives like "everyone is leaving GitHub" "too bad GitHub is in decline".
Meanwhile the vast, vast majority of people using GitHub for "getting shit done", while maybe impacted and unhappy with the outages, are not clamoring to move to Web3.0 forge-bro tech.
Can you imagine /usr/bin/git rewritten into a set of composable, monetized services with agent-first APIs?
>>>5a/ Machine-scale infrastructure.
>>> Others are bolting AI onto platforms built for human-rate commits. Git itself wasn't designed for agent-rate work.
>>> We're rebuilding the underlying infrastructure for machine scale: a generational rebuild of git, the monolith giving way to composable, monetized services with agent-first APIs.
/bin/ls wasn't designed for agent-rate work either.
We know that Microsoft at least acknowledges that GitHub being down is something they are ashamed of and aims to fix. At the same time, I feel like having an EU government-run service down over a weekend because there is no one to fix it outside business hours is something that should be expected, not frowned upon.
It is actually down right now as I am writing this comment.
Will EU government service steal my data? Absolutely not, here the trust is at the highest.
Will EU government screw up a commercial service? Absolutely yes. I fully trust EU governments to regulate itself to a full halt.
And since Gitlab seems to have looked over at what is happening at Github and decided, we want some of that, that was probably the right choice.
As a long term GitHub customer, I see many practical and personal reasons to move away from the platform. I've seen a handful of similar posts lately. A few years ago this would have felt totally fringe, but now all of a sudden it really doesn't. For now, for me and many others GitHub still works great, and is very convenient. But the alternatives are getting even easier to self implement all the time.
GitHub pulled it off for 5ish years before that began to change, and it was only last year when they stopped having their own "CEO".
The hardest parts of switching to forgejo: 1) coming up with a comfortable way to pronounce "forgejo" in my head, and 2) adapting to not having the same GitHub v3 API and needing to switch to a different CLI for PR creation, repo creation, etc.
The pronunciation thing is probably the more difficult of the two.
With my American accent, I don't quite say it exactly like the recording, but pretty close: "for-JAY-oh"
It would actually be easier for me if it was in Cyrrilic or Greek letters, Форджеьо or Форджеджо or Φορτζέγιο don't have the blockers on pronunciation that Forgejo does.
You piqued my curiosity :)
edit: Ah, I was beaten to the punch :(
I don't see an alternative and its a bit of a stretch to expect people to follow you unless you are famous or have that audience reach already which many of us do not.
Vanity metrics.
GitHub initially tried to shy away from this, I remember conversations with early GitHub engineers trying to make sure "Stars" and "Followers" numbers were going into the direction of being just for vanity and popularity.
Then eventually the profile READMEs appeared, which people now use for showing even more vanity metrics and brag about how much code they can produce in how little days.
Since employers also ask you for a GitHub profile, it ends up being needed for new developers to make an entry into the industry, without it companies will basically ignore you. Unless you're really, really good, which to be honest, most of us aren't.
Sometime you need to go where many customers ahem enterprise are.
As a developer of an engineering application, windows is the way to go 1st hand. It would have been easier to adopt one more application on the daily driver enviornment. Till than, I am on GitHub only.
Once you start paying for it, GitHub Actions runners are very expensive. I've used both Jenkins and GitLab before to self-host CI/CD, and you save so much using on-demand (or at higher scale, reserved) cloud instances. I do freelance DevOps work and I've helped clients with these sorts of challenges.
12 jobs per PR for up to 30 minutes running Linux, macOS, and Windows jobs on LLVM, GCC, and MSVC in static and shared builds with also some sanitiser configurations.
And consider across projects we might send dozens of PRs per week.
Right now it's somehow all fully free on GitHub Actions. I wonder what the same would cost on i.e. CircleCI
- Repositories seem to mirror fine for a few weeks and stop. Pretty useless. I have a PAT token for it that does not expire, and yet it seems to claim otherwise, despite the token working elsewhere when I test it.
- Sometimes there is nothing in the logs, sometimes it's the database being locked for some reason. The only thing that uses the database is Forgejo.
- So far I haven't been able to tell if this is Forgejo, crappy SD IO on the Pi causing database locks, or Forgejo sucking at being a mirror.
+ they don’t want to pay the bandwidth costs
+ they don’t want to help train a model that might ultimately put them out of work.
I don’t personally agree that AI are taking out jobs, but I do think it’s still a reasonable concern others have so I would sympathise if that were the rationale.
I may also have a philosophical opposition to generative AI at the same time - there are plenty of environmental, societal, and intellectual-property costs that some may find unconscionable.
Then came the model trainers, ignoring the entire discourse, reasoning: "if I can download it, it's mine too use". And then basically selling the resulting tech back to the community.
Not unlike big tech extracting money from open source, but at least the latter usually (somewhat maliciously) complied with the license.
1. Many teachers don't publish, and those that do publish often still reserve their best for their students.
2. OS development sometimes operates like esoteric societies: you publish enough that people with the desire and insight become interested and engaged - both a filter and an invitation. So you can tailor the community you like.
Both depend on people really valuing these mutually-constitutive relationships.
My observation is that the generations raised on social media and gaming are happy enough with those substitutes, and view publishing their best work as a kind of self-promotion and participation in a larger, diffuse community (without a real role in governance). And they're right: expecting more personal communities now is a severely limiting factor, and AI removes most of the incentives to participate in someone else's project.
I don't want my code to be used to build proprietary software. I want code built on top of mine to respect its users. I choose the AGPL for this reason.
I also don't mind the attribution.
The LLMs don't care about all that, and do that by hogging the resources, y creating a lot of waste and pollution and disrupting society for unclear benefits. No thanks.
Open Source and the OSI are an industry plant. Look at who sponsors it.
The monopoly hyperscaler conglomerates get free labor and use it to build the world we despise: tracking panopticons, phones we can't install things on, device attestation, browser monoculture with no adblock, etc. etc.
Google made people fall in love with BSD/MIT, and look what it did.
Just a few of the classic plays:
"That Belongs to Us Now" - (1) vendors build stuff like Elasticsearch and Redis, (2) the hyperscalers yoink it into their proprietary offerings and take all the profits, (3) original authors and their companies starve.
"Embrace, Extend, Extinguish" - (1) vendors take an open source project like KTHML or Linux and build their version, (2) they flood the market with their offering, pushing out the competitors, (3) they use anti-competitive means to get their thing in front of all eyeballs, (4) once they have marketshare, they do evil things like add tracking and remove freedoms
Open Source needs to replaced with "freedom for the people, companies must pay". Source available shareware with anti-hyperscaler teeth.
Even Richard Stallman's licenses are not strong enough. CC BY-NC-SA is better.
"Pure" Open Source is corporate welfare. It was a mistake. It enabled giants to hang us with our own rope.
This is ignorant to the history of Open Source software. Software has been open long before it was subsidized by large corporations.
"Computer software was created in the early half of the 20th century.[2][3][4] In the 1950s and into the 1960s, almost all softwares were produced by academics and corporate researchers working in collaboration,[5] often shared as public-domain software." https://en.wikipedia.org/wiki/History_of_free_and_open-sourc...
Rough times out there for transparent organizations.
I'm all for open source, most of what I do is released as MIT, almost never "Free Software", still doing the same thing since LLMs appeared, regardless of everything else.
I'm a real person, have nothing to do with OSI but willing to explain my position, as long as you take it as real opinions held by a real person, instead of going into conspiracy theory land. Ask me anything, I'll give you my honest perspective.
I like using radicle.xyz instead.
>I moved my code from GitHub to a self-hosted Forgejo
It was more about the symbolism. If the goal is decentralization, “I moved to a personal forge I control” is the post's core idea. But framing it as “leaving GitHub for Forgejo” inevitably creates a new flag to gather around.
That may be useful and even necessary, but it also shows that decentralization movements often produce new centers, names, and identities.
There is no difference between what you say people cry for and what you say they actually want.
Others recognize there's some kind of more comfortable middle ground where decentralization means the same as a town/city/state type of social good that is independent and capable of working without larger centralized structures. Having to work towards it, pay money into it, etc, are expected but because the work that goes into maintaining the infrastructure has a clear line of derivation (taxes clearly go to X, Y, Z) would be a benefit.
It's typically the first class tho that dominates all conversations regarding decentralization, and that class includes the Epstein billionaires who just dont want laws to apply anywhere they want to do unethical, immoral and whatever. eg, money is the only law.
I try to keep things simple in the homelab and thinking only using fail2ban and caddy reverse proxy and expose it.
Package registry isn’t private by default and accessible with PAT. Or am I mistaken?
What keeps you using Gitea? I'm wondering if I should try it over Foregejo now.
Disclaimer: I was also previously an elected member of Codeberg's board (presidium), where before the company that was founded to support Gitea's community maintainers was created, I had asked for assistance with multiple matters to aid in project development and was denied.
Honestly, stay where you are
I've done this too in the past, I'm still running the internal/lan Forgejo instance, but not any public instance at the moment. But in the past, I've setup a public read-only instance, which mirrors my internal one, then one reverse-proxy connection from the internal to the public instance, which the public one uses for getting the git data. Then it mostly just kept on working by itself, whenever I changed anything in the internal Forgejo, the public one got updated, yet I could keep all issues, CI and more completely private and on lan.
I'd pay a bit so I don't have to back it up. But not 20 per friend who might make a commit next year.
It's too trivial for anyone to be selling that. And I don't think there's a large market for $5-$10 barebones setup when GH is free and you can self host.
Something similar from them or digital ocean or linode or Hetzner would be a win.
Pika does offer Forgejo and Gitea.
(I use that VM as my primary public nameserver now and I don't really need a web front end for git so I'll be keeping my current setup. But if it had been available back then, I'd probably have gone for it.)
On the positive side, HN has gone through multiple periods of enthusiasm for new code forges. There was even excitement for GitHub at one point. :) It’s good because all the forges generally add each other’s features if one takes off.
The people "hating on github" and leaving it today, regardless of their (lack of) fame do not mind not being followed to the new forge, which may be a dealbreaker for you (an educated guess, from your phrasing). Conversely, you use GitHub "just fine" under terms the leavers consider dealbreakers.
What you are baffled by are differences in value judgements.
Just as with the COVID supply chain problems reconfigured trust and assumptions, the actions of Microsoft and other large US software companies have become erratic and untrustworthy. In particular the sanctions against the ICC prosecutor in Europe have made it imperative to have zero dependence on Microsoft with easy exit paths at any given moment.
Similarly, one cannot trust Google as they can disable your account one day, with zero recourse, zero ability to appeal, zero accountability to Google for disabling your email and SSO. Except maybe posting on HN and causing enough bad PR to get Google to act to suppress the bad PR.
The US is no longer a nation of laws, it's a nation of whatever the president wants, and even impartial courts are likely to be overturned by an extremely partisan Supreme Court that has said that the president can not break the law while doing "official acts", whatever the heck that means.
Audience is not a factor for me at all, if GitHub is a social network for you then it definitely makes sense to use it still.
But I don't know anybody that needs the social aspects of GitHub. That's just the people I know and work with, however.
For past two weeks I haven't been really able to browse repositories; some parts will not get downloaded. Issues not appearing after creating. Pull requests missing. All kinds of issues.
One issue is that issues tend to be monotonically increasing numbers, and references to old issues vs. new issues get confusing over time.
I was able to pull the graph and collect patches and bugfixes from forks, like from a fablab somewhere using the same hardware, pushing to their fork with all their other location-specific stuff. In one case I discovered like four different forks with a different fix for the same problem.
Now you could argue the "social" part wasn't working if people fixed the same issue multiple times without knowing about each other. But at least github made it possible for me to collect everything there is, review and merge it into my own repo, and then drop a comment at the original issue tracker. (Which would have gone 404 had it been self-hosted, or with registrations closed due to spam.) In both cases I eventually got feedback from other people who found my branch and used it.
Just like "log in with Google", you should be able to do "log in with OAuth", you type your email or domain (or your browser fills it), and it triggers a redirect flow for login. Then people can use GitHub or Google or Apple or their own provider, just like email. Every email provider could also be an OAuth provider.
Same thing book writers did when the printing press brought the possibility for others to copy their books for free.
There are also other options that have been used historically but they tend to be a lot more bloody.
Software then was also rather different from software now. It's not a government-funded research project these days.
For me, I only use a forge's viewer/navigation for cursory glances or sharing links to others. If I need to spend any real time digging, I'll clone the repo.
What do you think, what is the business for GitHub in providing limitless private and public repo hosting?
I run my own public instance of forgejo. Is this software I run on my own that syndicates other users' commits? GitHub *was* good for discovery; does GitSocial offer something similar? Are there ways I can push more of my contributions into GitSocial, or does that happen automatically when I start using it?
I think the GitSocial website would benefit from a "features and benefits" section rather than just a timeline view and demo, and I advise you to emphasize the benefits. I can see a TUI and a timeline of commits, but it seems like GitSocial is MUCH more exciting than just that.
To me, GitSocial offers freedom from corporate control and surveillance of my open source work, and that's really intriguing.
Laws should make it a double-edged sword, make distillation explicitly legal.
Not much else they can do.
Knowledge-distillation is already legal. Current case law says none of outputs of any model are protected by copyright, so one could use model outputs for whatever they want - including distillation. That is why AI companies resort to ToS clauses to block distillation and/or training competing models.
Its already proven based on previous specification laws.
Hilarious as it sounds its true. We can make “any” code a public domain woth LLMs.
Unless you can actually prove what your model has ingested & if you have the license for it in court.
https://azure.microsoft.com/en-us/products/github/copilot#fa...
OpenAI did train the model on GitHub repos. The next question is whether this was enabled by Microsoft's investment in / partnership with OpenAI. I suspect yes, but I haven't gone searching for this yet.
it's interesting because the more paid services these guys bring on board the more complex the security shit gets for them. the head of our IT is a fucking lunatic though and he is steering shit towards utter disaster, he's obsessed with being the guy who picks the next cloud service that "makes things so much better".
my small team is actually considering just getting some mac minis and making a cluster of servers. we decided we don't need infinite uptime for hosting m-f office tools and we can just ... not interface with our infra/devops guys who have lost their damn minds and say no to everything now. they're supposed to be the compute tower under the tragedy known as TBM and they haven't approved a single VM in like 2 years.
It doesn't matter what happens 6m-2y down the road, your odds of being laid off or job hopping are high in the current regime so this all makes sense. You pay some amount of your budget to make your life "easier" in the now.
The trouble comes 2-5y down the line when the service is bought out by <insert MEGACORP here>, and you have to scramble to replace it or hold your nose and pay up.
(tbh, migration is not that hard, but the org will act like it is)
The matrix of authentications, compliances, and intranets will only go up as your company grows and often are enforced by people who do not suffer them daily.
I mean, if you're going that far, a couple of refurbished servers gives you far more compute and far more capacity and much better maintainability.
I think services like Cloudflare could play a role if they were able to provide some kind of forward auth and preferential treatment of core users during overload. My self hosted systems would have to be the source of truth and Cloudflare would have to be replaceable for me to consider using it.
Think along the lines of automated pre-auth that coordinates with the origin based on some standard.
That said: SourceHut has private repos and access control.
... but not on the sr.ht link you gave. That's a list of projects like your average open source hosting landing page.
You have to click on the tiny sourcehut link at the top to see the real site.
For anybody else who needs it, the page with info about the service is here: https://sourcehut.org/
If the Forgejo core team announces tomorrow that their 'philosophy has changed' and overhauls the architecture, tens of thousands of 'independent' home servers will grumble, but they'll inevitably run git pull to update anyway. And eventually, they might migrate again. That cycle is inevitable, and I have no intention of denying it.
The issue in the context of this post is that it rejects dependency on GitHub while presenting Forgejo as the escape. If your defense is 'why reinvent the wheel,' you are essentially proving my point: we are just choosing a new center to depend on.
Beyond that, we might just be talking past each other at this point. I don't think you are wrong, and I have no intention of twisting your words. Your underlying point is likely this: GitHub's service quality has degraded, and with their aggressive AI strategies driving users away, it is strange to view the act of leaving negatively. I fully respect our difference of opinion there.
I honestly just got a bit annoyed earlier because you framed your reply as if I hadn't even read the article. Anyway, let's leave it at that. There is no reason to misrepresent your views, and no reason for us to argue further.
This is a hilarious - and empirically wrong - thing to say considering Forgejo's raison d'être[1] and lineage.
If users just grumbled and ran git pull anyway, Forgejo would not exist, and neither would its fork-parent project (Gitea); everyone would be using Gogs under duress. Forgejo's past shows that people care about the philosophy, which is why this particular fork of a fork is thriving.
People may converge on a new node like forgejo, so from that point of view things are not decentralizing, the dependency graph stays roughly the same shape, but it’s a step in the right direction. It’s marginally better than github hosting everything.
That’s what I should have said originally instead of being an asshole.
https://archive.ph/2024.08.23-032320/https://vitalik.eth.lim...
It doesn't necessarily mean we should immediately suspect them of raising a flag for its own sake.
https://archive.ph/2024.08.23-032320/https://vitalik.eth.lim...
Epstein files remind me of the first line from《삼국지연의》or Pan-Asianists who raise the Universalism flag..
>The Himalayas divide, only to accentuate, two mighty civilisations, the Chinese with its communism of Confucius, and the Indian with its individualism of the Vedas. But not even the snowy barriers can interrupt for one moment that broad expanse of love for the Ultimate and Universal, which is the common thought-inheritance of every Asiatic race, enabling them to produce all the great religions of the world, and distinguishing them from those maritime peoples of the Mediterranean and the Baltic, who love to dwell on the Particular, and to search out the means, not the end, of life.
I think Radicle is interesting. It doesn't solve the CI bit, at least not yet, but I suppose it's possible to hook up some local runner for it.
There's also a bug tracker which I believe was called bug, but I can't find it ;), that tries to bridge different issue trackers and providing offline mode for working with them.
People of course also love free CI capacity where they can run even untrusted code, so in that sense Microsoft resources might be difficult to compete against.
[1] https://radicle.network/nodes/seed.radicle.dev/rad:z3qg5TKmN...
Which is why we should always champion FOSS for dev tooling as it's the only way a community can have a say in an industry dominated by unregulated tech behemoths.
Will they? Has Gitlab doubled down on "Agentic AI" and thus require 30x capacity to support current users, while being kneecapped by Azure?
There's just so much confusion around this. In this thread alone:
* Distillation is legal under copyright; the violations would come as ToS violations, which is contract law, not copyright law.
* Training is legal as well, so long as the original material was obtained legally.
* Moving code off of GitHub doesn't change any of this: AI companies are free to download your git repo no matter where it is hosted, just like they can any other content on a publicly accessible website.
* Liability comes into the picture when the models are used to infringe copyright in their output. We'll have to see the outcome of the NYT case here, but that is proceeding at a glacial pace.
I am not a lawyer; I'm an interested amateur that's been following the saga for years. I wish the discussion here on HN were more nuanced.
If anyone has legal updates that render any of the above incorrect, I'd love a pointer to the decisions. One area I'm particularly weak is the legal status in countries that are not the US: I don't follow those laws nearly as carefully, nor the court cases brought.
C'mon, I'm not even apart of the movement to move away from GitHub, but that's not really a valid argument. Sure, they CAN download the source code, but its not nearly as automatic. They don't get to download it all, en masse, from copying hard drives/databases they already own. They have to go over the internet. They don't get automatic notifications when new code gets pushed. And finally, if one wanted, they can make it harder for bots.
I certainly believe that these companies do get away with a lot more than the average Joe - see: Facebook downloading Anna's Archive, every pirated eBook - but that doesn't mean you have to hand it to them on a silver platter.
Plus, even if your code is private on GitHub, you can guarantee that they can't train there models on it anyway; unlike if you host it yourself, or somewhere else.
Does anyone else find it ironic when closed-source GitHub claims it's some super hero for open source?
In short:
- GPL code requires attribution and sharing of code. Models strip license, so GPL is effectively violated.
- Source available licenses are "for your eyes" only, so training on source available code is also violates said code's licenses.
- MIT requires attribution, but forgetting it has no consequences, so it's a more gray area.
About moving from GitHub:
- Some public repositories provide visible and invisible anti-scraping protections. So it's not always that easy.
- GPL says I need to share code to the people who downloads the application itself, so I can move to cathedral model.
Moreover:
- US Government has a stance of "If we need to take permission for everything, AI industry will die". Hence, as an outsider, the court rulings have no weight in my eyes. They are taking stance to enable and not hinder the industry. If one reads Fair Use doctrine, it's very possible to rule otherwise. OpenAI's whole non-profit research arm was an instrument to circumvent Fair Use doctrine's "earn money from copyrighted works" clause and support "we only do research pinky promise" requirement of the said doctrine.
When courts said "go ahead, we're not looking", people started to torrent e-books (ahem Meta ahem) to train models or buy/cut/scan/ocr books to train their models (Anthropic).
So the situation is left murky to allow Silicon Valley to thrive. Not to protect people's blood, sweat and tears. These works are provided by peasants anyway, so why bother.
Addenda: Courts said models' outputs can't be copyrighted. So, copyrighted code gets in, non-copyrightable code gets out. It's effectively license-washing.
I wrote a nearby comment giving some resources on the current state of Fair Use for training, but in short: it depends.
https://news.ycombinator.com/item?id=48125071
> Hence, as an outsider, the court rulings have no weight in my eyes.
My only focus in on legality, so this doesn't track for me. If we're not talking about what courts are ruling, then there's nothing to talk about legally, since the copyright office is waiting on courts to rule here.
This was true before AI, and the ToS now explicitly includes AI training to avoid confusion.
In short: it has never been a good idea to put anything with a copy left or strong license up on GitHub if you wanted them to abide by it.
FOSS and open source licenses never ever granted entitlement for contributors to have their proposals reviewed/merged by maintainers. Neither it ever offered entitlement for users to ask for free support.
FOSS is about giving people access to source code so they can do with it whatever they want, and maintainers/authors should have always had the ability to "publish and forget" the source code, without having to deal with those "entitlements".
You're confusing things. The "social component" refers to people interacting with each other. Such as two developers working on a bug or a feature. Or a tester reporting a bug.
This is a big part of actual professional software development work.
I write a library. I put it up online, for other people to use, if they like. At what point did I assume any responsibility to play nice with others? Or work with others at all?
There's FOSS licenses, and then there's the social expectations around collaborative online development, and the latter is nowhere implied in the former.
FOSS was not historically weighed down with all these social expectations. Forking was not seen as some community failure, but the basic purpose of FOSS. Sites like GitHub were a major part of this shift.
Or maybe they mean there aren't good collaboration platforms in general, not sure.
I really really am not trying to start a political argument, but just as food for thought: this is exactly why I have faith in socialism (read: 'prosocial institutions and norms'). And whether socialism is eu- or dys-topian, it certainly cannot happen in the first place without a "social component"!
At a push you could have the same thing massively more spread out using GitHub purely as auth.
[†] I occasionally have AI write one-off bullshit too, so I'm not casting stones. It's just overwhelmed the discovery signal with noise.
GitHub is also somewhat entangled with junior hiring in compsci, meaning there's a tangible economic incentive in building a hireable profile that a recruiter might like. (Or at least that is a widespread belief.)
None of this has much to do with git, the VCS tool, nor is GH-as-a-social-network objectively very important, but it's natural to place outsized importance on a community in which one is active, and there's a significant overlap in the HN commentariat and GH users. So you'll probably hear more about the claimed social utility of GH here than in the real world.
Hope this helps clarify what people are talking about. :)
Same problem, capitalism, but different constraints.
It actually would be hard to impossible if done properly - meaning no lost information and no dead links.
Sorry if this sounds really grim / cynical, I've simply seen enough of these kinds of migrations to know that it is fundamentally opposite of my perception of engineering philosophy. It often becomes more of a question of business rather than correctness. (Can we simply fire the smaller customers? -> yes.)
You're making things up: the outrage is not that people used it, it's that the licence requires attribution at least, and opening the derivative product at worst. Token providers that trained on open source did neither.
> Please make it make sense.
I am skeptical that you didn't know the reason for the outrage because it's been repeated in every single thread where this was discussed.
I myself repeated it multiple times each time this feigned confusion you display appears.
Like I am doing now, yet again.
So? Just because a piece of output data is encrypted or compressed and does not resemble the input, does not mean that the process did not take the input.
We have decades of law that regards zipped files as infringment, lossy compression (MP3's) as infringment, etc.
> guess another way to put it is show your code in the output of an llm that isn't being attributed correctly.
Well, a better way of putting it is answering the question "Will that model have existed had none of the code used as input existed".
IOW, can that model be generated or created without first having all that copyrighted code used as input?
National Law Review covered some of those nuances last year: https://natlawreview.com/article/federal-courts-issue-first-...
US Copyright Office has a substantial document discussing each of the four factors, and making it clear this is an unanswered question, and details of the particular case will decide which way courts go. It is a prepublication version, and it's over 100 pages, but it covers the issues well, citing arguments on all sides.
https://www.copyright.gov/ai/Copyright-and-Artificial-Intell...
What are you talking about? There is no distribution, only read access.
That said, FOSS licenses are non-exclusive. Regarding the original upthread topic of GitHub's copilot training, iirc GitHub's terms and conditions involve granting them a license in order to host your code. Depending what else is in those terms, they may have had the ability to use all hosted code for LLM training through that license, instead of the FOSS licensing on any given Open Source repo. But that would only apply to GitHub/Microsoft, not third party scrapers.
And global search but I don’t feel like that even really works.
I spent about two hours reading your writings. Your arguments, such as those regarding the Taiwanese digital democracy movement and how to break the endless cycle of centralization and decentralization, are quite profound. If I were to summarize what you are trying to tell me, it seems to boil down to: "Your previous point was too generalized, and the attempt itself [at building alternatives] should not be dismissed."
Also, your Romance of the Three Kingdoms analogy confirms we are definitely both East Asians: "The empire, long divided, must unite; long united, must divide."
Overall, if I have understood your work correctly, your core message is this: while historical attempts at universalism or a 'new order' may eventually turn into new centralized powers, we shouldn't look at them only through that cynical lens. Instead, we must simply remain constantly vigilant.
I hope I understood you correctly.
No need to learn nuance, you seem to be doing better than most of the natives here just by being yourself (with the help of modern tools)
I just wanted to get more people to read the vitalik.eth articles along with me. Hope you got something out of 2hours! I had been nibbling at them and steeping in their diagrams for weeks, your centralisation comments provoked me (in a good way)
Vitaliy Buterin and Balaji Srinivasan are both based in SEAsia, Audrey Tang is in Taiwan, Glen Weyl from California is the only Westerner--- but UCBerkeley is like 80% East Asians? Surely they also know about the lore.
I don't know if "staying vigilant" is the mood I was going for. Maybe "being aware of emerging centralising entities and be prepared to engage fruitfully with them"?
These days, "emerging centralising entities" might even be old programming languages like APL that have become popular again..
In the context of the links above, such entities need to provide an "exit option" in order to gain wider traction. For example, APL is clearly an escape from C-like syntax, but there might be new directions that that such "array languages" can take from Python or even natural language:
Is it a danger to anyone, or damaging in any way? I think not.
Can? Sure. Should? Very questionable.
I'd call your statement more "extreme" than any of the stallman's statements on software.
But our 25 year lax regulatory environment has created a world where the largest players abuse consumers and the competitive ecosystem.
Open source is one of the many strategies these companies have abused to create grave harm to our society. It's let them get further with our support and with less expenditure. It's given them an ethical smoke screen.
- Social media algorithms are the tobacco products of our century. Kids are growing up with a distorted sense of self worth, people are getting angrier and more polarized, and all of it is highly addictive - all to fuel corporate profits.
- The most popular and important computer form factor is controlled by a duopoly and we can't even own / repair / install / have rights to our devices.
- All hardware is becoming locked to device attestation, meanwhile companies are lobbying for "age verification" (read: full-on identity tracking).
- Distribution is being locked to monopolies. 92% of "URL bars" are owned by one company, and typing something into a computer goes through a bidding war protection racket.
I can go on and on about it. I shouldn't even have to. You know this.
A lot of this is because of a lack of proper competition. Since the DOJ / FTC / EU / ASEAN are being toothless (the latter are slowly waking up), the next best thing we can do is take away their open source abuse. Stop letting them use our work against us and the rest of the population.
Also, open source is one more justification on why we need to increase taxes on the very rich. At this point all of them have built their fortunes on it. Just like they do on the rest of public infrastructure.
We need more socialists in power...
This is not how copyright law works or any other law for that matter. The issue is foremost between the copyright author and GitHub. The ToS may or may not allow GitHub to sue the uploader for damages for a ToS does not magically give them rights that the uploader isn't legally able to give.
Look, you are the one that opened with "What on earth is the social component of GitHub?". What's the semantic function of that specific construction if not being completely ironic, like you decided I was wrong before engaging?
> if you don’t want to engage don’t
I am engaging, you just don't like that I'm not spelling it out. This is perfectly within the community guidelines.
> Totally different thing
It's just the same thing, you both are ignoring how important convenience is.
All you've pointed to is devs working together and people filing bugs. All that requires is auth, no? Is that it? Is an SSO button really this enormous hurdle?
> It's just the same thing,
Saying dropbox is irrelevant because all end users could just "build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem." is not the same as saying "what is the social aspect of github?".
Closer is what I've argued elsewhere, which is that multiple different hosts running (something like) gitea selling cloud based storage as a service would be extremely close to github for end users. And it would be identical for what you've talked about wouldn't it?
> you both are ignoring how important convenience is.
The convenience of what, specifically? Not having to click an SSO button on a new website?
Collaboration is a form of socialization. GitHub made a social network on top of a SVN to create a forge that people can interact with each other creating issues, pull requests, reviewing and commenting on them, engaging in discussions, forking and improving upon each others works... These are not "just auth" and it's not something git solves by itself. I really don't know how to make this clearer either, sorry. Maybe it's this is the kind of thing that you see it and you get it or you don't, I don't know. For me and apparently for lots of people in this thread that makes sense.
All these other services have what you're talking about, and the only cross-repo/org work I can see here is:
* Aligned accounts (person X on one service is person Y on github), if you want that continuity across services
* Forking
And the whole forking/branching/merging side is handled by raw git.
That's why I've been asking what githubs huge centralisation gets us. It has a UI and features that are useful, great, those exist in other projects too, so what is the stickiness?
Edit: I've written UAC in both my posts by accident, but I actually meant UGC: User generated content.
I find repos because of search engines, links from project pages, links from package managers, hn, all external sources. If it’s from linked issues that’s user generated and can easily link off site (and often do).
Are you following users and finding repos like that by seeing that they commit to? Having GitHub recommend repos?
> What is the social component bringing people there?
We've answered that again and again, pull requests, discussions, comments, following people. These bring people value. There isn't much more to it. And these aren't "just auth over cross repos". You reached the ocean here and you are still thinking it's just water, expecting it to be something greater. This is just it. That's the social component and the value it brings to the community.