Do we want knowledge communities like Stack Overflow or Reddit to continue to exist? Should big AI providers that train on their data share some of the value back to the community? Is there an ethical way for web communities to license data to AI providers?
I hope the answer is yes and that there is a path to a productive partnership, one that allows public communities where knowledge is shared freely to thrive, while also bringing more grounded and vetted content to AI systems that are often closed and require a subscription to access.
How would they do that? So far, the LLMs can't be trusted to produce accurate answers. The AI companies can pay money to the data sources, but they can't really offer back anything useful (yet, imho).
[1]: https://arstechnica.com/ai/2024/02/reddit-has-already-booked...
They offered a convenience by burning money and the mismanagement and pre IPO shenanigans certainly are not helping.
They don"t own the content and the communities or the user base that can move from irc, aol, to discord. What did they learn from dead communities of the past? Do they sell traction and convenience that they own or are they claiming they sell content which they don"t own? Users curated the content, and most effective mods have left. The content in large parts of their sites has become stale or degraded long before OpenAI existed. Graveyard communities cannot curate nor pay for server costs.
AI is convenient curation and people are paying for the convenience. AI sites are also losing money per click with server costs that are out of the galaxy compared to the cached html and elastcsearch serving crowd.
We have seen the ridiculousness of the AI sites' attempts to introduce management features, short of putting penguins in the desert for animal diversity. But the great teachers of bad management features were reddit and stackoverflow who also actively killed community developed management modules.
They are failing because they lack basic understanding of the teachings of centuries of civil society and they make up what is right, wrong or politically correct ad hoc based on marketing. Just trying to avoid bad publicity that could scare off potential IPO crowd only introduces community debt and grievance. That is what has been killing them.
Wikipedia has not been crying foul but has been curating the most quality content for AI but on a low cost setup for its size. I just think its better to donate there content and money,
No, we want better knowledge communities to exist and for Stack Overflow and Reddit to cease to exist.
There are often clues in the comments that are more helpful than the “answer” and often outdated answers have the most votes.
All this to ask, how on earth is something like an LLM expected to reconcile those issues.
Though I suppose that is a short sighted concern in itself given the way in which we work will begin to evolve quickly as AI becomes more powerful.
In the end, I guess they end up being a positive press story for Google / Open AI / etc.?
Isn't it the knowledge of its users?
Expect a lot of enriched answers like these coming soon from Gemini/bard :-p
these are literally questions I've given to project managers to help create better requirements but ultimately as a dev you have to come up with "something" regardless and redo the work once the customer complains. Stupid GPTs cutting the line!
Overall this all feels so unimaginative. With all the resources these companies have the only solution they can come up with for the search problem is "just throw AI at it." I could come up with that. It's not clever.
I've basically completely replaced Google in my day-to-day unless i need to look up a specific location of something in the physical world or something that recently happened.
That's ...not good.
GPTx gets alot of surface topics right but when you delve into gritty specific details it will just start rambling like a straight jacket lunatic with the confidence of a used car salesman. The rubber meets the road when i try to compile code that uses libraries or functions that don't exist or it leads me to hallucinated imaginary github repos. I worry that this use of GPTx would be like getting water from lead pipes: it would seem fine on the day-to-day while my mind is slowly poisoned with nonsense and insanity.
Google has certainly taken a nosedive in result quality for sure the last few years but Kagi has been amazing for me lately.
Unlike Google, I can click the second tab every time and it goes to image search. Wait, actually they put "copilot" there and image search is the third tab now. Either way point stands: no shuffling of tabs.
Image search is actually better than Google. I can search for exact image sizes. Google used to offer this! I can just type my screen width and height and find the perfect wallpaper. Wait, it says "at least" here, not "exactly," so I guess it just stores the total amount of pixels of an image and then multiplies the width and height you inputted...
Can you believe this? It's 2024 and I can't even find an image by size on the Internet. I can't even trust the second tab is going to be the images tab. And some people think AI is going to fix software. It's ridiculous. It's just laughable. And so depressing.
Also theres a chance these LLMs have access to other tech forums in addition to stack overflow and could possibly provide a solution. For example GitHub has actually been the better source for me when debugging issues. Usually you can go to the repo and search the issues and read comments with solutions or workarounds.
But aside from that i am in agreeance with you that these bots will struggle to provide new, non regurgitated answers and could potentially cause more harm than good
None of the things you are describing happen to me, especially if you do basic trust+verify which you should be doing for Google anyways.
And, of course, you wouldn't know that your mind is being poisoned with hallucinate half-truths. Maybe you can pick some out because of prior knowledge, but what about the ones you can't? What about the little things you learn that you don't deem important enough to verify, but then remember later without remembering that they snuck in through an untrusted source? That's precisely the danger - you can't accurately tell truth from fiction, and the stuff you already know isn't the stuff you're asking about (otherwise you wouldn't be asking)
The difference is that the models are completely different? I don't really find that GPT-4 hallucinates all that frequently (only in very nitty gritty details rarely).
> And, of course, you wouldn't know that your mind is being poisoned with hallucinate half-truths
Okay, so it appears you have some non-falsifiable theory of mind that somehow renders Google better because my mind is being poisoned. Not sure what sort of appeal to objectivity I could use to demonstrate otherwise.
> the stuff you already know isn't the stuff you're asking about (otherwise you wouldn't be asking)
True for Google - less true for GPT4, who I ask to give me practice problems and worked solutions of various things I already know about to practice.