Google brings Stack Overflow's knowledge base to Gemini for Google cloud

Google brings Stack Overflow's knowledge base to Gemini for Google cloud(techcrunch.com)

53 points by onatm 2 years ago | 37 comments

benpopper1 2 years ago |

One comment to add here - regardless of where you stand on this particular LLM provider:

Do we want knowledge communities like Stack Overflow or Reddit to continue to exist? Should big AI providers that train on their data share some of the value back to the community? Is there an ethical way for web communities to license data to AI providers?

I hope the answer is yes and that there is a path to a productive partnership, one that allows public communities where knowledge is shared freely to thrive, while also bringing more grounded and vetted content to AI systems that are often closed and require a subscription to access.

ForHackernews 2 years ago | |

> Should big AI providers that train on their data share some of the value back to the community?

How would they do that? So far, the LLMs can't be trusted to produce accurate answers. The AI companies can pay money to the data sources, but they can't really offer back anything useful (yet, imho).

Crye 2 years ago | | |

For free integrate back into stack overflow. there are tons of questions that never get answered. This also provides a public forum for that response to be corrected and provided feedback. symbiosis.

SamuelAdams 2 years ago | |

Reddit literally licenses its data to AI training [1]. If doing so kills its own product that would be hilarious.

[1]: https://arstechnica.com/ai/2024/02/reddit-has-already-booked...

antman 2 years ago | |

Reddit and stackoverflow have heavily degraded over the years long before ChatGPT existed, we should remember that.

They offered a convenience by burning money and the mismanagement and pre IPO shenanigans certainly are not helping.

They don"t own the content and the communities or the user base that can move from irc, aol, to discord. What did they learn from dead communities of the past? Do they sell traction and convenience that they own or are they claiming they sell content which they don"t own? Users curated the content, and most effective mods have left. The content in large parts of their sites has become stale or degraded long before OpenAI existed. Graveyard communities cannot curate nor pay for server costs.

AI is convenient curation and people are paying for the convenience. AI sites are also losing money per click with server costs that are out of the galaxy compared to the cached html and elastcsearch serving crowd.

We have seen the ridiculousness of the AI sites' attempts to introduce management features, short of putting penguins in the desert for animal diversity. But the great teachers of bad management features were reddit and stackoverflow who also actively killed community developed management modules.

They are failing because they lack basic understanding of the teachings of centuries of civil society and they make up what is right, wrong or politically correct ad hoc based on marketing. Just trying to avoid bad publicity that could scare off potential IPO crowd only introduces community debt and grievance. That is what has been killing them.

Wikipedia has not been crying foul but has been curating the most quality content for AI but on a low cost setup for its size. I just think its better to donate there content and money,

rufus_foreman 2 years ago | |

>> Do we want knowledge communities like Stack Overflow or Reddit to continue to exist?

No, we want better knowledge communities to exist and for Stack Overflow and Reddit to cease to exist.

StimDeck 2 years ago |

My experience of stack overflow is that the question in the title is too often not answered directly. The specific issue is tangentially related to the title and the answer can amount to a typo or a bad assumption.

There are often clues in the comments that are more helpful than the “answer” and often outdated answers have the most votes.

All this to ask, how on earth is something like an LLM expected to reconcile those issues.

johnny_canuck 2 years ago |

I'm curious about the longevity of these sort of collaborations - in the future who is contributing to these knowledge bases? I imagine the communities that surround these places will fade away if new users are unaware of them and content creation begins to halt.

Though I suppose that is a short sighted concern in itself given the way in which we work will begin to evolve quickly as AI becomes more powerful.

In the end, I guess they end up being a positive press story for Google / Open AI / etc.?

polytely 2 years ago | |

It really feels analogous to overfishing or deforestation, where it's very profitable at the moment for whoever is doing the harvesting but then ruins the resource for anyone in the future.

kemotep 2 years ago | |

Yeah does the AI then go and upvote or downvote sources to its responses based on the end user feedback?

BandButcher 2 years ago | | |

I mean we already have that with social media posts getting phantom likes and bot comments to give the appearance of "engagement". You know the same playbook will unfold when the CEOs want the numbers or perceived value to go up.

croes 2 years ago |

Stackoverflow's knowledge?

Isn't it the knowledge of its users?

YesThatTom2 2 years ago | |

Don’t anthropomorphize computers—they hate it.

ChrisArchitect 2 years ago |

Official Stack Overflow post: https://stackoverflow.blog/2024/02/29/defining-socially-resp...

anotherhue 2 years ago |

Do we think it's going to meaningfully enrich the data?

emmanueloga_ 2 years ago | |

This question is likely to be answered with opinions rather than facts and citations. This question has already been asked, and answered. As currently written, the question lacks enough detail or clarity to be answered. Your question is too broad or has multiple parts and needs to be distilled into one. I'm voting to close this question because it has no effort to solve the problem.

Expect a lot of enriched answers like these coming soon from Gemini/bard :-p

BandButcher 2 years ago | | |

How come they get to talk to the customer that way and not me lol

these are literally questions I've given to project managers to help create better requirements but ultimately as a dev you have to come up with "something" regardless and redo the work once the customer complains. Stupid GPTs cutting the line!

artninja1988 2 years ago | |

The only SO data not yet incorporated into these models is that which was recently created since the model has been trained. It appears more like a "licensing" deal to give something back for scraping all "their" data like everyone else

tracerbulletx 2 years ago | | |

I think it's probably also about continuing to have access to it for updating information.

htrp 2 years ago |

So much for deep and thoughtful adoption of AI. We're just gonna go full-speed ahead and damn the consequences.

jgalt212 2 years ago | |

If it doesn't work, then Stack Overflow got some free money. If it does work, new knowledge production / acquisition will suffer. Or will Gemeni pay for answers to questions it does not know. Then some script kiddie using OpenAI will answer then, and then our internet hive mind will take the shape of a Habsburg Jaw.

archsurface 2 years ago |

I don't understand these instances of using ML to return search results. DDG, returns results; ML returns results, possibly with hallucination. Even without the hallucination what's the point? I find the results I need from a search engine. Solution looking for a problem?

AlienRobot 2 years ago | |

I don't know what Google is going to do with it, but Bing is absolutely ruining their search engine with AI. You search for something, AI starts typing out an answer in a typewriter effect. Since AI, I can't trust anymore that the snippets shown at the top are verbatim quotes from a human-written article or something the AI came up with (not considering they could be quoting an AI-generated article!). At the bottom of the search page, where the "next page" would normally be the rightmost button, it's now the "chat" GPT button. I misclick it every time I want to see the next page. I bet that is driving up some metrics and making some engineers really confused about why people keep clicking the chat button and then not chatting.

Overall this all feels so unimaginative. With all the resources these companies have the only solution they can come up with for the search problem is "just throw AI at it." I could come up with that. It's not clever.

nolongerthere 2 years ago | | |

Curious why you ever use bing to begin with?

whimsicalism 2 years ago | |

GPT4 is much faster for searching through results than I am typically and can pull out exactly what I need.

I've basically completely replaced Google in my day-to-day unless i need to look up a specific location of something in the physical world or something that recently happened.

nicetryguy 2 years ago | | |

> I've basically completely replaced Google in my day-to-day (for GPT4)

That's ...not good.

GPTx gets alot of surface topics right but when you delve into gritty specific details it will just start rambling like a straight jacket lunatic with the confidence of a used car salesman. The rubber meets the road when i try to compile code that uses libraries or functions that don't exist or it leads me to hallucinated imaginary github repos. I worry that this use of GPTx would be like getting water from lead pipes: it would seem fine on the day-to-day while my mind is slowly poisoned with nonsense and insanity.

Google has certainly taken a nosedive in result quality for sure the last few years but Kagi has been amazing for me lately.

egberts1 2 years ago |

... kinda already tainted by Google Gemini importing the likes of Reddit, DemocratUnderground or Daily Kos, no?