Microsoft's AI shopping announcement contains hallucinations in the demo

Microsoft's AI shopping announcement contains hallucinations in the demo(perfectrec.com)

90 points by craigts 2 years ago | 106 comments

cryptozeus 2 years ago |

Is it just me or does everyone trust AI opinions less and less ? Every time I ask it to find top 5 of something, I go and double check myself and almost always find it to be wrong. For example try searching for top 5 restaurants around me in bard. Some of them dont even exist lol and some are just random if you cross verify with actual popularity from yelp etc.

cubefox 2 years ago | |

Using language models for location or time based things is not recommended, as this usually requires non-textual data. Better to use them for general knowledge questions, programming help, translation, or writing. Asking them to do any complex calculations (especially when they also require non-text raw data, like inflation in a given time period) is also futile.

CSSer 2 years ago | | |

> general knowledge questions, programming help, translation, or writing.

They get all of these wrong too. It's like some AI-specific variant of the Gell-man amnesia effect. It's usually right in the first sentence, but if you really know the answer, it's often either very debatable or completely wrong by the halfway mark of the paragraph. Meanwhile, the associated brand authority is problematic.

sorokod 2 years ago | | |

I get different answers every time to "what is the third element in the periodic table" from llama2.

I'll hold off actually using them for now.

dkjaudyeqooe 2 years ago | |

It's just reality sinking in.

rvz 2 years ago | |

Well it doesn’t surprise me since I have been saying this for a while that these LLMs hallucinate nonsense to the point where you end up triple checking whatever it outputs.

LLMs thrive in applications that involve creativity and non-serious applications mostly around fantasy or creative writing. Anyone using them seriously outside of summarization for high risk use cases is going to be very disappointed.

nickpeterson 2 years ago | | |

Perhaps the outcome is we get better at actually checking things, not a terrible result.

geoduck14 2 years ago | | |

I recommend LLM users leverage the RAG technique

rblatz 2 years ago | |

I'm glad that expectations are shifting. At the extremes, it's either a fancy parlor trick or a hyper-intelligent god. A lot of the original hype has skewed much closer to the hyper-intelligent god side of the spectrum. It's definitely not a fancy parlor trick, but it's likely closer to that than the other side it's being hyped as.

hotpotamus 2 years ago | |

I think the most amusing comment I've read here in the last few weeks called it "demented Clippy".

TheCaptain4815 2 years ago | |

My trust factor for online opinion is ranked:

1) Online forums (adding 'reddit' or 'hacker news' to a search query) 2) GPT4 3) Google search

phyzome 2 years ago |

There is information, here, in the observations that all these "AI" demos contain blatant inaccuracies, with apparently no fact-checking having taken place. It's clear that these companies (Microsoft, Google, OpenAI) do not care about accuracy, correctness, or the truth. It is not part of their business model.

There is no respect for your time, your safety, your reputation. Your role as a customer is to be conned into using the products for long enough that a return on investment can be made; the companies will pivot to a new product as soon as the untrustworthiness of the old one becomes common knowledge.

Short-term thinking. Desperation.

imchillyb 2 years ago |

A hallucination is an unexpected emergence.

The 'making up' facts, because it cannot determine a fact from fiction, is entirely expected behavior.

There is no 'hallucination' as the behavior is anticipated, expected, and entirely within normal operations processes.

The bullshit comes from there being no model of trust these AIs subscribe to. I'd love-love-love to see these AI producers be held to some responsibility to verification of truth and ethics.

These companies/universities/groups allowing their applications to bold-face-lie (misrepresent data with authority) to citizens should be top-priority to bash-in-the-face by legislators around the world.

pmontra 2 years ago |

Bing works for Microsoft and basically that's an ad. Wouldn't any human paid by Microsoft say in an ad that Surface Headphones 2 are the best ANC headphones?

aeirjtaweraew 2 years ago |

Pretty soon some LLM owner is going to use the argument "Everyone is allowed to have their own opinions, and LLMs are too, their responses don't have to line up with someone else's preferences."

jarofghosts 2 years ago | |

Alternative Intelligence

siva7 2 years ago |

Opinion pieces like shopping recommendations are quite hard for current LLMs. Either it is a hard fact - or pure creative work - that's where AI shines. Anything between and things get tricky

2bitencryption 2 years ago | |

This is one of those areas where the poor quality of the data influences the output, I think.

There are so many garbage, lazily written product reviews, by websites that only exist to get people to click affiliate links. These sites only have one goal, which is to get you to click an affiliate link and make a purchase. So it is not in their best interest to say "You shouldn't buy this."

Rather, they make a list of "top X Foobars", they start with a really expensive one, then they follow with a more reasonably-priced one, and give it a very positive review. It leads to clicks and purchases.

Given this, it's not surprising to me that even the best LLMs carry pieces of this with them. Ask it to predict text describing some tech product on a sales page, and of course parts of that low-quality data will bleed through.

cubefox 2 years ago | | |

There is an argument to be made for automatically downweighting (be it training epochs or pagerank rating) anything with affiliate links. But I guess it would be trivial to hide them behind a redirect.

That being said, I recently asked the Bing chatbot about the difference between two similar sounding printer models, and it gave a good explanation which I previously couldn't quickly find via Google. In case of Bing it is sometimes not completely clear to which degree its answer depends on the Web search, if it performed one, and to which degree it is just answering from its background knowledge (which could be prone to hallucination, but is less "gullible", so to speak). It provides sources, but not everything it says is necessarily present in the source. I'm actually surprised how quickly Bing is able to search (load and read) multiple websites, given that the loading times are not always trivial. It turns out they are much faster at reading than at typing. Indeed, each forward pass reads the entire context window, so once for every generated token!

nowooski 2 years ago | | |

For sure. The garbage in, garbage out problem is quite real for ecommerce applications.

sporadicallyjoe 2 years ago |

Is anyone shipping AI products that DO NOT contain hallucinations? I thought that was pretty much a given.

thewataccount 2 years ago | |

Well there isn't a human that never "hallucinates" in meaning we use for LLMs aka gives "incorrect answers" confidently.

Human's brains use lots of heuristics - we don't "think step by step" through everything - instead we rapidly construct an answer for almost everything.

What we say is "hallucinations" for AI in humans is "misspeaking, misremembering anything, off by 1 math/counting, missidentifying someone, using the wrong variable/method when programming, etc."

jarofghosts 2 years ago | |

Hallucinating is roughly how they work, we just label it as such when it's something obviously weird

thewataccount 2 years ago | | |

This is something I'm not sure people understand.

LLM's only make a "best guess" for each next token. That's it. When it's wrong we call it a "hallucination" but really the entire thing was a "hallucination" to begin with.

This is also analogous to humans - who also "hallucinate" incorrect answers, usually "hallucinate" incorrect answers less when they "Think through this step by step before giving your answer", etc.

yonatron 2 years ago |

Yeah. These "lies" are just artifacts of the way that LLMs work. They're meant to predict likely text given a prompt. And they do. If tasked with "write some marketing or a buying guide for product X", they will simulate likely marketing blurbs, nothing yo do with truth, that's not their wheelhouse. Predictive is a very different function, algorithm and problem-set than something like "accurately summarize existing reviews". This is a feature, not a bug. If you use something off label, you'll get off label results. MSFT should know better.

predictabl3 2 years ago |

I'm sorry but watching people talk about the vast majority of the AI landscape is like watching people talk about FSD. Have fun on the hype treadmill.

fizwhiz 2 years ago |

Why hasn't their stock plummeted like Google's?

barbariangrunge 2 years ago |

Stop calling them hallucinations. If we're going to anthropomorphize AIs, let's just call it bullshitting and lies. If we're not going to anthropomorphize AIs, then we need a different term