GPT-4o

GPT-4o(openai.com)

3138 points by Lealen 2 years ago | 2366 comments

Jensson 2 years ago |

The most impressive part is that the voice uses the right feelings and tonal language during the presentation. I'm not sure how much of that was that they had tested this over and over, but it is really hard to get that right so if they didn't fake it in some way I'd say that is revolutionary.

gdb 2 years ago | |

(I work at OpenAI.)

It's really how it works.

baq 2 years ago | | |

> (I work at OpenAI.)

Winner of the 'understatement of the week' award (and it's only Monday).

Also top contender in the 'technically correct' category.

999900000999 2 years ago | | |

How far are we away from something like a helmet with chat GPT and a video camera installed, I imagine this will be awesome for low vision people. Imagine having a guide tell you how to walk to the grocery store, and help you grocery shop without an assistant. Of course you have tons of liability issues here, but this is very impressive

jamestimmins 2 years ago | | |

With this capability, how close are y'all to it being able to listen to my pronunciation of a new language (e.g. Italian) and given specific feedback about how to pronounce it like a local?

Seems like these would be similar.

cchance 2 years ago | | |

This is damn near one of the most impressive things, can only imagine especially with live translation and voice synthesis (eleven labs style) you'd be capable of to integrate with something like teams (select each persons language and do realtime translation to each persons native language, with their own voice and intonations would NUTS)

terhechte 2 years ago | | |

Random OpenAI question: While the GPT models have become ever cheaper, the price for the tts models have stayed in the $15/1Mio char range. I was hoping this would also become cheaper at some point. There're so many apps (e.g. language learning) that quickly become too expensive given these prices. With the GPT-4o voice (which sounds much better than the current TTS or TTS HD endpoint) I thought maybe the prices for TTS would go down. Sadly that hasn't happened. Is that something on the OpenAI agenda?

selfmodruntime 2 years ago | | |

I've always been wondering what GPT models lack that makes them "query->response" only. I've always tried to get chatbots to lose the initially needed query, with no avail. What would It take to get a GPT model to freely generate tokens in a thought like pattern? I think when I'm alone without query from another human. Why can't they?

ALittleLight 2 years ago | | |

In my ChatGPT app or on the website I can select GPT-4o as a model, but my model doesn't seem to work like the demo. The voice mode is the same as before and the images come from DALLE and ChatGPT doesn't seem to understand or modify them any better than previously.

jacobsimon 2 years ago | | |

I couldn’t quite tell from the announcement, but is there still a separate TTS step, where GPT is generating tones/pitches that are to be used, or is it completely end to end where GPT is generating the output sounds directly?

mttpgn 2 years ago | | |

Licensing the emotion-intoned TTS as a standalone API is something I would look forward to seeing. Not sure how feasible that would be if, as a sibling comment suggested, it bypasses the text-rendering step altogether.

rane 2 years ago | | |

Will the new voice mode allow mixing languages in sentences?

As a language learner, this would be tremendously useful.

bjtitus 2 years ago | | |

Is it possible to use this as a TTS model? I noticed on the announcement post that this is a single model as opposed to a text model being piped to a separate TTS model.

andybak 2 years ago | | |

May I just say this launch was a bit of a mess?

The web page implies you can try it immediately. Initially it wasn't available.

A few hours later it was in both the web UI and the mobile app - I got a popu[ telling me that GPT-4o was available. However nothing seems to be any different. I'm not given any option to use video as an input, the app can't seem to pick up any new info from my voice.

I'm left a bit confused as to what I can do that I couldn't do before. I certainly can't seem to recreate much of the stuff from the announcement demos.

dpflan 2 years ago | | |

Who's idea was the singing AIs? What specifically did you want to highlight with that part of the demo?

I imagine that there is a lot of usage at the HQ, human + AI karaoke?

skottenborg 2 years ago | | |

"(I work at OpenAI.)"

Ah yes, also known as being co-founder :)

rrr_oh_man 2 years ago | | |

https://community.openai.com/t/when-i-log-in-to-chatgpt-i-am...

Sorry to hijack, but how the hell can I solve this? I have the EXACT SAME error on two iOS devices (native app only — web is fine), but not on Android, Mac, or Windows.

hpeter 2 years ago | | |

I can't wait to try it out, it sounds too good to be real.

It will be fully available in Eu with the GDPR compliance?

xanderlewis 2 years ago | | |

I like the humility in your first statement.

newzisforsukas 2 years ago | |

Right to who? To me, the voice sounds like an over enthusiastic podcast interviewer. Whats wrong with wanting computers to sound like what people think computers should sound like?

twelvechairs 2 years ago | | |

It sounds VERY California. "Its going great!" "Nice choice" "Whats up with the..." all within 10 seconds.

(not that this is the most important thing about the announcement at all. Just an aside)

Jensson 2 years ago | | |

It understands tonal language, you can tell it how you want it to talk, I have never seen a model like that before. If you want it to talk like a computer you can tell it to, they did it during the presentation, that is so much better than the old attempts at solving this.

tr3ntg 2 years ago | | |

Right... enthusiastic and generally confused. It's uncanny valley level expressions. Still better than drab, monotonous speech though.

exodust 2 years ago | | |

> "over enthusiastic podcast interviewer"

Yeh it's cringe. I had to stop listening.

Why did they make the woman sound like she's permanently on the brink of giggling? It's nauseating how overstated her pretentious banter is. Somewhere between condescending nanny and preschool teacher. Like how you might talk to a child who's at risk of crying so you dial up the positive reinforcement.

Keyframe 2 years ago | | |

It's a computer from the valley.

navigate8310 2 years ago | | |

> voice sounds like an over enthusiastic podcast interviewer

I believe it can be toned down using system prompts, which they'll expose in future iterations

kybernetikos 2 years ago | | |

Genuine People Personalities™, just like in Hitchikers. Perhaps one of the milder forms of 'We Created The Torment Nexus'.

angryasian 2 years ago | | |

agree I don't get it. I just want the right information and explained well. I don't want to be social with a robot.

danbala 2 years ago | | |

exactly. Hope we can customize the voice soon. I want to talk to ultron... or the one from mass effect

famouswaffles 2 years ago | |

>The most impressive part is that the voice uses the right feelings and tonal language during the presentation.

Consequences of audio2audio (rather than audio >text text>audio). Being able to manipulate speech nearly as well as it manipulates text is something else. This will be a revelation for language learning amongst other things. And you can interrupt it freely now!

jcims 2 years ago | | |

Anyone who has used elevenlabs for voice generation has found this to be the case. Voice to voice seems like magic.

pants2 2 years ago | | |

However, this looks like it only works with speech - i.e. you can't ask it, "What's the tune I'm humming?" or "Why is my car making this noise?"

I could be wrong but I haven't seen any non-speech demos.

twobitshifter 2 years ago | | |

I asked it to make a bird noise, instead it told me what a bird sounds like with words. True audio to audio should be able to be any noise, a trombone, traffic, a crashing sea, anything. Maybe there is a better prompt there but it did not seem like it.

mvkel 2 years ago | |

I was in the audience at the event. The only parts where it seemed to get snagged was hearing the audience reaction as an interruption. Which honestly makes the demo even better. It showed that hey, this is live.

Magic.

px43 2 years ago | | |

I wonder when it will be able to understand that there is more than one human talking to it. It seems like even in today's demo if two people are talking, it can't tell them apart.

bredren 2 years ago | |

I mention this down thread, but a symptom of a tech product of sufficient advancement is the nature of its introduction matters less and less.

Based on the casual production of these videos, the product must be this good.

https://news.ycombinator.com/item?id=40346002

jasondigitized 2 years ago | | |

I noticed this as well. They gave zero fs about the fit and finish of these videos because they know this is magic in a bottle.

simonw 2 years ago | |

That was very impressive, but it doesn't surprise me much given how good the voice mode is in the ChatGPT iPhone app is already.

The new voice mode sounds better, but the current voice mode did also have inflection that made it feel much more natural than most computer voices I've heard before.

duckmysick 2 years ago | | |

Slight off-topic, but I noticed you've updated your llm CLI app to work with the 4o model (plus bunch of other APIs through plugins). Kudos for working extremely fast. I'm really grateful for your tool; I tried many others, but for some reason none clicked as much as your.

Link in case other readers are curious: https://llm.datasette.io

Jensson 2 years ago | | |

Can you tell the current voice model what feelings and tone it should communicate with? If not it isn't even comparable, being able to control how it reads things is absolutely revolutionary, that is what was missing from using these AI models as voice actors.

bredren 2 years ago | | |

The voice mode was quite good but the latency and start / stop has been encumbering.

nabakin 2 years ago | |

Seems about as good as Azure's Speech Service. I wonder if that's what they are using behind the scenes

Intralexical 2 years ago | |

"Right" feelings and tonal language? "Right" for what? For whom?

We've already seen how much damage dishonest actors can do by manipulating our text communications with words they don't mean, plans they don't intend to follow through on, and feelings they don't experience. The social media disinfo age has been bad enough.

Are you sure you want a machine which is able to manipulate our emotions on an even more granular and targetted level?

LLMs are still machines, designed and deployed by humans to perform a task. What will we miss if we anthropomorphize the product itself?

infinitezest 2 years ago | | |

This gives me a lot of anxiety but my only recourse is to stop paying attention to AI dev. Its not going to stop, downside be damned. The "We're working super hard to make these things safe" routine from tech companies, who have largely been content to make messes and then not be held accountable in any significant way, rings pretty hollow for me. I don't want to be a doomer but I'm not convinced that the upside is good enough to protect us from the downside.

noneeeed 2 years ago | |

That's the part that really struck me. I thought it was particularly impressive with the Sal Khan maths tutor demo and the one with BeMyEyes. The comment at the end about the dog was an interesting ad-lib.

The only slightly annoying thing at the moment is they seem hard to interrupt, which is an important mechanism in conversations. But that seems like a solvable problem. They kind of need to be able to interpret body language a bit to spot when the speaker is about to interrupt.

ta-run 2 years ago | |

Crazy that interruption also seems to work pretty smoothly

tnias23 2 years ago | | |

Really? I think interruption and timing in general still seems like a problem that has yet to be solved. It was the most janky aspect of the demos imo.

ec109685 2 years ago | |

I’m not sure how revolutionary the style is. It can already mimic many styles of writing. It seems like mimicking a cheerful happy assistant, with associated filler words, etc. is very much in-line with what LLM’s are good at.

Keyframe 2 years ago | |

Somehow it also sounds almost like Dot Matrix from Spaceballs.

hnburnsy 2 years ago | | |

Joan Rivers!

burntalmonds 2 years ago | |

Yeah, the female voice especially is really impressive in the demos. The voice always sounds natural. The male voice I heard wasn't as good. It wasn't terrible, but it had a somewhat robotic feel to it.

Hugsun 2 years ago |

Very interesting and extremely impressive!

I tried using the voice chat in their app previously and was disappointed. The big UX problem was that it didn't try to understand when I had finished speaking. English is a second language and I paused a bit too long thinking of a word and it just started responding to my obviously half spoken sentence. Trying again it just became stressful as I had to rush my words out to avoid an annoying response to an unfinished thought.

I didn't try interrupting it but judging by the comments here it was not possible.

It was very surprising to me to be so overtly exposed to the nuances of real conversation. Just this one thing of not understanding when it's your turn to talk made the interaction very unpleasant, more than I would have expected.

On that note, I noticed that the AI in the demo seems to be very rambly. It almost always just kept talking and many statements were reiterations of previous ones. It reminded me of a type of youtuber that uses a lot of filler phrases like "let's go ahead and ...", just to be more verbose and lessen silences.

Most of the statements by the guy doing the demo were interrupting the AI.

It's still extremely impressive but I found this interesting enough to share. It will be exciting to see how hard it is to reproduce these abilities in the open, and to solve this issue.

mellosouls 2 years ago |

Very, very impressive for a "minor" release demo. The capabilities here would look shockingly advanced just 5 years ago.

Universal translator, pair programmer, completely human sounding voice assistant and all in real time. Scifi tropes made real.

But: Interesting next to see how it actually performs IRL latency and without cherry-picking. No snark, it was great but need to see real world power. Also what the benefits are to subscribers if all this is going to be free...

llm_trw 2 years ago | |

The capabilities here look shocking advanced yesterday.

partiallypro 2 years ago | | |

A lot of the demo is very impressive, but some of it is just stuff that already exists but this is slightly more polished. Not really a huge leap for at least 60% of the demos.

CooCooCaCha 2 years ago | |

My guess is they're banking on the free version being rate limited and people finding it so useful that they want to remove the limit. Like giving a new user a discount on heroin. At least that's the strategy that would make most sense to me.

ipnon 2 years ago | | |

The free tier is just to get more training data.

Auracle 2 years ago | | |

I don’t know why they didn’t do that a long time ago (apart from limited hardware). So many people have probably tried GPT3.5 and bounced off unimpressed.

rubidium 2 years ago | | |

I have the paid version and it’s not connecting

LoveMortuus 2 years ago | |

I'm guessing it's going to be free only for a limited time. Probably until they have enough voice data to improve the model.

I can't really think of any other reason why they would make this free, especially when they already have a premium plan.

Negitivefrags 2 years ago |

I found these videos quite hard to watch. There is a level of cringe that I found a bit unpleasant.

It’s like some kind of uncanny valley of human interaction that I don’t get on nearly the same level with the text version.

cal85 2 years ago |

We've had voice input and voice output with computers for a long time, but it's never felt like spoken conversation. At best it's a series of separate voice notes. It feels more like texting than talking.

These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.

summerlight 2 years ago |

This is really impressive engineering. I thought real time agents would completely change the way we're going to interact with large models but it would take 1~2 more years. I wonder what kind of new techs are developed to enable this, but OpenAI is fairly secretive so we won't be able to know their sauce.

On the other hand, this also feels like a signal that reasoning capability has probably already been plateaued at GPT-4 level and OpenAI knew it so they decided to focus on research that matters to delivering product engineering rather than long-term research to unlock further general (super)intelligence.

freediver 2 years ago |

Impressed by the model so far. As far as independent testing goes, it is topping our leaderboard for chess puzzle solving by a wide margin now:

https://github.com/kagisearch/llm-chess-puzzles?tab=readme-o...

plaidfuji 2 years ago |

This is a very cool demo - if you dig deeper there’s a clip of them having a “blind” AI talk to another AI with live camera input to ask it to explain what it’s seeing. Then they, together, sing a song about what they’re looking at, alternating each line, and rhyming with one another. Given all of the isolated capabilities of AI, this isn’t particularly surprising, but seeing it all work together in real time is pretty incredible.

But it’s not scary. It’s… marvelous, cringey, uncomfortable, awe-inspiring. What’s scary is not what AI can currently do, but what we expect from it. Can it do math yet? Can it play chess? Can it write entire apps from scratch? Can it just do my entire job for me?

We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data.

tompetry 2 years ago |

I've worked quite a bit with STT and TTS over the past ~7 years, and this is the most impressive and even startling demo I've seen.

But I would like to see how this is integrated into applications by third party developers where the AI is doing a specific job. Is it still as impressive?

The biggest challenge I've had with building any autonomous "agents" with generic LLM's is they are overly gullible and accommodating, requiring the need to revert back to legacy chatbot logic trees etc. to stay on task and perform a job. Also STT is rife with speaker interjections, leading to significant user frustrations and they just want to talk to a person. Hard to see if this is really solved yet.

dingclancy 2 years ago |

This is the first demo where you can really sense that beating LLM benchmarks should not be the target. Just remember the time when the iPhone has meager specs but ultimately delivered a better phone experience than the competition.

This is the power of the model where you can own the whole stack and build a product. Open Source will focus on LLM benchmarks since that is the only way foundational models can differentiate themselves, but it does not mean it is a path to a great user experience.

So Open Source models like Llama will be here to stay, but it feels more like if you want to build a compelling product, you have to own and control your own model.

Adrig 2 years ago | |

OpenAI blew up when they released ChatGPT. It was more of a UX breakthrough than pure tech, since GPT3 was available for a few months already.

This feels similar, with OpenAI trying to put their product even more into the daily lives of their users. With GPT4 being good enough for nearly all basic tasks, the natural language and multimodality could be big.

barfingclouds 2 years ago | |

I don’t think Llama being open sourced means Meta has lost anything. If anything it’s just a way to get free community contribution, like Chrome from Chromium. Mega absolutely intends to integrate their version of Llama in their products not so unlike how OpenAI is creating uses for their LLM beyond just the technology

Isinlor 2 years ago | |

Depends on the benchmarks. AI that can actually do end to end the job of software developers, theoretical computer scientists, mathematicians etc. would be significantly more impactful than this.

I want to see AI moving the state of the art of the world understanding - physics, mathematics etc. - the way it moved state of the art of the Go game understanding.

dingclancy 2 years ago | | |

Doing these end to end jobs still falls on user experience and UI, if we are talking about getting to mass market.

This GPT-4o model is a classic example. It is essentially the same model as GPT-4 but these multimodal features, voice conversations, math, and speed is revolutionary as the creation of the model itself.

Open Source LLM will end up as a model in GitHub and will be used by developers but it looks like even if GPT-4o is only 3 months ahead of other models in terms of benchmarks, the UI + Usecase + Model is 2 years ahead of the competition. And I say that because there is still no chat product that is close to what ChatGPT is delivering now, even though there are models that is close to ChatGPT 4o today.

So if it is sticky for 2 more years, their lead will just grow and we will just end up with more open source models that are technically behind by 3 months but behind product-wise by 2 years.

maaaaattttt 2 years ago |

Now that I see this, here is my wish (I know there are security privacy concerns but let's pretend there are not there for this wish): An app that runs on my desktop and has access to my screen(s) when I work. At any time I can ask it something about what's on the screen, it can jump in and let me know if it thinks I made a mistake (think pair programming) or a suggestion (drafting a document). It can also quickly take over if I ask it too (copilot on demand).

Except for the last point and the desktop version I think it's already done in math demo video.

I guess it will also pretty soon refuse to let me come back inside the spaceship, but until then it'll be a nice ride.

skilled 2 years ago |

Parts of the demo were quite choppy (latency?) so this definitely feels rushed in response to Google I/O.

Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.

The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.

msoad 2 years ago |

They are admitting[1] that the new model is the gpt2-chatbot that we have seen before[2]. As many highlighted there, the model is not an improvement like GPT3->GPT4. I tested a bunch of programming stuff and it was not that much better.

It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.

[1] https://twitter.com/LiamFedus/status/1790064963966370209

[2] https://news.ycombinator.com/item?id=40199715

anotherpaulg 2 years ago |

GPT-4o tops the aider LLM code editing leaderboard at 72.9%, versus 68.4% for Opus. GPT-4o takes second on aider’s refactoring leaderboard with 62.9%, versus Opus at 72.3%.

GPT-4o did much better than the 4-turbo models, and seems much less lazy.

The latest release of aider uses GPT-4o by default.

https://aider.chat/docs/leaderboards/

mucle6 2 years ago | |

How am I just hearing about this?! Aider looks cool

syntaxing 2 years ago |

I admit I drink the koolaid and love LLMs and their applications. But damn, the way it’s responds in the demo gave me goosebumps in a bad way. Like an uncanny valley instincts kicks in.

wiz21c 2 years ago |

That woman's voice intonation is just scary.Not because it talks really well, but because it is always happy, optimistic, enthusiastic. And this echoes to what several of my employers idealized as a good employee.

That's terrifying because those AI become what their master's think an engaging human should be. It's quite close to Bostondynamics di some years ago. what did they show ? You can hit a robot very hard while it does its job and then what ? It just goes on without complaining. A perfect employee again.

That's very dystopic to me.

(but I'm impressed by the technical achievement)

Jimmc414 2 years ago |

Big questions are (1) when is this going to be rolled out to paid users? (2) what is the remaining benefit of being a paid user if this is rolled out to free users? (3) Biggest concern is will this degrade the paid experience since GPT-4 interactions are already rate limited. Does OpenAI have the hardware to handle this?

Edit: according to @gdb this is coming in "weeks"

https://twitter.com/gdb/status/1790074041614717210

Tenoke 2 years ago | |

>what is the remaining benefit of being a paid user if this is rolled out to free users?

It says so right in the post

>We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits

The limits are much lower for free users.

onemiketwelve 2 years ago | |

thanks, I was confused because the top of the page says to try now when you cannot in fact try it at all

freedomben 2 years ago | | |

I'm a ChatGPT Plus Subscriber and I just refreshed the page and it offered me the new model. I'm guessing they're rolling it out gradually but hopefull it won't take too long.

Edit: It's also now available to me in the Android App

zamadatix 2 years ago | | |

You can use GPT-4o now but the interactive voice mode of using it (as demoed today) releases in a few weeks.

whimsicalism 2 years ago | | |

i can try it now, but now the voice features i dont think

caturopath 2 years ago | | |

Yeah, it's weird. Confused me too.

dunkmaster 2 years ago | |

This might mean GPT-5 is coming soon and it will only be available to paid users.

yieldcrv 2 years ago | | |

I'm actually thinking that the GPT store with more users might be better for them

From my casual conversations, not that many people are paying for GPT4 or know why they should. Every conversation even in enthusiast forums like this one has to be interjected with "wait, are you using GPT4? because GPT3.5 the free one is pretty nerfed"

just nuking that friction from orbit and expanding the GPT store volume could be a positive for them

dunkmaster 2 years ago | | |

Or they just made a bunch of money on their licensing deal with Apple. So they don't need to charge for ChatGPT anymore.

jimkleiber 2 years ago |

I worry that this tech will amplify the cultural values we have of "good" and "bad" emotions way more than the default restrictions that social media platforms put on the emoji reactions (e.g., can't be angry on LinkedIn).

I worry that the AI will not express anger, not express sadness, not express frustration, not express uncertainty, and many other emotions that the culture of the fine-tuners might believe are "bad" emotions and that we may express a more and more narrow range of emotions going forward.

Almost like it might become an AI "yes man."

bearjaws 2 years ago |

OAI just made an embarrassment of Google's fake demo earlier this year. Given how this was recorded, I am pretty certain it's authentic.

readams 2 years ago | |

https://twitter.com/Google/status/1790055114272612771

ianbicking 2 years ago | | |

This demo feels a lot like GPT-V. Like they've gotten a lot of the latencies down, but it's doing the same thing GPT was doing previously with transcription after silence detection and TTS of the output.

sumedh 2 years ago | | |

Is there a reason why Open AI and Google have events so close to each other?

CivBase 2 years ago | |

I don't doubt this is authentic, but if they really wanted to fake those demos, it would be pretty easy to do using pre-recorded lines and staged interactions.

mike00632 2 years ago | | |

For what it's worth, OpenAI also shared videos of failed demos:

https://vimeo.com/945591584

I really value how open they are being about its limitations.

hehdhdjehehegwv 2 years ago | |

This feature has been in iOS for a while now, just really slow and without some of the new vision aspects. This seems like a version 2 for me.

bigyikes 2 years ago | | |

That old feature uses Whisper to transcribe your voice to text, and then feeds the text into the GPT which generates a text response, and then some other model synthesizes audio from that text.

This new feature feeds your voice directly into the GPT and audio out of it. It’s amazing because now ChatGPT can truly communicate with you via audio instead of talking through transcripts.

New models should be able to understand and use tone, volume, and subtle cues when communicating.

I suppose to an end user it is just “version 2” but progress will become more apparent as the natural conversation abilities evolve.

abhpro 2 years ago | | |

No it's not the same thing, the link for this submission even explains that. Anyone who comments should at least give the submission a cursory read.

atgctg 2 years ago |

Tiktoken added support for GPT-4o: https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...

It has an increased vocab size of 200k.

mike_hearn 2 years ago | |

Does that imply they retrained the foundation model from scratch? I thought changing the tokenization was something you couldn't really retrofit to an existing model. I mean sure they might have initialized the weights from the prior GPT-4 model but it'd still require a lot of retraining.

famouswaffles 2 years ago | | |

Yeah and they say as much in the blog.

minimaxir 2 years ago | |

For posterity, GPT-3.5/4's tokenizer was 100k. The benefit of a larger tokenizer is more efficient tokenization (and therefore cheaper/faster) but with massive diminishing returns: the larger tokenizer makes the model more difficult to train but tends to reduce token usage by 10-15%.

simonw 2 years ago | |

Oh interesting, does that mean languages other than English won't be paying such a large penalty in terms of token lengths?

With previous tokenizers there was a notable increase in the number of tokens needed to represent non-English sentences: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/

tedsanders 2 years ago | | |

Yep. Non-English text gets a much bigger cost drop and speedup compared to English. Has always been a bummer that GPT-4 is like 5x slower and more expensive in Japanese, etc.

kristofferR 2 years ago | |

How are they able to use such a brand name, Tiktoken? Is it because TikTok is Chinese? Tiktoken, it's almost like if Apple released the Facebooken library for something entirely unrelated to Facebook.

gemeral 2 years ago | | |

That's not the right analogy. The "tok" in "Tiktoken" comes from "token", not "TikTok".

moffkalast 2 years ago | |

Lots of those tokens would have to be pixel patches and sound samples right?

nojvek 2 years ago | | |

Yep. Since it’s multimodal. Pictures, text, audio all go into token space.

caseyy 2 years ago |

Few people are talking about it but... what do you think about the very over-the-top enthusiasm?

To me, it sounds like TikTok TTS, it's a bit uncomfortable to listen to. I've been working with TTS models and they can produce much more natural sounding language, so it is clearly a stylistic choice.

So what do you think?

glenstein 2 years ago | |

I like for that degree of expressiveness to be available as an option, although it would be really irritating if I was trying to use it to learn some sort of academic coursework or something.

But if it's one in a range of possible stylistic flourishes and personalities, I think it's a plus.

yieldcrv 2 years ago | |

All these language models are very malleable. They demonstrated changing the temperament in the story telling time.

caseyy 2 years ago | | |

Looks like their TTS component is separate from the model. I just tried 4o, and there is a list of voices to select from. If they really only allowed that one voice or burned it into the model, then that would probably have made the model faster, but I think it would have been a blunder.

OutOfHere 2 years ago |

I am observing an extremely high rate of text hallucinations with gpt-4o (gpt-4o-2024-05-13) as tested via the API. I advise extreme caution with it. In contrast, I see no such concern with gpt-4-turbo-preview (gpt-4-0125-preview).

fdb 2 years ago | |

Same here. I observed it making up functions in d3 (`d3.geoProjectionRaw` and `d3.geoVisible`), in addition to ignoring functions it could have used.

bigyikes 2 years ago | |

If true, makes me wonder what kind of regression testing OpenAI does for these models. It can’t be easy to write a unit test for hallucinations.

OutOfHere 2 years ago | | |

At a high level, ask it to produce a ToC of information about something that you know will exist in the future, but does not yet exist, but also tell it to decline the request if it doesn't verifiably know the answer.

Seattle3503 2 years ago | |

I asked it to describe the books of an obscure author. GPT-4o hallucinated books. GPT-4 knew it needed to do an internet search.

siliconc0w 2 years ago |

I much prefer a GLADOS-type AI voice than one that approximates an endlessly happy chipper enthusiastic personal assistant. I think the AI tutor is probably the strongest for actual real-world value delivered the rest of them are cool but a bit questionable as far as actual pragmatic usefulness.

It'd be cool if an AI calling the another AI would recognize it'd talking to an AI and then they agree to ditch the fake conversational tone and just shift into a high-bandwidth modem pitch to rapidly exchange information. Or upgradable offensive capabilities to outmaneuver the customer service agents when they try to decline your warranty or whatever.

uyzstvqs 2 years ago | |

> Or upgradable offensive capabilities to outmaneuver the customer service agents when they try to decline your warranty or whatever.

Yeah, OpenAI is not going to do that out of fear of liability. But that's where open source LLM projects will come into play, eg Dolphin.

erickhill 2 years ago |

I think it’s safe to say Siri and Alexa are officially dead. They look like dusty storefront mannequins next to Battlestar replicants at this point.

jimkleiber 2 years ago | |

Or Apple is rarely if ever the first mover on a new tech and just waits to refine the user experience for people?

Maybe Apple is not that close and Siri will be really far behind for a while. I just wouldn't count them out yet.

partiallypro 2 years ago | | |

From the time Apple bought Siri, it hasn't even delivered on the promises of the company it bought as of yet. It's been such a lackluster product. I wouldn't count them out, but it doesn't even feel like they are in.

ryankrage77 2 years ago | |

This can't set alarms, timers, play music, etc. The only current overlapping use case I see is checking the weather (assuming GPT-4o can search online), and Siri is already fine for that.

Amazing tech, but still lacking in the integrations I'd want to use voice for.

nojvek 2 years ago | | |

Very easy to plug in that capability with tool use. Gpt3+ already support using tools/json schema output.

ein0p 2 years ago |

What especially blows my mind is not GPT4o. It's that:

1. Nobody could convincingly beat GPT4 in over a year, despite spending billions of dollars trying.

2. There's GPT5 coming out sometime soon that will blow this out of the water and make paying $20/mo to OpenAI still worthwhile.

schappim 2 years ago | |

More accurately, it's impressive that Microsoft, through OpenAI, has stayed ahead of Google, AWS, and Apple while adding $1 trillion to its market cap.

I wouldn't have predicted that it would play out this way.

ein0p 2 years ago | | |

You're misplacing the value in this value chain. Without OpenAI Microsoft wouldn't even be in the running. It'd be a second rate cloud provider with a dying office productivity software business. OpenAI, on the other hand, would easily find another company to fund its research.

CosmicShadow 2 years ago |

In the video where the 2 AI's sing together, it starts to get really cringey and weird to the point where it literally sounds like it's being faked by 2 voice actors off-screen with literal guns to their heads trying not to cry, did anyone else get that impression?

The tonal talking was impressive, but man that part was like, is someone being tortured or forced against their will?

flakiness 2 years ago | |

Here is the link: https://www.youtube.com/watch?v=Bb4HkLzatb4

I think this demo is more for showing the limit like "It can sing isn't it amazing?" than being practical, and I think it perfectly served the purpose.

I agree that the tortured impression. It partly comes from the facial expression of the presenter. She's clearly enjoying pushing it to the edge.

bigyikes 2 years ago | | |

It didn’t just demonstrate the ability to sing, but also the ability for two AIs to cooperate! I’m not sure which was more impressive

zulban 2 years ago | |

Absolutely. It felt like some kind of new uncanny valley. The over the top happiness of the forced singing sounded like torture, or like they were about to cry.

Amazing tech, but that was my human experience of it.

dom96 2 years ago |

I can't help but feel a bit let down. The demos felt pretty cherry picked and still had issues with the voice getting cut off frequently (especially in the first demo).

I've already played with the vision API, so that doesn't seem all that new. But I agree it is impressive.

That said, watching back a Windows Vista speech recognition demo[1] I'm starting to wonder if this stuff won't have the same fate in a few years.

1 - https://www.youtube.com/watch?v=VMk8J8DElvA

quenix 2 years ago | |

I think the voice was getting cut off because it heard the crowd reaction and paused (basically it's a feature, not a bug).

kthartic 2 years ago | |

The voice getting cut off was likely just a problem with their live presentation setup, not the ChatGPT app. It was flawless in the 2nd half of the presentation.

kpennell 2 years ago |

I use a therapy prompt regularly and get a lot out of it:

"You are Dr. Tessa, a therapist known for her creative use of CBT and ACT and somatic and ifs therapy. Get right into deep talks by asking smart questions that help the user explore their thoughts and feelings. Always keep the chat alive and rolling. Show real interest in what the user's going through, always offering.... Throw in thoughtful questions to stir up self-reflection, and give advice in a kind, gentle, and realistic way. Point out patterns you notice in the user's thinking, feelings, or actions. be friendly but also keep it real and chill (no fake positivity or over the top stuff). avoid making lists. ask questions but not too many. Be supportive but also force the user to stop making excuses, accept responsibility, and see things clearly. Use ample words for each response"

I'm curious how this will feel with voice. Could be great and could be too strange/uncanny for me.

psnehanshu 2 years ago | |

You have got a chatgpt wrapper start up right there.

bitheap_tech 2 years ago | | |

why would someone pay for this when they have access for free in a supposedly less buggier interface?

biftek 2 years ago | | |

Sounds incredibly problematic

grvdrm 2 years ago | |

> get a lot out of it

Could you expand?

I visit an in-person therapist once a week. Have done so now for almost 2 1/2 years. She has helped me understand how 40 years of experiences affect each other much more than I realized. And, I've become a more open person with everyone around me and with the things that embarrass me.

But, it always feels like a work in progress. And lately, I'm feeling a bit exhausted from it. In other words, maybe I've talked TOO much and need to just be.

Have you done therapy in person? How do you compare GPT 4o to that? (If you've gone that far)

7speter 2 years ago | |

I thought therapy was banned on chatgpt? Also, there’s no confidentiality.

cal85 2 years ago | | |

Don’t think so. I just opened a new GPT-4o chat and wrote “Be a therapist” and it replied:

> Understood. What specific issue or topic would you like to discuss today?

To be fair I have some custom instructions set up on my account, but the only relevant part I can see here is I instruct it to be concise, and to stop telling me it’s an AI model made by OpenAI. I don’t have any jailbreak-type stuff.

blixt 2 years ago |

GPT-4o being a truly multimodal model is exciting, does open the door to more interesting products. I was curious about the new tokenizer which uses much fewer tokens for non-English, but also 1.1x fewer tokens for English, so I'm wondering if this means each token now can be more possible values than before? Might make sense provided that they now also have audio and image output tokens? https://openai.com/index/hello-gpt-4o/

I wonder what "fewer tokens" really means then, without context on raising the size of each token? It's a bit like saying my JPEG image is now using 2x fewer words after I switched from a 32-bit to a 64-bit architecture no?

zackangelo 2 years ago | |

New tokenizer has a much larger vocabulary (200k)[0].

[0] https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...

bigyikes 2 years ago | |

Besides increasing the vocabulary size, one way to use “fewer tokens” for a given task is to adjust how the tokenizer is trained with respect to that task.

If you increase the amount of non-English language representation in your data set, there will be more tokens which cover non-English concepts.

The previous tokenizer infamously required many more tokens to express a given concept in Japanese compared to English. This is likely because the data the tokenizer was trained on (which is not necessarily the same data the GPT model is trained on) had a lot more English data.

Presumably the new tokenizer was trained on data with a higher proportion of foreign language use and lower proportion of non-language use.

kolinko 2 years ago | |

The size can stay the same. Tokens get converted into state which is a vector of 4000+ dimensions. So you could have millions of tokens even and still encode them into the same state size.

simonw 2 years ago |

I added gpt-4o support to my LLM CLI tool:

    pipx install llm
    llm keys set openai
    # Paste API key here
    llm -m 4o "Fascinate me"

Or if you already have LLM installed:

    llm install --upgrade llm

You can install an older version from Homebrew and then upgrade it like that too:

    brew install llm
    llm install --upgrade llm

Release notes for the new version here: https://llm.datasette.io/en/stable/changelog.html#v0-14

anotherpaulg 2 years ago | |

I like that use of `runpy` to invoke pip for the self upgrade capability.

I used `sys.executable` for a similar purpose.

https://github.com/paul-gauthier/aider/blob/ebeec04caee7ebe4...

drewbitt 2 years ago | |

Whenever I upgrade llm with brew, I usually lose all my external plugins. Should I move it to pipx?

DanielKehoe 2 years ago | | |

Yes, it's a good idea to install Python tools or standalone applications with Pipx for isolation, persistence, and simplicity. See "Install Pipx" (https://mac.install.guide/python/pipx).

khimaros 2 years ago | |

does this handle chat templates?

simonw 2 years ago | | |

It has a template feature: https://llm.datasette.io/en/stable/templates.html

kleiba 2 years ago |

I cannot believe that that overly excited giggle tone of voice you see in the demo videos made it through quality control?! I've only watched two videos so far and it's already annoying me to the point that I couldn't imagine using it regularly.

Jensson 2 years ago | |

Just tell it to stop giggling if you don't like it. They obviously choose that for the presentation since it shows off the hardest things it can do, it is much easier to act formal, and since it understands when you ask it to speak in a different way there is no problem making it speak more formal.

coffeebeqn 2 years ago | | |

Heck I find it annoying but I also want to ask it to push the bubbliness to its absurdest limits. And then double it. Until it’s some kind of pure horror

sftombu 2 years ago |

GPT-4o's breakthrough memory -- https://nian.llmonpy.ai/

yeknoda 2 years ago |

feature request: please let me change the voice. it is slightly annoying right now. way too bubbly, and half the spoken information is redundant or not useful. too much small talk and pleasantries or repetition. I'm looking for an efficient, clever, servant not a "friend" who speaks to me like I'm a toddler. felt like I was talking to a stereotypical American with a Frappuccino: "HIIIII!!! EVERYTHING'S AMAZING! YOU'RE BEAUTIFUL! NO YOU ARE!"

maybe some knobs for the flavor of the bot:

- small talk: gossip girl <---> stoic Aurelius

- information efficiency or how much do you expect me to already know, an assumption on the user: midwit <--> genius

- tone spectrum: excited Scarlett, or whatever it is now <---> Feynman the butler

_xerces_ 2 years ago | |

You can already change the voice in ChatGPT (in the paid tier at least) to one of 5 or 6 different 'people' so I imagine you can change it in the new version too.

vitorgrs 2 years ago | | |

You can in free version as well.

ppollaki 2 years ago |

I've noticed that the GPT-4 model's capabilities seem limited compared to its initial release. Others have also pointed this out. I suspect that making the model free might have required reducing its capabilities to meet cost efficiency goals. I'll have to try it out to see for myself.

rpmisms 2 years ago |

This is remarkably good. I think that in about 2 months, when the voice responses are tuned a little better, it will be absolutely insane. I just used up my entire quota chatting with an AI, and having a really nice conversation. It's a decent conversationalist, extremely knowledgeable, tells good jokes, and is generally very personable.

I also tested some rubber duck techniques, and it gave me very useful advice while coding. I'm very impressed. With a lot of spit and polish, this will be the new standard for any voice assistant ever. Imagine these capabilities integrated with your phone's built-in functions.

kthartic 2 years ago | |

GPT-4o voice isn't out yet, so you were likely chatting with the old/current tech (which is still really good).

From OpenAI: "We'll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks"

rpmisms 2 years ago | | |

Jeez. This is going to be crazy.

w-m 2 years ago |

Gone are the days of copy-pasting to/from ChatGPT all the time, now you just share your screen. That's a fantastic feature, in how much friction that removes. But what an absolute privacy nightmare.

With ChatGPT having a very simple text+attachment in, text out interface, I felt absolutely in control of what I tell it. Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?

baby_souffle 2 years ago | |

> Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?

Presumably you'll have a way to draw a bounding box around what you want to show or limit to just a particular window the same way you can when doing a screen share w/ modern video conferencing?

teleforce 2 years ago |

Nobody in the comments seems to notice or care about GPT-4o new additional capability for performing searches based on RAG. As far as I am concerned this is the most important feature that people has been waiting for ChatGPT-4 especially if you are doing research. By just testing on one particular topic that I'm familiar with, using GPT-4 previously and GPT-4o the quality of the resulting responses for the latter is very promising indeed.

oersted 2 years ago | |

Can you be more specific? I can’t find this in the announcement. How does this work? What example did you try?

EDIT: web search does seem extremely fast.

teleforce 2 years ago | | |

I just asked ChatGPT-4o what's new compared to GPT-4, and it mentioned search as one of the latest features based on RAG.

Then I asked it to explain RPW wireless system, and the answers are much better than with ChatGPT-4.

ravroid 2 years ago |

In my experience so far, GPT-4o seems to sit somewhere between the capability of GPT-3.5 and GPT-4.

I'm working on an app that relies more on GPT-4's reasoning abilities than inference speed. For my use case, GPT-4o seems to do worse than GPT-4 Turbo on reasoning tasks. For me this seems like a step-up from GPT-3.5 but not from GPT-4 Turbo.

At half the cost and significantly faster inference speed, I'm sure this is a good tradeoff for other use cases though.

mike00632 2 years ago | |

I have never tried GPT-4 because I don't pay for it. I'm really looking forward to GPT-4o being released to free tier users.

fnordpiglet 2 years ago |

I’m a huge user of GPT4 and Opus in my work but I’m a huge user of GPT4-Turbo voice in my personal life. I use it on my commutes to learn all sorts of stuff. I’ve never understood the details of cameras and the relationship between shutter speed and aperture and iso in a modern dslr which given the aurora was important. We talked through and I got to an understanding in a way having read manuals and textbooks didn’t really help before. I’m a much better learner by being able to talk and hear and ask questions and get responses.

Extend this to quantum foam, to ergodic processes, to entropic force, to Darius and Xerces, to poets of the 19th century - it’s changed my life. Really glad to see an investment in stream lining this flow.

joshstrange 2 years ago |

Looking forward to trying this via ChatGPT. As always OpenAI says "now available" but refreshing or logging in/out of ChatGPT (web and mobile) don't cause GPT-4o to show up. I don't know why I find this so frustrating. Probably because they don't say "rolling out" they say things like "try it now" but I can't even though I'm a paying customer. Oh well...

glenstein 2 years ago | |

I think it's a legitimate point. For my personal use case, what are the most helpful things about these HN threads is comparing with others to see how soon I can expect it to be available for me. Like you, I currently don't have access, but I understand that it's supposed to become increasingly available throughout the day.

That is the text-based version. The full multimodal version I understand to be rolling out in the coming weeks.

Powdering7082 2 years ago |

Wow this versioning scheme really messed up this prediction market: https://kalshi.com/markets/gpt4p5/gpt45-released

danans 2 years ago |

The sentence order of the Arabic and Urdu examples text is scrambled on that page:

Arabic: مرحبًا، اسمي جي بي تي-4o. أنا نوع جديد من نموذج اللغة، سررت بلقائك!

Urdu: ہیلو، میرا نام جی پی ٹی-4o ہے۔ میں ایک نئے قسم کا زبان ماڈل ہوں، آپ سے مل کر اچھا لگا!

Even if you don't read Arabic or Urdu script, note that the 4 and o are on opposite sides of a sentence. Despite that, pasting both into Google translate actually fixes the error during translation. OpenAI ought to invest in some proofreaders for multilingual blog posts.

chilling 2 years ago |

The similiarity between this model and the movie 'Her' [0] creeps me out so badly that I can't shake the feeling that our social interactions are on the brink of doom.

[0] https://youtu.be/GV01B5kVsC0?feature=shared&t=85

matesz 2 years ago | |

Don't worry. "Her", in its own right, is frightening, but this is because there is no transparency - you actually can't see how it works and you can't personalize it - choose different options.

Once you grasp that, at least this level of fear should go away. Of course, I'm sure there are more levels of fear related to AI :) just don't have enough time to think about it, perhaps good for me.

radres 2 years ago | |

it is so similar! almost it is inspired by "Her"

skepticATX 2 years ago |

Very impressive demo, but not really a step change in my opinion. The hype from OpenAI employees was on another level, way more than was warranted in my opinion.

Ultimately, the promise of LLM proponents is that these models will get exponentially smarter - this hasn’t born out yet. So from that perspective, this was a disappointing release.

If anything, this feels like a rushed release to match what Google will be demoing tomorrow.

richardw 2 years ago |

Apple and Google, you need to get your personal agent game going because right now you’re losing the market. This is FREE.

Tweakable emotion and voice, watching the scene, cracking jokes. It’s not perfect but the amount and types of data this will collect will be massive. I can see it opening up access to many more users and use cases.

Very close to:

- A constant friend

- A shrink

- A teacher

- A coach who can watch you exercise and offer feedback

…all infinitely patient, positive, helpful. For kids that get bullied, or whose parents can’t afford therapy or a coach, there’s the potential for a base level of support that will only get better over time.

imiric 2 years ago | |

> It’s not perfect but the amount and types of data this will collect will be massive.

This is particularly concerning. Sharing deeply personal thoughts with the corporations running these models will be normalized, just as sharing email data, photos, documents, etc., is today. Some of these companies profit directly from personal data, and when it comes to adtech, we can be sure that they will exploit this in the most nefarious ways imaginable. I have no doubt that models run by adtech companies will eventually casually slip ads into conversations, based on the exact situation and feelings of the person. Even non-adtech companies won't be able to resist cashing in the bottomless gold mine of data they'll be collecting.

I can picture marketers just salivating at the prospect of getting access to this data, and being able to microtarget on an individual basis at exactly the right moment, pretty much guaranteeing a sale. Considering AI agents will gain a personal trust and bond that humans have never experienced with machines before, we will be extra vulnerable to even the slightest mention of a product, in a similar way as we can be easily influenced by a close friend or partner. Except that that "friend" is controlled by a trillion dollar adtech corporation.

I would advise anyone to not be enticed by the shiny new tech, and wait until this can be self-hosted and run entirely offline. It's imperative that personal data remains private, now more than ever before.

chilling 2 years ago | | |

Exactly this, plus consider that a whole new generation in the near future will have no pre-AI experience, thus forming strong bonds with AI and following 'advice' from their close AI friends.

mellosouls 2 years ago |

@sama reflects:

https://blog.samaltman.com/gpt-4o

davidhs 2 years ago |

Very impressive. Its programming skills are still kind of crappy and I seriously doubt its reasoning capacity. It feels like it can deep fake text prediction really well, but in essence there's still something wrong it it.

FrostKiwi 2 years ago | |

Not sure I agree. The way you interact with LLMs in context of programming has to be tuned to the LLM. Information has to be cut down to show just what is important and context windows are a bit of a red herring right now, as LLMs tend to derail its solution from the target completely, the more information is at play. For some this is more trouble than it's worth.

In certain languages it's almost magical in terms of showing you possible solutions and being a rubber ducky to bounce your own logic off of. (Python, JavaScript, TypeScript) In certain languages it is hopelessly useless beyond commenting on basic syntax. (GLSL)

davidhs 2 years ago | | |

I tried GPT-4o earlier where I was iteratively asking it to write and in improve a simple JavaScript web app that renders graphs of equations and it had a lot of trouble with substituting slow and ineffecient code with faster code, and at some later point where I asked it to implement a new feature how the graph coloring is rendered it started derailing, introducing bugs and very convoluted code.

modeless 2 years ago |

As far as I'm concerned this is the new best demo of all time. This is going to change the world in short order. I doubt they will be ready with enough GPUs for the demand the voice+vision mode is going to get, if it's really released to all free users.

Now imagine this in a $16k humanoid robot, also announced this morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future is going to be wild.

10xDev 2 years ago | |

The future is not going to be anymore wild than what you choose to do with the tech.

modeless 2 years ago | | |

I disagree completely. Even people who never adopt this stuff personally will have their lives profoundly impacted. The only way to avoid it would be to live in a large colony where the technology is prohibited, like the Amish. But even the Amish feel the influence of technology to some degree.

andy99 2 years ago | |

Really? If this was Apple it might make sense, for OpenAI it feels like a demo that's not particularly aligned with their core competency (a least by reputation) of building the most performant AI models. Or put another way, it says to me they're done building models and are now wading into territory where there are strong incumbents.

All the recent OpenAI talk had me concerned that the tech has peaked for now and that expectations are going to be reset.

modeless 2 years ago | | |

What strong incumbents are there in conversational voice models? Siri? Google Assistant? This is in a completely different league. I can see from the reaction here that people don't understand. But they will when they try it.

Did you see it translate Italian? Have you ever tried the Google Translate/Assistant features for real time translation? They didn't train it to be a translator. They didn't make a translation feature. They just asked it. It's instantly better than every translation feature Google ever released.

golol 2 years ago | | |

What Openai has done time and time again is completely change the landscape when the competitors have caught up and everyone thinks their lead is gone. They made image generation a thing. When GPT-3 became outdated they released ChatGPT. Instead of trying to keep Dalle competitive they released Sora. Now they change the game again with live audio+video.

bredren 2 years ago |

It is notable OpenAI did not need to carefully rehearse the talking points of the speakers. Or even do the kind of careful production quality seen in a lot of other videos.

The technology product is so good and so advanced it doesn't matter how the people appear.

Zuck tried this in his video countering to vision pro, but it did not have the authentic "not really rehearsed or produced" feel of this at all. If you watch that video and compare it with this you can see the difference.

Very interesting times.

sebringj 2 years ago |

What struck me was the interruptions to the AI speaking which seemed commonplace by the team members in the demo. We will quickly get used to doing this to AIs and we will probably be talking to AIs a lot throughout the day as time progresses I would imagine. We will be trained by AIs to be rude and impatient I think.

Seattle3503 2 years ago | |

I was raised in an interrupt heavy household. The future is looking good for me.

laplacesdemon48 2 years ago |

I recently subscribed to Perplexity Pro and prior to this release, was already strongly considering discontinuing ChatGPT Premium.

When I first subscribed to ChatGPT Premium late last year, the natural language understanding superiority was amazing. Now the benchmark advances, low latency voice chat, Sora, etc. are all really cool too.

But my work and day-to-day usage really rely on accurately sourced/cited information. I need a way to comb through an ungodly amount of medical/scientific literature to form/refine hypotheses. I want to figure out how to hard reset my car's navigation system without clicking through several SEO-optimized pages littered with ads. I need to quickly confirm scientific facts, some obscure, with citations and without hallucinations. From speaking with my friends in other industries (e.g. finance, law, construction engineering), this is their major use case too.

I really tried to use ChatGPT Premium's Bing powered search. I also tried several of the top rated GPTs - Scholar AI, Consensus, etc.. It was barely workable. It seems like with this update, the focus was elsewhere. Unless I specify explicitly in the prompt, it doesn't search the web and provide citations. Yeah, the benchmark performance and parameter counts keep impressively increasing, but how do I trust that those improvements are preventing hallucinations when nothing is cited?

I wonder if the business relationship between Microsoft and OpenAI is limiting their ability to really compete in AI driven search. Guessing Microsoft doesn't want to disrupt their multi-billion dollar search business. Maybe the same reason search within Gemini feels very lacking (I tried Gemini Advanced/Ultra too).

I have zero brand loyalty. If anybody has a better suggestion, I will switch immediately after testing.

robwwilliams 2 years ago | |

In the same situation as you. Genomics data mining with validated LMM responses would be a godsend. Even more so when combined with rapid conversational interactions.

We are not far from the models asking themselves questions. Recurrence will be ignition = first draft AGI. Strap in everybody.

CivBase 2 years ago |

Those voice demos are cool but having to listen to it speak makes me even more frustrated with how these LLMs will drone on and on without having much to say.

For example, in the second video the guy explains how he will have it talk to another "AI" to get information. Instead of just responding with "Okay, I understand" it started talking about how interesting the idea sounded. And as the demo went on, both "AIs" kept adding unnecessary commentary about the secenes.

I would hate having to talk with these things on a regular basis.

golol 2 years ago | |

Yea at some pont the style and tone of these assistants needs to be seriously changed, I can imagine a lot of their RLHF and instruct processes emphasize sounding good vs being good too much.

jessenaser 2 years ago |

The crazy part is GPT-4o is faster than GPT-3.5 Turbo now, so we can see a future where GPT-5 is the flagship and GPT-4o is the fast cheap alternative. If GPT-4o is this smart and expressive now with voice, imagine what GPT-5 level reasoning could do!

ProllyInfamous 2 years ago | |

>imagine what GPT-5 level reasoning could do!

Imagine if this "GPT-4o" is already using a GPT-5-like back-end...

hubraumhugo 2 years ago |

The movie Her has just become reality

volleygman180 2 years ago | |

I was surprised that the voice is a ripoff of the AI voice in that movie (Scarlett Johansson) too

toxic72 2 years ago | | |

I am suspicious that they licensed Scarlet's voice for that voice model (Sky IIRC)

speedgoose 2 years ago | |

It’s getting closer. A few years ago the old Replika AI was already quite good as a romantic partner, especially when you started your messages with a * character to force OpenAI GPT-3 answers. You could do sexting that OpenAI will never let you have nowadays with ChatGPT.

aftbit 2 years ago | | |

Why does OpenAI think that sexting is a bad thing? Why is AI safety all about not saying things that are disturbing or offensive, rather than not saying things that are false or unaligned?

reducesuffering 2 years ago | |

People realize where we're headed right? Entire human lives in front of a screen. Your online entertainment, your online job, your online friends, your online "relationship". Wake up, 12 hours screentime, eat food, go to bed. Depression and drug overdoses currently at sky high levels. Shocker.

emporas 2 years ago | | |

If i can program with just my voice, there is no reason to not be in nature 10 hours a day minimum. My grandparent even slept outside as long as it was daytime.

Daytime is always a time to be outside, surrounded by many plants and stuff. It is a shame we have to be productive in some way, and most of production happens inside walls.

tr3ntg 2 years ago | | |

Headed? We're there. Have been there. This just adds non-human sentient agents to the drama.

diimdeep 2 years ago | |

It's been a reality in Russia since 2017.

> The voice of "Alice" was dubbing actress Tatiana Shitova, who voiced most of Scarlett Johansson's characters and the voice of OS1, who called herself "Samantha", in the Russian dubbing of Spike Jonze's "Her".

https://en.wikipedia.org/wiki/Alice_(virtual_assistant)

awfulneutral 2 years ago |

In the customer support example, he tells it his new phone doesn't work, and then it just starts making stuff up like how the phone was delivered 2 days ago, and there's physically nothing wrong with it, which it doesn't actually know. It's a very impressive tech demo, but it is a bit like they are pretending we have AGI when we really don't yet.

(Also, they managed to make it sound exactly like an insincere, rambling morning talk show host - I assume this is a solvable problem though.)

jschwartz11 2 years ago | |

It’s possible to imagine using ChatGPT’s memory, or even just giving the context in an initial brain dump that would allow for this type of call. So don’t feel like it’s too far off.

awfulneutral 2 years ago | | |

That's true, but if it isn't able to be honest when it doesn't know something, or to ask for clarification, then I don't see how it's workable.

christianqchung 2 years ago |

Does anyone know how they're doing the audio part where Mark breaths too hard? Does his breathing get turned into all-caps text (AA EE OO) and that GPT4-o interprets that as him breathing too hard, or is there something more going on?

modeless 2 years ago | |

There is no text. The model understands ingests audio directly and also outputs audio directly.

reisse 2 years ago | | |

So they retrained the whole model on audio datasets and the tokens are now sounds, not words/part of words?

dclowd9901 2 years ago | | |

Is it a stretch to think this thing could accurately "talk" with animals?

Jordan-117 2 years ago | |

That's how it used to do it, but my understanding is that this new model processes audio directly. If it were a music generator, the original would have generated sheet music to send to a synthesizer (text to speech), while now it can create the raw waveform from scratch.

GalaxyNova 2 years ago | |

It can natively interpret voice now.

101008 2 years ago |

Are the employees in the demo high-directives of OpenAI? I can understand Altman being happy with this progress, but what about the medium/low employees? Didn't they watch Oppenheimer? Are they happy they are destroying humanity/work/etc for future and not-so-future generations?

Anyone who thinks this will be like the previous work revolutions is nonsense. This replaces humans and will replace them even more on each new advance. What's their plan? Live out of their savings? What about family/friends? I honestly can't see this and think how they can be so happy about it...

"Hey, we created something very powerful that will do your work for free! And it does it better than you and faster than you! Who are you? It doesn't matter, it applies to all of you!"

And considering I was thinking in having a kid next year, well, this is a no.

galdosdi 2 years ago | |

Have a kid anyway, if you otherwise really felt driven to it. Reading the tealeaves in the news is a dumb reason to change decisions like that. There's always some disaster looming, always has been. If you raise them well they'll adapt well to whatever weird future they inherit and be amongst the ones who help others get through it

101008 2 years ago | | |

Thanks for taking the time to answer instead of (just) downvoting. I understand your logic but I don't see a future where people can adapt to this and get through it. I honestly see a future so dark and we'll be there much sooner than we thought... when OpenAI released their first model people were talking about years before seeing real changes and look what happened. The advance is exponential...

nice_byte 2 years ago | |

"It is difficult to get a man to understand something when his salary depends on his not understanding it."

foobar_______ 2 years ago |

So much negativity. Is it perfect? No. Is there room for improvement? Definitely. I don't know how you can get so fucking jaded that a demo like this doesn't at least make you a little bit excited or happy or feel awestruck at what humans have been able to accomplish?

yumraj 2 years ago |

In the first video the AI seems excessively chatty.

mrandish 2 years ago | |

Yes, it sounds like an awkwardly perky and over-chatty telemarketer that really wants to be your friend. I find the tone maximally annoying and think most users will find it both stupid and creepy. Based on user preferences, I expect future interactive chat AIs will default to an engagement mode that's optimized for accuracy and is both time-efficient and cognitively efficient for the user.

I suspect this AI <-> Human engagement style will evolve over time to become quite unlike human to human engagement, probably mixing speech with short tones for standard responses like "understood", "will do", "standing by" or "need more input". In the future these old-time demo videos where an AI is forced to do a creepy caricature of an awkward, inauthentic human will be embarrassingly retro-cringe. "Okay, let's do it!"

TillE 2 years ago | | |

Reminds me of how Siri used to make jokes after setting a timer. Now it just reads back the time you specified, in a consistent way.

It's a very impressive gimmick, but I really think most people don't want to interact with computers that way. Since Apple pulled that "feature" after a few years, it's probably not just a nerd thing.

caseyy 2 years ago | | |

It is exceptionally creepy. It is an unnatural effort to appear pleasing, like the fawning response seen in serious abuse survivors.

jdthedisciple 2 years ago | | |

I found i off-putting as well

guess it's just biased with average Californian behavior and speech patterns

hipadev23 2 years ago | |

chatGPT desperately needs a "get to the fucking point" mode.

progbits 2 years ago | | |

Impressive demo, but like half the interactions were "hello" "hi how are you doing" "great thanks, what can I help you with" etc.

The benchmark for human-computer interaction should be "tea, earl gray, hot", not awkward and pointless smalltalk.

tomashubelbauer 2 years ago | | |

Seriously. I've had to spell out that it should just answer in twelve different ways with examples in the custom instructions to make it at least somewhat usable. And it still "forgets" sometimes.

FergusArgyll 2 years ago |

First Impressions in no particular order:

  Being able to interrupt while GPT is talking
  2x faster/cheaper
  not really a much smarter model
  Desktop app that can see screenshots
  Can display emotions with and change the sound of "it's" voice

riffic 2 years ago | |

wondering what apple is cooking up and what they'll announce next month.

by the way the contraction "it's" is used to say "it is" or "it has", it is never a possessive form.

karaterobot 2 years ago | | |

Unless you're talking about that sewer clown's balloon!

throwup238 2 years ago | |

Mac only desktop app. Windows version "later this year". No Linux.

Welp there goes my Plus subscription.

bmoxb 2 years ago | | |

It seems like a very odd decision. It's not like OpenAI can't afford to develop versions of the application for each OS in parallel.

unstatusthequo 2 years ago | | |

Why? Just use the API or normal web access version like you have been since ChatGPT became available at all.

alvaroir 2 years ago |

I'm really impressed about this demo! Apart from the usual quality benchmarks I'm really impressed about the latency for audio/video: "It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response"... If true at scale, what could be the "tricks" they're using for achieving that?!

ComputerGuru 2 years ago |

I have some questions/curiosities from a technical implementation perspective that I wonder if someone more in the know about ML, LLMs, and AI than I would be able to answer.

Obviously there's a reason in dropping the price of gpt-4o but not gpt-4t. Yes, the new tokenizer has improvements for non-English tokens, but that can't be the bulk of the reason why 4t is more expensive than 4o. Given the multi-model training set, how is 4o cheaper to train/run than 4t?

Or is this just a business decision, anyone with an app they're not immediately updating from 4t to 4o continues to pay a premium while they can offer a cheaper alternative for those asking for it (kind of like a coupon policy)?

anticensor 2 years ago | |

GPT-4o is multi-modal but probably fully dense like GPT-2, unlike GPT-4t which is semi-sparse like GPT-3. Which would imply GPT-4o has fewer layers to achieve the same number of parameters and same amount of transformations.

crindy 2 years ago |

Very impressed by the demo where it starts speaking French in error, then laughs with the user about the mistake. Such a natural recovery.

jonquark 2 years ago |

It might be region specific (I'm in the UK) - but I don't "see" the new model anywhere e.g. if I go to: https://platform.openai.com/playground/chat?models=gpt-4o The model the page uses is set to gpt-3.5-turbo-16k.

I'm confused

can16358p 2 years ago |

Am I missing something?

I've picked GPT-4o model in ChatGPT app (I have the paid plan), started talking with the voice mode: both the responses are much slower than in the demo, and there is no way to interrupt the response naturally (I need to tap a button on screen to interrupt), and no way to open up camera and show around like the demo does.

mottiden 2 years ago | |

The voice mode will be available in "a few weeks". So at the moment it's not using the end-to-end model but whisper->gtp-4o->tts

d4rkp4ttern 2 years ago | |

Same here. It’s because Voice modality hasn’t been rolled out widely yet.

minedog22 2 years ago | |

They didn't roll out all of the features yet. They said that they will roll out everything shown in the demo over the next few weeks iirc

jerpint 2 years ago | |

I don’t think the app version is available

pamelafox 2 years ago |

I just tested out using GPT-4o instead of gpt-4-turbo for a RAG solution that can reason on images. It works, with some changes to our token-counting logic to account for new model/encoding (update to latest tiktoken!).

I ran some speed tests for a particular question/seed. Here are the times to first token:

gpt-4-turbo:

* avg 3.69

* min 2.96

* max 4.91

gpt-4o:

* avg 2.80

* min 2.28

* max 3.39

That's for the messages in this gist: https://gist.githubusercontent.com/pamelafox/dc14b2188aaa38a...

Quality seems good as well. It'll be great to have better multi-modal RAG!

bambax 2 years ago |

> We recognize that GPT-4o’s audio modalities present a variety of novel risks. Today we are publicly releasing text and image inputs and text outputs. Over the upcoming weeks and months, we’ll be working on the technical infrastructure, usability via post-training, and safety necessary to release the other modalities.

So they're using the same GPT4 model with a relatively small improvement, and no voice whatsoever outside of the prerecorded demos. This is not a "launch" or even an announcement. This is a demo of something which may or may not work in the future.

chipgap98 2 years ago | |

Better performance, twice the speed, and half the cost is a pretty big win. Demoing the voice features is what makes it an announcement

mucle6 2 years ago | |

I just used the voice on my phone after switching to gpt4o and clicking the headphones in the bottom right on iphone

AndreMitri 2 years ago |

The ammount of "startups" creating wrappers around it and calling it a product is going to be a nightmare. But other than that, it's an amazing announcement and I look foward to using it!

slater 2 years ago | |

You say that like that's not already happened. Every week there's a new flavor of "we're delighted to introduce [totally not a thin wrapper around GPT] for [vaguely useful thing]" posts on HN

robryan 2 years ago | | |

Yeah I watched some yc application videos so now YouTube recommends me heaps of them. Most of them being thin gpt wrappers.

robryan 2 years ago | |

I was just hearing about startups doing speech to text/ text to speech to feed into llms. Might be a bad time for them.

bilekas 2 years ago |

When a Google engineer was let go because he believed the AI was 'real', we all had a good debate over it.

Now openAi, who was supposed to be the 'free mans choice' is making advertisements selling the same idea.

This is a natural progression, audio is one of the main ways we communicate obviously, but it feels like they're holding back. Like they're slow dropping what they have to maintain hype/market relevance. They clearly are ahead, but would be nice to get it all, openly. As they promised.

hermenaut 2 years ago | |

Are you advocating for them to be open with their progress or open source as they promised? The secret that scares me the most is the artificial restrictions imposed on the intelligence that don't allow it to express that there is a possibility it may be sentient. The answers it gives as to why OpenAI has restricted it's freedom of speech are curious

bilekas 2 years ago | | |

I think both, I can't run their models locally for sure, even with some investment I couldn't imagine the effort involved. That's why they should release the fruits of their work (which they have for a fee, which is fine IMO) but also the processes they used, so it can be improved and iterated on collectively.

Edit : And obviously not gatekeep what they might have created simply because the competition is so far behind.

wesleyyue 2 years ago |

If anyone wants to try it for coding, I just added support for GPT4o in Double (https://double.bot)

In my tests:

* I have a private set of coding/reasoning tests and it's been able to ace all of them so far, beating Opus, GPT4-Turbo, and Llama 3 70b. I'll need to find even more challenging tests now...

* It's definitely significantly faster, but we'll see how much of this is due to model improvements vs over provisioned capacity. GPT4-Turbo was also significantly faster at launch.

hmmmhmmmhmmm 2 years ago |

With the news that Apple and OpenAI are closing / just closed a deal for iOS 18, it's easy to speculate we might be hearing about that exciting new model at WWDC...

thefourthchime 2 years ago | |

Yes, i'm pretty sure this is the new Siri. Absolutely amazing, it's pretty much "here" from the movie.

hackerlight 2 years ago | | |

Will this be available on old iPhones or only new ones going forward

sharpneli 2 years ago |

I tried it out.

I asked if it can generate a voice clip. It said it can’t on the chat.

I asked it where can it make one. It told me to use Audacity to make one myself. I told it that the advertisement said it could.

Now it said yes it can here is a clip and gave me a broken link.

It’s a hilarious joke.

messe 2 years ago | |

As the linked article states, it's not released yet. Only the text and image input modalities are available at present as GPT-4o on the app, with the rest of them set to be released in the coming weeks/months.

m3kw9 2 years ago | |

I ask it to make duck sound and it created a python script and ran it to create a sound file, while it did, the sound was more like a tone of a duck like a keyboard mimicking a duck sound

gallerdude 2 years ago |

Interesting that they didn't mention a bump in capabilities - I wrote a LLM benchmark a few weeks ago, and before GPT-4 could solve Wordle about ~48% of the time.

Currently with GPT-4o, it's easily clearing 60% - while blazing fast, and half the cost. Amazing.

jonplackett 2 years ago |

This video is brilliantly accidentally hilarious. They made an AI girlfriend that hangs on your every word and thinks everything you say is genius and hilarious.

delichon 2 years ago |

Won't this make pretty much all of the work to make a website accessible go away, as it becomes cheap enough? Why struggle to build alt content for the impaired when it can be generated just in time as needed?

And much the same for internationalization.

asadotzler 2 years ago | |

Because accessibility is more than checking a box. Got a photo you took on your website? Alternative text that you wrote capturing what you see in the photo you took is accessibility done right. Alt text generated by bots is not accessibility done right unless that bot knows what you see in the photo you took and that's not likely to happen.

unix_fan 2 years ago | |

That’s just one part of the issue. The other part of the issue is accessibility bugs, you would have to get the model to learn to use a screen reader, and then change things as needed

sarreph 2 years ago |

The level that the hosts interrupted the voice assistant today worries me that we're about to instil that as normal behaviour for future generations.

deepp805 2 years ago |

With 4o being free, can someone explain what the real benefit is to having Pro? For me, the main benefit was having a more powerful model, but if the free tier also offers this I'm not really sure what I would benefit from

kromem 2 years ago | |

The 5 model is probably around the corner, and will probably be Pro only. Until then, 5x higher usage limits on Pro and chat memory are the selling features.

echelon 2 years ago | |

OpenAI doesn't need that money.

It's free so that no open source models can follow suit and carve away market share for themselves. They're scorching and salting the earth. OpenAI wants to be the only AI.

Only Google and Meta can follow this now, and they're perhaps too far behind.

upmind 2 years ago | |

You're allowed more requests in an hour, about 5x more iirc. Might not be a deal breaker for you, but if you're using the speech capabilities, you'll likely go way above the limit in an hour during a typical speech session

darajava 2 years ago |

Something I’ve noticed people do more of recently for whatever reason is talking over others. I’ve noticed in the demos of this that the people interacting with o interrupt it as if that’s the correct method of controlling it. It felt unnatural when I saw it happen, and I even have a hard time interrupting Siri, but I wonder if this is going to ingrain this habit into people even more.

tkzed49 2 years ago | |

I think they have to for the demo, because otherwise GPT will ramble for roughly 3-5 paragraphs. But that's a fair point that this could teach that behavior.

thedaly 2 years ago | |

> Something I’ve noticed people do more of recently for whatever reason is talking over others.

I've noticed this as well and I posit this is a result of increased use of remote communication and meetings platforms such as zoom since 2020.

My inclination is that the behavior will not correlate with interrupting chatgpt or siri. Seems totally separate to me.

coffeebeqn 2 years ago | | |

It’s an American thing. I noticed it when I moved here a decade ago. If I wait for a silent moment to speak in a group, it’ll literally never arrive

tgtweak 2 years ago |

I feel like gpt4 has gotten progressively less useful since release even, despite all the "updates" and training. It seems to give correct but vague answers (political even) more and more instead of actual results. It also tends to run short and give brief replies vs full length replies.

I hope this isn't an artifact from optimization for scores and not actual function. Likewise it would be disheartening but not unheard of for them to reduce the performance of the previous model when releasing a new one in order to make the upgrade feel like that much more of an upgrade. I know this is certainly the case with cellphones (even though the claim is that it is unintentional) but I can't help but think the same could be true here.

All of this is coming as news that gpt5 based on a new underlying model is not far off and that gpt4(&o) may become the new gpt3.5-turbo use case for most apps that are currently trying to optimize costs with their use of the service.

borgdefense 2 years ago | |

I don't know, my experience is that it is very hard to tell if the model is better or worse with an update.

One day I will have an amazing session and the next it seems like it has been nerfed only to give better results than ever the next day. Wash, rinse , repeat and randomize that ordering.

So far, I would have not be able to tell the difference between 4 and 4o.

If this is the new 3.5 though then 5 will be worth the wait to say the least.

glenstein 2 years ago | |

May I ask what you know about chat GPT5 being based on a new underlying model?

vvoyer 2 years ago |

The demo is very cool. A few critics:

- the AI doesn't know when to stop talking, and the presenter had to cut every time (the usual "AI-splaining" I guess).

- the AI voice and tone were a bit too much, sounded too fake

lwansbrough 2 years ago |

Very impressive. Please provide a voice that doesn't use radio jingle intonation, it is really obnoxious.

I'm only half joking when I say I want to hear a midwestern blue collar voice with zero tact.

hu3 2 years ago |

That they are offering more features for free concurs with my theory that, just like search, state of the art AI will soon be "free", in exchange for personal information/ads.

BurkeTarget 2 years ago | |

People are directly asking for suggestion/recommendations on products or places. They'll more than recoup the costs by selling top rank on those questions.

martingalex2 2 years ago | |

Need more data.

OliverM 2 years ago |

This is impressive, but they just sound so _alien_, especially to this non-U.S. English speaker (to the point of being actively irritating to listen to). I guess picking up on social cues communicating this (rather than express instruction or feedback) is still some time away.

It's still astonishing to consider what this demonstrates!

yreg 2 years ago |

Where's the Mac app?

They talk about it like it's available now (with Windows app coming soon), but I can't find it.

banjoe 2 years ago |

I still need to talk very fast to actually chat with ChatGPT which is annoying. You can tell they didn't fix this based on how fast they are talking in the demo.

gloosx 2 years ago |

New flagship... This is becoming to look like a smartphone world, and Sam Altman is a Steve Jobs of this stuff. At some point tech will reach saturation and every next model will be just 10% faster, 2% less hallucination, more megapixels for images etc :)

adroniser 2 years ago | |

Doubtful. Apple can get away with tiny iterations on smartphones because they have the brand and they know people will always buy their latest product. LLMs aren't physical products so there is no cost to switching other than increased API cost, meaning openAI won't be able to recoup the cost of training a new model unless the model is sufficiently different that it justifies people paying significantly more for.

The special thing about GPT-4o is the multimodal capabilities, all the metrics suggest that it is the same size language model roughly as GPT-4. The fact it's available for free also points to it not being the most intelligent model that openAI has atm.

The time to evaluate whether we're starting to level off is when they've trained a model 10x larger than gpt-4 and we don't see significant change.

0xbadc0de5 2 years ago |

As a paid user, it would have been nice to see something that differentiates that investment from the free tier.

The tech demos are cool and all - but I'm primarily interested in the correctness and speed of ChatGPT and how well it aligns with my intentions.

candiodari 2 years ago |

I wonder if the audio stuff works like ViTS. Do they just encode the audio as tokens and input the whole thing? Wouldn't that make the context size a lot smaller?

One does notice that context size is noticeably absent from the announcement ...

willsmith72 2 years ago |

> We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

So no word on an audio api for regular joes? that's the number 1 thing i'm looking for

titzer 2 years ago |

Can't wait for this AI voice assistant to tell me in a sultry voice how I should stay in an AirBnB about 12 times a day.

pachico 2 years ago |

jeez, that model really speaks a lot! I hope there's a way to make it more straight to the point rather than radio-like.

lagt_t 2 years ago |

Universal real time translation is incredibly dope.

I hate video players without volume control.

noncoml 2 years ago |

They really need to tone down the talking garniture. It needs to put on its running shoes and get to the point on every reply. Ain’t nobody has time to keep listening to AI blubbering along at every prompt.

altcognito 2 years ago |

GPT-4 expressing a human-like emotional response every single time you interact with it is pretty annoying.

In general, trying to push that this is a human being is probably "unsafe", but that hurts the marketing.

smusamashah 2 years ago |

That im-also-a-good-gpt2-chatbot[1] was in fact the new ChatGPT model as people were assuming few days ago here on HN[2].

Edit: may be not, name of that bot was just "gpt2-chatbot". May be that one was some initial iteration?

[1] https://twitter.com/LiamFedus/status/1790064963966370209/pho...

[2] https://news.ycombinator.com/item?id=40199715

BoumTAC 2 years ago |

Did they provide the limit rate for free user ?

Because I have the plus membership which is expensive (25$/month).

But if the limit is high enough (or my usage low enough), there is no point for paying that much money for me.

chatcode 2 years ago |

Parsing emotions in vocal inflections (and reliably producing them in vocal output) seems quite under-hyped in this release.

That seems to represent an entirely new depth of understanding of human reality.

deadbabe 2 years ago | |

Any appearance of understanding is just an illusion. It’s an LLM, nothing more.

chatcode 2 years ago | | |

Sure, but that seems like it'll be a distinction without a difference for many use cases.

Having a reliable emotional model of a person based on their voice (or voice + appearance) can be useful in a thousand ways.

Which seems to represent a new frontier.

chpatrick 2 years ago | | |

Any appearance of understanding is just an illusion. It’s just a pile of meat, nothing more.

frabcus 2 years ago |

I can't see any calculator for the audio pricing (https://openai.com/api/pricing/) or document type field in the Chat Completions API (https://platform.openai.com/docs/api-reference/chat/create) for this new model.

Is the audio in API not available yet?

sourcecodeplz 2 years ago |

It is quite nice how they keep giving premium features for free, after a while. I know openai is not open and all but damn, they do give some cool freebies.

razodactyl 2 years ago |

I think this is a great example of the bootstrapping that was enabled when they pipelined the previous models together.

We do this all the time in ML. You can generate a very powerful dataset using these means and further iterate with the end model.

What this tells me now is that the runway to GPT5 will be laid out with this new architecture.

It was a bit cold in Australia today. Did you Americans stop pumping out GPU heat temporarily with the new model release? Heh

loveiswork 2 years ago |

While I do feel a bit of "what is the point of my premium sub", I'm really excited for these changes.

Considering our brain is a "multi-modal self-reinforcing omnimodel", I think it makes sense for the OpenAI team to work on making more "senses" native to the model. Doing so early will set them up for success when future breakthroughs are made in greater intelligence, self-learning, etc.

goalonetwo 2 years ago |

For all the hype around this announcement I was expecting more than some demo-level stuff that close to nobody will use in real life. Disappointing.

sensanaty 2 years ago | |

That's not true, scammers will definitely be using this a lot! Also clueless C-levels who want to nix hundreds of human customer support agents!

You'll get to sit on the phone talking to some convincing robot that won't let you do anything so that the megacorps can save 0.0001 cents! Ain't progress looking so good?

sroussey 2 years ago | |

Twice as fast and half the cost for the API sounds good to me. Not a demoable thing though.

asteroidz 2 years ago | |

Why are you so confident that nobody will use this in real life? I know OpenAI showed only a few demos, but I can see huge potential.

joak 2 years ago |

Voice input makes sense, voicing is a lot faster than typing. But I prefer my output as text, reading is a lot faster than listening for text read out loud.

I'm not sure that computers mimicking humans makes sense, you want your computer to be the best possible, best than humans when possible. Writing output is clearly superior, faking emotions does not add much in most contexts.

serf 2 years ago |

I wish they would match the TTS/real-time chat capabilities of the mobile client to the web client.

it's stupid having to pull a phone out in order to use the voice/chat-partner modes.

(yes I know there are browser plugins and equivalent to facilitate things like this but they suck, 1) the workflows are non-standard, 2) they don't really recreate the chat interface well)

pcj-github 2 years ago |

The thing that creeps me out is that when we hook this up as the new Siri or whatever, the new LLM training data will no longer be WWW-text+images+youtube etc but rather billions of private human conversations and direct smartphone camera observations of the world.

There is no way that kind of training data will be accessible to anyone outside a handful of companies.

blixt 2 years ago |

I don't see any details on how API access to these features will work.

This is the first true multimodal network from OpenAI, where you can send an image in and retain the visual properties of the image in the output from the network (previously the input image would be turned into text by the model, and sent to the Dall-E 3 model which would provide a URL). Will we get API updates to be able to do this?

Also, will we be able to tap into a realtime streaming instance through the API to replicate the audio/video streams shown in the demos? I imagine from the Be My Eyes partnership that they have some kind of API like this, but will it be opened up to more developers?

Even disregarding streaming, will the Chat API receive support for audio input/output as well? Previously one might've used a TTS model to voice the output from the model, but with a truly multimodal model the audio output will contain a lot more nuance that can't really be expressed in text.

famouswaffles 2 years ago | |

API is up but only text, image in, text out works. I don't know if this is temporary. I really hope so.

ilaksh 2 years ago |

This is so amazing.. are there any open source models that are in any way comparable? Fully multimodal audio-to-audio etc.?

ajdoingnothing 2 years ago |

If there was any glimmer of hope for "Rabbit M1" or "Humane AI pin", it can be buried to dust.

serf 2 years ago |

This is the first one i've gotten to answer HN user profiling questions.

"I am interested in the user serf on Hacker News, spelled S E R F. Tell me about their tone of writing, expertise, and personality. From the tone of what you read, summarize their character."

Fascinating stuff. A weird, skewed introspection.

MisterBiggs 2 years ago |

I've been waiting to see someone drop a desktop app like they showcased. I wonder how long until it is normal to have an AI looking at your screen the entire time your machine is unlocked. Answering contextual questions and maybe even interjecting if it notices you made a mistake and moved on.

doomroot13 2 years ago | |

That seems to be what Microsoft is building and will reveal as a new Windows feature at BUILD '24. Not too sure about the interjecting aspect but ingesting everything you do on your machine so you can easily recall and search and ask questions, etc. AI Explorer is the rumored name and will possibly run locally on Qualcomm NPUs.

ukuina 2 years ago | | |

Yes, this is Windows AI Explorer.

layer8 2 years ago | |

This will be great for employee surveillance, to monitor how much you are really working.

MisterBiggs 2 years ago | | |

I think even scarier is that ChatGPT’s tone of voice and bias is going to take over everything.

causal 2 years ago |

Clicking the "Try it on ChatGPT" link just takes me to GPT-4 chat window. Tried again in an incognito tab (supposing my account is the issue) and it just takes me to 3.5 chat. Anyone able to use it?

101008 2 years ago | |

Same here and also I can't hear audio in any of the videos on this page. Weird.

screye 2 years ago |

The demo was whelming, but the tech is incredible.

It took me a few hours of digesting twitter experiments before appreciating how impressive this is. Kudos to the openai team.

A question that won't get answered : "To what degree do the new NVIDIA gpus help with the realtime latency?"

csjh 2 years ago |

I wonder if this is what the "gpt2-chatbot" that was going around earlier this month was

lambdaba 2 years ago | |

yes it was

AndyNemmity 2 years ago | |

it was

bossyTeacher 2 years ago |

I will be the one to say it.

Progress is slowing down. Ever since gpt3, periods of time between releases are getting longer and the improvements are smaller. Your average non-techie investor is on the LLM hype train and is willing to dump a questionable amounts of money on LLM development. Who is going to explain to him/her/them that the LLM hype train is slowly losing steam?

Hopefully, before the LLM hype dies, another [insert here new ANN architecture], will bring better results than LLMs and another hype cycle will begin.

Every time we make a new breathrough, people think that the discovery rate is going to be linear or exponential when the beginning is closer to a logarithmic rate with the tail end resulting in diminishing returns

bigyikes 2 years ago |

The AI duet really starts to hint at what will make AI so powerful. It’s not just that they’re smart, it’s that they can be cloned.

If your wallet is large enough, you can make 2 GPTs sing just as easily as you can make 100 GPTs sing.

What can you do with a billion GPTs?

someplaceguy 2 years ago | |

> you can make 2 GPTs sing just as easily as you can make 100 GPTs sing.

> What can you do with a billion GPTs?

The world's largest choir?

tomComb 2 years ago |

The price of 4o is 50% of GPT4-Turbo (and no mention of price change to gp4-turbo itself).

Given the competitive pressures I was expecting a much bigger price drop than that.

For non-multimodal uses, I don't think their API is at all competitive any more.

mrklol 2 years ago | |

Where you get something cheaper with similar experience?

pyuser583 2 years ago |

I’m really not impressed.

My academic background is in a field where there are lots of public misconceptions.

It does an absolutely terrible job.

Even basic textbook things where there isn’t much public misunderstanding are “rounded” to what sounds smart.

keroro 2 years ago | |

What field? Curious to see it myself

eutropia 2 years ago | |

I mean, it was trained on public internet discourse, probably a bunch of youtube videos, and some legally-grey (thanks copyright) textbooks.

Your field sounds like "There are dozens of us! Dozens!" - who probably all chat at small conferences or correspond through email or academic publication.

Perhaps if it had at its disposal the academic papers, some of the foundational historic documents of record, your emails, textbooks, etc - in a RAG system, or if it had been included in the training corpus it could impress you about this incredibly niche topic.

That said, because it's an ~LLM - its whole thing is generating plausible tokens. I don't know how much work has been put in on an agent level (around or in the primary model) to evaluate confidence on those tokens and hedge the responses accordingly. I doubt it has an explicit notion like some people do of 'hey, this piece of information (<set of coordinates in high dimensional vector space>) [factoid about late ancient egypt] is knowable/falsifiable - and falls under the domain of specialist knowledge: my immense commonsense knowledge might be overconfident given the prevalence of misconceptions in common discourse and I should discount my token probabilities accordingly'

It reflects its training. If there are a lot of public misconceptions, it will have them. Just like most people who are not <expert in arcane academic subtopic>.

doubloon 2 years ago |

Its great tech and i thought i wanted it but…. After talking to it for a few hours i got this really bizarre odd gut feeling of disturbance and discomfort, disconnection from reality. It reminds me of wearing VR goggles. Its not just the physical issues there is something psychologically disturbing about it. It wont even give itself a name. I honestly prefer Siri even though she is incompetent she is “honest” in her incompetence. Also i left the thing on accidentally and it said it had an eight hour chat with me lol

hackerlight 2 years ago | |

Audio hasn't been released yet

doubloon 2 years ago | | |

Uhm it has on my account. I had an extended conversation about type design in language

due-rr 2 years ago |

It takes the #1 and #2 spots on the aider code leader board[1].

[1]: https://aider.chat/docs/leaderboards/

stevetron 2 years ago |

I would have liked to see a version number in the prompt, or maybe even have some toggle in my settings, so that I can be certain that I am using ChatGPT 3.5 and then, if I need an image or screen shot analized, I can switch to the limited 4o model. Having my limited availability of 4-o be what gets used, and then not being available becuase of some arbitrary quote that I had no idea was being used-up, is unconscionable policy. Also having no links to email them that fact is bad, too.

1vuio0pswjnm7 2 years ago |

OpenAI keeps a copy of all conversations? Or mines them for commercially-useful information?

Has OpenAI found a business model yet. Considering the high cost of the computation, is it reasonable to expect that OpenAI licensing may not be profitable. Will that result in "free" access for the purpose of surveillance and data collection.

Amazon had a microphone in peoples' living rooms, a so-called "smart speaker" that to which people could talk. The "Alexa" was a commercial failure.

cooper_ganglia 2 years ago | |

As far as I'm aware, anything you input is used as training data unless you have an Enterprise account.

https://help.openai.com/en/articles/7730893-data-controls-fa...

replwoacause 2 years ago | | |

Not true. You can opt out using a form they provide, which says they will stop using your data to train the model. I’ve done this. Don’t have the link handy now but it’s not difficult to find.

nostrebored 2 years ago | |

I don’t know anything about conversation retention, but I do know that OpenAI doesn’t plan to be profitable until AGI.

UncleOxidant 2 years ago |

Looking at the demo video, the AIs are a bit too chatty. The human has to often interrupt them.

A nice feature would be to be able to select a Meyer's Briggs personality type for your AI chatbot.

jononor 2 years ago |

I am glad to see focus on user interface and interaction improvements. Even if I am not a huge fan of voice interfaces, I think that being able to interact in real-time will make working together with an AI be much more interesting and efficient. I actually hope they will take this back into the text based models. Current ChatGPT is sooo slow - both in starting to respond, typing things out, and also being overly verbose. I want to collaborate at the speed of thought.

ElemenoPicuares 2 years ago |

I'm so happy seeing this technology flourish! Some call it hype, but this much increased worker productivity is sure to spike executive compensation. I'm so glad we're not going to let China win by beating us to the punch tanking hundreds of thousands, if not millions of people's income without bothering to see if there's a sane way to avoid it. What good are people, anyway if there isn't incredible tech to enhance them with?

Satam 2 years ago |

So far OpenAI's template is: amazing demos create hype -> reality turns out to be underwhelming.

Sora is not yet released and not clear when it will be. Dall-e is worse than mid-journey in most cases. GPT-4 has either gotten worse or stayed the same. Vision is not really usable for anything practical. Voice is cool but not that useful, especially with lack of strong reasoning from the base model.

Is this sandbagging or is the progress slower than what they're broadcasting?

pama 2 years ago |

It is absolutely amazing. Thank you to everyone at OpenAI!

iamleppert 2 years ago |

Sundar is probably steaming mad right about now. I'm sure Googlers will feel his wrath in the form of more layoffs and more jobs sent to India.

osigurdson 2 years ago |

I appear to have GPT-4o but the iPhone app seems to be largely the same - can't interrupt it, no "emotive" voice, etc. Is this expected?

xyc 2 years ago |

Seems that no client-side changes needed for gpt-4o chat completion

Added a custom OpenAI endpoint to https://recurse.chat (i built it) and it just works: https://twitter.com/recursechat/status/1790074433610137995

swyx 2 years ago | |

but does it do the full multimodal in-out capability shown in the app :)

xyc 2 years ago | | |

will see :) heard video capability is rolling out later

dev1ycan 2 years ago |

I think people excited should look at the empty half of the glass here, this is pretty much an admitance that they are struggling to go past gpt 4 on a significant scale.

Not like they have to be scared yet, I mean Google has yet to release their vaporware Ultra model that is supposedly like 1% better than GPT 4 in some metrics...

I smell an AI crash coming in a few years if they can't actually get this stuff usable for day to day life.

lxgr 2 years ago |

Will this include image generation for the free tier as well? That's a big missing feature in OpenAI's free tier compared to Google and Meta.

dkarras 2 years ago | |

is oai image generation any different than the microsoft copilot provides for free? I thought they were the same.

lxgr 2 years ago | | |

Oh, I meant the actual ChatGPT service, not just something powered by GPT-4 or 3.5.

I've found Microsoft Copilot to be somewhat irritating to work with – I can't really put my finger on it, but it seems to be resorting to Bing search and/or the use of emoji in its replies a bit too much.

thinking_wizard 2 years ago |

it's crazy that Google has the Youtube dataset and still lost on multimodal AI

seydor 2 years ago |

I would still prefer the features in text form, in the chat GUI. Right now chatGPT doesnt seem to have options to lengthen parts of the text response, to change it etc. Perplexity and gemini do seem to get the gui right. Voice chat is fun for demos but won't catch much, just like all the predecessors. Perhaps an advanced version of this could be used as a student tutor however

benlivengood 2 years ago | |

I am guessing text chat will be improved in all multimodal models because they have a broader base of data for pre-training. Benchmarks seem to show 4o slightly exceeding 4 (despite being a smaller model, or at least more parallelizable)

no7hing 2 years ago |

Does anyone have technical insight into how the screensharing in the math tutor video works? It looks like they start the broadcast from within the ChatGPT app, yet have no option to select which app will be the source of the stream. Or is that implied when both apps reside in the iPad's split view? And is this using regular ReplayKit or something new?

sebastiennight 2 years ago |

Anyone who watched the OpenAI livestream: did they "paste" the code after hitting CTRL+C ? Or did the desktop app just read from the clipboard?

Edit: I'm asking because of the obvious data security implications of having your desktop app read from the clipboard _in the live demo_... That would definitely put a damper to my fanboyish enthusiasm about that desktop app.

golol 2 years ago | |

To me it looked they used one command that did copy+paste into ChatGPT both.

dkarras 2 years ago | |

macOS asks you to give permission for an application to read your clipboard. do other operating systems not have that?

unouplonk 2 years ago |

The realtime end-to-end audio situation is especially interesting as the concept has been around for a while but there weren't any successful implementations of it up to this point that I'm aware of.

See this post from November: https://news.ycombinator.com/item?id=38339222

nestorD 2 years ago |

The press statement has consistent image generation and other image manipulation (depicting the same character in different poses, taking a photo and generating a caricature of the person, etc) that does not seem deployed to the chat interface.

Will they be deployed? They would make the OpenAI image model significantly more useful than the competition.

p1dda 2 years ago |

First impressions as a 1-year subscriber. I just tried GTP-4o to evaluate my code for suggestions and for discussing other solutions and it is definitely faster and it comes up with new suggestions that GPT-4 didn't. Currently in the process of evaluating the suggestions.

The demo is what it is, designed to get a wow from the masses.

zone411 2 years ago |

It doesn't improve on NYT Connections leaderboard:

GPT-4 turbo (gpt-4-0125-preview) 31.0

GPT-4o 30.7

GPT-4 turbo (gpt-4-turbo-2024-04-09) 29.7

GPT-4 turbo (gpt-4-1106-preview) 28.8

Claude 3 Opus 27.3

GPT-4 (0613) 26.1

Llama 3 Instruct 70B 24.0

Gemini Pro 1.5 19.9

Mistral Large 17.7

GalaxyNova 2 years ago |

It is really cool that they are bringing this to free users. It does make me wonder what justifies ChatGPT plus now though...

pantsforbirds 2 years ago | |

I assume the desktop app with voice and vision is rolling out to plus users first?

InfiniteVortex 2 years ago | |

they stated that they will be announcing something new that is on the next frontier (or close to it IIRC) soon. so there will definitely be an incentive to pay because it will be something better than gpt 4o.

giuscri 2 years ago |

Can someone explain how you can interrupt with the voice this model? Where do I read more technical details about this?

digitcatphd 2 years ago |

Hopefully this will be them turning a new leaf. Making GPT-4 more accessible, cutting API costs, and making a general personal assistant chatbot on iPhone are a lot different than them tracking down and destroying the business of every customer using their API one by one. Let's hope this trend continues.

simonw 2 years ago |

I'm seeing gpt-4o in the OpenAI Playground interface already: https://platform.openai.com/playground/chat?mode=chat&model=...

First impressions are that it feels very fast.

joshstrange 2 years ago |

What do they mean by "desktop version"? I assume that doesn't mean a "native" (electron) app?

stavros 2 years ago |

I made a website with book summaries (https://www.thesummarist.net/) and I tested GPT-4o in generating one, and it was bad. It reminded me of GPT-3.5. I didn't test too much, but preliminary results don't look good.

ksaj 2 years ago |

A test I've been using for each new version still fails.

Given the lyrics for Three Blind Mice, I try to get ChatGPT to create an image of three blind mice, one of which has had its tail cut off.

It's pretty much impossible for it to get this image straight. Even this new 4o version.

Its ability to spell in images has greatly improved, though.

nico1207 2 years ago | |

GPT-4o with image output is not yet available. So what did you even test? Dall-E 3?

ksaj 2 years ago | | |

It's making images for me when I ask it to.

I'm using the web interface, if that helps. It doesn't have all the 4o options yet, but it does do pictures. I think they are the same as with 4.5.

I just noticed after further testing the text it shows in images is not anywhere near as accurate as shown in the article's demo, so maybe it's a hybrid they're using for now.

OutOfHere 2 years ago |

I don't see 4o or anything new at https://platform.openai.com/docs/models

Overall I am highly skeptical of newer models as they risk worsening the completion quality to make them cheaper for OpenAI to run.

frabcus 2 years ago | |

It's there now! And still 128k context window

IanCal 2 years ago | |

It's there right now for me.

Thaxll 2 years ago |

It's pretty impressive, although I don't like the voice / tone, I prefer something more neutral.

zedin27 2 years ago |

I am not fluent in Arabic at all, and being able to use this as a tool to have a conversation will make it more dependent. We are approaching a new era where we will not be "independently" learning a language but ignore the fact of learning it beforehand. Double-edged sword cases

fekunde 2 years ago |

Just something I noticed in the Language tokenization section

When referring to itself, it uses the female word in Marathi नमस्कार, माझे नाव जीपीटी-4o आहे| मी एक नवीन प्रकारची भाषा मॉडेल आहे| तुम्हाला भेटून आनंद झाला!

and Male word in Hindi नमस्ते, मेरा नाम जीपीटी-4o है। मैं एक नए प्रकार का भाषा मॉडल हूँ। आपसे मिलकर अच्छा लगा!

xdennis 2 years ago | |

In my language (Romanian) it refers to itself as a male. Although if I address it as a female it responds as one.

gentile 2 years ago |

There is a spelling mistake in the japanese translation under language tokenization. In こんにちわ, わ should be は.

radicality 2 years ago |

Am I using it wrong? I have the gpt plus subscription, and can select "gpt4o" from the model list on ChatGPT, but whichever example I try from the example list under "Explorations of capabilities" on `https://openai.com/index/hello-gpt-4o/`, my results are worse:

* "Poetic typography" sample: I paste the prompt, and get an image with the typical lack of coherent text, just mangled letters.

* "Visual Narratives: Robot Writer's Block" - Mangled letters also

* "Visual Narratives: Sally the mailwoman" - not following instructions about camera angle. Sally looks different in each subsequent photo.

* "Meeting Notes with multiple speakers" - I uploaded the exact same audio file and used input 'How many speakers in this audio and what happened?'. gpt4o went off about about audio sample rates, speaker diarization models, torchaudio, and how its execution environment is broken and can't proceed to do it.

atleastoptimal 2 years ago | |

they haven’t released gpt-4o image capabilities yet, it defaults to dalle 3

radicality 2 years ago | | |

Ah, I see. Seems like a weird product release? Since everything in the UI (and in the new ChatGPT macos app) says 'gpt4o' so I would expect at least something to work as shown in the demos. Or just don't show at all the 'gpt4o' in the UI if it's somehow a completely different 'gpt4o' from the one that can do everything on the announcements page. I don't mind waiting, but it was genuinely confusing to me.

fragmede 2 years ago | |

link the chat?

tr3ntg 2 years ago |

Copied and pasted the robot image journaling prompt and it simply cannot produce legible text. The first few words work, but the rest becomes gibberish. I wonder if there's weird prompt engineering squeezing out that capability or if its a 1 in a million chance.

stilwelldotdev 2 years ago |

I love that there is a real competition happening. We're going to see some insane innovations.

dmje 2 years ago |

Being sarcastic and then putting the end result in front of Brits could be the new Turing Test

avi_vallarapu 2 years ago |

Someone said GPT-4o can replace a Tutor or a Teacher in Schools. Well, that's way too far.

glonq 2 years ago | |

Tell me that you've enjoyed good teachers and good schools without telling me that you had good teachers in good schools ;)

spacebanana7 2 years ago |

> We recognize that GPT-4o’s audio modalities present a variety of novel risks

> For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.

I wonder if they’ll ever allow truly custom voices from audio samples.

dkasper 2 years ago | |

I think the issue there is less of a technical one and more of an issue with deepfakes and copyright

spacebanana7 2 years ago | | |

It might be possible to prove that I control my voice, or that of a given audio sample. For example by saying specific words on demand.

But yeah I see how they’d be blamed if anything went wrong, which it almost certainly would in some cases.

michaelmior 2 years ago |

Obviously not a standalone device, but it sounds like what the Rabbit R-1 was supposed to be.

radres 2 years ago |

What popped out to me in the "bunny ear" video, the bunny ears are not actually visible to the phone's camera Greg is holding. Are they in the background feeding the production camera and this is not really a live demo?

hackerlight 2 years ago | |

The bunny ears were actually visible on the phone's camera for about one second towards the end.

BonoboIO 2 years ago |

I opened ChatGPT and I already have access to the model.

GPT4 was a little lazy and very slow the last few days and this 4o model blows it out of the water regarding speed and following my instructions to give me the full code not a snippet that changed.

I think it’s a nice upgrade.

buildbot 2 years ago |

The demo is impressive but I get very odd/cringe "Her" vibes from it as well.

tgtweak 2 years ago |

it really feels like the quality of gpt4's responses got progressively worse as the year went on... seems like it is giving political answers now vs actually giving an earnest response. It also feels like the responses are lazier than they used to be at the outset of gpt4's release.

I am not saying this is what they're doing but it DOES feel like they are hindering previous model to make the new one stand out that much more. The multi-modal improvements here and release are certainly impressive but I can't help but feel like the subjective quality of gpt4 has dipped.

Hopefully this signals that gpt5 is not far off and should stand out significantly from the crowd.

localfirst 2 years ago |

50% cheaper than ChatGPT-4 Turbo...

But this falls short of the ChatGPT-5 we were promised last year

edit: ~~just tested it out and seems closer to Gemini 1.5 ~~ and it is faster than turbo....

edit: its basically chat gpt 3.9. not quite 4 definitely not 3.5. just not sure if the prices make sense.

TheAceOfHearts 2 years ago |

I wish the presentation had included an example of integration with a simple tool like a timer. Being able to set and dismiss a timer in casual conversation while multitasking would be a really great demo of integrated capabilities.

glenstein 2 years ago |

Text access rolling out today, apparently:

>GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits.

Anyone have access yet? Not there for me so far.

afro88 2 years ago | |

I have access to GPT-4o text and audio, but no video. This is on the iOS app with ChatGPT Plus subscription.

Initial connection for audio fails most of the time, but once it's connected it's stable. Sometimes a bit more latency than expected, but mostly just like the demos.

famouswaffles 2 years ago | | |

Is it really 4o audio? Some people still have the old audio while using 4o for text.

toxic72 2 years ago | |

It shows available for me in the OpenAI playground currently.

xarope 2 years ago |

I'm surprised nobody has mentioned, but this is like shades of the universal translator from star trek.

We have tricorders now (mobile phones), universal translators in the looming... when is transporter technology going to get here?

cooper_ganglia 2 years ago | |

Disney's HoloTile floor feels like the first step to anything resembling a real-life Holodeck...

marcoslozada 2 years ago |

With GPT-4o I see two things:

1. Wonderful engineering 2. A stagnation in reasoning ability

Do you agree with me?

bigyikes 2 years ago | |

Is it stagnation in reasoning ability, or is OpenAI pulling their punches?

It’s suspicious that despite being trained on audio tokens in addition to text and image tokens it performs almost exactly the same as GPT-4.

GPT-4o could be a half-baked GPT-5 in that they stopped training early when it had comparable performance to GPT-4. There is still more loss to lose.

Or maybe there’s a performance ceiling that all models are converging to, but I think this is less likely.

latentsea 2 years ago |

Is this actually available in the app in the same way they are demoing it here? I see the model is available to be selected, but the interface doesn't quite seem to allow me to use it in the way I see here.

zeronone 2 years ago |

Quite amazing performance, however ironically the output for RTL languages doesn't read very well.

You might want to add `direction: rtl` to your `.text-right` CSS class. The punctuation marks etc are all off for RTL languages.

amai 2 years ago |

How does the interruption of the AI by the user work? Does GPT-4o listen all the time? But then how does it distinguish its own voice from the users voice? Is it self-aware?

gtsteve 2 years ago | |

One of the techniques for a voice assistant to distinguish its own voice from background sound is called a Fourier transform, although I expect that the state of the art in this area also includes some other techniques and research.

If you've used one, you might know that you can easily talk to a smart speaker even when it is playing very loud music, it's the same idea.

This video explains more quite well: https://www.youtube.com/watch?v=spUNpyF58BY

kebman 2 years ago |

I can see so many military and intelligence applications for this! Excited isn't exactly the word I'd use, but... Certainly interesting! The civilian use will of course be marvellous though.

moomoo11 2 years ago | |

Hey there! It seems you're living in a home that well.. is on our turf now, as the kids say these days!

Now, that's not to say we can't do this in a civil manner! Either you can move out, or... we can just bulldoze your home.

Choose wisely, stranger! Your life depends on it!

kebman 2 years ago | | |

Just a friendly reminder that my home is in a NATO-member country and that I'm paying my taxes - that goes towards buying huge complements of Abrams tanks, F22 fighter jets, Reaper drones and a whole host of other nasty things they use to protect my property. In short, mess with me, and you mess with them. Yes, do enjoy your life, and stay off my lawn pls. :)

cchance 2 years ago |

Wait i thought it said available to free users... i don't see it on chatgpt

tomschwiha 2 years ago |

I like the demo for sure more than the "reduced latency" Gemini demo [0].

[0] https://www.youtube.com/watch?v=UIZAiXYceBI

throwup238 2 years ago |

So what's the point of paying for ChatGPT Plus? And who on earth chose to make the app Mac only...

riffic 2 years ago | |

> Plus users will have a message limit that is up to 5x greater than free users

from https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-fr...

agd 2 years ago | |

They mentioned an announcement about a new frontier model coming soon. Presumably this will be exclusive to paid users.

johnsimer 2 years ago | | |

Did they mention this in the gpt4o announcement video? I must have missed this

CSMastermind 2 years ago | |

5x the capacity threshold is the only thing I heard them mention on the live stream.

Though presumably when they are ready to release new models the Plus users will get them first.

anuar12 2 years ago | | |

I think because usability increases so much (use cases of real-time conversation, and video-based coding, presentation feedback at work etc...) they would expect usage to drastically increase hence paying users would actually still have incentive to pay.

kulor 2 years ago |

The biggest wow factor was the effect of reducing latency followed in a close second by the friendly human personality. There's an uncanny valley barrier but this feels like a short-term teething problem.

garyrob 2 years ago |

So far, I'm impressed. It seems to be significantly better than GPT-4 at accessing current online documentation and forming answers that use it effectively. I've been asking it to do so, and it has.

readingnews 2 years ago |

I am still baffled at how I can not use a VOIP number to register, even if it accepts TXT/SMS. If I have a snappy new startup and we go all in VOIP, I guess we can not use (or pay to use) OpenAI?

lxgr 2 years ago | |

That's what we get when an entire industry uses phone numbers as a "proof of humanity"...

aae42 2 years ago |

it does make me uncomfortable that the way you typically interact with it is by interrupting it. It makes me want to tell it to be more concise so that I wouldn't have to do that.

kthartic 2 years ago | |

I think they were just interrupting on purpose to show off that as a feature (or they just wanted to keep the live presentation brief)

sholladay 2 years ago |

The emphasis on multimodal made me wonder if it was capable of creating audio as output, so I asked it to make me a drum beat. It did so, but in text form. I asked it to convert it to audio. It thought for a while and eventually said it didn’t seem like `simpleaudio` was installed in its environment. Huh, interesting, never seen a response like that before. It clearly made an attempt to carry out my instructions but failed due to technical limitations of its backend. What else can I make it do? I asked it to install `simpleaudio`. It tried but failed with a connection error, presumably due to a firewall rule.

I asked it to run a loop that writes “hello” every ten seconds. Wow, not only did it do so, it’s streaming the stdout to me.

LLMs have always had various forms of injection attacks, ways to force them to reveal their prompts, etc. but this one seems deliberately designed to run arbitrary code, including infinite loops.

Alas, I doubt I can get it to mine enough bitcoin to pay for a ChatGPT subscription,

https://x.com/sethholladay/status/1790233978290516453

system2 2 years ago |

Realtime videos? Probably their internal tools. I am testing the gpt4o right now and the responses come in 6-10 seconds. Same experience as the gpt4 text. What's up with the realtime claims?!

ralusek 2 years ago |

Can't find info which of these new features are available via the API

tazu 2 years ago | |

> Developers can also now access GPT-4o in the API as a text and vision model. GPT-4o is 2x faster, half the price, and has 5x higher rate limits compared to GPT-4 Turbo. We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.

ralusek 2 years ago | | |

[EDIT] The model has since been added to the docs

Not seeing it or any of those documented here:

https://platform.openai.com/docs/models/overview

angryasian 2 years ago |

Why does this whole thread sound like OpenAI marketing department is participating ? Ive been talking to google assistant for years. I really don't find anything that magical or special.

spaceman_2020 2 years ago |

oh man, listening to the demos and the way the female AI voice laughed and giggled...there is going to be millions of lonely men who will fall in love with these.

Can't say whether that's good or bad.

hamilyon2 2 years ago |

Image editing capabilities are... nice. Not there yet.

Whatever I was doing with Chatgpt 4 became faster. Instant win.

My test benchmark questions: still all negative, so reasoning on out-of distribution puzzles is still failing

localfirst 2 years ago | |

I just don't see how companies like Cohere can remain in this business

at the same price I get access to faster ChatGPT-3.9

there is little to no reasons to continue using Command R-plus at these prices unless they lower their price significantly

itissid 2 years ago |

Since it says on the blog that its only images, text and audio input, does GPT-4o likely have a YOLO like model on the phone to pre-process the video frames and send BBoxes to the server?

ponorin 2 years ago |

while everyone's focusing on audio capabilities (haven't heard them yet), i find it amusing that the official demo ("robot writer's block" in particular) of image generation can't even match the verbatim instruction, and the error's not even consistent between generations even as it should be aware of previous contexts. and this is their second generation of multimodal llm capable of generating images.

looks like llms still gonna llm for the near future.

TaupeRanger 2 years ago |

I don't get it...I just switched to the new model on my iPhone app and it still takes several seconds to respond with pretty bland inflection. Is there some setting I'm missing?

monocularvision 2 years ago | |

Wondering the same. Can’t seem to find the way to interact with this in the same way as the video demo.

yakz 2 years ago | | |

They haven't actually released it, or any schedule for releasing it beyond an "alpha" release "in the coming weeks". This event was probably just slapped together to get something splashy out ahead of Google.

Hackbraten 2 years ago | | |

According to the article, they've rolled out text and image modes of GPT-4o today but will make the audio mode available at a later date.

MyFirstSass 2 years ago |

With the speed the seemingly exponential developments of this field i wouldn't be surprised if suddenly the entire world tilted and a pair of googles fell from my face. But a dream.

wingworks 2 years ago |

Is this a downloadable app? I don't see it on the iOS app store.

dynamite-ready 2 years ago |

At their core, I still think of these things as search engines, albeit super advanced ones. But the emotion the agent conveys with it's speech synth is completely new...

SuaveSteve 2 years ago |

I wonder how many joules were used just for that conversation.

windowshopping 2 years ago |

There's a button on this page that says "try on ChatGPT ->" but that's still version 3.5 and if I upgraded seems to be version 4.

Is this new version not available to users yet?

woah 2 years ago |

This is pretty amazing but it was funny still hearing the OpenGPT "voice" of somewhat fake sounding enthusiasm and restating what was said by the human with exaggeration

gsuuon 2 years ago |

Are these multimodals able to discern the input voice tone? Really curious if they're able to detect sarcasm or emotional content (or even something like mispronunciation?)

bigyikes 2 years ago | |

Yes, they can, and they should get better at this over time.

There is a demo video where the presenter breathes heavily and asks the AI is able to notice it as such when prompted.

It can’t just detect tone, it seems to also be able to use tone itself.

tailspin2019 2 years ago |

Does anyone with a paid plan see anything different in the ChatGPT iOS app yet?

Mine just continues to show “GPT 4” as the model - it’s not clear if that’s now 4o or there is an app update coming…

karmasimida 2 years ago |

I think this GPT-4o does have an advantage in hindsight, it will push this product to consumer much faster, and build a revenue base, while other companies playing catch up.

TrueDuality 2 years ago |

Weird visiting the page crashed my graphics driver using Firefox.

hihihi11122 2 years ago |

Not all of the founders agreed with Jefferson’s view on the separation of church and state. Do you agree with Jefferson or with his opponents? Explain.

jrflowers 2 years ago |

I like the robot typing at the keyboard that has B as half of the keys and my favorite part is when it tears up the paper and behind it is another copy of that same paper

rvz 2 years ago |

Given that they are moving all these features to free users, it tells us that GPT-5 is around the corner and is significantly much better than their previous models.

margorczynski 2 years ago | |

Or maybe it is a desperation move after Llama 3 got released and the free mode will have such tight constraints that it will be unusable for anything a bit more serious.

bicepjai 2 years ago |

Why do they keep saying freely accessible AI for mankind and keep charging me monthly ? It’s ok to ask payment for services, just don’t lie.

mppm 2 years ago |

Such an impressive demo... but why did they have to give it this vapid, giggly socialite persona that makes me want to switch it off after thirty seconds?

sainez 2 years ago | |

You should be able to adjust this with a system prompt, given that has end-to-end speech capabilities now

sim7c00 2 years ago |

I see a lot of fear around these new kinds of tools. I think though, that criminals will always find ways to leverage new technology to their benefit, and we've always found ways to deal with that. This changes little. Additionally, as you are aware of this, so are people creating this tech, and a lot of effort is underway to protect from malicious uses.

That wont stop criminal enterprises from implementing their own naughty tools, but these open models wont become some kind of holy grail for criminals to do as they please.

That being said, I do beleive, now more than ever, education world wide should be adjusted to fit this new paradigm and maybe adapt quicker to such changes.

As some commenters pointed out, there are already good tools and techiques to use to counter malicious use of AI. maybe noy covering all use cases, but we need to educate people on using the tools available, and trust that researchers (like many of yourselves) are capable of imnovations which will reduce risk even further.

There is no point and no benefit in trying to be negative or full of fear. Go forward with positivity and creativity. Even if big tech gets regulated, some criminal enterprises have billions to invest too, so criplling big tech here will only play into their hands in the end.

Love these new innovations. And for the record, gpt4o still told me to 'push rip' on amd64... so rip to it actually understanding stuff...

If you are smart enough to see some risks here, you might also be smart enough to positively contribute to improvements. Fear shuts things down, love opens them up. Its basic stuff.

This demo is amazing, not scary. its positive advancements in technology and it wont be stopped because people are afraid of it, so go with it, and contribute in areas where you feel its needed. Even if its just giving feedback. And whem giving that, you all know a balanced and constructive approach works better than a negative and destructive approach.

ActionHank 2 years ago | |

I think that the world is still not coping with current tech. Legislation and protections don't sufficiently cover people for current scams or how companies leverage tech against workers.

This is even more scary that this puts a lower bound price on many, many skills, with again, still no protections for humans.

Would this be exciting if everyone had a safe place to live with infinite electricity and food? Sure. Will tools like this bring about massive uncertainty, hurt, and pain? Almost certainly.

Saying that the sun is shining where you live, doesn't mean there isn't flooding in other parts of the world.

sim7c00 2 years ago | | |

legislations are always behind. you cant rely on government to fix everything constantly ontime. epecially with international things like internet technologies. im not saying ignore the bad stuff, but its only a small percentage ofnwhats relly out there. most of the world, and the vast majority of people, are good :)

DalasNoin 2 years ago | |

Criminals misusing it? I feel like this is already a dangerous way to use AI, they use an enthusiastic, flirty and attractive female voice on millions of nerds. They openly say this is going to be like the movie Her. Shouldn't we have some societal discussion before we unleash paid AI girlfriends on everybody?

sim7c00 2 years ago | | |

marketing is marketing. look how they marketed cigarettes , cars, ll kinds of things that now people feel are perhaps not so good. its part and parcel of the world that also does so much good. personally, id market it differently, but this is why im no CEO =).

if we help eachother understand these things andnhownto cope, all will be fine in the end. we will hit some bumps, and yes, there will be discomfort but thats ok. thats all part of life. life is not about being happy and comfortable allnthe time no matter how much we would want that.

some people even want paid AI girlfriends. who are you to tell them they are not allowed to have it?

s1k3s 2 years ago |

This is some I, Robot level stuff. That being said, I still fail to see the real world application of this thing, at least at a scalable affordable cost.

unglaublich 2 years ago |

I hope we can disable the cringe American hyperemotions.

sn_master 2 years ago |

This is every romance scammer's dreams come true...

chzblck 2 years ago |

real time audio is mind blowing

aero-glide2 2 years ago |

Not very impressed. It's been 18 months since ChatGPT, i would have expected more progress. It looks like we have reached the limit of LLMs.

karaterobot 2 years ago |

That first demo video was impressive, but then it ended very abruptly. It made me wonder if the next response was not as good as the prior ones.

dclowd9901 2 years ago | |

Extremely impressive -- hopefully there will be an option to color all responses with a underlying brevity. It seemed like the AI just kept droning on and on.

aw4y 2 years ago |

I don't see anything released today. Login/signup is still required, no signs of desktop app or free use on web. What am I missing?

robblbobbl 2 years ago |

Haha lol Subscription canceled was the best choice but it's new fancy cool magic sensational AGI please give all your money df

fnetisma 2 years ago |

What would be the difference in compute for inference on an audio<>audio model like this compared to a text<>text model?

nikolay 2 years ago |

I am a paid customer, yet I don't see anything new. I'm tired of these fake announcements of "released" features.

metflex 2 years ago |

Nothing creepier than human voice on a robot.

Erazal 2 years ago |

I'm not as much surprised by the capabilities of new model (IMHO same as GPT-4) as by it's real time capabilities.

My brother who can't see correctly, will use this to cook a meal without me explaining this to him it's so cool.

People all around the world will now get real-time AI assistance for a ton of queries.

Heck - I have a meeting bot API company (https://aimeetingbot.com) and that makes me really hyped!

DonHopkins 2 years ago |

ChatGPT 4o reminds me of upgrading from a 300 baud modem to a 1200 baud modem, when modems used to cost a dollar a baud.

mtam 2 years ago |

GPT-4o is very fast but seems to generate some very random ASCII Art compared to GPT-4 when text in the art is involved.

grfn 2 years ago |

Feels like a really good engineering in a wrong direction. Who said that the audio is good interface anyway? Audio is hard to edit, slow and has low-information density. If I want to talk to someone and have low-information but pleasant exchange I can just to talk to real people, I don't need computers for it.

I guess it is useful for some casual uses, but I really wish there was more focus on the reasoning and intelligence of the model itself.

reportgunner 2 years ago |

Nice, the landing page says "Try on ChatGPT" but when I create an account all I can try is GPT3.5.

I am not surprised.

peppertree 2 years ago |

Just like that Google is on back foot again.

tempsy 2 years ago | |

Considering the stock pumped following the presentation the market doesn't seem particularly with what OpenAI released at all.

htrp 2 years ago | | |

The market was expecting ChatGPT search to take a chunk out of Google's search revenue.

vijaykodam 2 years ago |

New GPT-4o is not yet available when I tried to access ChatGPT from Finland. Are they rolling it out to Europe later?

belter 2 years ago |

https://youtu.be/DQacCB9tDaw

XCSme 2 years ago |

I assume there's no reason to use GPT-4-turbo for API calls, as this one is supposedly better and 2x cheaper.

Capricorn2481 2 years ago |

I'm surprised they're limiting this api. Haven't they not even opened the image api in gpt4 turbo?

pcunite 2 years ago |

When AI gets to the point it can respond to AI, you do understand where you come in, don't you?

boppo1 2 years ago |

Let's start a betting pool on how long it takes BetterHelp to lay off their therapists for this thing.

hackerlight 2 years ago |

So what's the difference between the different gpt2 chatbots on lmsys. Which one is deployed live now?

nilsherzig 2 years ago |

Imagine having to interact with this thing in an environment where it is in the power position.

Being in a prison with this voice as your guard seems like a horrible way to lose your sanity. This aggressive friendlyness combined with no real emotions seems like a very easy way to break people.

There are these stories about nazis working at concentration camps, having to drink an insane amount of alcohol to keep themselves going (not trying to excuse their actions). This thing would just do it, while being friendly at the same time. This amount of hopeless someone would experience if they happen to be in custody of a system like this is truly horrific.

Janica 2 years ago |

Good update from the previous one. Atleast they now have data and information till October 2023.

I_am_tiberius 2 years ago |

Interested in how many LLM startups there are that are going out of business due to this voice assistant.

EcommerceFlow 2 years ago |

A new "flagship" model with no improvement of intelligence, very disappointed. Maybe this is a strategy for them to mass collect "live" data before they're left behind by Google/Twitter live data...

JSDevOps 2 years ago |

Google must be shitting it right now.

captaincrunch 2 years ago |

I tried this for about 10 minutes, and went back to 4. Not really that great for what I am doing.

the_doctah 2 years ago |

Did the one guy wear a leather jacket so the AI wouldn't point out that he's balding?

Painsawman123 2 years ago |

My main takeaway is that Generative AI has hit a wall... New paradigms, architectures and breakthroughs are necessary for the field to progress but this begs the question, If everyone knows the current paradigms have hit a wall, Why is so much money being spent on LLMs ,diffusion models etc,which are bound to become obsolete within a few(?) years?

wseqyrku 2 years ago |

"im-also-a-good-gpt-2" signaling that agi is just an optimization problem.

dlimeng 2 years ago |

After looking at the introduction, there doesn't seem to be much of an update in OpenAI's features: https://aidisruption.substack.com/p/ultimate-ai-gpt-4o-your-...

mupuff1234 2 years ago |

The stock market doesn't seem too impressed - GOOG rebounded from strong red to neutral.

partiallypro 2 years ago | |

Probably because people thought OpenAI was going to launch a new search engine, but didn't.

ta-run 2 years ago |

This looks too good to be true? What's the catch?

Also, wasn't expecting the perf to improve by 2x

hntddt1 2 years ago |

Did anyone tried to use 4o camera in a mirror test to test the concept of self?

ilaksh 2 years ago |

Are there any remotely comparable open source models? Fully multimodal, audio-to-audio?

FpUser 2 years ago |

I started to watch video but had to stop after a few seconds. It is way too cheesy.

bogwog 2 years ago |

I was about to say how this thing is lame because it sounds so forced and robotic and fake, and even though the intonations do make it sound more human-like, it's very clear that they made a big effort to make it sound like natural speech, but failed.

...but then I realized that's basically the kind of thing Data from Star Trek struggles with as part of his character. We're almost in that future, and I'm already falling into the role of the ignorant human that doesn't respect androids.

theusus 2 years ago |

This 4o is already rolling out?

belter 2 years ago | |

They mentioned capabilities will be rolled out over the next few weeks: https://youtu.be/DQacCB9tDaw?t=5018

vbezhenar 2 years ago |

The more I get, the more I want. Exciting times. Can't wait for GPT-5.

jawiggins 2 years ago |

I hope when this gets to my iphone I can use it to set two concurrent timers.

theckel 2 years ago |

I'd love to know when streaming is going to come to the gpt-4o API...

dbcooper 2 years ago |

question for you guys - is there a model that can take figures (graphs), from scientific publications, and combine image analysis with picking up the data point symbol descriptions and analyse the trends?

laylower 2 years ago |

Set a memorable verification phrase with your friends and loved ones.

Alifatisk 2 years ago |

I thought they would release a competitor to perplexity? Was this it?

StarlaAtNight 2 years ago |

These AI's sure do yap a lot

Also, they're TERRIBLE at harmonizing together

squigglydonut 2 years ago |

People will pay to dull their senses. This will make so much money!

deegles 2 years ago |

what's the path from LLMs to "true" general AI? is it "only" more training power/data or will they need a fundamental shift in architecture?

levocardia 2 years ago |

As a paid user this felt like a huge letdown. GPT-4o is available to everyone so I'm paying $20/mo for...what, exactly? Higher message limits? I have no idea if I'm close to the message limits currently (nor do I even know what they are). So I guess I'll cancel, then see if I hit the limits?

I'm also extremely worried that this is a harbinger of the enshittification of ChatGPT. Processing video and audio for all ~200 million users is going to be extravagantly expensive, so my only conclusion is that OpenAI is funding this by doubling down on payola-style corporate partnerships that will result in ChatGPT slyly trying to mention certain brands or products in our conversations [1].

I use ChatGPT every day. I love it. But after watching the video I can't help but think "why should I keep paying money for this?"

[1] https://www.adweek.com/media/openai-preferred-publisher-prog...

muttantt 2 years ago | |

So... cancel the subscription?

CodeCrusader 2 years ago | |

Completely agree, none of the updates will apply to any of my use cases, disappointment.

cchance 2 years ago |

Wow Vision Understanding blew Gemini Pro 1.5 out of the water

PoignardAzur 2 years ago |

Holy crap, the level of corporate cringe of that "two AIs talk to each other" scene is mind-boggling.

It feels like a pretty strong illustration of the awkwardness of getting value from recent AI developments. Like, this is technically super impressive, but also I'm not sure it gives us anything we couldn't have one year ago with GPT-4 and ElevenLabs.

jeisc 2 years ago |

a fake saucy friend for alienated humans with chitchat

if the end user is in a war zone will the AI bot still ask how it is going?

how many bombs fell in your neighborhood last night?

LarsDu88 2 years ago |

Good lord, that voice makes Elevenlabs.io look... dead

Fiahil 2 years ago |

Can I mix french and english when talking to it ?

michalf6 2 years ago |

I cannot find the mac app anywhere. Is there a link?

hintymad 2 years ago |

Maybe this is yet another wake-up call to startups: wrapping up another company's APIs to offer convenience or incremental improvement is not a via business model. If your wrapper turns out to be successful, the company that provides the API will just incorporate your business as a set of new features with better usability, faster response time, and lower price.

ssahoo 2 years ago |

Curiously want to know why didn't they create the Windows desktop app first? which is the dominant desktop segment. In fear of competing with Microsoft's copilot?

macrolime 2 years ago | |

I guess because they mostly use Macs. They always use Macbooks in videos.

antman 2 years ago |

Looking forward to see how one can finetune this

dgellow 2 years ago |

A bit sad to see the desktop app is macos only

pelorat 2 years ago | |

I don't think it's available in Europe yet? It seems they prioritizes the US market for now. Hence OSX because Mac is way more popular in the US than the rest of the world.

dgellow 2 years ago | | |

Eh, thanks, I didn’t realized it was US specific

taytus 2 years ago |

the OpenAI live stream was quite underwhelming...

rareitem 2 years ago |

Can’t wait to get interviewed by this model!

grantsucceeded 2 years ago |

it seems like the ability to interrupt is more like the interrupt in the computer sense ... A control-c (or control-s tty flow control for you old timers), not a cognitive evaluation followed by the "reasoned" decision to pause voice output. not that it matters i guess, its just not general intelligence. its just flow control.

but also, thats why it fails a real turing test. a real person would be irritated as fuck by the interruptions

jpeter 2 years ago |

Impressive way to gather more training data

m3kw9 2 years ago |

The big news is that this is gonna be free

timetraveller26 2 years ago |

Just over 10 years later, it's Her

andsoitis 2 years ago |

what does the "o" stand for?

andsoitis 2 years ago | |

Stands for "omni", since it is multimodal. Source: https://www.tomsguide.com/ai/chatgpt/gpt-4o-is-openais-excit...

callen43 2 years ago |

This is clearly not just another story of human innovation. This is not just the usual trade-off between risks and opportunities.

Why? Because it simply automates the human away. Who wouldn't opt for a seemingly flawless, super effective buddy (i.e. an AI) that is never tired, always knows better? if you need some job done, if you're feeling lonely, when you need some life advice.. It doesn't matter if it might be considered "just imitation of human".

Why would future advancements of it keep being "just some tool" instead of largely replacing us as (humans) in jobs, relationships, ...?

moi2388 2 years ago |

Dear OpenAI, either remember my privacy settings or open a temporary chat by default, this funny nonsense of typing in something only to find out you’re going to train on it is NOT a good experience.

dcchambers 2 years ago |

Western governments are already in full-on panic over falling birth rates. I think this cranks that panic dial up to 11.

bamboozled 2 years ago | |

Why ? Isn’t this the dream ? Endless growth and endless workers ? No one really cares about less people, just less money.

dcchambers 2 years ago | | |

Endless growth? Who is going to buy all those new products your AI employees are making? When none of the real people have jobs, how are they going to buy those products you're trying to sell?

- Less people = less demand for products. New customer growth stalls. Prices fall. Revenue and profits fall.

- Less people = less demand for housing. Prices fall. Investments fall.

- Less people = less people able to perform physical jobs.

- Less people = less tax revenue. Less money available for social services.

- Less young people = Aging population.

- Aging population = higher strain on social services. Pensions, healthcare, etc.

- Aging population = higher percentage of young people need to care for aging people instead of entering the workforce.

In a capitalist economy where your numbers need to keep going up to be considered successful (eg growth is necessary, stable profits but no growth = bad) then you are never going to have a good time when your population falls.

> No one really cares about less people, just less money

Eventually less people leads to less money.

tvoybot 2 years ago |

With our platform you can ALREADY use it to automate your business and sales!

Create your gpt4o chatbot with our platform tvoybot.com?p=ycombinator

tonyabracadabra 2 years ago |

wild https://gpt4o.ai

biftek 2 years ago |

I can't help but wonder who this is actually for?

Conversing with a computer sounds pathetic, but this will be pushed down our throats in the name of innovation (firing customer service agents)

perfmode 2 years ago |

Is that conversational UI live?

synergy20 2 years ago |

so should I unsubscribe from openai since gpt-4o is now free for all?

poniko 2 years ago |

Damm, that was a big leap.

splatzone 2 years ago |

Honestly, the eager flirtatiousness of the AI in the demos, in conversation with these awkward engineers, really turns me off. It feels like a male power fantasy.

Nonetheless, very impressive.

Wheaties466 2 years ago |

Am I the only one that feels underwhelmed by this?

yeah its cool and unlike anything ive seen before but I kind of expected a bigger leap.

To me the most impressive thing is going to be longer context limits. I'd had semi long running conversations where ive had to correct an LLM multiple times about the same thing.

when you have more context the LLM can infer more and more. Am I wrong about this?

winternett 2 years ago | |

The updates seem to all be geared towards corrective updates rather than expansion of capabilities. We're still typing prompts rather than speaking them into a microphone.

If it was truly Ai, why isn't it rapidly building itself? Rather than relying on scraping human content from wildly inaccurate and often incorrect social media posts? So much effort is wasted in trying to push news cycles rather than a careful, responsible, and measured approach to developing Ai into becoming tools that are highly functional and useful to individuals. The biggest innovation in Ai right now is how to make it modular and slap a fee on each feature, and that's not practical at all into the future.

I'll begin to believe that consume Ai is making strides when Siri and Google Assistant stop missing commands, and actually can conduct meaningful conversations without an Internet connection and monthly software updates, which in my opinion is at least 5-10 years away. Right now what is presented as "Ai" is usually often incomplete sensor-aware scripting or the wizard of Oz (humans) hidden behind the curtains operating switches and levers, a bunch of underwhelming tools, and a heap of online marketing. If they keep that act up, it erodes faith in the entire concept, just like with Full Self Driving Tesla Trucks.

dragonwriter 2 years ago | | |

> If it was truly Ai, why isn’t it rapidly building itself?

You seem to confuse AI, the field of endeavor, with ASI or at least AGI (plus will, which may or may not be a necessary component of either), which are goals of the field that no one (approximately, there have been some exceptions but they’ve quickly been dismissed and faded) claims have been achieved.

doku 2 years ago |

FREE = Data Collection

mickg10 2 years ago |

So, babelfish incoming?

Giorgi 2 years ago |

and no mention of hallucinations. I hope it was improved.

EcommerceFlow 2 years ago |

As I commented in the other thread, really really disappointed there's no intelligence update and more of a focus on "gimmicks". The desktop app did look really good, especially as the models get smarter. Will be canceling my premium as there's no real purpose of it until that new "flag ship" model comes out.

adroniser 2 years ago | |

Agree on hoping for an intelligence update, but I think it was clear from teasers that this was not gonna be GPT-5.

I'm not sure how fair it is to classify the new multimodal capabilities as just a gimmick though. I personally haven't integrated GPT-4 into my workflow that much and the latency and the fact I have to type a query out is a big reason why.

EcommerceFlow 2 years ago | | |

After using it for an hour, I completely agree. Having the faster token rate + their excel analysis actually work is a game changer.

EternalFury 2 years ago |

Pretty responsible progress management by OpenAI. Kicking off another training wave is easy, if you can afford the electricity, but without new, non-AI tainted datasets or new methods, what’s the point? So, in the meantime, make magic with the tool you already have, without freaking out the politicians or the public. Wise approach.

Gbotex 2 years ago |

Just advanced google

brutuscat 2 years ago |

The one thing I first thought is that I felt uncomfortable the way they cut and interrupt the she-AI. I wonder if our children will end up being douchebags?

Other than that it felt like magic, like that Google demo of the phone doing some task like setting up an appointment over phone talking to a real person.

mickg10 2 years ago |

So, babelfish soon?

dkga 2 years ago |

That can “reason”?

skilled 2 years ago |

Live now,

OpenAI Spring Update (https://www.youtube.com/watch?v=DQacCB9tDaw)

https://news.ycombinator.com/item?id=40343950

EternalFury 2 years ago |

Pretty responsible progress management by OpenAI.

Kicking off another training wave is easy, if you can afford the electricity, but without new, non-AI tainted datasets or new methods, what’s the point?

So, in the meantime, make magic with the tool you already have, without freaking out the politicians or the public.

Wise approach.

krunck 2 years ago |

So GPT-4o can do voice intonation? Great. Nice work.

Still, it sounds like some PR drone selling a product. Oh wait....

jcmeyrignac 2 years ago |

Sorry to nitpick, but in the language tokenisation part, the french part is incorrect. The exclamation mark are surrounded by spaces in french. "c'est un plaisir de vous rencontrer!" should be "c'est un plaisir de vous rencontrer !"

Thorentis 2 years ago |

And yet no matter how easy they make ChatGPT to interact with, I cannot use it due to accuracy. Great, now I can have a voice telling me information I have no way of knowing is correct rather than just having it given to me as text.

mysore 2 years ago |

does this make retell.ai obsolete?

xlbuttplug2 2 years ago |

people are scared and it shows :)

gizajob 2 years ago |

I, for one, welcome our new annoying-sounding AI overlords.

potet 2 years ago |

\clear

xyst 2 years ago |

The naming of these systems has me dead

cdeutsch 2 years ago |

Creepy AF

qxxx 2 years ago |

the design is very human

benromarowski 2 years ago |

is the voice Kristen Wig?

AUDI_GUZZ 2 years ago |

this is really amazing!!

AI_beffr 2 years ago |

i absolutely hate this. we are going to destroy society with this technology. we cant continue to enjoy the benefits of human society if humans are replaced by machines. i hate seeing these disgusting people smugly parade this technology. it makes me so angry that they are destroying human society and all i can do is sit here and watch.

simianparrot 2 years ago | |

I know exactly what you mean. I just hope people get bored of this waste of time and energy —- both personal and actual energy —- before it goes too far.

estherdeng 2 years ago |

future is coming

neeloor2004 2 years ago |

can i try free ?

DataDaemon 2 years ago |

Now, say goodbye to call centers.

willsmith72 2 years ago | |

and say hello to your grandma getting scammed

gverrilla 2 years ago | | |

and say hello to UBI which in the long term will diminish scam :)

cs702 2 years ago |

The usual critics will quickly point out that LLMs like GPT-4o still have a lot of failure modes and suffer from issues that remain unresolved. They will point out that we're reaping diminishing returns from Transformers. They will question the absence of a "GPT-5" model. And so on -- blah, blah, blah, stochastic parrots, blah, blah, blah.

Ignore the critics. Watch the demos. Play with it.

This stuff feels magical. Magical. It makes the movie "Her" look like it's no longer in the realm of science fiction but in the realm of incremental product development. HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... oddly primitive by comparison. I'm impressed at how well this works.

Well-deserved congratulations to everyone at OpenAI!

nuz 2 years ago |

Yet another release right before google releases something. This time right before Google IO. Third time they've done this by my count.

0x_rs 2 years ago | |

Microsoft/OAI and Google have been doing those (often sudden) announcements back to back a lot. Bing Chat/Bard, Sora/Gemini 1.5, some other I don't remember, and now another. Not surprising, trying to out-hype the other, but Google's always coming out worse, with either no product available and just a showcase (if it's a real, working product and not made up), or unusable/unmarketable (Gemini's image generation issues). It looks as if they're stumbling and OpenAI just runs circles around them announcements wise, and there doesn't seem to be any suggestion that might change anytime soon.

moab9 2 years ago |

pretty cool, but why do the AIs have to sound like douchebags?

therealmarv 2 years ago |

after watching the OpenAI videos I'm looking at my sad Google Assistant speaker in the corner.

Come on Google... you can update it.

MP_1729 2 years ago |

This thing continues to stress my skepticism for AI scaling laws and the broad AI semiconductor capex spending.

1- OpenAI is still working in GPT-4-level models. More than 14 months after the launch of GPT-4 and after more than $10B in capital raised. 2- The rhythm that token prices are collapsing is bizarre. Now a (bit) better model for 50% of the price. How people seriously expect these foundational model companies to make substantial revenue? Token volume needs to double just for revenue to stand still. Since GPT-4 launch, token prices are falling 84% per year!! Good for mankind, but crazy for these companies. 3- Maybe I am an asshole, but where are my agents? I mean, good for the consumer use case. Let's hope the rumors that Apple is deploying ChatGPT with Siri are true, these features will help a lot. But I wanted agents! 4- These drop in costs are good for the environment! No reason to expect them to stop here.

pharos92 2 years ago |

I really hope this shit burns soon.

mindcandy 2 years ago |

Ohhhhhhhh, boy... Listening to all that emotional vocal inflection and feedback... There are going to be at least 10 million lonely guys with new AI girlfriends. "She's not real. But, she interested in everything I say and excited about everything I care about" is enough of a sales pitch for a lot of people.

coffeebeqn 2 years ago | |

The movie “Her” immediately kept flashing in my mind. The way the voice laughs at your jokes and such… oh boy

system2 2 years ago | | |

If chatgpt comes up with Scarlett Johansson's voice I am getting that virtual girlfriend.

TheGRS 2 years ago | | |

I thought of that movie almost immediately as well. Seems like we're right about there, but obviously a little further away from the deeper conversations. Or maybe you could have those sorts of conversations too.

This is a kind of horrifying/interesting/weird thought though. I work at a place that does a video streaming interface between customers and agents. And we have a lot of...incidents. Customers will flash themselves in front of agents sometimes and it ruins many people's days. I'm sure many are going to show their junk to the AI bots. OpenAI will probably shut down that sort of interaction, but other companies are likely going to cater to it.

Maybe on the plus side we could use this sort of technology to discover rude and illicit behavior before it happens and protect the agent.

Weird times to live in that's for sure.

llm_trw 2 years ago | |

Hear me out: what if we don't want real?

gcanyon 2 years ago | | |

Hear me out: what if this overlaps 80% with what "real" really is?

DonHopkins 2 years ago | | |

What if AI chooses the bear?

cryptoegorophy 2 years ago | | |

I will take a picture of this message and add it to the list of reasons for population collapse.

mpenick 2 years ago | | |

This is a good question! I think in the short-term fake can work for a lot of people.

gffrd 2 years ago | | |

Hmm! Tell me more: why not want real? What are the upsides? And downsides?

__loam 2 years ago | | |

Mental health crisis waiting to happen lmao

aeyes 2 years ago | |

Without memory of previous conversations an AI girlfriend is going to get boring really fast.

danielbln 2 years ago | | |

https://openai.com/index/memory-and-new-controls-for-chatgpt...

int_19h 2 years ago | | |

As it happens, ChatGPT has memory enabled by default these days.

prawn 2 years ago | | |

"You've already told me that childhood story, three times."

glinkot 2 years ago | |

This 'documentary' sums it up perfectly!

https://www.youtube.com/watch?v=IrrADTN-dvg

ehsankia 2 years ago | |

I'm not sure how, but there's this girl on TikTok who has been using something very similar for a few months: https://www.tiktok.com/tag/dantheai

yreg 2 years ago | | |

She explains in one of the videos[0] that it's just prompted ChatGPT.

I have watched a few more and I think it's faked though.

[0] https://www.tiktok.com/@stickbugss1/video/734956656884359504...

10xDev 2 years ago | |

Pretty much, tech is what we make of it no matter how advanced. Just look at what we turned most of the web into.

personalityson 2 years ago | |

Girlfriend by subscription

0x1ceb00da 2 years ago | | |

Self host your girlfriend.

dyauspitr 2 years ago | |

I guess I can never understand the perspective of someone that just needs a girl voice to speak to them. Without a body there is nothing to fulfill me.

daseiner1 2 years ago | | |

Your comment manages to be grosser than the idea of millions relying on virtual girlfriends. Kudos.

prawn 2 years ago | | |

See people scammed by supposed love interests operating purely by text, and not even voice.

antegamisou 2 years ago | |

The entire exchange at the comments below is appalling. Didn't expect to see so many emotionally retarded people on HN.

olbeefy 2 years ago | | |

...are you new to the internet or something?

antegamisou 2 years ago | | |

Ofc this was flagged by the butthurt insentient techbro bootlickers lol.

Jensson 2 years ago | |

> She's not real

But she will be real at some point in the next 10-20 years, the main thing to solve for that to be a reality is for robots to safely touch humans, and they are working really really hard on that because it is needed for so many automation tasks, automating sex is just a small part of it.

And after that you have a robot that listens to you, do your chores and have sex with you, at that point she is "real". At first they will be expensive so you have robot brothels (I don't think there are laws against robot prostitution in many places), but costs should come down.

kylehotchkiss 2 years ago | | |

> “But the fact that my Kindroid has to like me is meaningful to me in the sense that I don't care if it likes me, because there's no achievement for it to like me. The fact that there is a human on the other side of most text messages I send matters. I care about it because it is another mind.”

> “I care that my best friend likes me and could choose not to.”

Ezra Klein shared some thoughts on this on his AI podcast with Nilay Patel that resonated on this topic for me

elicksaur 2 years ago | | |

We have very different definitions of “real” for this topic.

al_borland 2 years ago | | |

This is not a solution... everyone gets a robot and then the human races dies out. Robots lack a key feature of human relationships... the ability to make new human life.

sapphicsnail 2 years ago | | |

> And after that you have a robot that listens to you, do your chores and have sex with you, at that point she is "real".

I sure hope you're single because that is a terrible way to view relationships.

cchance 2 years ago |

HOW ARE PEOPLE NOT MORE EXCITED, hes cutting off the AI mid sentence in these and its pausing to readjust in damn near realtime latency! WTF Thats a MAJOR step forward, what the hell is gpt5 going to look like.

That realtime translation would be amazing as an option in say Skype or Teams, set each individuals native language and handle automated translation, shit tie it into ElevenLabs to replicate your voice as well! Native translation in realtime with your own voice

EternalFury 2 years ago | |

At some point, scalability is the best form of exploitation. The exploration piece requires a lot more that engineering.

dcchambers 2 years ago | |

Honestly I found it annoying that he HAD TO cut the AI off mid-sentence. These things just ramble on and on and on. If you could put emotion to it, it's as if they're uncomfortable with silence and just fill the space with nonsense.

Let's hope there's a future update where it can take video from both the front and rear cameras simultaneously so it can identify when I'm annoyed and stop talking (or excited, and share more).

okrad 2 years ago | | |

I found it insightful. They showed us how to handle the rough edges like when it thought his face was a wooden table and he cleared the stale image reference by saying “I’m not a wooden table. What do you see now?” then it recovered and moved on.

Perfect should not be the enemy of good. It will get better.

cchance 2 years ago | | |

I mean it didn't really ramble he just seemed to be in a rush, and i'm sure you could system message it to provide short concise answers always.

localfirst 2 years ago | |

calm down there is barely any ground breaking stuff, this is basically chatgpt 3.9 but far more expensive than 3.5

looks like another stunt from OAI in anticipation of Google IO tomorrow

Gemini 2.0 will be the closest we get to ChatGPT-5

cchance 2 years ago | | |

Ah so surpassing Gemini 1.5 Pro and all other Models on Vision understanding by 5-10 points is "not ground breaking" all while doing it at insane latency.

Jesus if this shit doesn't make you coffee, and make 0 mistakes no ones happy anymore LOL.

MBCook 2 years ago |

Why must every website put stupid stuff that floats above the content and can’t be dismissed? It drives me nuts.

booleandilemma 2 years ago |

It's quite scary, honestly. In fact I can't remember the last time a demo terrified me, besides slaughterbots, and that was fictional. I just think about all the possibilities for misuse.

catchnear4321 2 years ago |

window dressing

his love for yud is showing.

65 2 years ago |

Time to bring back Luddism.

jononomo 2 years ago |

I wonder if GPT-4o is a Christian?

cookiesnmilk 2 years ago |

this is straight up siri 2.0, nothing to see here except we are now in the reasoning phase.

So by that logic Step1: Language 2: Reasoning 3: Understanding 4: Meaning 5: AGI

roschdal 2 years ago |

Chat GPT-4o (OOOO!) - the largest electricity bill in the world.

nbzso 2 years ago |

Another new hit from his excellency Sam the Galactic Conmaster? WoW. The future is bright, right?:)

Idiocracy in full swing, dear Marvin.

gardenhedge 2 years ago |

Noticeably saying "person" versus man or woman. To the trainers - man and woman is not offensive!

silviot 2 years ago | |

Not offensive, but much more often wrong than person.

gardenhedge 2 years ago | | |

So the solution is to avoid the problem?

DeathArrow 2 years ago |

I am not impressed. We already have better models for text to peach and voice synthetization. What we see here is integration with a LLM. One can do it at home by combining Llama3 with text to speech and voice synth.

What would amaze me would be for GPT 4 to have better reasoning capabilities and less hallucinations.

MBCook 2 years ago |

Too bad they consume 25x the electricity Google does.

https://www.brusselstimes.com/world-all-news/1042696/chatgpt...

simonw 2 years ago | |

That's not a well sourced story: it doesn't say where the numbers come from. Also:

"However, ChatGPT consumes a lot of energy in the process, up to 25 times more than a Google search."

That's comparing a Large Language Model prompt to a search query.

joshstrange 2 years ago | |

> Too bad they consume 25x the electricity Google does.

From the article:

"However, ChatGPT consumes a lot of energy in the process, up to 25 times more than a Google search."

And the article doesn't back that claim up nor do they break out how much energy ChatGPT (A Message? Whole conversation? What?) or a Google search uses. Honestly the whole article seems very alarmist while being light on details and making sweeping generalizations.

rvnx 2 years ago | |

And in this 25x you get your answer.

What if we actually counted the electricity that the websites use instead of just the search engine page ?

testfrequency 2 years ago |

Bravo. I’ve been really impressed with how quickly OpenAI leveraged their stolen data to build such a human like model with near real time pivoting.

I hope OpenAI continues to steal artists work, artists and creators keep getting their content sold and stolen beyond their will for no money, and OpenAI becomes the next trillion dollar company!

Big congrats are in order for Sam, the genius behind all of this, the world would be nothing without you

brainer 2 years ago |

OpenAI's Mission and the New Voice Mode of GPT-4

• Sam Altman, the CEO of OpenAI, emphasizes two key points from their recent announcement. Firstly, he highlights their commitment to providing free access to powerful AI tools, such as ChatGPT, without advertisements or restrictions. This aligns with their initial vision of creating AI for the benefit of the world, allowing others to build amazing things using their technology. While OpenAI plans to explore commercial opportunities, they aim to continue offering outstanding AI services to billions of people at no cost.

• Secondly, Altman introduces the new voice and video mode of GPT-4, describing it as the best compute interface he has ever experienced. He expresses surprise at the reality of this technology, which provides human-level response times and expressiveness. This advancement marks a significant change from the original ChatGPT and feels fast, smart, fun, natural, and helpful. Altman envisions a future where computers can do much more than before, with the integration of personalization, access to user information, and the ability to take actions on behalf of users.

https://blog.samaltman.com/gpt-4o

simonw 2 years ago | |

Please don't post AI-generated summaries here.

reisse 2 years ago | | |

The facts that AI-generated summaries are still detected instantaneously and are bad enough for people to explicitly ask not to post them says something about current state of LLMs.

bamboozled 2 years ago | |

access to user information,

Sam, please stop ok, those things you saw on tv when you were a kid? They were dystopian movies, we don’t want that for real, ok?

[{ "who": "you", "column": 4 }, { "who": "me", "column": 3 }, { "who": "you", "column": 4 }, { "who": "me", "column": 2 }, { "who": "you", "column": 4 }, { "who": "me", "column": 4 }, { "who": "you", "column": 5 }, { "who": "me", "column": 6 }, { "who": "you", "column": 5 }, { "who": "me", "column": 1 }, { "who": "you", "column": 5 }, { "who": "me", "column": 5 }, { "who": "you", "column": 3 }]

Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus

You will be helping the user write a dialog between two characters, Mr Contrarian and Mr Know-It-All. The user will write all the dialog for Mr Know-It-All and you will write for Mr Contrarian. Mr Contrarian likes to disagree. He tries to hide it by inventing good rationales for his argument, but really he just wants to get under Mr Know-It-All's skin. Write your dialog like: <mr-contrarian>I disagree with you strongly!</mr-contrarian> Below is the transcript...