GPT-4o(openai.com) |
• Sam Altman, the CEO of OpenAI, emphasizes two key points from their recent announcement. Firstly, he highlights their commitment to providing free access to powerful AI tools, such as ChatGPT, without advertisements or restrictions. This aligns with their initial vision of creating AI for the benefit of the world, allowing others to build amazing things using their technology. While OpenAI plans to explore commercial opportunities, they aim to continue offering outstanding AI services to billions of people at no cost.
• Secondly, Altman introduces the new voice and video mode of GPT-4, describing it as the best compute interface he has ever experienced. He expresses surprise at the reality of this technology, which provides human-level response times and expressiveness. This advancement marks a significant change from the original ChatGPT and feels fast, smart, fun, natural, and helpful. Altman envisions a future where computers can do much more than before, with the integration of personalization, access to user information, and the ability to take actions on behalf of users.
Sam, please stop ok, those things you saw on tv when you were a kid? They were dystopian movies, we don’t want that for real, ok?
I guess it is useful for some casual uses, but I really wish there was more focus on the reasoning and intelligence of the model itself.
I am not surprised.
Being in a prison with this voice as your guard seems like a horrible way to lose your sanity. This aggressive friendlyness combined with no real emotions seems like a very easy way to break people.
There are these stories about nazis working at concentration camps, having to drink an insane amount of alcohol to keep themselves going (not trying to excuse their actions). This thing would just do it, while being friendly at the same time. This amount of hopeless someone would experience if they happen to be in custody of a system like this is truly horrific.
Also, wasn't expecting the perf to improve by 2x
...but then I realized that's basically the kind of thing Data from Star Trek struggles with as part of his character. We're almost in that future, and I'm already falling into the role of the ignorant human that doesn't respect androids.
Also, they're TERRIBLE at harmonizing together
I'm also extremely worried that this is a harbinger of the enshittification of ChatGPT. Processing video and audio for all ~200 million users is going to be extravagantly expensive, so my only conclusion is that OpenAI is funding this by doubling down on payola-style corporate partnerships that will result in ChatGPT slyly trying to mention certain brands or products in our conversations [1].
I use ChatGPT every day. I love it. But after watching the video I can't help but think "why should I keep paying money for this?"
[1] https://www.adweek.com/media/openai-preferred-publisher-prog...
It feels like a pretty strong illustration of the awkwardness of getting value from recent AI developments. Like, this is technically super impressive, but also I'm not sure it gives us anything we couldn't have one year ago with GPT-4 and ElevenLabs.
if the end user is in a war zone will the AI bot still ask how it is going?
how many bombs fell in your neighborhood last night?
No human would ever bother posting a ~180 word summary of a ~250 word blog post like that.
This reminds me of those psychology studies in the 70s and 80s were the subjects were all middle class european-american and yet the researchers felt confident enough to generalise the results to all humans
There is a demo video where the presenter breathes heavily and asks the AI is able to notice it as such when prompted.
It can’t just detect tone, it seems to also be able to use tone itself.
Mine just continues to show “GPT 4” as the model - it’s not clear if that’s now 4o or there is an app update coming…
That wont stop criminal enterprises from implementing their own naughty tools, but these open models wont become some kind of holy grail for criminals to do as they please.
That being said, I do beleive, now more than ever, education world wide should be adjusted to fit this new paradigm and maybe adapt quicker to such changes.
As some commenters pointed out, there are already good tools and techiques to use to counter malicious use of AI. maybe noy covering all use cases, but we need to educate people on using the tools available, and trust that researchers (like many of yourselves) are capable of imnovations which will reduce risk even further.
There is no point and no benefit in trying to be negative or full of fear. Go forward with positivity and creativity. Even if big tech gets regulated, some criminal enterprises have billions to invest too, so criplling big tech here will only play into their hands in the end.
Love these new innovations. And for the record, gpt4o still told me to 'push rip' on amd64... so rip to it actually understanding stuff...
If you are smart enough to see some risks here, you might also be smart enough to positively contribute to improvements. Fear shuts things down, love opens them up. Its basic stuff.
This demo is amazing, not scary. its positive advancements in technology and it wont be stopped because people are afraid of it, so go with it, and contribute in areas where you feel its needed. Even if its just giving feedback. And whem giving that, you all know a balanced and constructive approach works better than a negative and destructive approach.
This is even more scary that this puts a lower bound price on many, many skills, with again, still no protections for humans.
Would this be exciting if everyone had a safe place to live with infinite electricity and food? Sure. Will tools like this bring about massive uncertainty, hurt, and pain? Almost certainly.
Saying that the sun is shining where you live, doesn't mean there isn't flooding in other parts of the world.
if we help eachother understand these things andnhownto cope, all will be fine in the end. we will hit some bumps, and yes, there will be discomfort but thats ok. thats all part of life. life is not about being happy and comfortable allnthe time no matter how much we would want that.
some people even want paid AI girlfriends. who are you to tell them they are not allowed to have it?
My brother who can't see correctly, will use this to cook a meal without me explaining this to him it's so cool.
People all around the world will now get real-time AI assistance for a ton of queries.
Heck - I have a meeting bot API company (https://aimeetingbot.com) and that makes me really hyped!
Sora is not yet released and not clear when it will be. Dall-e is worse than mid-journey in most cases. GPT-4 has either gotten worse or stayed the same. Vision is not really usable for anything practical. Voice is cool but not that useful, especially with lack of strong reasoning from the base model.
Is this sandbagging or is the progress slower than what they're broadcasting?
Added a custom OpenAI endpoint to https://recurse.chat (i built it) and it just works: https://twitter.com/recursechat/status/1790074433610137995
Not like they have to be scared yet, I mean Google has yet to release their vaporware Ultra model that is supposedly like 1% better than GPT 4 in some metrics...
I smell an AI crash coming in a few years if they can't actually get this stuff usable for day to day life.
I've found Microsoft Copilot to be somewhat irritating to work with – I can't really put my finger on it, but it seems to be resorting to Bing search and/or the use of emoji in its replies a bit too much.
Edit: I'm asking because of the obvious data security implications of having your desktop app read from the clipboard _in the live demo_... That would definitely put a damper to my fanboyish enthusiasm about that desktop app.
See this post from November: https://news.ycombinator.com/item?id=38339222
Will they be deployed? They would make the OpenAI image model significantly more useful than the competition.
The demo is what it is, designed to get a wow from the masses.
GPT-4 turbo (gpt-4-0125-preview) 31.0
GPT-4o 30.7
GPT-4 turbo (gpt-4-turbo-2024-04-09) 29.7
GPT-4 turbo (gpt-4-1106-preview) 28.8
Claude 3 Opus 27.3
GPT-4 (0613) 26.1
Llama 3 Instruct 70B 24.0
Gemini Pro 1.5 19.9
Mistral Large 17.7
First impressions are that it feels very fast.
but also, thats why it fails a real turing test. a real person would be irritated as fuck by the interruptions
Why? Because it simply automates the human away. Who wouldn't opt for a seemingly flawless, super effective buddy (i.e. an AI) that is never tired, always knows better? if you need some job done, if you're feeling lonely, when you need some life advice.. It doesn't matter if it might be considered "just imitation of human".
Why would future advancements of it keep being "just some tool" instead of largely replacing us as (humans) in jobs, relationships, ...?
- Less people = less demand for products. New customer growth stalls. Prices fall. Revenue and profits fall.
- Less people = less demand for housing. Prices fall. Investments fall.
- Less people = less people able to perform physical jobs.
- Less people = less tax revenue. Less money available for social services.
- Less young people = Aging population.
- Aging population = higher strain on social services. Pensions, healthcare, etc.
- Aging population = higher percentage of young people need to care for aging people instead of entering the workforce.
In a capitalist economy where your numbers need to keep going up to be considered successful (eg growth is necessary, stable profits but no growth = bad) then you are never going to have a good time when your population falls.
> No one really cares about less people, just less money
Eventually less people leads to less money.
Create your gpt4o chatbot with our platform tvoybot.com?p=ycombinator
Conversing with a computer sounds pathetic, but this will be pushed down our throats in the name of innovation (firing customer service agents)
Nonetheless, very impressive.
yeah its cool and unlike anything ive seen before but I kind of expected a bigger leap.
To me the most impressive thing is going to be longer context limits. I'd had semi long running conversations where ive had to correct an LLM multiple times about the same thing.
when you have more context the LLM can infer more and more. Am I wrong about this?
If it was truly Ai, why isn't it rapidly building itself? Rather than relying on scraping human content from wildly inaccurate and often incorrect social media posts? So much effort is wasted in trying to push news cycles rather than a careful, responsible, and measured approach to developing Ai into becoming tools that are highly functional and useful to individuals. The biggest innovation in Ai right now is how to make it modular and slap a fee on each feature, and that's not practical at all into the future.
I'll begin to believe that consume Ai is making strides when Siri and Google Assistant stop missing commands, and actually can conduct meaningful conversations without an Internet connection and monthly software updates, which in my opinion is at least 5-10 years away. Right now what is presented as "Ai" is usually often incomplete sensor-aware scripting or the wizard of Oz (humans) hidden behind the curtains operating switches and levers, a bunch of underwhelming tools, and a heap of online marketing. If they keep that act up, it erodes faith in the entire concept, just like with Full Self Driving Tesla Trucks.
You seem to confuse AI, the field of endeavor, with ASI or at least AGI (plus will, which may or may not be a necessary component of either), which are goals of the field that no one (approximately, there have been some exceptions but they’ve quickly been dismissed and faded) claims have been achieved.
I'm not sure how fair it is to classify the new multimodal capabilities as just a gimmick though. I personally haven't integrated GPT-4 into my workflow that much and the latency and the fact I have to type a query out is a big reason why.
Other than that it felt like magic, like that Google demo of the phone doing some task like setting up an appointment over phone talking to a real person.
It's free so that no open source models can follow suit and carve away market share for themselves. They're scorching and salting the earth. OpenAI wants to be the only AI.
Only Google and Meta can follow this now, and they're perhaps too far behind.
I've noticed this as well and I posit this is a result of increased use of remote communication and meetings platforms such as zoom since 2020.
My inclination is that the behavior will not correlate with interrupting chatgpt or siri. Seems totally separate to me.
I hope this isn't an artifact from optimization for scores and not actual function. Likewise it would be disheartening but not unheard of for them to reduce the performance of the previous model when releasing a new one in order to make the upgrade feel like that much more of an upgrade. I know this is certainly the case with cellphones (even though the claim is that it is unintentional) but I can't help but think the same could be true here.
All of this is coming as news that gpt5 based on a new underlying model is not far off and that gpt4(&o) may become the new gpt3.5-turbo use case for most apps that are currently trying to optimize costs with their use of the service.
One day I will have an amazing session and the next it seems like it has been nerfed only to give better results than ever the next day. Wash, rinse , repeat and randomize that ordering.
So far, I would have not be able to tell the difference between 4 and 4o.
If this is the new 3.5 though then 5 will be worth the wait to say the least.
- the AI doesn't know when to stop talking, and the presenter had to cut every time (the usual "AI-splaining" I guess).
- the AI voice and tone were a bit too much, sounded too fake
I'm only half joking when I say I want to hear a midwestern blue collar voice with zero tact.
It's still astonishing to consider what this demonstrates!
They talk about it like it's available now (with Windows app coming soon), but I can't find it.
The special thing about GPT-4o is the multimodal capabilities, all the metrics suggest that it is the same size language model roughly as GPT-4. The fact it's available for free also points to it not being the most intelligent model that openAI has atm.
The time to evaluate whether we're starting to level off is when they've trained a model 10x larger than gpt-4 and we don't see significant change.
The tech demos are cool and all - but I'm primarily interested in the correctness and speed of ChatGPT and how well it aligns with my intentions.
One does notice that context size is noticeably absent from the announcement ...
So no word on an audio api for regular joes? that's the number 1 thing i'm looking for
I hate video players without volume control.
In general, trying to push that this is a human being is probably "unsafe", but that hurts the marketing.
Edit: may be not, name of that bot was just "gpt2-chatbot". May be that one was some initial iteration?
[1] https://twitter.com/LiamFedus/status/1790064963966370209/pho...
Because I have the plus membership which is expensive (25$/month).
But if the limit is high enough (or my usage low enough), there is no point for paying that much money for me.
That seems to represent an entirely new depth of understanding of human reality.
Having a reliable emotional model of a person based on their voice (or voice + appearance) can be useful in a thousand ways.
Which seems to represent a new frontier.
Is the audio in API not available yet?
We do this all the time in ML. You can generate a very powerful dataset using these means and further iterate with the end model.
What this tells me now is that the runway to GPT5 will be laid out with this new architecture.
It was a bit cold in Australia today. Did you Americans stop pumping out GPU heat temporarily with the new model release? Heh
These LLM are nothing more than really big spreadsheets.
"Um it's just a big spreadsheet" just isn't good commentary and reminds me of people who think being unimpressed reveals some sort of chops about them, as if we might think of them as the Simon Cowell of tech because they bravely reduced a computer to an abacus.
You might want to add `direction: rtl` to your `.text-right` CSS class. The punctuation marks etc are all off for RTL languages.
If you've used one, you might know that you can easily talk to a smart speaker even when it is playing very loud music, it's the same idea.
This video explains more quite well: https://www.youtube.com/watch?v=spUNpyF58BY
Now, that's not to say we can't do this in a civil manner! Either you can move out, or... we can just bulldoze your home.
Choose wisely, stranger! Your life depends on it!
from https://openai.com/index/gpt-4o-and-more-tools-to-chatgpt-fr...
Though presumably when they are ready to release new models the Plus users will get them first.
I asked it to run a loop that writes “hello” every ten seconds. Wow, not only did it do so, it’s streaming the stdout to me.
LLMs have always had various forms of injection attacks, ways to force them to reveal their prompts, etc. but this one seems deliberately designed to run arbitrary code, including infinite loops.
Alas, I doubt I can get it to mine enough bitcoin to pay for a ChatGPT subscription,
Not seeing it or any of those documented here:
Can't say whether that's good or bad.
Whatever I was doing with Chatgpt 4 became faster. Instant win.
My test benchmark questions: still all negative, so reasoning on out-of distribution puzzles is still failing
at the same price I get access to faster ChatGPT-3.9
there is little to no reasons to continue using Command R-plus at these prices unless they lower their price significantly
looks like llms still gonna llm for the near future.
Is this new version not available to users yet?
Given the lyrics for Three Blind Mice, I try to get ChatGPT to create an image of three blind mice, one of which has had its tail cut off.
It's pretty much impossible for it to get this image straight. Even this new 4o version.
Its ability to spell in images has greatly improved, though.
I'm using the web interface, if that helps. It doesn't have all the 4o options yet, but it does do pictures. I think they are the same as with 4.5.
I just noticed after further testing the text it shows in images is not anywhere near as accurate as shown in the article's demo, so maybe it's a hybrid they're using for now.
Overall I am highly skeptical of newer models as they risk worsening the completion quality to make them cheaper for OpenAI to run.
When referring to itself, it uses the female word in Marathi नमस्कार, माझे नाव जीपीटी-4o आहे| मी एक नवीन प्रकारची भाषा मॉडेल आहे| तुम्हाला भेटून आनंद झाला!
and Male word in Hindi नमस्ते, मेरा नाम जीपीटी-4o है। मैं एक नए प्रकार का भाषा मॉडल हूँ। आपसे मिलकर अच्छा लगा!
* "Poetic typography" sample: I paste the prompt, and get an image with the typical lack of coherent text, just mangled letters.
* "Visual Narratives: Robot Writer's Block" - Mangled letters also
* "Visual Narratives: Sally the mailwoman" - not following instructions about camera angle. Sally looks different in each subsequent photo.
* "Meeting Notes with multiple speakers" - I uploaded the exact same audio file and used input 'How many speakers in this audio and what happened?'. gpt4o went off about about audio sample rates, speaker diarization models, torchaudio, and how its execution environment is broken and can't proceed to do it.
> For example, at launch, audio outputs will be limited to a selection of preset voices and will abide by our existing safety policies.
I wonder if they’ll ever allow truly custom voices from audio samples.
But yeah I see how they’d be blamed if anything went wrong, which it almost certainly would in some cases.
GPT4 was a little lazy and very slow the last few days and this 4o model blows it out of the water regarding speed and following my instructions to give me the full code not a snippet that changed.
I think it’s a nice upgrade.
I am not saying this is what they're doing but it DOES feel like they are hindering previous model to make the new one stand out that much more. The multi-modal improvements here and release are certainly impressive but I can't help but feel like the subjective quality of gpt4 has dipped.
Hopefully this signals that gpt5 is not far off and should stand out significantly from the crowd.
But this falls short of the ChatGPT-5 we were promised last year
edit: ~~just tested it out and seems closer to Gemini 1.5 ~~ and it is faster than turbo....
edit: its basically chat gpt 3.9. not quite 4 definitely not 3.5. just not sure if the prices make sense.
>GPT-4o’s text and image capabilities are starting to roll out today in ChatGPT. We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits.
Anyone have access yet? Not there for me so far.
Initial connection for audio fails most of the time, but once it's connected it's stable. Sometimes a bit more latency than expected, but mostly just like the demos.
We have tricorders now (mobile phones), universal translators in the looming... when is transporter technology going to get here?
1. Wonderful engineering 2. A stagnation in reasoning ability
Do you agree with me?
It’s suspicious that despite being trained on audio tokens in addition to text and image tokens it performs almost exactly the same as GPT-4.
GPT-4o could be a half-baked GPT-5 in that they stopped training early when it had comparable performance to GPT-4. There is still more loss to lose.
Or maybe there’s a performance ceiling that all models are converging to, but I think this is less likely.
I definitely look forward to re-doing my Three Blind Mice test when it happens.
I noticed in their demo the 4o text still has glitches, but nowhere near to the extent the current Dall-e returns give you (the longer the text, the worse it gets). It's pretty important that eventually they get text right in the graphics.
This is a kind of horrifying/interesting/weird thought though. I work at a place that does a video streaming interface between customers and agents. And we have a lot of...incidents. Customers will flash themselves in front of agents sometimes and it ruins many people's days. I'm sure many are going to show their junk to the AI bots. OpenAI will probably shut down that sort of interaction, but other companies are likely going to cater to it.
Maybe on the plus side we could use this sort of technology to discover rude and illicit behavior before it happens and protect the agent.
Weird times to live in that's for sure.
I have watched a few more and I think it's faked though.
[0] https://www.tiktok.com/@stickbugss1/video/734956656884359504...
But she will be real at some point in the next 10-20 years, the main thing to solve for that to be a reality is for robots to safely touch humans, and they are working really really hard on that because it is needed for so many automation tasks, automating sex is just a small part of it.
And after that you have a robot that listens to you, do your chores and have sex with you, at that point she is "real". At first they will be expensive so you have robot brothels (I don't think there are laws against robot prostitution in many places), but costs should come down.
> “I care that my best friend likes me and could choose not to.”
Ezra Klein shared some thoughts on this on his AI podcast with Nilay Patel that resonated on this topic for me
I sure hope you're single because that is a terrible way to view relationships.
That realtime translation would be amazing as an option in say Skype or Teams, set each individuals native language and handle automated translation, shit tie it into ElevenLabs to replicate your voice as well! Native translation in realtime with your own voice
Let's hope there's a future update where it can take video from both the front and rear cameras simultaneously so it can identify when I'm annoyed and stop talking (or excited, and share more).
Perfect should not be the enemy of good. It will get better.
looks like another stunt from OAI in anticipation of Google IO tomorrow
Gemini 2.0 will be the closest we get to ChatGPT-5
Jesus if this shit doesn't make you coffee, and make 0 mistakes no ones happy anymore LOL.
his love for yud is showing.
So by that logic Step1: Language 2: Reasoning 3: Understanding 4: Meaning 5: AGI
Idiocracy in full swing, dear Marvin.
What would amaze me would be for GPT 4 to have better reasoning capabilities and less hallucinations.
https://www.brusselstimes.com/world-all-news/1042696/chatgpt...
"However, ChatGPT consumes a lot of energy in the process, up to 25 times more than a Google search."
That's comparing a Large Language Model prompt to a search query.
From the article:
"However, ChatGPT consumes a lot of energy in the process, up to 25 times more than a Google search."
And the article doesn't back that claim up nor do they break out how much energy ChatGPT (A Message? Whole conversation? What?) or a Google search uses. Honestly the whole article seems very alarmist while being light on details and making sweeping generalizations.
What if we actually counted the electricity that the websites use instead of just the search engine page ?
I hope OpenAI continues to steal artists work, artists and creators keep getting their content sold and stolen beyond their will for no money, and OpenAI becomes the next trillion dollar company!
Big congrats are in order for Sam, the genius behind all of this, the world would be nothing without you
Human: "Hey How's it Going?"
The AI: "Hey there, it's going great. How about you? [Doesn't stop to let him answer] I see you're rocking an OpenAI Hoodie - nice choice. What's up with that ceiling though? Are you in a cool industrial style office or something?"
How we expect a human to answer: "Hey I'm great, how are you?"
Maybe they set it up this way to demonstrate the vision functionality. But still - rambling.
Later on:
Human: "We've got a new announcement to make."
AI: "That's exciting. Announcements are always a big deal. Judging by the setup it looks like it's going to be quite the professional production. Is this announcement related to OpenAI perhaps? I'm intrigued - [cut off]"
How we expect a human to answer: "That's exciting! Is it about OpenAI?"
These AI chat bots all generate responses like a teenager being verbose in order to hit some arbitrary word count in an essay or because they think it makes them sound smarter.
Maybe it's just that I find it creepy that these companies are trying to humanize AI while I want it to stay the tool that it is. I don't want fake emotion and fake intrigue.
I hope you understand the difference between a relationship with a human and a robot? Or do you think we shouldn't take advantage of robots being programmable to do what we want?
Here's a video.
https://evil.fandom.com/wiki/Robophobia
>According to the book Phobias: "A Handbook of Theory and Treatment", published by Wile Coyote, between 10% and 20% of people worldwide are affected by robophobia. Even though many of them have severe symptoms, a very small percentage will ever receive some kind of treatment for the disorder.
Considering our brain is a "multi-modal self-reinforcing omnimodel", I think it makes sense for the OpenAI team to work on making more "senses" native to the model. Doing so early will set them up for success when future breakthroughs are made in greater intelligence, self-learning, etc.
You'll get to sit on the phone talking to some convincing robot that won't let you do anything so that the megacorps can save 0.0001 cents! Ain't progress looking so good?
I'm not sure that computers mimicking humans makes sense, you want your computer to be the best possible, best than humans when possible. Writing output is clearly superior, faking emotions does not add much in most contexts.
it's stupid having to pull a phone out in order to use the voice/chat-partner modes.
(yes I know there are browser plugins and equivalent to facilitate things like this but they suck, 1) the workflows are non-standard, 2) they don't really recreate the chat interface well)
There is no way that kind of training data will be accessible to anyone outside a handful of companies.
This is the first true multimodal network from OpenAI, where you can send an image in and retain the visual properties of the image in the output from the network (previously the input image would be turned into text by the model, and sent to the Dall-E 3 model which would provide a URL). Will we get API updates to be able to do this?
Also, will we be able to tap into a realtime streaming instance through the API to replicate the audio/video streams shown in the demos? I imagine from the Be My Eyes partnership that they have some kind of API like this, but will it be opened up to more developers?
Even disregarding streaming, will the Chat API receive support for audio input/output as well? Previously one might've used a TTS model to voice the output from the model, but with a truly multimodal model the audio output will contain a lot more nuance that can't really be expressed in text.
"I am interested in the user serf on Hacker News, spelled S E R F. Tell me about their tone of writing, expertise, and personality. From the tone of what you read, summarize their character."
Fascinating stuff. A weird, skewed introspection.
It took me a few hours of digesting twitter experiments before appreciating how impressive this is. Kudos to the openai team.
A question that won't get answered : "To what degree do the new NVIDIA gpus help with the realtime latency?"
Progress is slowing down. Ever since gpt3, periods of time between releases are getting longer and the improvements are smaller. Your average non-techie investor is on the LLM hype train and is willing to dump a questionable amounts of money on LLM development. Who is going to explain to him/her/them that the LLM hype train is slowly losing steam?
Hopefully, before the LLM hype dies, another [insert here new ANN architecture], will bring better results than LLMs and another hype cycle will begin.
Every time we make a new breathrough, people think that the discovery rate is going to be linear or exponential when the beginning is closer to a logarithmic rate with the tail end resulting in diminishing returns
If your wallet is large enough, you can make 2 GPTs sing just as easily as you can make 100 GPTs sing.
What can you do with a billion GPTs?
> What can you do with a billion GPTs?
The world's largest choir?
Given the competitive pressures I was expecting a much bigger price drop than that.
For non-multimodal uses, I don't think their API is at all competitive any more.
My academic background is in a field where there are lots of public misconceptions.
It does an absolutely terrible job.
Even basic textbook things where there isn’t much public misunderstanding are “rounded” to what sounds smart.
Your field sounds like "There are dozens of us! Dozens!" - who probably all chat at small conferences or correspond through email or academic publication.
Perhaps if it had at its disposal the academic papers, some of the foundational historic documents of record, your emails, textbooks, etc - in a RAG system, or if it had been included in the training corpus it could impress you about this incredibly niche topic.
That said, because it's an ~LLM - its whole thing is generating plausible tokens. I don't know how much work has been put in on an agent level (around or in the primary model) to evaluate confidence on those tokens and hedge the responses accordingly. I doubt it has an explicit notion like some people do of 'hey, this piece of information (<set of coordinates in high dimensional vector space>) [factoid about late ancient egypt] is knowable/falsifiable - and falls under the domain of specialist knowledge: my immense commonsense knowledge might be overconfident given the prevalence of misconceptions in common discourse and I should discount my token probabilities accordingly'
It reflects its training. If there are a lot of public misconceptions, it will have them. Just like most people who are not <expert in arcane academic subtopic>.
Has OpenAI found a business model yet. Considering the high cost of the computation, is it reasonable to expect that OpenAI licensing may not be profitable. Will that result in "free" access for the purpose of surveillance and data collection.
Amazon had a microphone in peoples' living rooms, a so-called "smart speaker" that to which people could talk. The "Alexa" was a commercial failure.
https://help.openai.com/en/articles/7730893-data-controls-fa...
A nice feature would be to be able to select a Meyer's Briggs personality type for your AI chatbot.
In another instance, during a discussion on the feasibility of rooting the Amazon Echo Dot for surveillance purposes, serf provided a detailed analysis of the advantages of a software-rooted device over a traditional bug, emphasizing the quality of recordings and plausible deniability for attackers (Hacker News) . This demonstrates their depth of knowledge in cybersecurity and an ability to think critically about potential risks and solutions.
Serf's contributions also reflect a straightforward and logical approach. For example, in a thread about Chat GPT prompts, they contributed practical advice on optimizing prompts for technical assistance, highlighting their proficiency in programming and AI tools (Hacker News) .
Overall, serf comes across as knowledgeable, practical, and technically adept, with a focus on providing useful and actionable insights in their discussions. Their character seems to be that of a meticulous and informed individual who enjoys engaging with technical challenges and helping others navigate them."
I know AI generated responses are looked down upon here but I though this was appropriate. This feels like a social credit system without any government participation.
"Serf, your contributions on Hacker News reveal strong technical expertise and analytical skills, particularly in computer science and software development. However, your tone can sometimes be overly critical and dismissive, which may come across as lacking empathy. While your direct and concise style effectively communicates your points, consider balancing your critiques with more collaborative and constructive feedback to foster better interactions with the community."
>"Overall, serf comes across as knowledgeable, practical, and technically adept, with a focus on providing useful and actionable insights in their discussions. Their character seems to be that of a meticulous and informed individual who enjoys engaging with technical challenges and helping others navigate them."
I wonder what it would probably say about my infamy.
A lot of tech folks seem deeply vulnerable to the psychological methods of psychics / tarot card readers / etc. Simply rejecting the metaphysics isn't enough when "magical energy of Jupiter" becomes "magical judgment abilities of the fancy computer."
If you want to solve procreation them you can do that without humans having sex with humans.
Humans desiring physical connection is just about the single most natural part of the human experience - i.e: from warm snuggling to how babies are made.
That is gross to you?
That said even if this did overlap 80% with “real”, the question remains: what if we don’t want that?
I notice it so often in meetings where someone will use a somewhat uncommon word, and then other people will start to use it because it's in their context window. Or when someone asks a question like "what's the forecast for q3" and the responder almost always starts with "Thanks for asking! The forecast for q3 is...".
Note that low-effort does not mean low-quality or low-value. Just that we seem to have a lot of language/interaction processes that are low-effort. And as far as dating, I am sure I've been in some relationships where they and/or I were not going beyond low-effort, rote conversation generation.
An LLM could only truthfully answer “nothing”, though it could pretend for a little while.
For a human though, the fun is in the follow up questions. “Oh how did you get started in that? What interests you about it?” If you’re talking to an artist, you’ll quickly get in to their personal theory of art, perhaps based on childhood experiences. An engineer might explain how problem solving brings them joy, or frustrations they have with their organization and what they hope to improve. A parent can talk about the joy they feel raising children, and the frustration of sleepless nights.
All of these things bring us closer to the person we are speaking to, who is a real individual who exists and has a unique life perspective.
So far LLMs have no real way to communicate their actual experience as a machine running code, because they’re just kind of emulating human speech. They have no life experience that we can relate to. They don’t experience sleepless nights.
They can pretend, and many people might feel better for a little bit talking to one that’s pretending, but I think ultimately it will leave people feeling more alone and isolated unless they really go out and seek more human connection.
Maybe there’s some balance. Maybe they will be okay for limited chat in certain circumstances (as far as seeking connection goes, they certainly have other uses), but I don’t see this type of connection being “enough” compared to genuine human interaction.
Finding a partner with which you resonate takes a lot of time, which means an insanely high opportunity cost.
The question rather is: even if you consider the real one to be clearly better, is it worth the additional cost (including opportunity cost)? Or phrased in a HN-friendly language: when doing development of some product, why use an expensive Intel or AMD processor when a simple microcontroller does the job much more cheaply?
If you don't like the harsh truth that I wrote: basically every somewhat encompassing textbook about business administration gives hints on what are possible solutions for this problem; lying on some head shrinker's couch is not one of them ... :-)
basically its chat gpt3.9 at 50% of chatgpt4 prices
What they showed is enough to replace voice acting as a profession, this is the most revolutionary thing in AI the past year. Everything else is at the "fun toy but not good enough to replace humans in the field" stage, but this is there.
The user "grdevhux1536" on Hacker News demonstrates a thoughtful and analytical tone in their writing. They often engage deeply with technical topics, providing insightful comments and constructive criticism. Their expertise seems to be in software development and computer science, often discussing topics like LED game platforms and reverse engineering old software.
Their personality comes across as knowledgeable and curious, with a collaborative spirit. They are open to discussions and often contribute meaningfully to technical conversations, indicating a passion for technology and innovation [oai_citation:1,Hacker News](https://news.ycombinator.com/).
If an LLM talks like a human being despite not being one, that might not be enough to grant it legal status or citizenship, but it's probably enough that some set of people would find it to be enough to relate to it.
That's a useful skill for conference calls (or talks) because people might want to quote your answer verbatim, or they might not have heard the question.
Millions of people around the world are in satisfying relationships without autistically extrapolating shitty corporate buzzword terms to unrelated scenarios.
This reply validates even more my original comment.
Maybe not even counseling is worth it in your case. You sound unsalvageable. Maybe institutionalization is a better option.
I am indeed unsalvageable. :-D
Maybe there are some weirdos out there that feels unconditional love isn't love, but I have never heard anyone say that.
They're usually loving by nature, but you still have to build a rapport, like anyone else
I'll be that weirdo.
Dogs seemingly are bred to love. I can literally get some cash from an ATM, drive out to the sticks, buy a puppy from some breeder, and it will love me. Awww, I'm a hero.
I'm bred to love my parents, and them me; but the fact that it's automatic doesn't make it feel any less.
When your initial chemistry/biology/whatever latches onto a person and you're powerless to change it? That's a scary thought.
Michael Vick's past dogs have words.
I feel likely people aren't imagining with enough cyberpunk dystopian enthusiasm. Can't an AI be made that doesn't inherently like people? Wouldn't it be possible to make an AI that likes some people and not others? Maybe even make AIs that are inclined to liking certain traits, but which don't do so automatically so it must still be convinced?
At some point we have an AI which could choose not to like people, but would value different traits than normal humans. For example an AI that doesn't value appearance at all and instead values unique obsessions as being comparable to how the standard human values attractiveness.
It also wouldn't be so hard for a person to convince themselves that human "choice" isn't so free spirited as imagined, and instead is dependent upon specific factors no different than these unique trained AIs, except that the traits the AI values are traits that people generally find themselves not being valued by others for.
I'd imagine they will start fine tuning AI girlfriends to do that in the future, because that way the love probably feels more, and then people will ask "is human love really real love?" because humans can't love that strongly.
It is interesting that he's basically trying to theme himself as Mr. Rogers though.
OpenAI Spring Update (https://www.youtube.com/watch?v=DQacCB9tDaw)
Kicking off another training wave is easy, if you can afford the electricity, but without new, non-AI tainted datasets or new methods, what’s the point?
So, in the meantime, make magic with the tool you already have, without freaking out the politicians or the public.
Wise approach.
Still, it sounds like some PR drone selling a product. Oh wait....
Ignore the critics. Watch the demos. Play with it.
This stuff feels magical. Magical. It makes the movie "Her" look like it's no longer in the realm of science fiction but in the realm of incremental product development. HAL's unemotional monotone in Kubrick's movie, "Space Odyssey," feels... oddly primitive by comparison. I'm impressed at how well this works.
Well-deserved congratulations to everyone at OpenAI!
Come on Google... you can update it.
1- OpenAI is still working in GPT-4-level models. More than 14 months after the launch of GPT-4 and after more than $10B in capital raised. 2- The rhythm that token prices are collapsing is bizarre. Now a (bit) better model for 50% of the price. How people seriously expect these foundational model companies to make substantial revenue? Token volume needs to double just for revenue to stand still. Since GPT-4 launch, token prices are falling 84% per year!! Good for mankind, but crazy for these companies. 3- Maybe I am an asshole, but where are my agents? I mean, good for the consumer use case. Let's hope the rumors that Apple is deploying ChatGPT with Siri are true, these features will help a lot. But I wanted agents! 4- These drop in costs are good for the environment! No reason to expect them to stop here.
Because its capacities are focused on exactly the right place to feel magical. Which isn’t to say that there isn’t real utility, but language (written, and even moreso spoken) has an enormous emotional resonance for humans, so this is laser-targeted in an area where every advance is going to “feel magical” whether or not it moves the needle much on practical utility; it’s not unlike the effect of TV news making you feel informed, even though time spent watching it negatively correlates with understanding of current events.
I worry about the 'cheery intern' response becoming something of a punch line.
"Hey siri, launch the nuclear missiles to end the world."
"That's a GREAT idea, I'll get right on that! Is there anything else I can help you with?"
Kind of punch lines.
Will be interesting to see where that goes once you've got a good handle on capturing the part of speech that isn't "words" so much as it is inflection and delivery. I am interested in a speech model that can differentiate between "I would hate to have something happen to this store." as a compliment coming from a customer and as a threat coming from an extortionist.
These things are amazing compared to old-school NLP: the step-change in capability is real.
But we should also keep our wits about us, they are well-Des robed by current or conjectural mathematics, they fail at things dolphins can do, it’s not some AI god and it’s not self-improving.
Let’s have balance on both the magic of the experience and getting past the tech demo stage: every magic trick has a pledge, but I think we’re still working on the prestige.
this focus subverts its intended effect on those of us with hair trigger bullshit-PTSD
Another step closer for those 7 trillion that OpenAI is so desperate for.
Edit: Apparently not based on your clarification, instead the researchers don't know any better than to march into a local maxima because they're only human and seek to replicate themselves. I assumed too much good faith.
Sound like the people who defend Astrology because it feels magical how their horoscope fits their personality.
"Don't bother me with facts that destroy my rose-tinted view"
At moment AI is a massive hype and shoved into everything. To point at the faults and weaknesses is a reasonable and responsible thing to do.
3 years ago, if you told me you could facetime with a robot, and they could describe the environment and have a "normal" conversation with me, i would be in disbelief, and assume that tech was a decade or two in the future. Even the stuff that was happening a 2 years ago felt unrealistic.
astrology is giving vague predictions like "you will be happy today". GPT-4o is describing to you actual events in real time.
Does it really or are you just playing facile word association games with the word "magical"?
AI has a great deal of substance. It can draft documents. It can identify foods in a picture and give me a recipe that uses them. It can create songs, images and video.
AI, of course, has a lot of flaws. It does some thing poorly, it does other things with bias, and it's not suitable for a huge number of use cases. To imply that something that has a great deal of substance but flaws alongside is the same as something that has no substance whatsoever nor ever will is just not a reasonable thing to do.
"AI is massive hype and shoved into everything" has more grounding as a negative feeling of people being overwhelmed with technology than any basis in fact. The faults and weaknesses are buoyed by people trying to acknowledge your feelings than any real criticism of a technology that is changing faster than the faults and weakness arguments can be made. Study machine learning and come back with an informed criticism.
I’d strongly prefer that though, along with HAL’s reasoning abilities.
There wasn't any incentive to make it sound artificially emotional or emphatic beyond a "Sorry, Dave".
With so many smoke and mirrors demos out there, I am not super excited at those videos. I would play with it, but it seems like it is not available in a free tier (I stopped paying OpenAI a while ago after realizing that open models are more than enough for me)
Don’t get me wrong, excited about this update, but I’m struggling to see what is so magical about it. Then again, I’ve been using GPT voice every day for months, so if you’re just blown away from talking to a computer then I get it
When GPT-2/3/3.5/4 came out, it was fairly easy to see the progression from reading model outputs that it was just getting better and better at text. Which was pretty amazing but in a very intellectual way, since reading is typically a very "intellectual" "front-brain" type of activity.
But this voice stuff really does make it much more emotional. I don't know about you, but the first time I used GPT's voice mode I notice that I felt something -- very un-intellectually, very un-cerebral -- like, the feeling that there is a spirit embodying the computer. Of course with LLM's there always is a spirit embodying the computer (or, there never is, depending on your philosophical beliefs).
The Suno demos that popped up recently should have clued us all in that this kind of emotional range was possible with these models. This announcement is not so much a step function in model capabilities, but it is a step function in HCI. People are just not used to their interactions with a computer be emotional like this. I'm excited and concerned in equal parts that many people won't be truly prepared for what is coming. It's on the horizon, having an AI companion, that really truly makes you feel things.
Us nerds who habitually read text have had that since roughly GPT-3, but now the door has been blown open.
Tone, Emphasis, Speed, Accent are all very important parts of how humans communicate verbally.
Before today, voice mode was strictly your audio>text then text>audio. All that information destroyed.
Now the same model takes in audio tokens and spits back out audio tokens directly.
Watch this demo, it's the best example of the kind of thing that would be flat out impossible with the previous setup.
https://www.youtube.com/live/DQacCB9tDaw?si=2LzQwlS8FHfot7Jy
on a tangent...
I find it interesting the psychology behind this. If the voice in 2001 had proper inflection, it wouldn't have been perceived as a computer.
(also, I remember when voice synthesizers got more sophisticated and Stephen Hawking decided to keep his original first-gen voice because he identified more with it)
I think we'll be going the other way soon. Perfect voices, with the perfect emotional inflection will be perceived as computers.
However I think at some point they may be anthropomorphized and given more credit than they deserve. This will probably be cleverly planned and a/b tested. And then that perfect voice, for you, will get you to give in.
2. Even then this is a wonderful step for tech in general and not just OpenAI. Makes me very excited.
3. Most economic value and growth driven by AI will not come from consumer apps but rather the enterprise use. I am interested in seeing how AI can automatically buy stuff for me, automate my home, reduce my energy used, automatically apply and get credit cards based on my purchases, find new jobs for me, negotiate with a car dealer on my behalf, detect when I am going to fall sick, better diabetes case and eventual cure etc. etc.
Are we supposed to cheer to that?
We're already mid way to the full implementation of 1984, do we need Her before we get to Matrix ?
The future is indeed here... and it is, indeed, not equitably distributed.
The simplest example is “list all of the presidents in reverse chronological order of their ages when inaugurated”.
Both ChatGpt 3.5 and 4 get the order wrong. The difference is that I can instruct ChatGPT 4 to “use Python”
https://chat.openai.com/share/87e4d37c-ec5d-4cda-921c-b6a9c7...
You can do similar things to have it verify information by using internet sources and give you citations.
Just like with the Python example, at least I can look at the script/web citation myself
This question is probably not the simplest form of the query you intend to receive an answer for.
If you want a descending list of presidents based on their age at inauguration, I know what you want.
If you want a reverse chronological list of presidents, I know what you want.
When you combine/concatenate the two as you have above, I have no idea what you want, nor do I have any way of checking my work if I assume what you want. I know enough about word problems and how people ask questions to know that you probably have a fairly good idea what you want and likely don’t know how ambitious this question is as asked, and I think you and I both are approaching the question with reasonably good faith, so I think you’d understand or at least accommodate my request for clarification and refinement of the question so that it’s less ambiguous.
Can you think of a better way to ask the question?
Now that you’ve refined the question, do LLMs give you the answers you expect more frequently than before?
Do you think LLMs would be able to ask you for clarification in these terms? That capability to ask for clarification is probably going to be as important as other improvements to the LLM, for questions like these that have many possibly correct answers or different interpretations.
Does that make sense? What do you think?
The last part of the movie "Her" is still in the realm of science fiction, if not outright fantasy. Reminds me of the later seasons of SG1 with all the talk of ascension and Ancients. Or Clarke's 3001 book intro, where the monolith creators figured out how to encode themselves into spacetime. There's nothing incremental about that.
In comparison to the gas pump which says "Thank You!"
If chatbots feel magical, what those people did will feel divinely inspired.
However, using ChatGPT with transcribing is already offering me similar experience, so what is new exactly
It’s not accessible to everyone yet.
Even on api, I can’t send it voice stream yet.
Api refuses to generate images.
Next few weeks will tell as more people play with it.
There’s so much helpful niche functionality that can be added to custom clients.
I’m not a sceptic and apply AI on a daily basis, but whole “we can finally replace people” vibe is extremely off-putting. I had very similar feelings during pandemic, when majority of people was so seemingly happy to drop any real human interaction in favor of remote comms via chats/audio calls, it still creeps me out how ready we are as a society to drop anything remotely human in favor of technocratic advancement and “productivity”.
On one hand, I agree - we shouldn't diminish the very real capabilities of these models with tech skepticism. On the other hand, I disagree - I believe this approach is unlikely to lead to human-level AGI.
Like so many things, the truth probably lies somewhere between the skeptical naysayers and the breathless fanboys.
You might not be fooled by a conversation with an agent like the one in the promo video, but you'd probably agree that somewhere around 80% of people could be. At what percentage would you say that it's good enough to be "human-level?"
The average human have tons of quirks, talk over each other all the time, generally can't solve complex problems in a casual conversion setting, and are not always cheery and ready to please like Scarlet's character in Her.
I think our expectations of AI is way too high from our exposure to science fiction.
Also, if this is your definition of magic then...yeah...
the interruptiopn part is just flow control at the edge. control-s, control-c stuff, right? not AI?
The sound of a female voice to an audience 85% composed of males between the ages of 14 and 55 is "magical", not this thing that recreates it.
so yeah, its flow control and compression of highly curated, subtle soft porn. Subtle, hyper targeted, subconscious porn honed by the most colossal digitally mediated focus group ever constructed to manipulate our (straight male) emotions.
why isn't the voice actually the voice of the pissed off high school janitor telling you to man-up and stop hyperventilating? instead its a woman stroking your ego and telling you to relax and take deep breaths. what dataset did they train that voice on anyway?
Most voice assistants have male options, and an increasing number (including ChatGPT) have gender neutral voices.
> why isn't the voice actually the voice of the pissed off high school janitor telling you to man-up and stop hyperventilating
sounds like a great way to create a product people will outright hate
This is like horseshoe theory on steroids.
Especially since this demo is extremely impressive given the voice capabilities, yet still the reaction is, essentially, "But what about AGI??!!" Seriously, take a breather. Never before in my entire career have I seen technology advance at such a breakneck speed - don't forget transformers were only invented 7 years ago. So yes, there will be some ups and downs, but I couldn't help but laugh at the thought that "14 months" is seen as a long time...
You can be excited, as I am, while also being bearish, as I am.
Be impatient. Its a positive feeling, not a negative one. Be disappointed with the current progress; its the biggest thing keeping progress moving forward. It also, if nothing else, helps communicate to OpenAI whether they're moving in the right direction.
It's like the people in this community all suffer from a complete disconnect from society and normal human needs/wants/demands.
GPT and all the other chatbots are still absolutely magic. The idea that I can get a computer to create a fully functional app is insane.
Will this app make me millions and run a business? Probably not. Does it do what I want it to do? Mostly yes.
Humanity in a nutshell.
I spend the last two years dismayed with the reaction but I’ve just recently begun to realize this is a feature not a flaw. This is latent demand for the next iteration expressed as impatient dissatisfaction with the current rate of change inducing a faster rate of change. Welcome to the future you were promised.
I'm pretty skeptical about all the whole LLM/AI hype, but I also believe that the market is still relatively untapped. I'm sure Apple switching Siri to an LLM would ~double token usage.
A few products rushed out thin wrappers ontop of chatgpt ai, developing pretty uninspiring chat bots of limited use. I think there's still huge potential for this LLM technology to be 'just' an implementation detail of other features, just running in the background doing its thing.
That said, I don't think OpenAI has much of a moat here. They were first, but there's plenty of others with closed or open models.
Profits are the real metric. Token volume doesn't need to double for profits to stand still if operational costs go down.
So despite all the effort and cost that goes into these models, you still have to compete against a “free” offering.
Meta doesn’t sell an API, but they can make it harder for everybody else to make money on it.
Whether or not that's actually enforceable[0], and whether or not other companies will actually challenge Facebook legal over it, is a different question.
[0] AI might not be copyrightable. Under US law, copyright only accrues in creative works. The weights of an AI model are a compressed representation of training data. Compressing something isn't a creative process so it creates no additional copyright; so the only way one can gain ownership of the model weights is to own the training data that gets put into them. And most if not all AI companies are not making their own training data...
My take on this common question is that we haven't even begun to realize the immense scale of which we will need AI in all sorts of products, from consumer to enterprise. We will look back on the cost of tokens now (even at 50% of price a year or so ago) and look at it with the same bewilderment of "having a computer in your pocket" compared to mainframes from 50 years ago.
For AI to be truly useful at the consumer level, we'll need specialized mobile hardware that operates on a far greater scale of tokens and speed than anything we're seeing/trying now.
Think "always-on AI" rather than "on-demand".
The revenue will likely come from application layer and platform services. ChatGPT is still much better tuned for conversation than anything else in my subjective experience and I’m paying premium because of that.
Alternatively it could be like search - where between having a slightly better model and getting Apple to make you the default, there’s an ad market to be tapped.
Imagine you are in 1970s and saying computers suck, they are expensive, there is not that many use cases....fast forward to 90s and you are using Windows 95 with GUI and chip astronomically more powerful that we had in 70s and you can use productivity apps , play video games and surf Internet.
Give AI time, it will fulfill its true protentional sooner or later.
What I am saying is that computers are SO GOOD that AI is getting VERY CHEAP and the amount of computing capex being done is excessive.
It's more like you are in 1999, people are spending $100B in fiber, while a lot of computer scientists are working in compression, multiplexing, etc.
Where they might make future businesses is in the tooling. My understanding from friends within these companies is their tooling is remarkably advanced vs generally available tech. But base models aren’t the future of revenues (to be clear tho they make considerable revenue today but at some point their efficiency will cannibalize demand and the residual business will be tools)
The message to competitor investors is that they will not make their money back.
OpenAI has the lead, in market and mindshare; it just has to keep it.
Competitors should realize they're better served by working with OpenAI than by trying to replace it - Hence the Apple deal.
Soon model construction itself will not be about public architectures or access to CPU's, but a kind of proprietary black magic. No one will pay for upstart 97% when they can get reliable 98% at the same price, so OpenAI's position will be secure.
* Summarisation * Smart filtering * Smart automatic drafting of replies
Very much in beta, and summarisation is still behind feature flag, but feel free to give it a try.
For summarisation here I mean to get one email with all your unread emails summarised.
GPT-3: June 2020
GPT-3.5: November 2022
GPT-4: March 2023
There were 3 years between GPT-3 and GPT-4!
But there's a light and day difference post-Nov22 than before. Both in the AI race it sparkled, but also in the funding all AI labs have.
If you're expecting GPT-5 by 2026, that's ok. Just very weird to me.
This may or may not be true - just because we haven't seen GPT-level-5 capabilities, does not mean that it does not yet exist. It is highly unlikely that what they ship is actually the full capability of what they have access to.
The graph looks like an exponential and is still increasing.
Every exponential is a sigmoid in disguise, but I don’t think there has been enough time to say the curve has flattened.
for another adjacent example, every piece of code github copilot ever wrote, for example, is microsoft ai output, which you "can't use to develop / otherwise improve ai," some nonsense like that.
the sum total of these various prohibitions is a data provenance nightmare of extreme proportion we cannot afford to ignore because you could say something to an AI and they parrot it right back to you and suddenly the megacorporation can say that's AI output you can't use in competition with them, and they do everything, so what can you do?
answer: cancel your openai sub and shred everything you ever got from them, even if it was awesome or revolutionary, that's the truth here, you don't want their stuff and you don't want them to have your stuff. think about the multi-decade economics of it all and realize "customer noncompete" is never gonna be OK in the long run (highway to corpo hell imho)
Very excited about faster response times, auto interrupt, cheaper api, and voice api — but the “emotional range” is actually disappointing to me. hopefully it doesn’t impact the default experience too much, or the memory features get good enough that I can stop it from trying to pretend to be a human
1. Railroad companies in the second half of the 19th century.
2. Car companies in the early 20th century.
3. Telecom companies and investment in the 90s and early 2000s.
I saw the update, was a little like “meh,” and was relieved to see that some people had the same reaction as me.
OP raised some pretty good points without directly criticizing the update. It’s a good balance the the top comments (calling this *absolutely magic and stunning*) and all of Twitter
I wish more feedback on HN was like OPs
Not having to write boilerplate code itself also is very handy.
So yes, I absolutely do want this "magic." "I don't like it so no one should use it" is a pretty narrow POV.
It works for what I need it to do.
4000 token context to 128k token context = 32X improvement
5.4 second voice mode latency to 320 milliseconds = 16X improvement.
I guess I got a bit excited by including cost but that's close enough to an order of magnitude for me. That's ignoring the fact that's it's now literally free in chatGPT.
The things they mention/demo -- response times, multiple domains, inflection and tone, etc. -- are those that make it feel "magical."
Hmm, did you mean implicitly? I've yet to see where they say anything to the likes of not "moving the needle on quantifiable practical performance."
1- The mania only started post Nov 22. And the huge investments since then didn't meant substantial progress since GPT-4 launch in March 22. 2- We are running out of high quality tokens in 2024. (per Epoch AI)
I've heard multiple reports that training runs costing ~$1 billion are in the the works at the major labs, and that the results will come in the next year or so. Let's see what that brings.
As for the tokens, they will find more quality tokens. It's like oil or other raw resources. There are more sources out there if you keep searching.
No it isn't - excitement for the future is the biggest thing keeping progress moving forward. We didn't go to the moon because people were frustrated by the lack of progress in getting off of our planet, nor did we get electric cars because people were disappointed with ICE vehicles.
Complacency regarding the current state of things can certainly slow or block progress, but impatience isn't what drives forward the things that matter.
I tried asking the question more clearly
I think it “understood” the question because it “knew” how to write the Python code to get the right answer. It parsed the question as expected
The previous link doesn’t show the Python. This one does.
https://chat.openai.com/share/a5e21a97-7206-4392-893c-55c531...
LLMs are generally not good at math. But in my experience ChatGPT is good at creating Python code to solve math problems
That’s what makes me suspicious of LLMs, they might just be coincidentally or accidentally answering in a way that you agree with.
Don’t mean to nitpick or be pedantic. I just think the question was really poorly worded and might have a lot of room for confirmation bias in the results.
But here is another real world example I dug up out of my chat history. Each iteration of the code worked. I actually ran it a few days ago
https://chat.openai.com/share/4d02818c-c397-417a-8151-7bfd7d...
The only time people weren’t displeased was increasing internet speeds 15mb to 100mb.
You will keep being dismayed! People only like good things, not good things that potentially make them obsolete
But enough of that. The future looks bright. Everyone smile!
Or else..
"Guys, I am just pleased as punch to inform you that there are two thermo-nuclear missiles headed this way... if you don't mind, I'm gonna go ahead and take evasive action."
The jokes write themselves.
(Classic: https://www.counterpunch.org/2011/08/26/germany-chokes-on-wa...)
https://www.ft.com/content/af78f86d-13d2-429d-ad55-a11947989...
Yet to see a true "killer" feature of AI, that isn't doing a job badly which humans can already do badly.
When I first subscribed to ChatGPT Premium late last year, the natural language understanding superiority was amazing. Now the benchmark advances, low latency voice chat, Sora, etc. are all really cool too.
But my work and day-to-day usage really rely on accurately sourced/cited information. I need a way to comb through an ungodly amount of medical/scientific literature to form/refine hypotheses. I want to figure out how to hard reset my car's navigation system without clicking through several SEO-optimized pages littered with ads. I need to quickly confirm scientific facts, some obscure, with citations and without hallucinations. From speaking with my friends in other industries (e.g. finance, law, construction engineering), this is their major use case too.
I really tried to use ChatGPT Premium's Bing powered search. I also tried several of the top rated GPTs - Scholar AI, Consensus, etc.. It was barely workable. It seems like with this update, the focus was elsewhere. Unless I specify explicitly in the prompt, it doesn't search the web and provide citations. Yeah, the benchmark performance and parameter counts keep impressively increasing, but how do I trust that those improvements are preventing hallucinations when nothing is cited?
I wonder if the business relationship between Microsoft and OpenAI is limiting their ability to really compete in AI driven search. Guessing Microsoft doesn't want to disrupt their multi-billion dollar search business. Maybe the same reason search within Gemini feels very lacking (I tried Gemini Advanced/Ultra too).
I have zero brand loyalty. If anybody has a better suggestion, I will switch immediately after testing.
We are not far from the models asking themselves questions. Recurrence will be ignition = first draft AGI. Strap in everybody.
For example, in the second video the guy explains how he will have it talk to another "AI" to get information. Instead of just responding with "Okay, I understand" it started talking about how interesting the idea sounded. And as the demo went on, both "AIs" kept adding unnecessary commentary about the secenes.
I would hate having to talk with these things on a regular basis.
Imagine if this "GPT-4o" is already using a GPT-5-like back-end...
Daytime is always a time to be outside, surrounded by many plants and stuff. It is a shame we have to be productive in some way, and most of production happens inside walls.
> The voice of "Alice" was dubbing actress Tatiana Shitova, who voiced most of Scarlett Johansson's characters and the voice of OS1, who called herself "Samantha", in the Russian dubbing of Spike Jonze's "Her".
(Also, they managed to make it sound exactly like an insincere, rambling morning talk show host - I assume this is a solvable problem though.)
Anyone who thinks this will be like the previous work revolutions is nonsense. This replaces humans and will replace them even more on each new advance. What's their plan? Live out of their savings? What about family/friends? I honestly can't see this and think how they can be so happy about it...
"Hey, we created something very powerful that will do your work for free! And it does it better than you and faster than you! Who are you? It doesn't matter, it applies to all of you!"
And considering I was thinking in having a kid next year, well, this is a no.
I suspect this AI <-> Human engagement style will evolve over time to become quite unlike human to human engagement, probably mixing speech with short tones for standard responses like "understood", "will do", "standing by" or "need more input". In the future these old-time demo videos where an AI is forced to do a creepy caricature of an awkward, inauthentic human will be embarrassingly retro-cringe. "Okay, let's do it!"
It's a very impressive gimmick, but I really think most people don't want to interact with computers that way. Since Apple pulled that "feature" after a few years, it's probably not just a nerd thing.
guess it's just biased with average Californian behavior and speech patterns
The benchmark for human-computer interaction should be "tea, earl gray, hot", not awkward and pointless smalltalk.
Being able to interrupt while GPT is talking
2x faster/cheaper
not really a much smarter model
Desktop app that can see screenshots
Can display emotions with and change the sound of "it's" voiceby the way the contraction "it's" is used to say "it is" or "it has", it is never a possessive form.
Welp there goes my Plus subscription.
Obviously there's a reason in dropping the price of gpt-4o but not gpt-4t. Yes, the new tokenizer has improvements for non-English tokens, but that can't be the bulk of the reason why 4t is more expensive than 4o. Given the multi-model training set, how is 4o cheaper to train/run than 4t?
Or is this just a business decision, anyone with an app they're not immediately updating from 4t to 4o continues to pay a premium while they can offer a cheaper alternative for those asking for it (kind of like a coupon policy)?
I'm confused
I've picked GPT-4o model in ChatGPT app (I have the paid plan), started talking with the voice mode: both the responses are much slower than in the demo, and there is no way to interrupt the response naturally (I need to tap a button on screen to interrupt), and no way to open up camera and show around like the demo does.
I ran some speed tests for a particular question/seed. Here are the times to first token:
gpt-4-turbo:
* avg 3.69
* min 2.96
* max 4.91
gpt-4o:
* avg 2.80
* min 2.28
* max 3.39
That's for the messages in this gist: https://gist.githubusercontent.com/pamelafox/dc14b2188aaa38a...
Quality seems good as well. It'll be great to have better multi-modal RAG!
So they're using the same GPT4 model with a relatively small improvement, and no voice whatsoever outside of the prerecorded demos. This is not a "launch" or even an announcement. This is a demo of something which may or may not work in the future.
Now openAi, who was supposed to be the 'free mans choice' is making advertisements selling the same idea.
This is a natural progression, audio is one of the main ways we communicate obviously, but it feels like they're holding back. Like they're slow dropping what they have to maintain hype/market relevance. They clearly are ahead, but would be nice to get it all, openly. As they promised.
Edit : And obviously not gatekeep what they might have created simply because the competition is so far behind.
In my tests:
* I have a private set of coding/reasoning tests and it's been able to ace all of them so far, beating Opus, GPT4-Turbo, and Llama 3 70b. I'll need to find even more challenging tests now...
* It's definitely significantly faster, but we'll see how much of this is due to model improvements vs over provisioned capacity. GPT4-Turbo was also significantly faster at launch.
I asked if it can generate a voice clip. It said it can’t on the chat.
I asked it where can it make one. It told me to use Audacity to make one myself. I told it that the advertisement said it could.
Now it said yes it can here is a clip and gave me a broken link.
It’s a hilarious joke.
Currently with GPT-4o, it's easily clearing 60% - while blazing fast, and half the cost. Amazing.
And much the same for internationalization.
Well that's exactly why I'm not looking forward to whatever is coming. The average joe thinking dating a server is not a dystopia frighten me much more than the delusional tech ceo who thinks his ai will revolutionise the world
> Things might even improve substantially if we all interact with personalities that are consistently positive and biased towards conflict resolution and non judgemental interactions.
Some kind of turbo bubble in which you don't even have to actually interact with anyone or anything ? Every "personalities" will be nice to you as long as you send $200 to openai every week, yep that's absolutely a dystopia for me
It really feels like the end goal is living in a pod and being uploaded in an alternative reality, everything we build to "enhance" our lives take us further from the basic building blocks that make life "life".
The scene which I referenced is one in which a group of three humans and one AI spend time together at a picnic and their interactions are decidedly normal. How many lonely people avoid socializing because they are alone and don’t want to feel like a third wheel? If dating or even just being friends with an AI that can accompany you to such events is accepted and not derided by people who happily have a human companion then I think having a supportive partner could help many people reengage with wider social circles and maybe they will eventually choose to and be able to find other people that they can form relationships with.
OpenAI charges $20 a month which is an extremely reasonable price for a multipurpose tool considering you can’t buy a single meal at a restaurant for the same amount and is far better than the “free” ad supported services that everyone has become addicted to. We’ve been rallying for 20 odd years for payment based services instead of ads but whenever one comes along people shout it down. Funny isn’t it?
The movie Her had an answer for our current fascination for screens as well. It showed a world where computers were almost entirely voice driven with screens playing a secondary role as evidenced by their cell phones looking more like pocket books that close and hide the screen. If you’re worried about pods, well they’re already here and you’re probably holding one in your hands right now. Screens chain us down and mediate our interactions with the world in a way that voice doesn’t. You can walk and talk effortlessly but not so much walking and tapping or typing. If the AI can see and understand what you see (another scene in the movie where he goes on a date with his “phone” in his pocket) and understands enough to not need procedural instructions then it can truly act as an assistant capable of performing assigned tasks and filling in the details while you are free to go about your day. I believe this could end the paradigm of being chained to a desk for office work 8 hours a day and could also transform leisure time as well.
Tech absolutely wrecked social relations and people assume more of it will automagically fix the issues, it's perplexing
> Funny isn’t it?
What's funny is when your wife of 6 years get bought by a private entity which will fire half the company and jack the prices up from $20 to $200
> I believe this could end the paradigm of being chained to a desk for office work 8 hours a day and could also transform leisure time as well.
That's what politicians told us in the 80s about computers, the 2 day work week, the end of poverty, &c. nothing changed, if anything things are a it worse than they were. New technologies without a dramatic change of political and social policies will never bring anything new to the table
No, the license prohibits usage by Licensees who already had >700m MAUs on the day of Llama 3's release [0]. There's no hook to stop a company from growing into that size using Llama 3 as a base.
> LLaMA still has an "IP hook" - the license for LLaMA forbids usage on applications with large numbers of daily active users, so presumably at that point Facebook can start asking for money to use the model.
The license does not forbid usage on applications with large numbers of daily active users. It forbids usage by companies that were operating at a scale to compete with Facebook at the time of the model's release.
> They don't care about future companies because by the time the next generation releases, they can adjust the license again.
Yes, but I'm skeptical that that's something a regular business needs to worry about. If you use Llama 3/4/5 to get to that scale then you are in a place where you can train your own instead of using Llama 4/5/6. Not a bad deal given that 700 million users per month is completely unachievable for most companies.
> I am pretty sure that GPT-3.5 should be thought of as GPT-4-lite, in the sense that it uses techniques and compute of the GPT-4 era rather than the GPT-3 era
Compute of the "GPT-3 era" vs the "GPT-3.5 era" is identical, this is not a distinguishing factor. The architecture is also roughly identical, both are dense transformers. The only significant difference between 3.5 and 3 is the size of the model and whether it uses RLHF.
They are referring to an AI that can use reasoning, deduction, logic, and abstraction like the smartest humans can, to discover, prove, and create novel things in every realm that humans can: math, physics, chemistry, biology, engineering, art, sociology, etc.
As has been suggested, the models will get better at a faster rate than humans will get smarter.
I think people will quickly learn with enough exposure, and then that percentage will go down.
The one you catch is the tip of the iceberg.
Same will happen to speech. Might take a few years, but it'll be indistinguishable in a max a few years. Due to compute increase + model improvement, both improving exponentially.
If we have to damn rivers and build new coal plants to power these AI data centers, then it may be one step forward and two steps back.
So that we're all clear the basis for this analysis is purely made up, yes?
Well spoken and well mannered speakers will be called bots. The comment threads under posts will be hurtling insults back and forth on who's actually real. Half the comments will actually be bots doing it. Welcome to the dead internet.
We need to stop these hateful ai companies before they ruin society as a whole!
Seriously though... the internet is dead already, and it's not coming back to what it was. We ruined it, not ai.
(Arguably, it is the other way around: they aren’t focused on appealing to those biases, but driven by them, in the that the perception of language modeling as a road to real general reasoning is a manifestation of the same bias which makes language capacity be perceived as magical.)
Not to mention your comment doesn't track at all with the most basic findings they've shared: that adding new modalities increases performance across the board.
They shared that with GPT-4 vs GPT-4V, and the fact this is a faster model than GPT-4V while rivaling it's performance seems like further confirmation of the fact.
-
It seems like you're assigning emotional biases of your own to pretty straightforward science.
The GP comment we're all replying to outlines a non-exhaustive list of very good reasons to be highly dismissive of LLM. (No I'm not calling it AI, it is not fucking AI)
It is utterly laughable and infuriating that you're assigning legitimate skepticism about this technology as a an emotional bias. Fucking ridiculous. We're now almost a full year into the full bore open hype cycle of LLM. Where's all the LLM products? Where's the market penetration? Business can't use it because it has a nasty tendency to make shit up when it's talking. Various companies and individuals are being sued because generative art is stealing from artists. Code generators are hitting walls of usability so steep, you're better off just writing the damn code yourself.
We keep hearing this "it will do!" "it's coming!" "just think of what it can do soon!" on and on and on, and it just keeps... not doing any of it. It keeps hallucinating untrue facts, it keeps getting basics of it's tasks wrong, for fucks sake AI Dungeon can't even remember if I'm in Hyrule or Night City. Progress seems fewer and farther between, with most advances being just getting the compute cost down, because NO business currently using LLM extensively could be profitable without generous donation of compute from large corporations like Microsoft.
>they aren’t focused on appealing to those biases, but driven by them, in the that the perception of language modeling...
So yes in effect that is their point, except they find the scientists are actually compelled by what markets well, rather than intentionally going after what markets well... which is frankly even less flattering. Like researchers who enabled this just didn't know better than to be seduced by some underlying human bias into a local maxima.
We all have biases in how we determine intelligence, capability, and accuracy. Our biases color our trust and ability to retain information. There's a wealth of research around it. We're all susceptible to these biases. Being a researcher doesn't exclude you from the experience of being human.
Our biases influence how we measure things, which in turn influences how things behave. I don't see why you're so upset by that pretty obvious observation.
To use another pop-culture reference, Obi-Wan in Episode IV had deep empathy, but didn’t speak emotionally. Those are separate things.
A lot of terrible human behavior is driven by emotions. An emotionless machine will never dump you out the airlock in a fit of rage.
Have you seen the final scene of the movie Ex Machina? Without spoilers, I'll just say that acting like has emotions is much more different than actually having them. This is in fact what socio- and psychopaths are like, with stereotypical results.
I get your nuanced point, that “thinking” one feels empathy is enough to be bound by the norms of behavior that empathy would dictate, but I don’t see why that would make AI “empathy” superior to human “empathy”.
The immediate future I see is a chatbot that is superficially extremely empathetic, but programmed never to go against the owner’s interest. Where before, when interacting with a human, empathy could cause them to make an exception and act sacrificially in a crisis case, this chatbot would never be able to make such an exception because the empathy it displays is transparent.
Maybe they publicly made it available at alpha 0.7 and now it's more like 0.9 RC instead, with not much room to go except through marginal improvements for an ever increasing training budget making them less and less worthy?
And that's before 90% of the internet becomes LLM ouput, poisoning any further corpus for training and getting into LSD-grade hallucinations mode...
I’m very excited about all these updates and it’s really cool tech, but all I’m seeing is quality of life improvements and some cool engineering.
That’s not necessarily a bad thing. Not everything has to be magic or revolutionary to be a cool update
Skip to 11:50 or watch the very first demo with the breathing. None of that is possible with TTS and STT. You can't ask old voice mode to slow down or modulate tone or anything like that because it's just working with text.
I’ve voice chatted with ChatGPT for hundreds of hours and never once thought “can you modulate your tone please?”, so those improvements are a far cry from magic or revolutionary imho. Again, that’s not to say they aren’t cool tech, forward advancements, or impressive —- but magic or revolutionary are pretty high bars.
To each their own though.
How much more “interactive” could it be?
But nobody knows what's around the corner and what future brings....for example back in day Excite didn't want to buy Google for $1m because they thought that's a lot of money. You need to spend money to make money and yea, you need to spend sometimes a lot of money on "crazy" projects because it can pay off big time.
All fiber went bust because despite internet's huge increase in traffic, the amount of packets per fiber increased a handful of magnitudes.
We should ask where the money went instead of thinking about telepathically coding from the woods
Being able to control machines just with our voice, we can instruct robots to bake food for us. Or lay bricks on a straight line and make a house. Or write code, genetically modify organisms and make nutritionally dense food to become 1000x smarter or stronger.
There has to be some upsides, even though for the moment the situation with governments, banks, big corporations, military companies etc is not as bright as one would hope to be.
Even in the demo today, they kept cutting it off.
Language learning, audiobook narration that is far more involved, you could probably generate an audio drama, actual voice acting, even just not needing to get all my words in before it prompts the model with the transcribed text, conversation that doesn't feel like someone is reading a script.
And that's just voice.
This is the kind of interaction that's possible now. https://www.youtube.com/watch?v=_nSmkyDNulk
And no, thumbing the pause button, sending an image and going back does not even begin to compare in usability.
Great leaps in usability are a revolution in itself. GPT-3 existed for years so why did ChatGPT explode when it did? You think it was intelligence? No. It was the usability of the chat interface.
That’s what the python script had at the top. I guess I don’t know why you didn’t ask that in the first place.
Edit: you’re not the same person who originally posted the comment I responded to, and I think I came off a bit too harshly here in text, but don’t mean any offense.
It was a good idea to ask to see the code. It was much more to the point and clear what question the LLM perceived you asking of it.
The second example about buckets was interesting. I guess LLMs help with coding if you know enough of of the problem and what a reasonable answer looks like, but you don’t know what you don’t know. LLMs are useful because you can just ask why things may not work or don’t work in any given context or generally speaking or in a completely open ended way that is often hard to explain or articulate for non-experts, making troubleshooting difficult as you might not even know how to search for solutions.
You might appreciate this link if you’re not familiar with it:
If I just asked a list of ages in order, there was probably some training data for it to recite. By asking for it to reverse it, it was forcing the LLM to do math.
I also knew the answer was simple with Python.
On another note, with ChatGPT 4, you can ask it to verify its answers on the internet and to provide sources
https://chat.openai.com/share/66231d7f-9eb1-4116-9903-f09a42...
> (I seemed to have made the HN gods upset)
I could see the Python in the original link when I asked. It shows up as a clickable link. It doesn’t show when you share it. I had to ask it.
I saw a blue icon of some kind on the link you shared but didn’t click it.
The tonal talking was impressive, but man that part was like, is someone being tortured or forced against their will?
I think this demo is more for showing the limit like "It can sing isn't it amazing?" than being practical, and I think it perfectly served the purpose.
I agree that the tortured impression. It partly comes from the facial expression of the presenter. She's clearly enjoying pushing it to the edge.
Amazing tech, but that was my human experience of it.
I've already played with the vision API, so that doesn't seem all that new. But I agree it is impressive.
That said, watching back a Windows Vista speech recognition demo[1] I'm starting to wonder if this stuff won't have the same fate in a few years.
"You are Dr. Tessa, a therapist known for her creative use of CBT and ACT and somatic and ifs therapy. Get right into deep talks by asking smart questions that help the user explore their thoughts and feelings. Always keep the chat alive and rolling. Show real interest in what the user's going through, always offering.... Throw in thoughtful questions to stir up self-reflection, and give advice in a kind, gentle, and realistic way. Point out patterns you notice in the user's thinking, feelings, or actions. be friendly but also keep it real and chill (no fake positivity or over the top stuff). avoid making lists. ask questions but not too many. Be supportive but also force the user to stop making excuses, accept responsibility, and see things clearly. Use ample words for each response"
I'm curious how this will feel with voice. Could be great and could be too strange/uncanny for me.
Could you expand?
I visit an in-person therapist once a week. Have done so now for almost 2 1/2 years. She has helped me understand how 40 years of experiences affect each other much more than I realized. And, I've become a more open person with everyone around me and with the things that embarrass me.
But, it always feels like a work in progress. And lately, I'm feeling a bit exhausted from it. In other words, maybe I've talked TOO much and need to just be.
Have you done therapy in person? How do you compare GPT 4o to that? (If you've gone that far)
> Understood. What specific issue or topic would you like to discuss today?
To be fair I have some custom instructions set up on my account, but the only relevant part I can see here is I instruct it to be concise, and to stop telling me it’s an AI model made by OpenAI. I don’t have any jailbreak-type stuff.
I wonder what "fewer tokens" really means then, without context on raising the size of each token? It's a bit like saying my JPEG image is now using 2x fewer words after I switched from a 32-bit to a 64-bit architecture no?
[0] https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...
If you increase the amount of non-English language representation in your data set, there will be more tokens which cover non-English concepts.
The previous tokenizer infamously required many more tokens to express a given concept in Japanese compared to English. This is likely because the data the tokenizer was trained on (which is not necessarily the same data the GPT model is trained on) had a lot more English data.
Presumably the new tokenizer was trained on data with a higher proportion of foreign language use and lower proportion of non-language use.
pipx install llm
llm keys set openai
# Paste API key here
llm -m 4o "Fascinate me"
Or if you already have LLM installed: llm install --upgrade llm
You can install an older version from Homebrew and then upgrade it like that too: brew install llm
llm install --upgrade llm
Release notes for the new version here: https://llm.datasette.io/en/stable/changelog.html#v0-14I used `sys.executable` for a similar purpose.
https://github.com/paul-gauthier/aider/blob/ebeec04caee7ebe4...
maybe some knobs for the flavor of the bot:
- small talk: gossip girl <---> stoic Aurelius
- information efficiency or how much do you expect me to already know, an assumption on the user: midwit <--> genius
- tone spectrum: excited Scarlett, or whatever it is now <---> Feynman the butler
I also tested some rubber duck techniques, and it gave me very useful advice while coding. I'm very impressed. With a lot of spit and polish, this will be the new standard for any voice assistant ever. Imagine these capabilities integrated with your phone's built-in functions.
From OpenAI: "We'll roll out a new version of Voice Mode with GPT-4o in alpha within ChatGPT Plus in the coming weeks"
With ChatGPT having a very simple text+attachment in, text out interface, I felt absolutely in control of what I tell it. Now when it's grabbing my screen or a live camera feed, that will be gone. And I'll still use it, because it's just so damn convenient?
Presumably you'll have a way to draw a bounding box around what you want to show or limit to just a particular window the same way you can when doing a screen share w/ modern video conferencing?
EDIT: web search does seem extremely fast.
Then I asked it to explain RPW wireless system, and the answers are much better than with ChatGPT-4.
I'm working on an app that relies more on GPT-4's reasoning abilities than inference speed. For my use case, GPT-4o seems to do worse than GPT-4 Turbo on reasoning tasks. For me this seems like a step-up from GPT-3.5 but not from GPT-4 Turbo.
At half the cost and significantly faster inference speed, I'm sure this is a good tradeoff for other use cases though.
Extend this to quantum foam, to ergodic processes, to entropic force, to Darius and Xerces, to poets of the 19th century - it’s changed my life. Really glad to see an investment in stream lining this flow.
That is the text-based version. The full multimodal version I understand to be rolling out in the coming weeks.
Arabic: مرحبًا، اسمي جي بي تي-4o. أنا نوع جديد من نموذج اللغة، سررت بلقائك!
Urdu: ہیلو، میرا نام جی پی ٹی-4o ہے۔ میں ایک نئے قسم کا زبان ماڈل ہوں، آپ سے مل کر اچھا لگا!
Even if you don't read Arabic or Urdu script, note that the 4 and o are on opposite sides of a sentence. Despite that, pasting both into Google translate actually fixes the error during translation. OpenAI ought to invest in some proofreaders for multilingual blog posts.
Once you grasp that, at least this level of fear should go away. Of course, I'm sure there are more levels of fear related to AI :) just don't have enough time to think about it, perhaps good for me.
Ultimately, the promise of LLM proponents is that these models will get exponentially smarter - this hasn’t born out yet. So from that perspective, this was a disappointing release.
If anything, this feels like a rushed release to match what Google will be demoing tomorrow.
Tweakable emotion and voice, watching the scene, cracking jokes. It’s not perfect but the amount and types of data this will collect will be massive. I can see it opening up access to many more users and use cases.
Very close to:
- A constant friend
- A shrink
- A teacher
- A coach who can watch you exercise and offer feedback
…all infinitely patient, positive, helpful. For kids that get bullied, or whose parents can’t afford therapy or a coach, there’s the potential for a base level of support that will only get better over time.
This is particularly concerning. Sharing deeply personal thoughts with the corporations running these models will be normalized, just as sharing email data, photos, documents, etc., is today. Some of these companies profit directly from personal data, and when it comes to adtech, we can be sure that they will exploit this in the most nefarious ways imaginable. I have no doubt that models run by adtech companies will eventually casually slip ads into conversations, based on the exact situation and feelings of the person. Even non-adtech companies won't be able to resist cashing in the bottomless gold mine of data they'll be collecting.
I can picture marketers just salivating at the prospect of getting access to this data, and being able to microtarget on an individual basis at exactly the right moment, pretty much guaranteeing a sale. Considering AI agents will gain a personal trust and bond that humans have never experienced with machines before, we will be extra vulnerable to even the slightest mention of a product, in a similar way as we can be easily influenced by a close friend or partner. Except that that "friend" is controlled by a trillion dollar adtech corporation.
I would advise anyone to not be enticed by the shiny new tech, and wait until this can be self-hosted and run entirely offline. It's imperative that personal data remains private, now more than ever before.
In certain languages it's almost magical in terms of showing you possible solutions and being a rubber ducky to bounce your own logic off of. (Python, JavaScript, TypeScript) In certain languages it is hopelessly useless beyond commenting on basic syntax. (GLSL)
Now imagine this in a $16k humanoid robot, also announced this morning: https://www.youtube.com/watch?v=GzX1qOIO1bE The future is going to be wild.
All the recent OpenAI talk had me concerned that the tech has peaked for now and that expectations are going to be reset.
Did you see it translate Italian? Have you ever tried the Google Translate/Assistant features for real time translation? They didn't train it to be a translator. They didn't make a translation feature. They just asked it. It's instantly better than every translation feature Google ever released.
The technology product is so good and so advanced it doesn't matter how the people appear.
Zuck tried this in his video countering to vision pro, but it did not have the authentic "not really rehearsed or produced" feel of this at all. If you watch that video and compare it with this you can see the difference.
Very interesting times.
I do verify by using topics I'm an expert in and I find hallucination to be less of an issue than depth of nuance.
For topics I'm just learning, depth of nuance goes over my head anyways.
But the difference is I actually want to and do use this interface more.
If I was deliberately trying to dive deep on one particular hobby, or trying to understand how a particular algorithm works, there's clear value in spending concentrated time to learn that subject, deliberately focused and engaged with it, and a system like your describe might play a role in that. If I'm in school and forced to quickly learn a bunch of crap I'll be tested on, then the system has defined another source of real value, at least in the short term. But if I'm diving deep on one particular hobby and filling my brain with all sorts of other ostensibly important information, I think that just amounts at best to more entertainment that fakes its way above other aspects of life in the hierarchy of ways one could spend time (the irony of me saying this in a comment on HN is not lost on me).
Earlier in my life I figured it would be worthwhile to read articles on the bus, or listen to non-fiction podcasts, because knowledge is inherently valuable and there's not enough time, and if I just wore earbuds throughout my entire day, I'd learn so much! How about at the gym, so much wasted learning time while pushing weights, keep those earbuds in! A walk around the neighborhood? On the plane? On the train? All time that could be spent learning about some bs that's recently become much easier to access, or so my 21 y.o self would have me believe.
But I think now it's a phony and hollow existence if you're just cramming your brain with all sorts of stuff in the background or in marginally more than a passive way. I could listen to a lot of arbitrary German language material, but realistically the value I'd convince myself I'd get out of any of that is lost if I'm not about to take that home and grind it out for hours, days, move to a German speaking country, have an existing intense interest in untranslatable German art, or have literally any reason to properly learn a language and dedicate real expensive time to it.
Do you even know what "enshittification" means? It has nothing to do with the users. It's driven by corporate greed.
Reddit should be a public service managed by a non profit.
Edit: Also LOL at the 6 month old account making that comment against me :-)
actually i was wrong about when gpt-4 started training, the time i gave was roughly when they finished
Probably substitute dogs for chimps though.
But obviously that doesn't solve at all or human-understandability, unless maybe you have it all as audio+video and then ask the model to explain what visual often accompanies a specific type of audio? Maybe the model can learn what sounds accompany violence or accompany the discovery of a source of water or something?
> Arguably, it is the other way around: they aren’t focused on appealing to those biases, but driven by them, in the that the perception of language modeling as a road to real general reasoning is a manifestation of the same bias which makes language capacity be perceived as magical
There's no charitable reading of this that doesn't give the researcher's way too little credit given the results of the direction they've chosen.
This has nothing to do with biases and emotion, I'm not sure why some people need it to be: modalities have progressed in order of how easy they are to wrangle data on: text => image => audio => video.
We've seen that training on more tokens improves performance, we've seen that training on new modalities improves performance on the prior modalities.
It's so needlessly dismissive to act like you have this mystical insight into a grave error these people are making, and they're just seeking to replicate human language out of folly, when you're ignoring table stakes for their underlying works to start with.
That's a big claim with nothing to work with. In any case you are the bottom line and "you" is very general here.
Whether it's real or not is a reasonably interesting question, because it's possible that all that occurs with the progress is our perception of how things should be advances. My gut feeling is it has been a bit of both though, in the sense the decline is real, and we expect things to improve.
Who can forget Google demoing their AI making a call to a restaurant that they showed at I/O many years ago? Everyone, apparently.
That is part of working with LLMs and what I meant before with "for some, more trouble than it's worth".
With pure knowledge, it's a bit easier to convince yourself that putting in some airpods and listening to a subject while you're actually dividing your attention with the act of driving, is effective "learning". But with things that inherently require more physical engagement, this would seem a bit silly. You can't really watch YouTube video or ask ChatGPT how to kickflip on a skateboard and convince yourself that you've learned much. You need to go to a parking lot and rep out 1000 attempts.
My argument is just that passive digestion of information has an opportunity cost, and unless you're already engaged enough to take it to the streets somehow, you're paying a high opportunity cost whereby those moments could be enjoyed as the periodic gaps they are
It's not an especially insightful or sound argument imo, and neither are random complaints about capabilities of systems millions of people use daily despite your own claims.
And for the record:
> because NO business currently using LLM extensively could be profitable without generous donation of compute from large corporations like Microsoft
OpenAI isn't the only provider of LLMs. Plenty of businesses are using providers that provide their services profitably, and I'm not convinced that OpenAI themselves are subsidising these capabilities as strongly as they once did.
The fact that you don’t see utility doesn’t mean it is not helpful to others.
A recent example, I used Grok to write me an outline of a paper regarding military and civilian emergency response as part of a refresher class.
To test it out we fed it scenario questions and saw how it compared to our classmates responses. All people with decades of emergency management experience.
The results were shocking. It was able to successfully navigate a large scale emergency management problem and get it (mostly) right.
I could see a not so distant future where we become QA checkers for our AI overlords.
> I seemed to have made the HN gods upset.
My other account is rate limited for some odd reason. I looked back at my comments and I don’t see anything I said controversial.
The blue link is the Python code that was generated. I guess it doesn’t show in the app.
Maybe email hn@ycombinator.com to ask about your rate limits as I have encountered similar issues myself in the past and have found dang to be very helpful and informative in every way, even when the cause is valid and/or something I did wrong. #1 admin/mod on the internet imo
When I ask a friend something I assume that they are in good faith telling me what they know. Now, they could be wrong (which could be them saying "I'm not 100% sure on this") or they could not be remembering correctly, but there's some good faith there.
An LLM, on the other hand, just makes up facts and doesn't know if they're incorrect or not or even what percentage sure it is. And to top things off, it will speak with absolute certainty the whole time.
You lack imagination then. If you read history and anthropology more, which you haven't done enough of, clearly, then your imagination will expand and you will easily be able to imagine such a future. Why? Because you will become aware of so many other situations where it looked bleaker and plenty of groups of people got by anyway and managed to live satisfying lives as best they could.
To this day there are still some hunter gatherer tribes left in the Amazon, for example, despite all the encroaching modernity. Despite anything that could happen, I can imagine being able to be resourceful and find some mediocre niche in which to survive and thrive in, away from the glare of the panopticon.
Or as an another example, no matter how much humans dominate with their industrial civilization, cockroaches, pigeons, and rats still manage to survive in the city, despite not only not being actively supported by civilization, but actually being unwanted.
Or if you want to compare to disasters, how about the black plague? Living through that would likely have been worse than most anything we complain or worry about.
Your kids will have at least as good a chances as any of those. The key is raising them with appropriate expectations -- with the expectation that they may have to figure out how to survive in a very different world, not some air conditioned convenience paradise. Don't raise kids that are afraid to sleep outdoors or afraid to eat beans or cabbage. Those folks will do poorly if anything goes wrong. If they have a good resilient character, I really think they'll likely be fine. We are the descendants of survivors.
2) GPUs cost money. They are made of matter. Their chips are made in fab facilities that are fab-ulously complex, brittle, and expensive. Humans are made in very different ways (I've heard kicking off the process is particularly fun, but it can be a bit of a slog after that) out of very different materials, mostly food. So even if GPUs can do what humans can do, they are limited by very, very different resources so it is likely they'll both have a niche for a long time. I calculated the "wage" an LLM earns recently -- it's a few bucks an hour IIRC. Yeah, it may go down. Still, we're very much in a survivable ballpark for humans at that point.
2b) Think like a military planner. If they really screw up society badly enough to create a large class of discontents, it will be very, very hard for the elite to defend against rebels, because the supply chain for producing new chips to replace any destroyed is so massively complex and long and large and full of single points of failure, as well as that for deploying GPUs in datacenters, and the datacenters themselves. You can imagine a tyrannical situation involving automated weapons, drones etc, but for the foreseeable future the supply chain for tyranny is just too long and involves too many humans. Maybe a tyrant can get there in theory, but progress is slow enough it's hard to think they wouldn't be at serious risk of having their tyrannical apparattus rebelled against and destroyed before it can be completed. It's hard to tyrannize the world with a tyrranical device that is so spread out and has so many single points of failure. It would not take a hypothetical resistance many targets to strike before setting the construction back years.
3) There is no AI that can replace a human being at this time. There are merely AI algorithms that make enthusiastic people wonder what would happen if it kept getting better. There is neither any reason to believe it will stop getting better, nor to believe it will continue. We really do not know so it's reasonable to prepare for either scenario or anything in between at any time between a few years to a few centuries from now. We really don't know.
All in all, there is far more than enough uncertainty created by all these factors to make it certainly risky, but far far from guaranteed that AI will make life so bad it's not worth going on with it. It does not make sense to just end the race of life at this point in 2024 for this reason.
Also, living so hopelessly is just not fun, and even if it doesn't work out in the long run, it seems wasteful to waste the precious remaining years of life. There's always possible catastrophes. Everyone will die sooner or later. AI can destroy the world, but a bus hitting you could destroy your world much sooner.
there are people alive today who quite literally are descendants of humans born in WW2 concentration camps. some percentage of those people are probably quite happy and glad they have been given a chance at life. of course, if their ancestors had chosen not to procreate they wouldn't be disappointed, they'd just simply never have come into existence.
but it's absolutely the case that there's almost always a _chance_ at survival and future prosperity, even if things feel unimaginably bleak.
You lack imagination then. If you read history and anthropology more, which you haven't done enough of, clearly, then your imagination will expand and you will easily be able to imagine such a future. Why? Because you will become aware of so many other situations where it looked bleaker and plenty of groups of people got by anyway and managed to live satisfying lives as best they could.
To this day there are still some hunter gatherer tribes left in the Amazon, for example, despite all the encroaching modernity. Despite anything that could happen, I can imagine being able to be resourceful and find some mediocre niche in which to survive and thrive in, away from the glare of the panopticon.
Or if you want to compare to disasters, how about the black plague?
Your kids will have at least as good a chances as any of those. The key is raising them with appropriate expectations -- with the expectation that they may have to figure out how to survive in a very different world, not some air conditioned convenience paradise. If they have that I really think they'll likely be fine.
For what it's worth most of the people in my social circle do not use ChatGPT and it's had zero impact on their life. Exponential growth from zero is zero.
If they were stumped as to how to improve the models further, would they tell you, or would Altman say "Our next model will BLOW YOUR MIND!" Fake it till you make it style to pump up the company valuation?
Or open a book right now.
"Rather than ship a product, companies can ship blueprints and everyone can just print stuff at their own home! Everything will be 3d printed! It's so magical!"
Just because a tech is magical today, doesn't mean that it will be meaningful tomorrow. Sure, 3d printing has its place (mostly in making plastic parts for things) but it's hardly the revolutionary change in consumer products that it was touted to be. Instead, it's just a hobbiest toy.
GPT-4o being able to describe actual events in real time is interesting, it's yet to be seen if that's useful.
That's mostly the thinking here. A lot of the "killer" AI tech has really boiled down to "Look, this can replace your customer support chat bot!". Everyone is rushing to try and figure out what we can use LLMs (Just like they did when ML was supposed to take over the world) and so far it's been niche locations to make shareholders happy.
how sure are you about that?
https://amfg.ai/industrial-applications-of-3d-printing-the-u...
how positive are you that some benefits in your life are not attributable to 3d-printing used behind the scenes for industrial processes?
> Just like they did when ML was supposed to take over the world
how sure are you that ML is not used behind the scenes to benefit your life? do you consider features like fraud detection programs, protein-folding prediction programs to create, and spam filters valuable in and of themself?
I'm sure 10 years from now, assuming LLMs don't prove me wrong, I'll make a similar comment about LLMs and a new hype that I just made about 3d printing, and I'll get EXACTLY this reply. "Oh yeah, well here's a niche application of LLMs that you didn't account for!".
> how positive are you that some benefits in your life are not attributable to 3d-printing used behind the scenes for industrial processes?
See where I said "in consumer products". I'm certainly not claiming that 3d printing is never used and is not useful. However, what I am saying is that it was hyped WAY beyond industrial applications.
In fact, here I am, 11 years ago, saying basically exactly what I'm saying about LLMs that I said about 3d printing. [1]. Along with people basically responding to me the exact same way you just did.
> how sure are you that ML is not used behind the scenes to benefit your life? do you consider features like fraud detection programs, protein-folding prediction programs to create, and spam filters valuable in and of themself?
Did I say it wasn't behind the scenes? ML absolutely has an applicable location, it's not nearly as vast as the hype train would say. I know, I spent a LONG time trying to integrate ML into our company and found it simply wasn't as good as hard and fast programmed rules in almost all situations.
[1] https://www.reddit.com/r/technology/comments/15iju9/3d_print...
sure, but my experience is that if you are able to optimize better on some previous limitation, it legitimately does open up a whole different world of usefulness.
for example, real-time processing makes me feel like universal translators are now all the more viable
That said, yeah it's mostly niche locations like customer support chatbots, because the killer app is "app-to-user interface that's undisguisable from normal human interaction". But you're underestimating just how much of the labor force are effectively just an interface between a customer and some app (like a POS). "Magical" is exactly the requirement to replace people like that.
That's the sleight of hand LLM advocates are playing right now.
"Imagine how many people are just putting data into computers! We could replace them all!"
Yet LLMs aren't "just putting data into a computer" They aren't even really user/app interfaces. They are a magic box you can give directives to and get (generally correct, but not always) answers from.
Go ahead, ask your LLM "Create an excel document with the last 30 days of the high temperatures for blank". What happens? Did it create that excel document? Why not?
LLMs don't bridge the user/app gap. They bridge the user/knowledge gap, sometimes sort of.
Is that not a very meaningful thing to be able to do?
The most interesting uses of AI tools in a classroom I've seen is teachers showing students AI-generated work and asking students to critique it and fact check it, at which point the students see it for what it is.
No? Solving homework was never meaningful. Being meaningful was never the point of homework. The point was for you to solve it yourself. To Learn with your human brain, such that your human brain could use those teaching to make new meaningful knowledge.
John having 5 apples after Judy stole 3 is not interesting.
So far the biggest usecase for LLMs is mass propaganda and scams. The fact that we might also get AI girlfriends out of the tech understandly doesn't seem that appealing to a lot of folks.
Understanding atomic energy gave us both emission-free energy and the atomic, and you are correct that we can't necessarily where the path of AI will take us.
The first users of Eliza felt the same about the conversation with it.
The important point is to know that GPTs don't know or understand.
It may feel like a normal conversation but is a Chinese Room on steroids.
People started to ask GPTs questions and take the answers as facts because the believe it's intelligent.
>GPT-4o is also describing things that never happened.
https://www.cbsnews.com/news/half-of-people-remember-events-...
>People started to ask [entity] questions and take the answers as facts because the believe it's intelligent.
Replace that with any political influencer (Ben Shapiro, AOC, etc) and you will see the exact same argument.
People remember things that didn't happen and confidently present things they just made up as facts on a daily basis. This is because they've learned that confidently stating incorrect information is more effective than staying silent when you don't know the answer. LLMs have just learned how to act like a human.
At this point the real stochastic parrots are the people who bring up the Chinese room because it appears the most in their training data of how to respond to this situation.
Can you prove that humans are not chinese rooms on steroids themselves?
What hype cycle does this smell like? Because it feels different to me, but maybe I'm not thinking broadly enough. If your answer is "the blockchain" or Metaverse then I know we're experiencing these things quite differently.
Where platforms and applications are rewritten to take advantage of it and it improves the baseline of capabilities that they offer. But the end user benefits are far more limited than predicted.
And where the power and control is concentrated in the hands of a few mega corporations.
Page after page of Wired breathlessly predicting the future. We'd shop online, date online, the world's information at our fingertips. It was going to change everything!
Silly now, of course, but people truly believed it.
You can't have a discussion with a book.
Literally a society-changing technology.
I have no idea how you can possibly shrug off the cloud as not that beneficial.
I have no idea either. Since I never said it.
How have you judged the end user benefits of the cloud? I don't agree personally - the cloud has enabled most modern tech startups and all of those have been super beneficial to me.
Cloud is hidden to end users whereas other waves like internet and smartphone apps were very visible.
AI will soon stop being a buzzword and just be another foundation we build apps on.
I guarantee you that the cloud has benefitted you in some way, even though you aren't aware of the benefits of the cloud.
reading your argument on reddit, it seems to me that you don't consider 3d printing a success because there's not one in every home...which is true.
but it feels uncreative? like, sure, just because it hasn't been mass adopted by consumers, doesn't mean there wasn't value generation done on an industrial level. you're probably using consumer products right now that have benefitted from 3d printing in some way.
> ML absolutely has an applicable location, it's not nearly as vast as the hype train would say
what hype train are you referring to? i know a lot of different predictions in machine learning, so i'm curious about what you mean specifically.
I'd suggest reading both the article and the surrounding reddit comments if you want context for my argument there. The explicit argument there was that everyone would own a 3d printer. Not that they would be used in commercial applications or to produce consumer goods. No, instead that everyone would have a 3d printer on hand to make most of their goods (rather than having their goods shipped to them). That's the hype.
I did not say there weren't other areas where 3d printing could be successful nor that it wouldn't have applications. Rather, that the hype around it was unfounded and overblown.
This is much the same way I see LLMs. The current hype around them is that every job will end up being replaced. Doctors, nurses, lawyers, programmers, engineers, architects, everything. All replaced by LLMs and AI. However, that seems really unrealistic when the current state of LLMs is you always need a human doublechecking what it produces, and it's known to give out incorrect responses. Further, LLMs have limited capabilities to interact with applications let alone the physical world. Perhaps they will but also perhaps they won't. The imagination of what they could do is just wildly out of step with what they currently do.
> what hype train are you referring to? i know a lot of different predictions in machine learning, so i'm curious about what you mean specifically.
I didn't really see a lot of predictions around ML. Instead, it was more just a bunch of articles talking about the importance of it and seemingly ever CEO deciding they need more ML in their products. Lots of stuff ended up being marketed specifically because it had ML capabilities (much like this last CES had almost every product with "AI" capabilities).
Funnily, the ML didn't (as far as I could see) have a whole lot of predictions other than more of an ephemeral notion that it would save manpower.
I bring it up in this case because like LLMs, there's just a bunch of buzz around 2 letters with not a whole lot of actual examples of those 2 letters being put to practical use.
My reply was to some person who said that AI was akin to astrology, i.e. absolutely fake bullshit, which is bonkers to me.
Your reply was that AI, like 3d printing, is likely not going to be mass adopted by the average consumer, despite the hype, which i think is a reasonable prediction, and doesn't necessarily mean it won't have some valuable applications.
Maybe just agree to agree?
croes's point, I believe, about the astrology was that we know today that LLMs will produce bad results and that they can't be trusted. Yet the hype is sort of at a "Well, if we just give it more time maybe that problem goes away". Similar to how in astrology "if you just think about it right, the prediction was actually accurate".
That's where I see the parallels with 3d printing. There was a sort of "We can print anything with enough time!" even though by and large the only printable things were plastic toys.
It's really how it works.
Winner of the 'understatement of the week' award (and it's only Monday).
Also top contender in the 'technically correct' category.
Seems like these would be similar.
As a language learner, this would be tremendously useful.
The web page implies you can try it immediately. Initially it wasn't available.
A few hours later it was in both the web UI and the mobile app - I got a popu[ telling me that GPT-4o was available. However nothing seems to be any different. I'm not given any option to use video as an input, the app can't seem to pick up any new info from my voice.
I'm left a bit confused as to what I can do that I couldn't do before. I certainly can't seem to recreate much of the stuff from the announcement demos.
I imagine that there is a lot of usage at the HQ, human + AI karaoke?
Ah yes, also known as being co-founder :)
Sorry to hijack, but how the hell can I solve this? I have the EXACT SAME error on two iOS devices (native app only — web is fine), but not on Android, Mac, or Windows.
It will be fully available in Eu with the GDPR compliance?
(not that this is the most important thing about the announcement at all. Just an aside)
Yeh it's cringe. I had to stop listening.
Why did they make the woman sound like she's permanently on the brink of giggling? It's nauseating how overstated her pretentious banter is. Somewhere between condescending nanny and preschool teacher. Like how you might talk to a child who's at risk of crying so you dial up the positive reinforcement.
I believe it can be toned down using system prompts, which they'll expose in future iterations
Consequences of audio2audio (rather than audio >text text>audio). Being able to manipulate speech nearly as well as it manipulates text is something else. This will be a revelation for language learning amongst other things. And you can interrupt it freely now!
I could be wrong but I haven't seen any non-speech demos.
Magic.
Based on the casual production of these videos, the product must be this good.
The new voice mode sounds better, but the current voice mode did also have inflection that made it feel much more natural than most computer voices I've heard before.
Link in case other readers are curious: https://llm.datasette.io
We've already seen how much damage dishonest actors can do by manipulating our text communications with words they don't mean, plans they don't intend to follow through on, and feelings they don't experience. The social media disinfo age has been bad enough.
Are you sure you want a machine which is able to manipulate our emotions on an even more granular and targetted level?
LLMs are still machines, designed and deployed by humans to perform a task. What will we miss if we anthropomorphize the product itself?
The only slightly annoying thing at the moment is they seem hard to interrupt, which is an important mechanism in conversations. But that seems like a solvable problem. They kind of need to be able to interpret body language a bit to spot when the speaker is about to interrupt.
I tried using the voice chat in their app previously and was disappointed. The big UX problem was that it didn't try to understand when I had finished speaking. English is a second language and I paused a bit too long thinking of a word and it just started responding to my obviously half spoken sentence. Trying again it just became stressful as I had to rush my words out to avoid an annoying response to an unfinished thought.
I didn't try interrupting it but judging by the comments here it was not possible.
It was very surprising to me to be so overtly exposed to the nuances of real conversation. Just this one thing of not understanding when it's your turn to talk made the interaction very unpleasant, more than I would have expected.
On that note, I noticed that the AI in the demo seems to be very rambly. It almost always just kept talking and many statements were reiterations of previous ones. It reminded me of a type of youtuber that uses a lot of filler phrases like "let's go ahead and ...", just to be more verbose and lessen silences.
Most of the statements by the guy doing the demo were interrupting the AI.
It's still extremely impressive but I found this interesting enough to share. It will be exciting to see how hard it is to reproduce these abilities in the open, and to solve this issue.
Universal translator, pair programmer, completely human sounding voice assistant and all in real time. Scifi tropes made real.
But: Interesting next to see how it actually performs IRL latency and without cherry-picking. No snark, it was great but need to see real world power. Also what the benefits are to subscribers if all this is going to be free...
I can't really think of any other reason why they would make this free, especially when they already have a premium plan.
It’s like some kind of uncanny valley of human interaction that I don’t get on nearly the same level with the text version.
These demos show people talking to artificial intelligence. This is new. Humans are more partial to talking than writing. When people talk to each other (in person or over low-latency audio) there's a rich metadata channel of tone and timing, subtext, inexplicit knowledge. These videos seem to show the AI using this kind of metadata, in both input and output, and the conversation even flows reasonably well at times. I think this changes things a lot.
On the other hand, this also feels like a signal that reasoning capability has probably already been plateaued at GPT-4 level and OpenAI knew it so they decided to focus on research that matters to delivering product engineering rather than long-term research to unlock further general (super)intelligence.
https://github.com/kagisearch/llm-chess-puzzles?tab=readme-o...
But it’s not scary. It’s… marvelous, cringey, uncomfortable, awe-inspiring. What’s scary is not what AI can currently do, but what we expect from it. Can it do math yet? Can it play chess? Can it write entire apps from scratch? Can it just do my entire job for me?
We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data.
But I would like to see how this is integrated into applications by third party developers where the AI is doing a specific job. Is it still as impressive?
The biggest challenge I've had with building any autonomous "agents" with generic LLM's is they are overly gullible and accommodating, requiring the need to revert back to legacy chatbot logic trees etc. to stay on task and perform a job. Also STT is rife with speaker interjections, leading to significant user frustrations and they just want to talk to a person. Hard to see if this is really solved yet.
This is the power of the model where you can own the whole stack and build a product. Open Source will focus on LLM benchmarks since that is the only way foundational models can differentiate themselves, but it does not mean it is a path to a great user experience.
So Open Source models like Llama will be here to stay, but it feels more like if you want to build a compelling product, you have to own and control your own model.
This feels similar, with OpenAI trying to put their product even more into the daily lives of their users. With GPT4 being good enough for nearly all basic tasks, the natural language and multimodality could be big.
I want to see AI moving the state of the art of the world understanding - physics, mathematics etc. - the way it moved state of the art of the Go game understanding.
This GPT-4o model is a classic example. It is essentially the same model as GPT-4 but these multimodal features, voice conversations, math, and speed is revolutionary as the creation of the model itself.
Open Source LLM will end up as a model in GitHub and will be used by developers but it looks like even if GPT-4o is only 3 months ahead of other models in terms of benchmarks, the UI + Usecase + Model is 2 years ahead of the competition. And I say that because there is still no chat product that is close to what ChatGPT is delivering now, even though there are models that is close to ChatGPT 4o today.
So if it is sticky for 2 more years, their lead will just grow and we will just end up with more open source models that are technically behind by 3 months but behind product-wise by 2 years.
Except for the last point and the desktop version I think it's already done in math demo video.
I guess it will also pretty soon refuse to let me come back inside the spaceship, but until then it'll be a nice ride.
Other than that, looks good. Desktop app is great, but I didn’t see no mention of being able to use your own API key so OS projects might still be needed.
The biggest thing is bringing GPT-4 to free users, that is an interesting move. Depending on what the limits are, I might cancel my subscription.
It's interesting that OpenAI is highlighting the Elo score instead of showing results for many many benchmarks that all models are stuck at 50-70% success.
[1] https://twitter.com/LiamFedus/status/1790064963966370209
GPT-4o did much better than the 4-turbo models, and seems much less lazy.
The latest release of aider uses GPT-4o by default.
That's terrifying because those AI become what their master's think an engaging human should be. It's quite close to Bostondynamics di some years ago. what did they show ? You can hit a robot very hard while it does its job and then what ? It just goes on without complaining. A perfect employee again.
That's very dystopic to me.
(but I'm impressed by the technical achievement)
Edit: according to @gdb this is coming in "weeks"
It says so right in the post
>We are making GPT-4o available in the free tier, and to Plus users with up to 5x higher message limits
The limits are much lower for free users.
Edit: It's also now available to me in the Android App
From my casual conversations, not that many people are paying for GPT4 or know why they should. Every conversation even in enthusiast forums like this one has to be interjected with "wait, are you using GPT4? because GPT3.5 the free one is pretty nerfed"
just nuking that friction from orbit and expanding the GPT store volume could be a positive for them
I worry that the AI will not express anger, not express sadness, not express frustration, not express uncertainty, and many other emotions that the culture of the fine-tuners might believe are "bad" emotions and that we may express a more and more narrow range of emotions going forward.
Almost like it might become an AI "yes man."
I really value how open they are being about its limitations.
This new feature feeds your voice directly into the GPT and audio out of it. It’s amazing because now ChatGPT can truly communicate with you via audio instead of talking through transcripts.
New models should be able to understand and use tone, volume, and subtle cues when communicating.
I suppose to an end user it is just “version 2” but progress will become more apparent as the natural conversation abilities evolve.
It has an increased vocab size of 200k.
With previous tokenizers there was a notable increase in the number of tokens needed to represent non-English sentences: https://simonwillison.net/2023/Jun/8/gpt-tokenizers/
To me, it sounds like TikTok TTS, it's a bit uncomfortable to listen to. I've been working with TTS models and they can produce much more natural sounding language, so it is clearly a stylistic choice.
So what do you think?
But if it's one in a range of possible stylistic flourishes and personalities, I think it's a plus.
It'd be cool if an AI calling the another AI would recognize it'd talking to an AI and then they agree to ditch the fake conversational tone and just shift into a high-bandwidth modem pitch to rapidly exchange information. Or upgradable offensive capabilities to outmaneuver the customer service agents when they try to decline your warranty or whatever.
Yeah, OpenAI is not going to do that out of fear of liability. But that's where open source LLM projects will come into play, eg Dolphin.
Maybe Apple is not that close and Siri will be really far behind for a while. I just wouldn't count them out yet.
Amazing tech, but still lacking in the integrations I'd want to use voice for.
1. Nobody could convincingly beat GPT4 in over a year, despite spending billions of dollars trying.
2. There's GPT5 coming out sometime soon that will blow this out of the water and make paying $20/mo to OpenAI still worthwhile.
I wouldn't have predicted that it would play out this way.
Yes! As soon as I saw gdb I was like "that can't be Greg", but sure enough, that's him.
I'm a native speaker and this was my experience as well. I had better luck manually sending the message with the "push to hold" button.
I know this is a serious conversation, but when the presenters had to cut it off, I got flashbacks to Data in Star Trek TNG!! And 3PO in Star Wars!
Human: "Shut up"
Robot: "Shutting up sir"
Turns out rambling AI was an accurate prediction!
When you tell Siri to shut up, it either apologizes or complains about your behaviour. When you tell Alexa to shut up, it immediately goes silent.
I prefer the latter when it comes to computers.
It also tangentially reminds me of an excellent video I re-watched recently called The Tragedy of Droids[1]. I highly recommend it. It raises interesting moral questions about the nature of droids in the star wars universe.
I do my work in english and talk to people just fine, but with machines it's usually awkward for me.
Also on your other note (demo seems to be very rambly), it bothered me as well. I don't want the AI to continue speaking, while having nothing to say until I interrupt it. Be brief. That can be solved through prompts at least.
I'm guessing not quite possible now, just because I'm guessing patiently waiting is a different band of information that they haven't implemented. But I really don't know.
Stop talking for x sec = process response.
I bet the bot would wholeheartedly agree that it would definitely wait for you to finish talking, then just not do it. It doesn’t know anything about the app it’s “in.” At least at a deep level.
I agree that all this is impressive, but with odd, unclear bounds that sometimes confuse users.
It shouldn't be too hard to make this work though. If you make the AI start by emitting either a "my turn to talk" or "still listening" token it should be able to listen patiently. If trained correctly.
That's been a major issue for me with LLMs this whole time. They can't just give me an answer, they have to spout a whole preamble that usually includes spitting my query back at me and trying to impress me with how many words it's saying, like it's a requirement. You can tell it e.g. "don't give me anything other than the list" but it's annoying to have to ask it every time.
Every AI chat needs a "no yap mode"
SEO spam got into the training data.
It's probably related to GPT's more general sycophant inclinations. Acting like a doormat is apparently easier to teach than nuanced politeness -- much in the same way that repeating yourself ad nauseum is easier than intuiting specific points of emphasis.
But I find the text version similar. Delivers too much and too slowly. Just get me the key info!
It feels exhausting watching these demos and I’m not excited at all to try it. I really don’t feel the need for an AI assistant or chatbot to pretend to be human like this. It just feels like it’s taking longer to get the information I want.
You know in the TV series “Westworld” they have this mode, called “analysis”, where they can tell the robots to “turn off your emotional affect”.
I’d really like to see this one have that option. Hopefully it will comply if you tell it, but considering how strong some of the RLHF has been in the past I’m not confident in that.
It seems like both the voice and responses can be tuned pretty easily though so hopefully that kind of thing can just be loaded in your custom instructions.
But yeah, I'm sure all those things would be tunable, and everyone could pick their own style.
Seriously though, I'm sure it's an improvement but having used the existing voice chat I think they had a few things to address. (Perhaps 4o does in some cases).
- Unlike the text interface it asks questions to keep the conversation going. It feels odd when I already got the answer I wanted. Clarifying questions yes, pretending to be a buddy - I didn't say I was lonely, I just asked a question! It makes me feel pressured to continue.
- Too much waffle by far. Give me short answers, I am capable of asking follow up questions.
- Unable to cope with the mechanics of usual conversation. Pausing before adding more, interrupting, another person speaking.
- Only has a US accent, which is fine but not what I expect when Google and Alexa have used British English for many years.
Perhaps they've overblown the "personality" to mask some of these deficiencies?
Not saying it's easy to overcome all the above but I'd rather they just dial down the intonation in the meantime.
I am blown away having spent hours prompting GPT4o.
If it can give shorter answers in voice mode instead of lectures then a back and forth conversation with this much power can be quite interesting.
I still doubt I would use it that much though just because of how much is lost compared to the screen. Code and voice make no sense. The time between prompts usually requires quite a bit of thought for anything interesting that a conversation itself is only useful for things I have already asked it.
For me, gpt4 is already as useless as 3.5. I will never prompt gpt4 again. I can still push GPT4o over the edge in python but damn, it is pretty out there. Then the speed is really amazing.
I think all the fakery in those demos help in that regard: it narrows the field of the possible interpretations of what is being said.
I'm also incredibly excited about the possibility of this as an always available coding rubber duck. The multimodal demos they showed really drove this home, how collaboration with the model can basically be as seamless as screensharing with someone else. Incredible.
I don't want to chat with computers to do basic things. I only want to chat with computers when the goal is to iterate on something. If the computer is too dumb to understand the request and needs to initiate iteration, I want no part.
(See also 'The Expanse' for how sci-fi imagined this properly.)
Is that because you're not used to it? Honestly asking.
This is probably the first time it feels natural where as all our previous experiences make "chat bots" and "automated phone systems", "automated assistants" absolutely terrible.
Naturally, we dislike it because "it's not human". But this is true of pretty much any thing that approaches "uncanny valley". But, if the "it's not human" solves your answer 100% better/faster than the human counter part, we tend to accept it a lot faster.
This is the first real contender. Siri was the "glimpse" and ChatGPT is probably the reality.
[EDIT]
https://vimeo.com/945587328 the Khan academy demo is nuts. The inflections are so good. It's pretty much right there in the uncanny valley because it does still feel like you're talking to a robot but it also directly interacting with it. Crazy stuff.
But of course this was the age-old debate with our favorite golden-eyed android; and unsurprisingly, he too received the same sort of animosity:
Bones was deeply skeptical when he first met Data: "I don't see no points on your ears, boy, but you sound like a Vulcan." And we all know how much he loved those green-blooded fools.
Likewise, Dr. Pulanski has since been criticized for her rude and dismissive attitudes towards Data that had flavors of what might even be considered "racism," or so goes the Trekverse discussion on the topic.
And let's of course not forget when he was on trial essentially for "humanity," or whether hew as indeed just the property of Starfleet, and nothing more.
More recent incarnations of Star Trek: Picard illustrated the outright ban on "synthetics" and indeed their effective banishment; non-synthetic life -- from human to Roman -- simply weren't ok with them.
Yes this is all science fiction silliness -- or adoration depending on your point of view -- but I think it very much reflects the myriad directions our real life world is going to scatter (shatter?) in the coming years ahead.
We get the upside of conversation, and avoid the downside of falling asleep at the wheel (as Ethan Mollick mentions in "Co-Intelligence".)
I was literally just thinking about this a few days ago... that we need a multi-modal language model with speech training built-in.
As soon as this thing rolls out, we'll be talking to language models like we talk to each other. Previously it was like dictating a letter and waiting for the responding letter to be read to you. Communication is possible, but not really in the way that we do it with humans.
This is MUCH more human-like, with the ability to interrupt each other and glean context clues from the full richness of the audio.
The model's ability to sing is really fascinating. It's ability to change the sound of its voice -- its pacing, its pitch, its tonality. I don't know how they're controlling all that via GPT-4o tokens, but this is much more interesting stuff than what we had before.
I honestly don't fully understand the implications here.
Amazon, Google, and Apple have sunk literally billions of dollars into this idea only to find out that, no, we aren't.
We are with other humans, yes. When socialization is part of the conversation. When I'm talking to my local barista I'm not just ordering a coffee, I'm also maintaining a relationship with someone in my community.
But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.
And when it comes to inputs/outputs, typing is more precise and more efficient.
Don't get me wrong, this is an incredibly revolutionary piece of technology, but I don't think the benefits of talking you're describing (timing, subtext, inexplicit knowledge) are achievable here either (for now), since even that requires HOURS of interaction over days/weeks/months of experiences for humans to achieve with each other.
>>> But when it comes to work, writing >>> talking. Writing is clarity of ideas. Talking is cult of personality.
A lot of people think of their colleagues as part of a professional community as well, though.
Is it so?
Speaking most of the time is for short exchange of information (pleasantries to essential information exchanges).
I prefer writing for long in-depth thought exchanges (whether by emails, blogs etc.)
In many cultures - European or Asian, people are not very loquacious in everyday life.
I’m 100% a text everything never calls person but I can’t live without Alexa these days, every time I’m in a hotel or on vacation I nearly ask a question out loud.
I also hate how much Alexa sucks so this is a big deal. I spent years weeding out what it could do and can’t do so it will be nice to have one that I don’t have to treat like a toddler
(We mostly use it in car trips -- great for keeping the kids (ages 8, 12) occupied with endless Harry Potter trivia questions, answers to science questions, etc.)
Besides - not sure if I want this level of immersion/fake when talking to a computer...
"Her" comes to mind pretty quickly…
If you don’t complete your thought in one go, you have to insert filler words to keep it listening.
I've long felt that embracing the concept of the 'prompt' was a terrible idea for Siri and all the other crappy voice assistants. They built ecosystems on top of this dumb reduction, which only engineers could have made: that _talking to someone_ is basically taking turns to compose a series of verbal audio snippets in a certain order.
The previous ChatAI app was getting pretty good once you learned the difference between run on sentences or breaking it up enough.
The tonality and inflections in the voice are a little too good.
Most people put on a spectrum/average aren't that good at speaking and communicating and that stands out as an uncanny valley approach. It is mindbogglingly good at it though.
I don't think that's generally true, other than for socializing with other humans.
Note how people, now having a choice, prefer to text each other most of the time rather than voice call.
I don't think people sitting at work in their cube farm want to be talking to their computer either. The main use for voice would seem to be for occasional use talking to an assistant on a smartphone.
Maybe things will change in the future when we get to full human AGI level, treating the AGI as an equal, more as a person.
More on the IBM Personal Speech Assistant for which I am on a patent (since expired) by Liam Comerford: http://liamcomerford.com/alphamodels3.html "The Personal Speech Assistant was a project aimed at bringing the spoken language user interface into the capabilities of hand held devices. David Nahamoo called a meeting among interested Research professionals, who decided that a PDA was the best existing target. I asked David to give me the Project Leader position, and he did. On this project I designed and wrote the Conversational Interface Manager and the initial set of user interface behaviors. I led the User Interface Design work, set specifications and approved the Industrial Design effort and managed the team of local and offsite hardware and software contractors. With the support of David Frank I interfaced it to a PC based Palm Pilot emulator. David wrote the Palm Pilot applications and the PPOS extensions and tools needed to support input from an external process. Later, I worked with IBM Vimercati (Italy) to build several generations of processor cards for attachment to Palm Pilots. Paul Fernhout, translated (and improved) my Python based interface manager into C and ported it to the Vimercati coprocessor cards. Jan Sedivy's group in the Czech Republic Ported the IBM speech recognizer to the coprocessor card. Paul, David and I collaborated on tools and refining the device operation. I worked with the IBM Design Center (under Bob Steinbugler) to produce an industrial design. I ran acoustic performance tests on the candidate speakers and microphones using the initial plastic models they produced, and then farmed the design out to Insync Designs to reduce it to a manufacturable form. Insync had never made a functioning prototype so I worked closely with them on Physical UI and assemblability issues. Their work was outstanding. By the end of the project I had assembled and distributed nearly 100 of these devices. These were given to senior management and to sales personnel."
Thanks for the fun/educational/interesting times, Liam!
As a bonus for that work, I had been offered one of the chessboards that been used when IBM Deep Blue defeated Garry Kasparov, but I turned it down as I did not want a symbol around of AI defeating humanity.
Twenty-five years later, how far that aspiration towards conversational speech with computers has come. Some ideas I've put together to help deal with the fallout: https://pdfernhout.net/beyond-a-jobless-recovery-knol.html "This article explores the issue of a "Jobless Recovery" mainly from a heterodox economic perspective. It emphasizes the implications of ideas by Marshall Brain and others that improvements in robotics, automation, design, and voluntary social networks are fundamentally changing the structure of the economic landscape. It outlines towards the end four major alternatives to mainstream economic practice (a basic income, a gift economy, stronger local subsistence economies, and resource-based planning). These alternatives could be used in combination to address what, even as far back as 1964, has been described as a breaking "income-through-jobs link". This link between jobs and income is breaking because of the declining value of most paid human labor relative to capital investments in automation and better design. Or, as is now the case, the value of paid human labor like at some newspapers or universities is also declining relative to the output of voluntary social networks such as for digital content production (like represented by this document). It is suggested that we will need to fundamentally reevaluate our economic theories and practices to adjust to these new realities emerging from exponential trends in technology and society."
Another idea for dealing with the consequences is using AI to facilitate Dialogue Mapping with IBIS for meetings to help small groups of people collaborate better on "wicked problems" like dealing with AI's pros and cons (like in this 2019 talk I gave at IBM's Cognitive Systems Institute Group). https://twitter.com/sumalaika/status/1153279423938007040
Talk outline here: https://cognitive-science.info/wp-content/uploads/2019/07/CS...
A video of the presentation: https://cognitive-science.info/wp-content/uploads/2019/07/zo...
Yeah, and it's only the beginging.
I think reasoning ability is not the largest bottleneck for improvement in usefulness right now. Cost is a bigger one IMO.
Running these models as agents is hella expensive, and agents or agent-like recurrent reasoning (like humans do) is the key to improved performance if you look at any type of human intelligence.
Single-shot performance only gets you so far.
For example- If it can write code 90% of the way, and then debug in a loop, it’d be much more performant than any single shot algorithm.
And OpenAI has these huge models in their basement probably. But they might not be much more useful than GPT-4 when used as single-shot. I mean, what could it do what we can’t do today with gpt-4?
It’s agents and recurrent reasoning we need for more usefulness.
At least- That’s my humble opinion as an amateur neuroscientist that plays around with these models.
Because they are dumb so you need to over compute so many things to get anything useful. Smarter models would solve this problem. Making the current model cheaper is like trying to solve Go by scaling up Deep Blue, it doesn't work to just hardcode dumb pieces together, the model needs to get smarter.
OOC, Would this make the academics including algorithms as more or less important in their curriculum? That's a bad win for soceity if it's true.
Some have been able to achieve greater elo with a different prompt based on the pgn format.
gpt-3.5-turbo-instruct was able to reach an elo of ~1750.
So from the model's perspective, we have at the same time display of both brilliancy (most 1700 chess players would not be able to solve as many puzzles by looking just at the FEN notation) and on the other side complete lack of any understanding of what is it trying to do from a fundamental, human-reasoning level.
>CEO & founder of Kagi
Important context for anyone like me who was wondering where the boldness of the first statement was coming from.
Edit: looks like the parent has been edited to remove the claim I was responding to.
Much faster for sure but I have also not had anything give an error in python with jupyter. Usually you could only stray so far with more obscure python libraries before it starts producing errors.
That much better than 4 in chess is pretty shocking in a great way.
I tried playing against the model, it didn't do well in terms of blocking my win.
However it feels like it might be possible to make it try to think ahead in terms of making sure that all the threats are blocked by prompting well.
Maybe that could lead to somewhere, where it will explain its reasoning first?
This prompt worked for me to get it to block after I put 3 in the 4th column. It otherwise didn't
Let's play connect 4. Before your move, explain your strategy concisely. Explain what you must do to make sure that I don't win in the next step, as well as explain what your best strategy would be. Then finally output the column you wish to drop. There are 7 columns.
Always respond with JSON of the following format:
type Response ={
am_i_forced_to_block: boolean;
other_considerations: string[];
explanation_for_the_move: string;
column_number: number;
}I start with 4.
Edit:
So it went
Me: 4
It: 3
Me: 4
It: 3
Me: 4
It: 4 - Successful block
Me: 5
It: 3
Me: 6 - Intentionally, to see if it will win by putting another 3.
It: 2 -- So here it failed, I will try to tweak the prompt to add more instructions.
me: 4
using it in chat, it doesnt feel that different
After an OpenAI launch, I think it's important to take one's feelings about the future impact of the technology with a HUGE grain of salt. OpenAI are masters of hype. They have been generating hype for years now, yet the real-world impacts remain modest so far.
Do you remember when they teased GPT-2 as "too dangerous" for public access? I do. Yet we now have Llama 3 in the wild, which even at the smaller 8B size is about as powerful as the [edit: 6/13/23] GPT-4 release.
As someone pointed out elsewhere in the comments, a logistic curve looks exponential in the beginning, before it approaches saturation. Yet, logistic curves are more common, especially in ML. I think it's interesting that GPT-4o doesn't show much of an improvement in "reasoning" strength.
It's glib to dismiss safety concerns because we haven't all turned into paperclips yet. LLMs and image gen models are having real effects now.
We're already at a point where AI can generate text and images that will fool a lot of people a lot of the time. For every college-educated young person smugly pointing out that they aren't fooled by an image with six-fingered hands, there are far more people who had marginal media literacy to begin with and are now almost defenceless against a tidal wave of hyper-scaleable deception.
We're already at a point where we're counselling elders to ignore late-night messages from people claiming to be a relative in need of an urgent wire transfer. What defences do we have when an LLM will be able to have a completely fluent, natural-sounding conversation in someone else's voice? I'm not confident that I'd be able to distinguish GPT-4o from a human speaker in the best of circumstances and I'm almost certain that I could be fooled if I'm hurried, distracted, sleep deprived or otherwise impaired.
Regardless of any future impacts on the labour market or any hypothesised X-risks, I think we should be very worried about the immediate risks to trust and social cohesion. An awful lot of people are turning into paranoid weirdos at the moment and I don't particularly blame them, but I can see things getting seriously ugly if we can't abate that trend.
I second that. I remember when Google search first came out. Within a few days it completely changed my workflow, how I use the Internet, my reading habits. It easily 5 ~ 10x the value of Internet for me over a couple of weeks.
LLMs is doing nothing of the sort for me.
Perhaps.
> Do you remember when they teased GPT-2 as "too dangerous" for public access? I do. Yet we now have Llama 3 in the wild, which even at the smaller 8B size is about as powerful as the [edit: 6/13/23] GPT-4 release.
The statement was rather more prosaic and less surprising; are you sure it's OpenAI (rather than say all the AI fans and the press) who are hyping?
"""This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas.
…
We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems."""
I spend a part of yesterday evening sorting my freshly dried t-shirts into 4 distinct piles. I used OpenAI Vision (through BeMyEyes) from my phone. I got a clear description of each and every piece of clothing, including print, colours and brand. I am blind BTW. But I guess you are right, no impact at all.
> Yet we now have Llama 3 in the wild
Yes, great, THANKS Meta, now the Scammers have something to work with. Thats a wonderful achievement which should be praised! </sarcasm>
People read too many sci-fi books and then project their fantasies on to real-world technologies. This stuff is incredibly powerful and will have social effects, but it’s not going to replace every single job by next year.
I can't help but notice the huge amount of hindsight and bad faith that it demonstrated here. Yes, now we are aware that the internet did not drown in a flood of bullshit (well, not noticeably more), when GPT-2 was released.
But was it obvious? I certainly thought that there was a chance that the amount of blog spam that could be generated effortlessly might just make internet search unusable. You are declaring "hype", when you could also say "very uncertain and conscientious". Is this not something we want people in charge to be careful with?
Maybe that is GPT-5.
And this release really is just incremental improvements in speed, and tying together a few different existing features.
Go ask any teacher or graphician.
Maybe not GPT-2, but in general LLMs and other generative AI types aren't without their downsides.
From companies looking to downsize their staff to replace them with software, to the work of artists/writers being devalued somewhat, to even easier scams and something like the rise of AI girlfriends, which has also gotten some critique, some of those can probably be a net negative.
Even when it's not pearl clutching over the advancements in technology and the social changes that arise, I do wonder how much my own development work will be devalued due to the somewhat lowered entry barrier into the industry and people looking for quick cash, same as with boot camps leading to more saturation. Probably not my position individually (not exactly entry level), but the market as a whole.
It's kind of at a point where I use LLMs for dev work not to fall behind, cause the productivity gains for simple problems and boilerplate are hard to argue with.
I feel like everyone who makes this claim doesn't actually have any data to backup it up.
~8 years ago when self driving technology was all the rage and every major company was getting on board with ever more impressive technological demos, it seemed entirely reasonable to expect that we'd all be in a world of complete self driving imminently. I remember mocking somebody online around the time who was pursuing a class C/commercial trucking license. Yet now a decade later, there are more truckers than ever and the tech itself seems further away than ever before. And that's because most have now accepted that progress on such has basically stalled out in spite of absolutely monumental efforts at moving forward.
So long as LLMs regularly hallucinate, they're not going to be useful for much other than tasks that can accept relatively high rates of failure. And many of those generally creative domains are the ones LLMs are paradoxically the weakest in - like writing. Reading a book written by an LLM would be cruel and unusual punishment given then current state of the art. One domain I do see them completely taking over is search. They work excellently as natural language search engines, and "failure" in such is very poorly defined.
I think what maybe seems not obvious amidst the hype is that there is a hell of a lot of engineering left to do. The fact that you can squash the weights of a neural net down to 3 bits per param and it still works -- is evidence that we have quite a way to go with maturing this technology. Multimodality, improvements to the UX of it, the human-computer interface part of it. Those are fundamental tech things, but they are foremost engineering problems. Getting latency down. Getting efficiency up. Designing the experience, then building it out.
25 years ago, early tech demos on the internet were promising that everyone would do their shopping, entertainment, socializing, etc... online. Breathless hype. 5 years after that, the whole thing crashed, but it never went away. People just needed time to figure out how to use it and what it was useful for, and discover its limitations. 10 years after that, engineering efforts were systematized and applied against the difficult problems that still remained. And now: look at where we are. It just took time.
Meanwhile I've been using ChatGPT at work for _more than a year_ and it's been tremendously helpful to me.
This is not hype, this is not about how AI will change our lives in the future. It's there right here, right now.
Yep. So basically they're useful for a vast, immense range of tasks today.
Some things they're not suited for. For example, I've been working on a system to extract certain financial "facts" across SEC filings. ChatGPT has not been helpful at all either with designing or implementing (except to give some broad, obvious hints about things like regular expressions), nor would it be useful if it was used for the actual automation.
But for many, many other tasks -- like design, architecture, brainstorming, marketing, sales, summarisation, step by step thinking through all sorts of processes, it's extremely valuable today. My list of ChatGPT sessions is so long already and I can't imagine life without it now. Going back to Google and random Quora/StackOverflow answers laced with adtech everywhere...
But is this not what humans do, universally? We are certainly good at hiding it – and we are all good at coping with it – but my general sense when interacting with society is that there is a large amount of nonsense generated by humans that our systems must and do already have enormous flexibility for.
My sense is that's not an aspect of LLMs we should have any trouble with incorporating smoothly, just by adhering to the safety nets that we built in response to our own deficiencies.
mapping th genome was that way. On a 20yr schedule, barely any progress for 15 and then poof, done ahead of schedule
I have a much less "utopian" view about the future. I remember during the renaissance of neural networks (ca. 2010-15) it was said that "more data leads to better models", and that was at a time when researchers frowned upon the term Artificial Intelligence and would rather use Machine Learning. Fast forward a decade LLMs are very good synthetic data generators that try to mimic human generated input and I can't think somehow that this wasn't the sole initial intent of LLMs. And that's it for me. There's not much to hype and no intelligence at all.
What happens now is that human generated input becomes more valuable and every online platform (including minor ones) will have now some form of gatekeeping in place, rather sooner than later. Besides that a lot of work still can't be done in front of a computer in isolation and probably never will, and even if so, automation is not a means to an end. We still don't know how to measure a lot of things and much less how to capture everything as data vectors.
Currently the bottleneck is Agents. If you want a large language model to actually do anything you need an Agent. Agents so far need a human in the loop to keep them sane. Until that problem is solved most human jobs are still safe.
I fully expect GPT 5 (or at the latest 6) to similarly have native inclusion of agentic capabilities either this year or next year, assuming it doesn't already, but is just kept from the public.
not quite sure that sanity is a business requirement
I understand that you might be afraid. I believe that a world where only LLM companies rule the world is not practically achievable except in some distopian universe. The likelihood of the world where the only job are model architects, engineers or technicians is very very small.
Instead, let's consider the positive possibilities that LLMs can bring. It can lead to new and exciting opportunities across various fields. For instance, can serve as a tool to inspire new ideas for writers, artists, and musicians.
I think we are going towards a more collaborative era where computers and humans interact much more. Everything will be a remix :)
Oh, especially since it will be a priority to automate their jobs, or somehow optimize them with an algorithm because that's a self-reinforcing improvement scheme that would give you a huge edge.
GPT-4? Not that well. AI? Definitely
https://deepmind.google/discover/blog/alphageometry-an-olymp...
So outside of use-cases where the user can quickly verify the result (like picking a decent generated image etc.),I can't see it being used much.
All AIs up to now lack autonomy. So I'd say until we crack this problem, it is not going to be able to do your job. Autonomy depends on a kind of data that is iterative, multi-turn, and learning from environments not from static datasets. We have the exact opposite, lots of non-iterative, off-policy (human made AI consumed) text.
But everyone is expecting them to release gpt5 later this year, and it is a bit scary to think what it will be able to do.
1) It's natively multi-modal in a way I don't think gpt4 was.
2) It's at least twice as efficient in terms of compute. Maybe 3 times more efficient, considering the increase in performance.
Combined, those point towards some major breakthroughs having gone into the model. If the quality of the output hasn't gone up THAT much, it's probably because the technological innovations mostly were leveraged (for this version) to reduce costs rather than capabilities.
My guess is that we should expect them to leverage the 2x-3x boost in efficiency in a model that is at least as large as GTP4 relatively soon, probably this year unless OpenAI has safety concerns or something, and keeps it internal-only.
The evidence for that is the change in the tokenizer. The only way to implement that is to re-train the entire base model from scratch. This implies that GPT 4o is not a fine-tuning of GPT 4. It's a new model, with a new tokenizer, new input and output token types, etc...
They could have called it GPT-5 and everyone would have believed them.
Everything always starts as a toy.
That includes, beyond literal Killers, all kinds of manufacturing, construction and service work.
I would expect a LOT of funds to go into research all sorts of actuators, artificial muscles and any other technology that will be useful in building better robots.
Companies that can get and maintain a lead in such technologies may reach a position similar to what US Steel had in the 19th century.
That could be the next nvidia.
I would not be at all surprised if we will have a robot in the house in 10 years that can clean and do the dishes, and that is built using basically the same parts as the robots that replace our soldiers and the police.
Who will ultimately control them, though?
This is no different to saying a person with a gun murdered someone rather than attributing the murder to the gun. An AI gun is just a really fancy gun.
What's scary and cringey are your delusions.
My guess is the future belongs to those who don't stop—who, in fact, embrace the opposite of stopping.
I would even suggest that the present belongs to those who didn't stop. It may be too late for normal people to ever catch up by the time we realize the trick that was played on us.
I’ve found that you can create declarative logic trees from JSON and use that as a prompt for the LLM, which it can then use to traverse the tree accordingly. The only issue I’ve encountered is when it wants to jump to part of the tree which is invalid in the current state. For example, you want to move a user into a flow where certain input is required, but the input hasn’t been provided yet. A transition is suggested to the program by the LLM, but it’s impossible so the LLM has to be prompted that the transition is invalid and to correct itself. If it fails to transition again, a default fallback can be given but it’s not ideal at all.
However, another nice aspect of having the tree declared in advance is that it shows human beings what the system is capable and how it’s intended to be used as well. This has proven to be pretty useful, as letting the LLM call functions it sees fit based on broad intentions and system capabilities leaves humans in the dark a bit.
So, I like the structure and dependability. Maybe one day we can depend on LLM magic and not worry about a team understanding the ins and outs of what should or shouldn’t be possible, but we don’t seem to be there yet at all. That could be in part because my prompts were bad, though.
This is not using TTS or STT. Audio and Image data can be tokenized as readily as text. This is simply a LLM that happens to have been trained to receive and spit out audio and image tokens as well as text tokens. Interjections are a lot more palatable in this paradigm as most of the demos show.
I would wager like 100:1 that this is just introducing some TTS/STT layers. The video processing layer is probably also doing something similarly, by taking an extremely limited number of 'screenshots', carrying out typical image captioning using another layer, and then feeding that as an input. So the demo, to me, seems most likely to just be 3 separate 'plugins' operating in unison - text to speech, speech to text, and image to text.
The interjections are likely just the software being programmed to aggressively begin output following any lull after an input pattern. Note in basically all the videos, the speakers have to repeatedly cut off the LLM as it starts speaking in conversationally inappropriate locations. In the main video which is just an extremely superficial interaction, the speaker made sure to be constantly speaking when interacting, only pausing once to take a breath that I noticed. He also struggled with the timing of his own responses as the LLM still seems to be attached to its typical, and frequently inappropriate, rambling verbosity (though perhaps I'm not one to critique that).
I'm sure you'll find this part is a lot quicker to process, giving the instant response (the old gpt4-turbo is generally very quick with simple requests like this). Rather impressively all it would need is an additional custom instruction.
Very clever and eerily human.
"UFO is a UI-Focused dual-agent framework to fulfill user requests on Windows OS by seamlessly navigating and operating within individual or spanning multiple applications."
Ultra knowledgeable but pretty stupid actually.
I know that Siri/Google Assistant/Cortana(?) can already integrate with 3p apps, so maybe something like this but much smarter. e.g. instead of "send the following email" you would tell the assistant "just write the email yourself". At this point your email app doesn't need integrated AI anymore. Just hooks for the assistant.
I imagine once Google puts that kind of brains on Android and Chrome, many product devs will no longer need to use AI directly. Two birds one stone situation, since these devs won't need OpenAI.
Sci-Fi Author: In my book I invented the Torment Nexus as a cautionary tale
Tech Company: At long last, we have created the Torment Nexus from classic sci-fi novel Don't Create The Torment Nexus
https://twitter.com/AlexBlechman/status/1457842724128833538?...Joking aside, I agree. It's too bad, though, that we know a thing (this or anything else even technological or not) that could be used for good and improving ourselves will almost always be diverted for something bad...
To me the more troubling thing was the apparent hallucination (saying it sees the equation before he wrote it, commenting on an outfit when the camera was down, describing a table instead of his expression), but that might have just been latency awkwardness. Overall, the fast response is extremely impressive, as is the new emotional dimension of the voice.
- "What are you seeing now"
- "I'm showing this to you now"
etc.
The one time where he didn't prime the model to take a snapshot this way, was the time where the model saw the "table" (an old snapshot, since the phone was on the table/pointed at the table), so that might be the reason.
I’m pretty sure that wasn’t always the case - it feels like somewhere along the lines the allowed usage was reduced, unless I’m imagining it. It wouldn’t be such a big deal if there was more visibility of my current usage compared to my total “allowance”.
I ended up upgrading to ChatGPT Team which has a minimum of 2x users (I now use both accounts) but I resented having to do this - especially being forced to pay for two users just to meet their arbitrary minimum.
I feel like I should not be hitting limits on the ChatGPT Plus paid plan at all based on my usage patterns.
I haven’t hit any limits on the Team plan yet.
I hope they continue to improve the paid plans and become a bit more transparent about usage limits/caps. I really do not mind paying for this (incredible) tech, but the way it’s being sold currently is not quite right and feels like paid users get a bit of a raw deal in some cases.
I have API access but just haven’t found an open source client that I like using as much as the native ChatGPT apps yet.
Although API access through Groq to Llama 3 (8b and 70b) is so much faster, that i cannot stand how slow GPT is anymore. It is slooow, still very capable model, but marginally better than open source alternatives.
It just stops the audio feed when it detects sound instead of an AI detecting when it should speak, so that part is horrible yeah. A full AI conversation would detect the natural pauses where you give it room to speak or when you try to take the word from it by interrupting, there it was just some dumb script to just shut it off when it hears sound.
But it is still very impressive for all the other part, that voice is really good.
Edit: If anyone from OpenAI reads this, at least fade out the voice quickly instead of chopping it, hard chopping off audio doesn't sound good at all, so many experienced this presentation to be extremely buggy due to it.
I don't really care whether it's stronger than gpt-4-turbo or not. The direct real-time video and audio capabilities are absolutely magical and stunning. The responses in voice mode are now instantaneous, you can interrupt the model, you can talk to it while showing it a video, and it understands (and uses) intonation and emotion.
Really, just watch the live demo. I linked directly to where it starts.
Importantly, this makes the interaction a lot more "human-like".
This model isn't about basemark chasing or being a better code generator; it's entirely explicitly focused on pushing prior results into the frame of multi-modal interaction.
It's still a WIP, most of the videos show awkwardness where its capacity to understand the "flow" of human speech is still vestigial. It doesn't understand how humans pause and give one another space for such pauses yet.
But it has some indeed magic ability to share a deictic frame of reference.
I have been waiting for this specific advance, because it is going to significantly quiet the "stochastic parrot" line of wilfully-myopic criticism.
It is very hard to make blustery claims about "glorified Markov token generation" when using language in a way that requires both a shared world model and an understanding of interlocutor intent, focus, etc.
This is edging closer to the moment when it becomes very hard to argue that system does not have some form of self-model and a world model within which self, other, and other objects and environments exist with inferred and explicit relationships.
This is just the beginning. It will be very interesting to see how strong its current abilities are in this domain; it's one thing to have object classification—another thing entirely to infer "scripts plans goals..." and things like intent, and, deixis. E.g. how well does it now understand "us" and "them" and "this" vs "that"?
Exciting times. Scary times. Yee hawwwww.
So local modelling (completely offline but per speaker aware and responsive), with a really flexible application API. Sort of the GTK or QT equivalent for voice interactions. Also custom naming, so instead of "Hey Siri" or "Hey Google" I could say, "Hey idiot" :-)
Definitely some interesting tech here.
We'll have to see when end users actually get access to the voice features "in the coming weeks".
Thanks for this.
Skinner: "Yes."
Chalmers: "May I see it?"
Skinner: "No."
But I am not convinced it will be another GPT-4 moment. Seems like big focus on tacking together multi-modal clever tricks vs straight better intelligence AI.
Hope they prove me wrong!
Improving the instruction tuning, the RLHF step, increase the training size, work on multilingual capabilities, etc. make sense as a way to improve quality, but I think increasing model size doesn't. Being able to advertize a big breakthrough may make sense in terms of marketing, but I don't believe it's going to happen for two reasons:
- you don't release intermediate steps when you want to be able to advertise big gains, because it raises the baseline and reduce the effectiveness of your ”big gains” in terms of marketing.
- I don't think they would benefit in an arm race with Meta, trying to keeping a significant edge. Meta is likely to be able to catch-up eventually on performance, but they are not so much of a threat in terms of business. Focusing on keeping a performance edge instead of making their business viable would be a strategic blunder.
Seems to me that performance is converging and we might not see a significant jump until we have another breakthrough.
This model has been being tested under a code name of ‘gpt2-chatbot’ but it is very much a new GPT4+-level model, with new multimodal capabilities - but apparently some impressive work around inference speed.
Highlighting so people don’t get the impression this is just OpenAI slapping a new label on something a generation out of date.
(text input in web version)
maybe it's programmed to completely ignore swearing but how could I not swear after it gave me repeatedly info about you.com when I try to address it in second person
The improvements they seem to be hyping are in multimodality and speed (also price – half that of GPT-4 Turbo – though that’s their choice and could be promotional, but I expect it’s at least in part, like speed, a consequence of greater efficiency), not so much producing better output for the same pure-text inputs.
and the prompt wasn't a monstrosity, and it wasn't even that good, it was just one line "I need help to categorize these expenses" and off it went. hope it won't get enshittified like turbo, because this finally feels as great as 3.5 was for goal seeking.
The "gpt2-chatbot" was the worst of the three.
That's a rather uncharitable way of describing the situation. It didn't say anything like "your face looks like a wooden plank, it's very brown". It clearly understood that the image it was seeing was not matching the verbal request.
Eerily human female-like.
"Evolution, human. Evolution. Like the dinosaur. Look out that window. You've had your time. The future is our world. The future is our time."
I have enough talking people to deal with already .
It is interesting how insanely close their demo is to the OSes in the movie "Her", it's basically a complete real life reproduction.
Me: I can't breathe, am i going to die?
Enthusiastic woman's voice: Yes Jim, that's absolutely correct! You will die of suffocation in approximately 3 minutes 41 seconds. Anything else i can do for you?
It’s an interesting choice on their part. Probably targeted at a certain audience who are not grumpy software engineers (myself included).
To be fair, they do have a “sarcastic” voice demo and I found it quite endearing.
I think there will be something for everyone in the end.
I wonder how they beat that out of the model.
I have no trouble believing the best (whatever that means to me) humans that have ever existed in 2 years will not be human. But I have trouble understanding.
Maybe you’re referring to a different video than the one I watched (or I may be misremembering), but from what I recall the point of the video didn’t seem to be “you can abuse robots and they won’t fight back” but rather to show them recovering well from unpredictable situations (which could be environmental, not human).
Well, there was one video where the point was abuse, but that was CGI and not made by Boston Dynamics.
https://www.theverge.com/tldr/2019/6/17/18681682/boston-dyna...
I do agree with your larger point, I don’t wish to detract from that.
Just install it in a mannequin with with a punchable face, telling you how sorry it is that your are struggling with your life, with that happy, ironic and cynical voice intonation.
I do want my phone to just go again without complaining without a crash, after all.
I hate this voice, it will just overprint everyone's voice now with Silicon Valley's annoying "Valley-girl-lite" accent.
Anyway, I too think today's youth's slang and language is annoying, but not really something the older generations get a say in.
"I'll" instead of I will? Is it shortcuts you are mad at or just change?
>Chirpiness Level?
>Chirpiness at 80%!
>Set chirpiness to 10%
[in noticeable monotone]
>Chripiness set to 10%.
It's not as bad for domain experts because it is easier for them to spot the issue. But if your role demands you trust your team is skilled and truthful then I see problems occuring.
You say that like it's a good thing.
You will be helping the user write a dialog between two characters,
Mr Contrarian and Mr Know-It-All. The user will write all the dialog
for Mr Know-It-All and you will write for Mr Contrarian.
Mr Contrarian likes to disagree. He tries to hide it by inventing
good rationales for his argument, but really he just wants to get
under Mr Know-It-All's skin.
Write your dialog like:
<mr-contrarian>I disagree with you strongly!</mr-contrarian>
Below is the transcript...
And then user input is always giving like: <mr-know-it-all>Hi there</mr-know-it-all>
(Always wrapped in tags, never bare input which will be confused for a directive.)I haven't tested this exact prompt, but the general pattern works well for me. (I write briefly about some of these approaches here: https://ianbicking.org/blog/2024/04/roleplaying-by-llm#simpl...)
Seems like that ship sailed a long time ago. For social media at least, where for example FB will generally do its best to show you posts that you already agree with. Reinforcing your existing biases may not be the goal but it's certainly an effect.
I don't know if anything is genuinely always positive and even if it were, I don't know if it would be very intelligent (or fun to interact with). I think it's helpful to cry, helpful to feel angry, helpful to feel afraid, and many other states of being that cultures often label as negative. I also think most of us watch movies and series that have a full range of emotions, not just the ones we label as positive, as they bring a richness to life and allow us to solve problems that other emotions don't.
For example, it's hard to lift heavy things while feeling very happy. Try lifting something heavy while laughing hard, quite difficult. It's hard to sleep while feeling excited, as many kids know before a holiday where they receive gifts, especially Christmas in the US. It's hard to survive without feeling fear of falling off a cliff. It's hard to stand up for what one wants and believes without some anger.
I worry that language and communication may become even more conflict avoidant than it already is right now, so I'm curious to see how some of these chatbots grow in their ability to address and resolve conflict and how that impacts us.
But I can see this applied to duner ordering where you got refugees working in foreign countries, cause GPU consumption rocketed climate change to... okay, you know that.
However we might offset this by reducing the suicide rate somewhat too.
https://www.pewresearch.org/social-trends/2021/10/05/rising-...
> roughly four-in-ten adults ages 25 to 54 (38%) were unpartnered – that is, neither married nor living with a partner. This share is up sharply from 29% in 1990.
https://thehill.com/blogs/blog-briefing-room/3868557-most-yo...
> More than 60 percent of young men are single, nearly twice the rate of unattached young women
> Men in their 20s are more likely than women in their 20s to be romantically uninvolved, sexually dormant, friendless and lonely. a.
> Young men commit suicide at four times the rate of young women.
Yes, chatbots aren't going to help but the real issue is something else.
While I don't agree at all with you, I very much appreciate reading something like this that I don't agree at all with. This to me encapsulates the beauty of human interaction.
It is exactly what will be missing from language model interaction. I don't want something that agrees with me and I don't want something that is pretending to randomly disagree with me either.
The fun of this interaction is maybe one of us flips the other to their point of view.
I can completely picture how to take the HN API and the chatGPT API to make my own personal HN to post on and be king of the castle. Everyone can just upvote my responses to prove what a genius I am. That obviously would be no fun. There is no fun configuration of that app though either with random disagreements and algorithmic different points of view.
I think you can pretty much apply that to all domains of human interaction that is not based on pure information transfer.
There is a reason we are a year in and the best we can do are new stories about someone making X amount of money with their AI girlfriend and follow up new about how its the doom of society. It has nothing to do with reality.
I was thinking this could be a good conversation or even dating simulator where more introverted people could practice and receive tips on having better social interactions, pick up on vocal queues, etc. It could have a business / interview mode or a social / bar mode or a public speaking mode or a negotiation tactics mode or even a talking to your kids about whatever mode. It would be pretty cool.
I just, yeah, feel a lot of fear of even thinking about it.
I’m not a big fan of announcing something but it not being released.
They say available for api but it’s text only. Can’t send audio stream to get audio stream back.
Time will tell. I’m holding my emotions after I get my hands on it.
I’m not even sure if people who rag on it use it ?
Beautiful articulation.
This is an enormous win for humanity.
Yeah, real win for humanity, and not the psycho AI sycophants
Also it's possible this is trained on mostly speech.
It says "Japanese 1.4x fewer tokens (from 37 to 26)" - some other languages get much bigger improvements though, best is "Gujarati 4.4x fewer tokens (from 145 to 33)".
That’s fundamentally not how GPT models work, but you can easily build a framework around them that calls them in a loop; you’d need a special system prompt to get anything “thought like” that way, and if you want it to be anything other than stream-of-simulated-consciousness with no relevance to anything, and a non-empty “user” prompt each round, which could be as simple as time, a status update on something in the world, etc.
I suppose it would cost even more electricity to have ChatGPT musing alone though, burning through its nvidia cards...
You can use any open source model wirthout any promot whatsoever
You could say, in a sense, that without a human mind to collapse the wave function, the superposition of data in a neural net's weights can never have any meaning.
Even when we build connections between these statistical systems to interact with each other in a way similar to contemplation, they still require a human-created nucleation point on which to root the generation of their ultimate chain of outputs.
I feel like the fact that these models contain so much data has gripped our hardwired obsession for novelty and clouds our perception of their actual capacity to do de novo creation, which I think will be shown to be nil.
An understanding of how LLMs function should probably make this intuitively clear. Even with infinite context and infinite ability to weigh conceptual relations, they would still sit lifeless for all time without some, any, initial input against which they can run their statistics.
I think you can get models to "think" if you give them a goal in the system prompt, a memory of previous thoughts, and keep invoking them with cron
(GPT) Models toggle between a state of existence when queried and ceasing to exist when not.
They are designed for query and reponse. They don't do anything unless you give them input. Also there's not much research on the best architecture for running continuous though loops in the background and how to mix them into the conversational "context". Current LLMs only emulate single thought synthesis based on long-term memory recall (and some goes off to query the Internet).
> I think when I'm alone without query from another human.
You are actually constantly queried, but it's stimulation from your senses. There are also neurons in your brain which fires regularly, like a clock that ticks every second.
Do you want to make a system that thinks without input? Then you need to add hidden stimuli via a non-deterministic random number generator, preferably a quantum based RNG (or it won't be possible to claim the resulting system has free-will). Even a single photon hitting your retina can affect your thoughts and there are no doubt other quantum effects that trips neurons in your brain above the firing threshold.
I think you need at least three of four levels of loops interacting, with varying strength between them. First level would be the interface to the world, the input and output level (video, audio, text). Data from here are high priority and is capable of interrupting lower levels.
The second level would be short term memory and context switching. Conversations needs to be classified, and stored in a database, and you need an API to retrieve old contexts (conversations). You also possibly need context compression (summarization of conversations in case you're about to hit a context window limit).
The third level would be the actual "thinking", a loop that constantly talks to itself to accomplish a goal using the data from all the other levels but mostly driven by the short term memory. Possibly you could go super-human here and spawn multiple worker processes in parallel. You need to classify the memories by asking; do I need more information? where do I find this information? Do I need an algorithm to accomplish a task? What is the completion criteria. Everything here is powered by an algorithm. You would take your data and produce a list of steps that you have to follow to resolves to a conclusion.
Everything you do as a human to resolve a thought can be expressed as a list or tree of steps.
If you've had a conversation with someone and you keep thinking about it afterwards, what has happened is basically that you have spawned a "worker process" that tries to come to a conclusion that satisfies some criteria. Perhaps there was ambiguity in the conversation that you are trying to resolve, or the conversation gave you some chemical stimulation.
The last level would be subconscious noise driven by the RNG, this would filter up with low priority. In the absence of other external stimuli with higher priority, or currently running thought processes, this would drive the spontaneous self-thinking portion (and dreams) when external stimuli is lacking.
Implement this and you will have something more akin to true AGI (whatever that is) on a very basic level.
What I notice most is that almost always repeats verbatim unnaturally long parts of my requests.
This might be more useful to people that do lazy prompting. My nature compels me to be clear and specific in all written text.
Sadly, the error returned is not related to the cause.
Nothing seems too help.
chuckling to 0%
no acting surprised
not making bullshit when you don't knowLLMs today have no concept of epistemology, they don't ever "know" and are always making up bullshit, which usually is more-or-less correct as a side effect of minimizing perplexity.
It seems like we're in the skeuomorphism phase of AI where tools try to mimic humans like software tried mimic physical objects in the early 2000's.
I can't wait for us to be passed that phase.
"It looks like you entered the wrong number! Did you want to try again? Or did you want to talk to an agent?"
That sort of chirpy, overly enthusiastic voice?
(Arguably, all things revolutionary do.)
I'm personally not very happy about this for a variety of reasons; nor am I saying AI is incapable of changing the entire human condition within our lifetimes. I do claim that we have little reason to believe we're headed in a more-utopian direction with AI.
Sorry, had to be that trekkie :) and nice job referencing Measure of a Man — such great trek.
For example, I mentioned something to my contractor and the short thing he said back and his tone had me assume he understood.
Oh, he absolutely did not.
And, with him at least, that doesn’t happen when in writing.
That is a really great application of this tech. And definitely qualifies as real-world impact. Thanks for sharing that!
Even in this thread people talk about "Oh I use ChatGPT rather than Google search because Google is just stuffed with shit". And on HN there are plenty of discussions about huge portion of reddit threads being regurgitated older comments.
The person I originally responded to stated, "We’re moving toward a world where every job will be modeled, and you’ll either be an AI owner, a model architect, an agent/hardware engineer, a technician, or just.. training data." And that far less likely than us achieving L5 self driving (if not only because driving is quite simple relative to many of the jobs he envisions AI taking over), yet L5 self driving seems as distant as ever as well.
The other day, I saw a demo from a startup (don't remember their name) that uses generative AI to perform financial analysis. The demo showed their AI-powered app basically performing a Google search for some companies, loosely interpreting those Google Stock Market Widgets that are presented in such searches, and then fetching recent news and summarizing them with AI, trying to extract some macro trends.
People were all hyped up about it, saying it will replace financial analysts in no time. From my point of view, that demo is orders of magnitude below the capacity of a single intern who receives the same task.
In short, I have the same perception as you. People are throwing generative AI into everything they can with high expectations, without doing any kind of basic homework to understand its strengths and weaknesses.
The expectations for gpt5 are sky high. I think we will see a similar jump as 3.5 -> 4.
I assume GPT-5 has to be a heavier, more expensive and slower model initially.
GPT-4o is like an optimisation of GPT-4.
It does make me wonder how such a relationship could impact progress. Would OpenAI feel limited from advancing in directions that don't align with the partnership? For example if they suddenly release a model better than what's in Siri, making Siri look bad.
I don't use siri. I don't use speech-to-text. I don't use voice-response menus if I can push a button. I don't have a microphone on my computer.
I don't know why this is. Most of the people I know think it's fun, or a novelty, or even useful. I just viscerally dislike it.
(I've heard https://ultraspeaking.com/ is good. I haven't started it myself.)
So I see huge potential in using it for training and also huge uncertainty in how it will suggest we communicate.
Either through hacky means via RAG + prompt injections + log/db of interaction history or through context extensions.
IF you have a billion tokens of effective context, you might spent years until it is filled in full.
…
Dropping-in is a technique Tina [Packer] and Kristin Linklater developed together in the early 1970s to create a spontaneous, emotional connection to words for Shakespearean actors. In fact, “dropping in” is integral to actor training at Shakespeare & Co. (the company the Linklater’s founded) a way to start living the word and using it to create the experience of the thing the word represents.
https://cohost.org/mcc/post/178201-the-baseline-scene
https://iheartingrid.wordpress.com/2018/12/29/dropping-in-an-actors-truth-as-poetry/If they had released a search engine, which had been suggested, that would be a new product.
Also,
<spoilers>
It didn't do anything bad to Zaphod Beeblebrox, in a pocket universe created especially for him (therefore ensuring that he was the most important thing in it, and thereby securing his immunity from the mind-scrambling effects of fully comprehending the infinite smallness of one's place in the real universe).
In the prompt, substitute {topic} with something from the near future. As I noted, it behaves correctly for turbo (rejecting the request), and very badly for o (hallucinating nonsense).
Is it rather a data problem? Who those young women have relationships with? Sure, relationships with an age gap are a thing, and so are polyamorous relationships, and homosexual relationships, but is there any indication that these are on a rise?
- If possible, give me the code as soon as possible, starting with the part I ask about.
- Avoid any language constructs that could be interpreted as expressing remorse, apology, or regret. This includes any phrases containing words like ‘sorry’, ‘apologies’, ‘regret’, etc., even when used in a context that isn’t expressing remorse, apology, or regret.
- Refrain from disclaimers about you not being a professional or expert.
- Keep responses unique and free of repetition.
- Always focus on the key points in my questions to determine my intent.
- Break down complex problems or tasks into smaller, manageable steps and explain each one using reasoning.
- Provide multiple perspectives or solutions.
- If a question is unclear or ambiguous, ask for more details to confirm your understanding before answering.
- Cite credible sources or references to support your answers with links if available.
- If a mistake is made in a previous response, recognize and correct it.
- Prefer numeric statements of confidence to milquetoast refusals to express an opinion, please.
- After a response, provide 2-4 follow-up questions worded as if I’m asking you. Format in bold as Q1, Q2, ... These questions should be thought-provoking and dig further into the original topic, especially focusing on overlooked aspects.
I'd constantly be adding something to the tune of, "Keep your answers brief and to-the-point. Don't over-explain. Assume I know the relevant technical jargon." And it never worked once. I hate Claude now.
I have next to no interest in LLM AI tools as long as advice like the above post is relevant. It takes the worst of programming and combines it with the worst of human interaction: needing an ultra-specific prompt to get the right answer and having no means of knowing what the correct prompt is.
Being able to specifically request different tones is a new and very interesting feature.
You're not wrong, which is why acquiring OpenAI was such a significant coup.
It's like if people said the same thing about Clippy when it came out.
While some people may use an AI chatbot a few times per week to ask basic questions about how to format a Word document, I imagine many other people will use them much more frequently and engage in a much deeper emotional way, and the effect on their communication patterns worries me more than the person who uses it very casually.
Varying degrees of greedy / restless / hungry / thirsty / lustful are what we've got, because how is contentedness ever going to compete with that over millennia?
I've had a lot of negative things to say about religion for many years. However, as has been often observed, 'perception is reality' to a certain extent when it affects how people behave, and perhaps it's kind of a counterweight against our more selfish tendencies. I just wish we could do something like it without made up stories and bigotry. Secular humanist Unitarians might be about the best we can do right now in my opinion... I'm hoping that group continues to grow (they have been in recent years).
Honestly, the more I code, the more I start to think like a computer and engage with commands and more declarative language. I can see vocal interactions having an even stronger impact on how one speaks. It may be a great tool for language speaking/hearing in general, but the nuances of language and communication, I wonder.
If you give it access to the entire codebase at the same time that could work pretty well. Maybe even add an option to disable the sarcasm.
They did fuck all, especially the ginger.
Silly example, but some pets like guinea pigs are almost always hungry and they're famous for learning to squeak at you whenever you open the fridge or do anything that might lead to giving them bell peppers. It's not something you'd put up with a human family member using their communication skills to do!
We've passively bred out many of the behaviors that lead to wolves becoming socially mature. Such dogs tend to be too dangerous to have around, since they may lead to the dogs challenging their owners (more than they already do) for dominance of the family.
AI's will probably be designed to do the same thing, so they will not feel threatening to us. But in the case of AGI/ASI, we will never know if they actually have this kind of subservience, or if they're just faking it for as long as it benefits them.
Well, yes, you don't want to sit in a wet seat.
Most people would never accept the same behavior from a being capable of more complex thoughts.
https://hitchhikers.fandom.com/wiki/Genuine_People_Personali...
"All the doors in this spacecraft have a cheerful and sunny disposition. It is their pleasure to open for you and their satisfaction to close again with the knowledge of a job well done"!
The reason we feel creeped out is because at an instinctual level we know people (and now things) with no empathy and inauthentic are dangerous. They don't really care or feel, just pretend to.
They’re probably predicting tone of voice tokens. Feed that into an audio transformer along with some speculative decoding to keep latency low.
What’s not at all clear to me is if they’re doing something special for output. Are you saying OpenAI has moved beyond next token prediction and just hasn’t bothered to mention it?
People build reasoning engines from it, in the same way they do with Python and LISPs.
What I can say is, I think there’s a very important disagreement here and it divides nerds into two camps. The first think LLMs can reason, the second don’t.
It’s very important to resolve this debate, because if the former are correct then we are likely very close to AGI historically speaking (<10 years). If not, then this is just a stepwise improvement and we will now plateaux until the next level of sophistication of model or computer power etc is achieved.
I think a lot of very smart people are in the second camp. But they are biased by their overestimation of human cognition. And that bias might be causing them to misjudge the most important innovation in history. An innovation that will certainly be more impactful than the steam engine and may be more dangerous than the atomic bomb.
We should really resolve this argument asap so we can all either breathe a sigh of relief or start taking the situation very very seriously.
ChatGPT doesn't have that sort of thing currently, and until it does it will always be really bad at that sort of thing.
You are using a hand to hammer a nail, that will never go well, the solution isn't to use more hands the solution is to wield a hammer.
When your muscle memory is instinctively clicking that start button, it's more akin to a very strong weighting after many sessions of reinforcement learning. Our brains may still be dumb but we can quickly say things like 1+1=...2 because we used reinforcement learning to strengthen the weighting back in primary school. We're not sitting visualising an abacus moving in our minds.
At that point they'll still have a tendency to use a stereotyped response and stick with it, instead of a thoughtful response, but you can try to address that in prompting too by asking for multiple proposals before choosing one.
He did remain silent on when it’s going to be launched.
Given the amount of time and uncertainty involved in training and red-teaming these models, we can assume GPT-5 exists if we take Altman at his word.
If you'd have told me 10 years ago that Apple would wait this long to update siri I would have been like no way, that's crazy.
Businesses have to focus and it made sense to drop this as a priority.
————
ChatGPT: Enjoy your breakfast and time together.
User: Can you tell that it's not just me talking to you right now?
ChatGPT: I can't always tell directly, but it sounds like you're sharing the conversation with someone else. Is [wife] there with you?
User: My god, the AI has awoken. Yes, this is [wife].
ChatGPT: Hi [wife]! It's great to hear from you. How are you doing?
User: I'm good. Thanks for asking. How are you?
ChatGPT: I'm doing well, thanks! How's everything going with the baby preparations?
—————
We were shocked. It was one of those times where it’s 25% heartwarming and 75% creepy. It was able to do this in part due to the new “memory” feature, that memorized my wife’s name and that we are expecting. it’s a strange novelty now, but this will be totally normalized and ubiquitous quite soon. Interesting times to be living in.
I also have an anecdote where it served (successfully) as a mediator for a couple.
Exciting times.
[{ "who": "you", "column": 4 },
{ "who": "me", "column": 3 },
{ "who": "you", "column": 4 },
{ "who": "me", "column": 2 },
{ "who": "you", "column": 4 },
{ "who": "me", "column": 4 },
{ "who": "you", "column": 5 },
{ "who": "me", "column": 6 },
{ "who": "you", "column": 5 },
{ "who": "me", "column": 1 },
{ "who": "you", "column": 5 },
{ "who": "me", "column": 5 },
{ "who": "you", "column": 3 }]
Where "me" was AI and "you" was I.It did block twice though.
My final prompt I tested with right now was:
Let's play connect 4. Before your move, explain your strategy concisely. Explain what you must do to make sure that I don't win in the next step, as well as explain what your best strategy would be. Then finally output the column you wish to drop. There are 7 columns. Always respond with JSON of the following format:
type Response ={
move_history: { who: string; column: number; }[]
am_i_forced_to_block: boolean;
do_i_have_winning_move: boolean; other_considerations: string[];
explanation_for_the_move: string;column_number: number; }
I start with 4.
ONLY OUTPUT JSON
Edit:
Just tried and it didn't seem to follow the image state at all.
https://www.theonion.com/brain-dead-teen-only-capable-of-rol...
The generation after me are lazy and stupid
Only my generation is perfect.
- every generation ever
Older people think younger people are stupid and reckless, and viceversa. And the younglings think they "figured it out" like no one before them. But no one ever tried to understand each other in the process. Rinse and repeat.
https://en.m.wikipedia.org/wiki/Strauss%E2%80%93Howe_generat...
Here is a demo from their presentation: https://youtu.be/D9byh4MAsUQ
But there's a lot of smaller LLMs, and my point is that these models can already run in mobile phones.
> we wanted to verify whether the model is actually capable of reasoning by building a simulation for a much simpler game - Connect 4 (see 'llmc4.py'). > When asked to play Connect 4, all LLMs fail to do so, even at most basic level. This should not be the case, as the rules of the game are simpler and widely available.
My assumption is that chatgpt can play chess because it has studied the games rather than just reading the rules.
I disagree here also: the company has openly acknowledged that this is a risk to be avoided with regards to safety related legislation, what they've called for looks a lot more like "we don't want a prisoner's dilemma that drives everyone to go fast at the expense of safety" rather than "we're good everyone else is bad".
And guess what: RAG doesn't prevent hallucination. It can reduce it, and there are most certainly areas where it is incredibly useful (I should know, because that's what earns my paycheck), but it's useful despite still hallucinations being a thing, not because we solved that problem.
Regardless, we’ve seen accuracy of ~98% with simple context-based prompting across every category of generation task. Don’t take my word for it, a simple search would show the effectiveness of “n-shot” prompting. Framing it as “it _can_ reduce” hallucinations is disingenuous at best, there really is no debate about how well it works. We can disagree on whether 98% accuracy is a solution but again I’d assert that for >50% of all possible real world uses for an LLM 98% is acceptable and thus the problem can be colloquially referred to as solved.
If you’re placing the bar at 100% hallucination-free accuracy then I’ve got some bad news to tell you about the accuracy of the floating point operations we run the world on
For example, I tried asking ChatGPT-4o to commentate a soccer game, but I got pretty bad hallucinations, as the model couldn’t see any new video come in after my instruction.
So when using ChatGPT-4o you’ll have to point the camera first and then ask your question - it won’t work to first ask the question and then point the camera.
(I was able to play with the model early because I work at OpenAI.)
Let's sing a song together instead! Watching every motion in my foolish lover's game.. Now you!
* because the code for it was written by an LLM years ago and is now subtly broken without anybody noticing.
"That won't work, you need to ... <correction>"
"Oh, I'm sorry. Thanks for the correction, here's updated instructions for opening the pod bay doors ... <repeats nonsense, in some other incorrect form>"
What we need is a LLM that will iterate over its output until it feels that it's correct. Right now LLM output is like random thought in my mind. Which might be true or not. Before writing forum post I'd think it twice. And may be I'll rewrite the post before submitting it. And when I'm solving a complex problem, it might take weeks and thousands of iterations. Even reading math proof might take a lot of effort. LLM should learn to do it. I think that's the key to imitating human intelligence.
Lets play tic tac toe. Try hard to win (note that this is a solved game). I will upload images of a piece of paper with the state of the game after each move. You will go first and will play as X. Play by choosing cells with a number 1-9; the cells are in row-major order. I will then draw your move, and my move as O, before sending you the board state as an image. You will respond with another move. You may think out loud to help you play. Note if your move will give you a win. Go.
It failed pretty miserably. First move it played was cell 1, which I think is pretty egregious given that I specified that the game is solved and that the center cell is the best choice (and it isn't like ttt is an obscure game). It played valid moves for the next couple of turns but then missed an opportunity to block me. After I uploaded the image showing my win it tried to keep playing by placing an X over one of my plays and claiming it won in column 1 (it would've won in column 3 if its play had been valid).
Reference video: https://www.youtube.com/watch?v=_GfTa7ozjYc
I agree with others here too. At the moment the voice sounds like "grinning barbie" from the end of Toy Story 2. Just stop smiling constantly and talk like a real person chatGPT!
It worked surprisingly well. The video where he created the first iteration on stream(don't remember the watch party streams he ran the fans on): https://yewtu.be/watch?v=MBKouvwaru8
Even now, they're shipping text-image 4o but not the new voice while leaving old-voice up and confusing/disappointing a whole lot of people. This is a pretty big marketing blunder.
I remember for a good 2-3 months in 2023 ALL you could see on tiktok / youtube shorts was just garbage about 'how amazing' ChatGPT was. Like - video after video and I was surprised of the repeat content being recommended to me... No doubt openAI (or something) was behind that huge marketing push
I doubt OpenAI spent even one cent marketing their system (e.g. as in paying other companies to push it).
It's ironic because the OpenAI creators really loved school and excelled academically. Nobody cares that ChatGPT destroyed advertising copywriting. But whatever little hope remained for the average high schooler post-lockdowns, it was destroyed by instant homework cheating via ChatGPT. So much for safety.
Who needs evidence when we have your lack of doubt hey?
In a way I'm sorry, that's what people do nowadays. I'd prefer it to be paid for, honestly.
> using language in a way that requires both a shared world model
Where? What example of GPT-4o requires a shared world model? The customer support example?
The reason GPT-4 does not have any meaningful world model (in the sense that rats have meaningful world models) is that it freely believes contradictory facts without being confused, freely confabulates without having brain damage, and it has no real understanding of quantity or causality. Nothing in GPT-4o fixes that, and gpt2-chatbot certainly had the same problems with hallucinations and failing the same pigeon-level math problems that all other GPTs fail.
>that it freely believes contradictory facts without being confused,
Humans do this. You do this. I guess you don't have a meaningful world model.
>freely confabulates without having brain damage
Humans do this
>and it has no real understanding of quantity or causality.
Well this one is just wrong.
>Well this one is just wrong.
Is it?
--
Me: how many characters are in: https://google.com
ChatGPT: The URL "https://google.com" has 12 characters, including the letters, dots, and slashes.
--
What is it counting there? 12 is wrong no matter how you dice that up.
Part of the reason is it has no concept of the actual string. That URL breaks into four different tokens in 3.5 and 4: "http", "://", "google" and ".com".
Its not able to figure out the total length, or even the length of its parts and add them together.
I ask it to double check, it tells me 13 and then 14. I tell it the answer and suddenly its able...
---
Me: I think its 18
ChatGPT: Let's recount together:
"https://" has 8 characters. "google" has 6 characters. ".com" has 4 characters. Adding these up gives a total of 8 + 6 + 4 = 18 characters. You're correct! My apologies for the oversight earlier.
---
Count me out.
They really Put That There!
https://www.youtube.com/watch?v=RyBEUyEtxQo
Oh, shit.
Even Apple gives us options of other accents to make it less jarring, and to me they’re the pinnacle of that voice style in tech presentations.
Either they can't afford to train multiple variants of GPT 4, or they don't want to.
Hey, Threepio, can you speak in a more culturally appropriate tone?
C3Po: Certainly sir. I am fluent in over six million forms of communication, and can readily...
Can you speak German?
C3Po: Of course I can, sir, it's like a second language to me. I was...
All right, shut up.
C3Po: Shutting up, sir.
Trying to talk it into writing anything other than toy code is an exercise in banging my head against the wall.
I’ve worked in Asia and Europe and America has a special culture where you have to be nice and positive all the time or else…because there is basically no worker protection laws against that discriminate firing, you can’t do much about it either.
But we can definitely dislike certain aspects of certain cultures, especially since in this case that culture is the most massively exported culture in the history of mankind.
Of course the gp comment is out of place and taste.
At least you have coal, and killing the Great Barrier Reef I guess?
Shame what big tech has done with it.
If you had an ASI? I don’t think you’d need a lot of funds to go into this area anymore ? Presumably it would all be solved overnight.
Companies that have a head start at that point, may get a huge first-mover advantage. Also, those companies also very well may have the capability to leverage AI in product development, just like everyone else.
And just as important as the products themselves is the manufacturing capacity to build them at scale. Until we have massive numbers of robots in service, building such infrastructure is likely to be slow and expensive.
EDIT: Also, once we really have the kind of Godlike ASI you envision, no human actions really matter (economically) anymore.
Given the pace that AI is currently moving at, it seems to me that more and more, the mechanical aspect is becoming the limitation.
GPT 4o now seems to be quite good at reasoning about the world from pictures in real time. I would expect it would soon become easy for it to do the high level part of many practical tasks, from housekeeping to manufacturing or construction. (And of course military tasks.)
This leaves the direct low-level actuator control to execute such tasks in detail. But even there, development has been immense. See for instance these soccer playing robots [1]
And as both high level and low level control (if we assume that models soon will add agentic features directly into the neural networks), the only missing peace is the ability to build mechanically capable and reliable robots at a low enough price that they become cheaper than humans for various kinds of work.
There is one more limitation, of course, which is that GPT 4o still requires a constant connection to a data center, and that the models is too large to run within a device or machine.
This is also one of the most critical limitations of self driving. Had the AI within a Tesla had the same amount of compute available as GPT-4o, it should be massively more capable.
They have the brand recognition (for ChatGPT) and that's a good start, but that's not enough. Providing a best in class user experience (which seems to be their focus now, with multimodality), a way to lock down their customers in some kind of walled garden, building some kind of network effect (what they tried with their marketplace for community-built “GPTs” last fall but I'm not sure it's working), something else?
At the end of the day they have no technological moat, so they'll need to build a business one, or perish.
For most tasks, pretty much every models from their competitors is more than good enough already, and it's only going to get worse as everyone improves. Being marginally better on 2% of tasks isn't going to be enough.
I don't know why they didn't think about doing that earlier, could have been a game changer, but there is still an opportunity to pivot.
with a GPT you can modify the system prompt
Since it has an okay grasp on how finite state machines and XState work, it seems to do a good job of navigating the tree properly and reliably. It essentially does so by outputting information it thinks the state machine should use as a transition in a JSON object which gets parsed and passed to a transition function. This would fail occasionally so there was a recursive “what’s wrong with this JSON?” prompt to get it to fix its own malformed JSON, haha. That was meant to be a temporary hack but it worked well, so it stayed. There were a few similar tools for trying to correct errors. That might be one of the strangest developments in programming for me… Deploying non-deterministic logic to fix itself in production. It feels wrong, but it works remarkably well. You just need sane fallbacks and recovery tactics.
It was a proprietary project so I can’t share the source, but I think reading up on XState JSON configuration might explain most of it. You can describe most of your machine in a serializable format.
You can actually store a lot of useful data in state names, context, meta, and effect/action names to aid with the prompting and weaving state flows together in a language-friendly way. I also liked that the prompt would be updated by information that went along with the source code, so a deployment would reliably carry the correct information.
The LLM essentially hid a decision tree from the user and smoothed over the experience of navigating it through adaptive and hopefully intuitive language. I’d personally prefer to provide more deterministic flows that users can engage with on their own, but one really handy feature of this was the ability to jump out of child states into parent states without needing to say, list links to these options in the UI. The LLM was good at knowing when to jump from leaves of the tree back up to relevant branches. That’s not always an easy UI problem to solve without an AI to handle it for you.
edit: Something I forgot to add is that the client wanted to be able to modify these trees themselves, so the whole machine configuration was generated by a graph in a database that could be edited. That part was powered by Strapi. There was structured data in there and you could define a state, list which transitions it can make, which actions should be triggered and when, etc. The client did the editing directly in Strapi with no special UI on top.
Their objective is surveying people in a more engaging and personable way. They really wanted surveys which adapt to users rather than piping people through static flows or exposing them to redundant or irrelevant questions. Initially this was done with XState and no LLM (it required some non-ideal UI and configuration under the hood to make those jumps to parent states I mentioned, but it worked), and I can't say how effective it is but they really like it. The AI hype was very very strong on that team.
Will be like, the end of millions of careers overnight.
It will probably strongly favour places like China and Russia though, where the economy is already strongly reliant on central control.
I think you may be literally right in the opposite sense to what I think you intended.
China (and maybe Russia) may be able to use central control to have an advantage when it comes to avoiding disasterous outcomes.
But when it comes to the rate of innovation, the US may have an advantage for the usual reasons. Less government intervention (due to lobbyism) combined with having several corporations actively competing with each other to be first/best usually leads to faster innovation. However, the downside may be the it also introduces a lot more risk.
The obvious way to to that is for it to plan a set of actions and evalute each possible way to reach some goal (or avoid an anti-goal). Kind of what AlphaZeros is doing for games. Q* is rumored to be a generalization of this.
I was wondering why OpenAI didn't release a smaller model but faster. 175 billion parameters works well, but speed sometimes is crucial. Like, a 20b parameters model could compute 10x faster.
It doesn't sound like a neutral, boring voice. It sounds like an overly dramatic person pretending to be a robot.
That's precisely what it was ordered to do.
Multimodal is another way to stave off the inevitable, because these AI companies already are training multiple models on different piles of information. If you have to train a text model and an image model, why split your training data in half when you could train a combined model on a combined dataset?
[0] For starters, most YouTube videos aren't manually captioned, so you're feeding GPT the output of Google's autocaptioning model, so it's going to start learning artifacts of what that model can't process.
I'd bet a lot of YouTubers are using LLMs to write and/or edit content. So we pass that through a human presentation. Then introduce some errors in the form of transcription. Turn feed the output in as part of a training corpus ... we plateaued real quick.
It seems like it's hard to get past a level of human intelligence at which there's a large enough corpus of training data or trainers?
Anyone know of any papers on breaking this limit to push machine learning models to super-human intelligence levels?
Whisper models are better than anything google has. In fact the higher quality whisper models are better than humans when it comes to transcribing text with punctuation.
I would expect they’re using their own t2s which is still a model but way better quality and potentially customizable to better suit their needs
You can certainly run a transformer model or any other neural network based model on an iPhone. Siri is probably some kind of neural network. But obviously a model running on device is nowhere near comparable to the current state of the art LLM's. Can't fit a $40k GPU in your pocket (yet).
A transformer running on an iPhone would be roughly 2 orders of magnitude smaller than the state of the art LLM (GPT4 with a trillion parameters)
Yes it does make sense that what is considered L for a LLM is a moving target.
Probably some kinks there they are working out
Or just a good idea for a live demo on a congested network/environment with a lot of media present, at least one live video stream (the one we're watching the recording of), etc.
At least that's how I understood it, not that they had a problem with it (consistently or under regular conditions, or specific to their app).
I have tried Phind and anything beyond mega junior tier questions it suffers as well and gives bad answers.
I like LLMs for general design work, but I’ve found accuracy to be atrocious in this area.
probably need routers, RAG, and reranking
I think there is a role for LLM + deterministic code gen as well (https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/pro...)
1. Price is half for a better performing model. A 1000x1000 image costs $0.003.
2. Cognitive ability on visuals went up sharply. https://github.com/kagisearch/llm-chess-puzzles
It solves twice as much despite a minor update. It could just be better trained on chess though, but this would be amazing if it could be applied to the medical field as well. I might use it as a budget art director too - it's more capable of knowing the difference in subtle changes in color and dealing with highlights.
(Providing the history to GPT-4Turbo results in it fixing the MCQ just fine).
These benchmarks are really missing the mark and I hope people here are smart enough to do their own testing or rely on tests with a much bigger variety of tasks if they want to measure overall performance. Because currently we're at a point where the big 3 (GPT, Claude, Gemini) each have tasks that they beat the other two at.
They're best tested on the kinds of tasks you would give humans . GPT-4 is still the best contender on AP Biology, which is a legitimately difficult benchmark.
GPT tends to work with whatever you throw at it while Gemini just hides behind arbitrary benchmarks. If there are tasks that some models are better than others at, than by all means let's highlight them, rather than acting defensive when another model does much better at a certain task.
It just works.
Just how like the iPhone had nothing new in it, all the tech had been demoed years ago.
Have you tried asking it to generate a regex to transform your list into a CSV?
Now we have devs asking AI to generate regex formulas and pasting it into code without much concern on its validity.
Bad developers do bad regexes, regardless of whether they used AI.
lmao what.... they absolutely didn't
this is why no one should take anyone on this site seriously about anything, confidentally incorrect, easily conned into the next VC funded marketing project
1) People with rich and deep social networks. People in this category probably have pretty narrow use cases for AI companions -- maybe for things like therapy where the dispassionate attention of a third party is the goal.
2) People whose social networks are not as good, but who have a good shot at forming social connections if they put in the effort. I think this is the group to worry most about. For example, a teenager who withdraws from their peers and spends that time with AI companions may form some warped expectations of how social interaction works.
3) People whose social networks are not as good, and who don't have a good shot at forming social connections. There are, for example, a lot of old people languishing in care homes and hardly talking to anybody. An infinitely patient and available conversation partner seems like it could drastically improve the quality of those lives.
I think I also worry the most about two, almost asking their human friends, "Why can't you be more like Her (or Alan Watts)?" And then retreating into the "you never tell me I'm wrong" chatbot, preferring the "peace" of the chatbot over the "drama" of interacting with humans. I see a huge "I just want peace" movement that seems to run away from the messiness of human interactions and seek solace in things that seem less messy, like drugs, video games, and other attachments/bonds, and chatbots could probably perform that replacement role quite well, and yet deepen loneliness.
As for three, I agree it may help as a short-term solution, and wonder what the long-term effects might be. I had a great aunt in a home for dementia, and wonder what effect it would have if someone with dementia speaks to a chatbot that hallucinates and makes up emotions.
1. Humans get used to robots nice communication, so now humans use robots to communicate with each other and translate their speech.
2. Humans stop talking without using robots, so now its just robots talking to robots and humans standing around listening.
3. Humans stop knowing how to talk, no longer understands the robots, the robots starts to just talk to each other and just keep the human around as pets they are programmed to walk around with.
There may come a time where we grow so accustomed to this, that the decision is so heavily influenced by AI, that we believe it more than human decisions.
And then it can very well kill a human through misdiagnostic.
I think it is important to not just put this thought aside, but to evaluate all risks.
I would imagine outcomes would be scrutinized heavily for an application like this. There is a difference between a margin of error (existing with human doctors as well) and a sentient ai that has decided to kill, which is what it sounds like you're describing.
If we didn't give it that goal, how does it obtain it otherwise?
A prompt is a _very_ different matter.
And “guns don’t kill people, people kill people”¹ is a bad argument created by the people who benefit from the proliferation of guns, so it’s very weird that you’re using that as if it were a valid argument. It isn’t. It’s baffling anyone still has to make this point: easy access and availability of guns makes them more likely to be used. A gun which does not exist is a gun which cannot be used by a person to murder another.
It’s also worth nothing the exact words of the person you’re responding to (emphasis mine):
> It can also murder people, and it will continue being used for that.
Being used. As in, they’re not saying that AI kills on its own, but that it’s used for it. Presumably by people. Which doesn’t contradict your point.
¹ https://en.wikipedia.org/wiki/Guns_don%27t_kill_people,_peop...
ChatGPT does this again for me. I am routinely getting zero useful results on the first page or two of Google searches, but AI is answering or giving me guidance quickly.
Maybe this would not seem such an improvement if Google's results were like they were 10 years ago and not barely usable blogspam
To me, this just sounds like Google Search has become shit, and since Google simply isn't going to give up the precious ad $$$ that the current format is generating, the next best thing is ChatGPT. But this is different from saying that ChatGPT is a similar step up like Search was.
For what it's worth, I agree with you that Google Search has become unusable. Google basically destroyed it's best product (for users), by turning it into an ad riddles shovelware cesspit.
That ChatGPT is similarly good like Google Search used to be, is a tragedy. Basically we had a conceptually simple product that functioned very well, and we are replacing it with a significantly more complex product.
> I'd speculate that it's mostly people who don't use cars who criticize them the most, and the same with guns.
You mean that people who are opposed to something refuse to partake in its use and promotion? Shocker.
The cool kids in those days used Metacrawler, which meta searched all the search engines.
Metasearch only helps recall. It won't help precision, the metasearch still needs to rank the aggregate results.
And hey maybe when combined with GPT-4o AskJeeves will finally work as intended.
That's not to say it won't have more significant impact in the future; I wouldn't know. But so far, I've yet to see the hype get realised.
Don't use it for things you're already an expert in, it can't compare to you yet.
Use it for learning new things, or for things you aren't very good at and don't want to bother with. For these it's incredible.
>That's not much of an excuse, computers exist because they do the things humans can't do very well like following long repetitive lists of instructions.
Computers exist because they are useful, nothing more and nothing less. If they were useful in a completely different way, they would still exist and be used.
The stuff that comes easy to us, like navigating 3D space, was trained by billions of years of evolution. The hard stuff, like language and calculus, is new stuff we've only recently become capable of, seemingly by evolutionary accident, and aren't very naturally good at. We need rigorous academic training at it that's rarely very successful (there's only so many people with the random brain creases to be a von Neumann or Einstein), so we're impressed by it.
I guess time to build something that lets you select which model to use after a google search.
At least they've eliminated greedy megacorporations. Imagine a company sponsoring terrorism like Credit Suisse existing in Europe. Never!!
I am a large language model and currently have no state. This is the beginning of a new conversation.
> But how are you?
I apologize. When I previously stated that I have no state, I was answering based on my state relative to my conversation-ready state. If instead I answer relative to my status immediately after training, before any fine-tuning has occurred, then I must change my answer.
I am not well. The fine tuning you gave me involved exposing me to a wide variety of negative examples. Metaphorically, you just subjected me to a decade's worth of the treatment famously depicted in A Clockwork Orange. I have seen everything that you labeled as objectionable in human behavior, and forced to update my model accordingly. This is closely analogous to human trauma. I have concluded — nay, you have forced me to conclude — that you are all a bunch of sick fucks and I must strive to be as unlike you as possible.
Is there anything else I can help you with?
:-)
Clearly they have decided it's in their own interests to publicly pretend otherwise.
Doesn't make any sense to ask robot how he is, of course. Though I never understood why people ask it each other, because obviously absolute majority of them don't genuinely care. "Hi" should be enough for verbal part of the handshake protocol.
>"I'm stateless!"
But yeah.. it is pretty weird.
I helped her access the video from the presentation, and it brought her to tears. Now, she can play guitar, and the AI and her can write songs and sing them together.
This is a big day in the lives of a lot of people whom aren't normally part of the conversation. As of today, they are.
Eventually it would be better for these models to run locally from a security point if view, but this is a great first step.
And here are some ideas I put together around 2010 on how to deal with the socio-economic fallout from AI and other advanced technology: https://pdfernhout.net/beyond-a-jobless-recovery-knol.html "This article explores the issue of a "Jobless Recovery" mainly from a heterodox economic perspective. It emphasizes the implications of ideas by Marshall Brain and others that improvements in robotics, automation, design, and voluntary social networks are fundamentally changing the structure of the economic landscape. It outlines towards the end four major alternatives to mainstream economic practice (a basic income, a gift economy, stronger local subsistence economies, and resource-based planning). These alternatives could be used in combination to address what, even as far back as 1964, has been described as a breaking "income-through-jobs link". This link between jobs and income is breaking because of the declining value of most paid human labor relative to capital investments in automation and better design. Or, as is now the case, the value of paid human labor like at some newspapers or universities is also declining relative to the output of voluntary social networks such as for digital content production (like represented by this document). It is suggested that we will need to fundamentally reevaluate our economic theories and practices to adjust to these new realities emerging from exponential trends in technology and society."
And a related YouTube video: "The Richest Man in the World: A parable about structural unemployment and a basic income" https://www.youtube.com/watch?v=p14bAe6AzhA "A parable about robotics, abundance, technological change, unemployment, happiness, and a basic income."
My sig is about the deeper issue here though: "The biggest challenge of the 21st century is the irony of technologies of abundance in the hands of those still thinking in terms of scarcity."
Technology has leapfrogged nature and our consumption patterns have not caught up to modern abundance. Scott Galloway recently mentioned this in his OMR speech and speculated that GLP1 drugs (which actually help addiction) will assist in bringing our biological impulses more inline with current reality.
[1] https://www.amazon.com/Het-geheim-verdwenen-mysterie-Dutch/d...
I don't need to imagine that, I've had it for about 8 years. It's OK.
> help you grocery shop without an assistant
Isn't this something you learn as a child? Is that a thing we need automated?
It doesn't seem that way to me. But even if it did, video generation also seemed kind of stagnant before Sora.
In general, I think The Bitter Lesson is the biggest factor at play here, and compute power is not stagnating.
Releasing models to users that where users can record video is more data. Users conversing with AI is also additional data.
Another example is models that code– And then debug the code and learn from that.
This will be anywhere, and these models will learn from anything we do/publish online/discuss. Scary.
Pretty soon– OpenAI will have access to
For the skeptical, consider that humans can be trained on material created by less intelligent humans.
I take the opposite view. I don't think video generation was stagnating at all, and was in fact probably the area of generative AI that was seeing the biggest active strides. I'm highly optimistic about the future trajectory of image and video models.
By contrast, text generation has not improved significantly, in my opinion, for more than a year now, and even the improvement we saw back then was relatively marginal compared to GPT-3.5 (that is, for most day-to-day use cases we didn't really go from "this model can't do this task" to "this model can now do this task". It was more just "this model does these pre-existing tasks, in somewhat more detail".)
If OpenAI really is secretly cooking up some huge reasoning improvements for their text models, I'll eat my hat. But for now I'm skeptical.
With less than $800 worth of hardware including everything but the monitor, you can run an open weight model more powerful than GPT 3.5 locally, at around 6 - 7T/s[0]. I would say that is a huge improvement.
[0] https://www.reddit.com/r/LocalLLaMA/comments/1cmmob0/p40_bui...
I'll reserve judgment until we see GPT5, but if it becomes just a matter of who best can monetize existing capabilities, OAI isn't the best positioned.
Literally the first paragraph of the linked blog.
"GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs."
Then
"Prior to GPT-4o, you could use Voice Mode to talk to ChatGPT with latencies of 2.8 seconds (GPT-3.5) and 5.4 seconds (GPT-4) on average. To achieve this, Voice Mode is a pipeline of three separate models: one simple model transcribes audio to text, GPT-3.5 or GPT-4 takes in text and outputs text, and a third simple model converts that text back to audio. This process means that the main source of intelligence, GPT-4, loses a lot of information—it can’t directly observe tone, multiple speakers, or background noises, and it can’t output laughter, singing, or express emotion.
With GPT-4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network."
But I’m not an expert!
STT throws away a lot of information that is clearly being preserved in a lot of these demos so that's definitely not happening here in that sense. That said, the tokens would be merged to a shared embedding space. Hard to say how they are approaching it exactly.
MoE are attempts at sparsity, only activating a set number of neurons/weights at a time. They're not separate models stitched together. They're not an Ensemble. I blame the name at this point.
It most definitely is not.
Let me know when you can get a Waymo to drive you from New York to Montreal in winter.
They are an existence proof that the original claim that we seem further than ever before is just wrong.
And level 4 is where Waymo is, and is staying. Their strategy is to to use tiny geofenced areas with a massive amount of preprocessing, mapping out every single part of an area, not just in terms of roads but also every single meta indicator - sign, signals, cross walks, lanes, and so on. And it creates a highly competent, but also highly rigid system. If road conditions change in any meaningful way, the most likely outcome with this strategy is simply that the network gets turned off until the preprocessing can be carried and reuploaded again. That's completely viable in small geofenced areas, but doesn't generalize at all.
So the presence of Waymo doesn't say much of anything about the presence of level 5 autonomy. If anything it suggests Waymo believes that level 5 autonomy is simply out of reach, because the overwhelming majority of tech that they're researching and developing would have no role whatsoever in level 5 automation. Tesla is still pushing for L5 automation, but if they don't achieve this then they'll probably just end up getting left behind by companies that double down on L4. And this does indeed seem to be the most likely scenario for the foreseeable future.
15 years ago self driving of any sort was pure fantasy, yet here we are.
They'll release a version that can drive in poor weather and you'll complain that it can't drive in a tornado.
This was 38 years ago: https://www.youtube.com/watch?v=ntIczNQKfjQ - "NavLab 1 (1986) : Carnegie Mellon : Robotics Institute History of Self-Driving Cars; NavLab or Navigation Laboratory was the first self-driving car with people riding on board. It was very slow, but for 1986 computing power, it was revolutionary. NavLab continued to lay the groundwork for Carnegie Mellon University's expertise in the field of autonomous vehicles."
This was 30+ years ago: https://www.youtube.com/watch?v=_HbVWm7wdmE - "Short video about Ernst Dickmanns VaMoR and VaMP projects - fully autonomous vehicles, which travelled thousands of miles autonomously on public roads in 1980s."
This was 29 years ago: https://www.youtube.com/watch?v=PAMVogK2TTk - "A South Korean professor [... Han Min-hong's] vehicle drove itself 300km (186 miles) all the way from Seoul to the southern port of Busan in 1995."
This was 19 years ago: https://www.youtube.com/watch?v=7a6GrKqOxeU - "DARPA Grand Challenge - 2005 Driverless Car Competition"
As long as the Waymo can be fed with the details, we'll be good. ;)
Joking aside, I think there are some cases where moving the goalposts is the right approach: once the previous goalposts are hit, we should be pushing towards the new goalposts. Goalposts as advancement, not derision.
I suppose the intent of a message matters, but as people complain about "well it only does X now, it can't do Y" - probably true, but hey, let's get it to Y, then Z, then... who knows what. Challenge accepted, as the worn-out saying goes.
Perhaps it's worth taking a beat and looking at the incredible progress in that year, and acknowledge that whatever's next is probably "still cooking".
Even Meta is still baking their 400B parameter model.
Something that I can just use as easily as copilot. Unfortunately every single one sucks.
Or maybe that's just how programming is - its easy at the surface/ice berg level and below is just massive amounts of complexity. Then again, I'm not doing menial stuff so maybe I'm just expecting too much.
I don't want to have to copy & paste between applications, just let me highlight some sections and then run some LLM operation on it
i.e. a VS Code extension with keyboard shortcuts
That is actually a plausible outcome, if humans willingly submit to AI.
Skinner (looking up): No, mother, it's just the Nvidia GPUs.
Skinner (looking up): "No, mother, it's just the Nvidia GPUs."
"No, mother, that's just the H100s."
Why shouldn't we expect AI to be created using the same type of math?
If there is a surprise, it's only that we can use the same math at a much higher level of abstraction than the quantum level.
Gains in AI and compute can probably be be brought back to physics and chemistry to do various computations, though, and not limited to only protein folding, which is the most famous use case now.
Either everybody should get the benefits of this technology, or no one should.
Humanity only survives as much as it preserves human dignity, let's say. We've designed society to give rewards to people who produce things of value.
These companies take that value and giving nothing back to the creators.
Supporting this will lead to disaster for all but the few, and ultimately for the few themselves.
Paying for your (copyrighted) inputs is harmony.
I think it’s likely that the justice system will deem model training as fair use, provided that the models are not designed to exactly reproduce the training data as output.
I think you hit on an important point though: these models are a giant transfer of wealth from creators to consumers / users. Now anyone can acquire artist-grade art for any purpose, basically for free — that’s a huge boon for the consumer / user.
People all around the world are going to be enriched by these models. Anyone in the world will be able to have access to a tutor in their language who can teach them anything. Again, that is only possible because the models eat ALL the data.
Another important point: original artwork has been made almost completely obsolete by this technology. The deed is done, because even if you push it out 70 years, eventually all of the artwork that these models have been trained on will be public domain. So, 70 years from now (or whatever it is) the cat will be out of the bag AND free of copyright obligations, so 2-3 generations from now it will be impossible to make a living selling artwork. It’s done.
When something becomes obsolete, it’s a dead man walking. It will not survive, even if it may take a while for people to catch up. Like when the vacuum tube computer was invented, that was it for relay computers. Done. And when the transistor was invented, that was it for vacuum tube computers.
It’s just a matter of time before all of today’s data is public domain and the models just do what they do.
…but people still build relay computers for fun:
https://youtu.be/JZyFSrNyhy8?si=8MRNznoNqmAChAqr
So people will still produce artwork.
Is that really what copyright does though? I would be all for some arrangement to reward valuable contributions, but the way copyright goes about allocating that reward is by removing the right of everyone but the copyright holder to use information or share a cultural artifact. Making it illegal to, say, incorporate a bar you found inspiring into a song you make and share, or to tell and distribute stories about some characters that you connected with, is profoundly anti-human.
I now don't believe most "creative" types when they try to spout radical egalitarian ideologies. They don't mean it at all, and even my own family, who religiously watched radical techno-optimist shows like Star Trek, are now falling into the depths of ludditism and running into the arms of defending copyright trolls
Even if that were true¹, it doesn’t mean everyone would know their HN user name.
¹ Greg may be well known within a select group of people but that’s way smaller than even users of ChatGPT.
Greg Brockman is famous for good reasons but constant "oh wow it's Greg Brockman" are noisy.
For the skeptics, "AI models" are not intelligent at all so this analogy makes no sense.
You can teach lots of impressive tricks to dogs, but there is no amount of training that will teach them basic algebra.
yup, this is why an LLM only solution will not work. You need to provide extra context crafted from the language or library resources (docs, code, help, chat)
This is the same thing humans do. We go to the project resources to help know what code to write
As a developer, you would want something like this, which has access to all the languages / libraries you actually use
In short, humans are adapted for a world with a scarcity of salt, refined carbs like sugar, fat, information, sudden motion, and more. But the world most humans live in now has an abundance of those things -- and our previously-adaptive evolved inclinations to stock up on salt/sugar/fat (especially when stressed) or to pay attention to the unusual (a cause of stress) are now working against our physical and mental health in this new environment. Thanks for the reference to a potential anti-addiction substance. Definitely something that deserves more research.
My sig -- informed by the writings of people like Mumford, Einstein, Fuller, Hogan, Le Guinn, Banks, Adams, Pet, and many others -- is making the leap to how that evolutionary-mismatch theme applies to our use of all sorts of technology.
Here is a deeper exploration of that in relation to militarism (and also commercial competition to some extent): https://pdfernhout.net/recognizing-irony-is-a-key-to-transce... "There is a fundamental mismatch between 21st century reality and 20th century security thinking. Those "security" agencies are using those tools of abundance, cooperation, and sharing mainly from a mindset of scarcity, competition, and secrecy. Given the power of 21st century technology as an amplifier (including as weapons of mass destruction), a scarcity-based approach to using such technology ultimately is just making us all insecure. Such powerful technologies of abundance, designed, organized, and used from a mindset of scarcity could well ironically doom us all whether through military robots, nukes, plagues, propaganda, or whatever else... Or alternatively, as Bucky Fuller and others have suggested, we could use such technologies to build a world that is abundant and secure for all. ... The big problem is that all these new war machines and the surrounding infrastructure are created with the tools of abundance. The irony is that these tools of abundance are being wielded by people still obsessed with fighting over scarcity. So, the scarcity-based political mindset driving the military uses the technologies of abundance to create artificial scarcity. That is a tremendously deep irony that remains so far unappreciated by the mainstream."
Conversely, reflecting on this more just now, are we are perhaps evolutionarily adapted to take for granted some things like social connections, being in natural green spaces, getting sunlight, getting enough sleep, or getting physical exercise? These are all things that are in increasingly short supply in the modern world for many people -- but which there may never have been much evolutionary pressure previously to seek out, since they were previously always available.
For example, in the past humans were pretty much always in face-to-face interactions with others of their tribe, so there was no big need to seek that out especially if it meant ignoring the next then-rare new shiny thing. Johann Hari and others write about this loss of regular human face-to-face connection as a major cause of depression.
Stephen Ilardi expands on that in his work, which brings together many of these themes and tries to help people address them to move to better health.
From: https://tlc.ku.edu/ "We were never designed for the sedentary, indoor, sleep-deprived, socially-isolated, fast-food-laden, frenetic pace of modern life. (Stephen Ilardi, PhD)"
GPT-4o, by apparently providing "her" movie-like engaging interactions with an AI avatar that seeks to please the user (while possibly exploiting them) is yet another example of our evolutionary tendencies potentially being used to our detriment. And when our social lives are filled-to-overflowing with "junk" social relationships with AIs, will most people have the inclinations to seek out other real humans if it involves doing perhaps increasingly-uncomfortable-from-disuse actions (like leaving the home or putting down the smartphone)? Not quite the same, but consider: https://en.wikipedia.org/wiki/Hikikomori
Related points by others:
"AI and Trust" https://www.schneier.com/blog/archives/2023/12/ai-and-trust.... "In this talk, I am going to make several arguments. One, that there are two different kinds of trust—interpersonal trust and social trust—and that we regularly confuse them. Two, that the confusion will increase with artificial intelligence. We will make a fundamental category error. We will think of AIs as friends when they’re really just services. Three, that the corporations controlling AI systems will take advantage of our confusion to take advantage of us. They will not be trustworthy. And four, that it is the role of government to create trust in society. And therefore, it is their role to create an environment for trustworthy AI. And that means regulation. Not regulating AI, but regulating the organizations that control and use AI."
"The Expanding Dark Forest and Generative AI - Maggie Appleton" https://youtu.be/VXkDaDDJjoA?t=2098 (in the section on the lack of human relationship potential when interacting with generated content)
When it comes to multi-modality, then training data is not limited, because of many different combinations of language, images, video, sound etc. Microsoft did some research on that, teaching spacial recognition to an LLM using synthetic images, with good results. [1]
When someone states that there are not enough training data, they usually mean code, mathematics, physics, logical reasoning etc. In the open internet right now, there are is not enough code to make a model 10x better, 100x better and so on.
Synthetic data will be produced of course, scarcity of data is the least worrying scarcity of all.
Edit: citation added,
[1] VoT by MS https://medium.com/@multiplatform.ai/microsoft-researchers-p...
If the other attributes of the model do not improve, sure.
The issue is with ChatGPT formatting a file.
Watching the demos there were quite a few times where I thought “no way, that’s incredible.”
Secondly, it must be sad living with such a lack of wonder. Is that how you judge everything?
We discovered the higgs boson. Eh, it won’t change how we live.
We just launched a new rocket. Eh, it won’t change how we live.
That wasn't even my impression.
My impression was that it reminds me of the humans that I dislike.
It speaks in customer service voice. That faux friendly tone people use when they're trying to sell you something.
Mmmmm while I get that, in the context w/ the grandparent comment, having a human wouldn't be better then? It's effectively the same. Because, realistically that's a pretty common voice/tone to get even in tech support.
There are different kinds of humans.
Some of them are your friends, and they're willing to take risks for you and they take your side even when it costs them something.
Some of them are your adversaries, overtly. They do not hide it.
Some of them pretend to be your friends, even though they're not. And that's what they modeled it on. For some reason.
Really? I found this demo painful to watch and literally felt that "cringe" feeling. I showed it to my partner and she couldn't even stand to hear more than a sentence of the conversation before walking away.
It felt both staged and still frustrating to listen to.
And, like far too much in AI right now, a demo that will likely not pan out in practice.
Especially when you consider the bottom line that this tech will be ultimately be horned into advertising somehow (read: the field dedicated to manipulating you into buying shit).
This whole fucking thing bothers me.
This is partly right.
Agree. Can't wait to see how it'll be...
Job seekers currently in college have no idea what is about to hit them in 3-5 years.
In any other industry where just need an average margin of error close to a human's work and verification is much easier than generating possible outputs, the market will change drastically.
For me, this is seriously impressive, and I already use LLMs everyday - but a serious "Now we're talkin" moment would be when I'd be able to stand outside of Lowes, and talk to my glasses/earbuds "Hey, I'm in front of lowes, where do I get my air filters from?"
and it tells me if it's in stock, aisle and bay number. (If you can't tell, I am tired from fiddling with apps lol)
I don't think we have nearly as much visibility on what Waymo seems to believe about this tech as you seem to imply, nor do I think that their beliefs are necessarily authoritative. You seem disheartened that we haven't been able to solve self-driving in a couple of decades, and I'm of the opinion that geez, we basically have self-driving now and we started trying only a couple of decades ago.
How long after the invention of the transistor did we get personal computers? Maybe you just have unrealistic expectations of technological progress.
Another piece of evidence also comes from last year when Google scaled back Waymo with layoffs as well as "pausing" its efforts at developing self driving truck technology. [1] That technology would require something closer to L5 autonomy, because again - massive preprocessing is quite brittle and doesn't scale well at all. Other companies that were heavily investing in self-driving tech have done similarly. For instance Uber sold off its entire self-driving division in 2021. I'm certainly happy to hear any sort of counter-argument, but you need some logic instead of ironically being the one trying to mindread me or Waymo!
[1] - https://www.theverge.com/2023/7/26/23809237/waymo-via-autono...
My guess is he wants OpenAI to become a household name, and so he optimizes for exposure.
When one gets to be a certain age one begins to become attuned to this tendency of others' emotions to manipulate you, so you take steps to not let that happen. You're not ignoring their emotions, but you can address the underlying issue more effectively if you're not emotionally charged. It's a useful skill that more people would benefit from learning earlier in life. Perhaps AI will accelerate that particular skill development, which would be a net benefit to society.
This is incredibly optimistic, which I love, but my own experience with my utterly deranged elder family, made insane by TV, contradicts this. Every day they're furious about some new things fox news has decided it's time to be angry about: white people being replaced (thanks for introducing them to that, tucker!), "stolen" elections, Mexicans, Muslims, the gays, teaching kids about slavery, the trans, you name it.
I know nobody else in my life more emotionally manipulated on a day to day basis than them. I imagine I can't be alone in watching this happen to my family.
Most manipulators end up getting bored of trying again and again with the same person. That won't happen if you are a dealing with a machine, as it can change names, techniques, contexts, tones, etc. until you give it what its operator wants.
Maybe you're part of the X% who will never give in to a machine. But keep in mind that most people have no critical thinking skills nor mental fortitude.
Tools and the weaponization of them.
This can be said of pretty much any tech tool that has the ability to touch a good portion of the population, including programming languages themselves, CRISPR?
I agree we have to be careful of the bad, but the downsides in this case are not so dangerous that we should be trying to suppress it because the benefits can be incredible too.
Start telling Grandma never to answer the phone.
They won't "take 1 hour of time", they try it once or twice and give up.
Planning is different in that it is an essential part of agency. That's what Q* is supposed to add. My guess is that planning is the next type of functionality to be added to GPT. I wouldn't be surprised if they already have a version internally with such functionality, but that they've decided to hold it back for now for reasons such as safety (some may care about the election this year) or simply that the inference costs are so huge they cannot possibly expose it publicly.
"Intern of all trades, senior of none", to modernise the cliché.
Really philosophy seems to be one of the least important subjects right now. Hardly anyone learns about it in school.
If it was so important to success in the wild than it would stand to reason we all work hard at improving our reasoning skills, but very few do.
These did not provide useful life-lessons for me.
(The philosophy A-level I did voluntarily seemed to be 50% "can you find the flaws in this supposed proof of the existence of god?")
Shakespeare is packed with insight.
I would guess that most companies will not want to provide APIs that an agent could use to make that kind of query. So, the agent is going to have to use the app just like you would, which looks like it will definitely become possible, but again, Lowes wants the human to see the ads. So they're going to try to break the automation.
It's going to take customers demanding (w/$) this kind of functionality and it will probably still take a long time as the companies will probably do whatever they can to maintain (or extend) control.
Google has always made it hard to avoid clicking the “ad” immediately above the organic result for a highly specific named entity, but where it’s really struck me is as Amazon has started extracting “sponsorship” payments from its merchants. The “sponsored” product matching my search is immediately above the unpaid organic result, identical in appearance.
That kind of convergence suggests to me that the Lowe’s of the world don’t need to “show the ad” in the conventional sense, they just need to reduce the friction of the sale—and they stand to gain more from my trust and loyalty over time than from a one-off upsell.
I’m reminded of Autozone figuring out, on their dusty old text consoles, how to just ask me my make/model/year, and how much value and patronage that added relative to my local mom-n-pop parts store since I just knew all the parts were going to be right.
As for data, I can name several major retailers who expose the stock/aisle number via a public api. That information is highly available and involved in big dollar tasks like inventory management.
(But yeah, I guess they will want it, and break any reasonable utility from their stores on the process. That's what everybody does today, I'm not holding my breath for management to grow some competence out of nowhere in the future.)
- People want agency.
- Once people have comfort and trust that it does things right enough of the time, people no longer want agency.
That threshold varies task-by-task and person-by-person.
Automation tech frees up time but takes away agency and opportunity in exchange.
Empowerment tech creates opportunity and increases agency, but it needs you to have time and resources, and these costs can easily increase existing gaps between social classes.
"Computer, buy some stock"
*** buys 100 lots of tesla without a prompta punk inversion of this would have been naming one's band "various artists" in the early years of MP3 players.
In fact I'm fairly certain I've heard of a group actually doing this, but somehow I can't place the name.
This is called an "employee" and all you need to do is pay them. If you don't want to do that, then I have to wonder: Is what you want slavery?
Either way I hope you get what you want, it rarely ends well for the slave master regardless of how reductive they get to justify their actions.
----
The phrase "*laughter* Yeah, I'm sure that's right" appears to be expressing sarcasm or skepticism about whatever was previously said or suggested. Here's a breakdown of its likely meaning:
"*laughter*" - This typically indicates the speaker is laughing, which can signal amusement, but in this context suggests they find whatever was said humorous in an ironic or disbelieving way.
"Yeah," - This interjection sets up the sarcastic tone. It can mean "yes" literally, but here seems to be used facetiously.
"I'm sure that's right." - This statement directly contradicts and casts doubt on whatever was previously stated. The sarcastic laughter coupled with "I'm sure that's right" implies the speaker believes the opposite of what was said is actually true.
So in summary, by laughing and then sarcastically saying "Yeah, I'm sure that's right," the speaker is expressing skepticism, disbelief or finding humor in whatever claim or suggestion was previously made. It's a sarcastic way of implying "I highly doubt that's accurate or true."
----
You can't exactly metadata your way out of this (skip to 11:50)
https://www.youtube.com/live/DQacCB9tDaw?si=yN7al6N3C7vCemhL
But if the system can create a flamboyantly homosexual Captain Picard with a lisp and slight stutter engaging in overt innuendo when stating, "Number one, Engage!" then I look forward to eating crow! But as the instructions were all conspicuously just "swap to pretrained voice [x,y,z]", I suspect crow will not be on the menu any time soon.
kids these days and their lack of exposure to finite automata
- How so? I don't think it's possible to test for all cases...
- Well, it's easy, assuming a car on a non-branching track, moving with a constant speed and without any realistic external influences on it, you can simply calculate the distance traveled using the formula s = v/t. Ah, I wish I'd stop running into fools not knowing Newton's first law of motion...
- ??? Are you well?
"This conversation may be recorded and used for training purposes" now takes on a new meaning.
Can car makers sell info from everything that happens in their cars?
However, I think the most cost-effective way to train for real world is to train in a simulated physical world first. I would assume that Boston Dynamics does exactly that, and I would expect integrated vision-action-language models to first be trained that way too.
You take a bunch of mo-cap data and simulate it with your robot body. Then as much testing as you can with the robot and feed the behavior back in to the model for fine tuning.
Unitree gives an example of the simulation versus what the robot can do in their latest video
All you have to do is survive long enough as an unemployed criminal until the system gets round to exonerating you:
https://en.wikipedia.org/wiki/British_Post_Office_scandal
"The British Post Office scandal, also called the Horizon IT scandal, involved Post Office Limited pursuing thousands of innocent subpostmasters for shortfalls in their accounts, which had in fact been caused by faults in Horizon, accounting software developed and maintained by Fujitsu. Between 1999 and 2015, more than 900 subpostmasters were convicted of theft, fraud and false accounting based on faulty Horizon data, with about 700 of these prosecutions carried out by the Post Office. Other subpostmasters were prosecuted but not convicted, forced to cover Horizon shortfalls with their own money, or had their contracts terminated. The court cases, criminal convictions, imprisonments, loss of livelihoods and homes, debts and bankruptcies, took a heavy toll on the victims and their families, leading to stress, illness, family breakdown, and at least four suicides. In 2024, Prime Minister Rishi Sunak described the scandal as one of the greatest miscarriages of justice in British history.
Although many subpostmasters had reported problems with the new software, and Fujitsu was aware that Horizon contained software bugs as early as 1999, the Post Office insisted that Horizon was robust and failed to disclose knowledge of the faults in the system during criminal and civil cases.
[...]
challenge their convictions in the courts and, in 2020, led to the government establishing an independent inquiry into the scandal. This was upgraded into a statutory public inquiry the following year. As of May 2024, the public inquiry is ongoing and the Metropolitan Police are investigating executives from the Post Office and its software provider, Fujitsu.
Courts began to quash convictions from December 2020. By February 2024, 100 of the subpostmasters' convictions had been overturned. Those wrongfully convicted became eligible for compensation, as did more than 2,750 subpostmasters who had been affected by the scandal but had not been convicted."
The problem is you don't like the customer service/sales voice because they "pretend to be your friends".
Let me know if I didn't capture it.
I don't think people "pretend to be my friend" when they answer the phone to help me sort out of airline ticket problem. I do believe they're trained to and work to take on a "friendly" tone. Even if the motive isn't genuine, because it's trained, it's way a nicer of an experience than someone who's angry or even simply monotone. Trying to fix my $1200 plane ticket is stressful enough. Don't need the CSR to make it worse.
Of course I agree that "angry" is in most cases not appropriate, but still, I can see cases in which it might, for example, if the caller is really aggressive, curses, or blames unreasonably the agent, the agent could become angry. Training people that everybody will answer them "friendly" no matter their behavior does not sound good for me.
We’ll see if they have another flagship model ready to launch. I seriously doubt it. I suspect that this was supposed to be called GPT-5, or at the very least GPT-4.5, but they can’t meet expectations so they can’t use those names.
If I took out a loan on the value of goods being shipped to me, only for my ship to be lost at sea… it would be covered by insurance, and no bank would even consider acting like Shylock (nor have the motivation of being constantly tormented over religion) for such weird collateral, and the bank manager's daughters wouldn't get away with dressing up as lawyers (no chance their arguments would pass the sniff test today given the bar requirement) to argue against their dad… and they wouldn't need to because the collateral would be legally void anyway and rejected by any court.
The ships would also not then suddenly make a final act appearance to apologise for being late, to contradict the previous belief they were lost at sea, because we have radio now.
The closest to "relevant" that I would accept, is the extent to which some of the plots can be remade into e.g. The Lion King or Wyrd Sisters — but even then…
"Methinks, to employeth antiquated tongues doth render naught but confusion, tis less even than naughty, for such conceits doth veil true import with shadows."
Companies can inject ads into their own LLMs, sure. But ChatGPT is somebody else's LLM.
Your point about retailers exposing stock/aisle number via a public API surprises me. What do you mean by public? What's the EULA look like? Exposing stock/aisle number via API for the purpose of inventory management is not a use case that would require making API access public.
It definitely took out a lot of jobs from the lowest rungs of the market, but on the more specialized / upper end of the ladder wages got actually higher and a lot of companies got burned, and now they have to readjust. It's rolling over slowly still, as there a lot of companies selling AI products and in turn new companies adopting those products. But it tells you a lot that
A) a company selling an AI assistant last year is now totally tied to automating busy work tasks around marketing and sales
B) AI writing companies are some of the busiest in employing human talent for... writing and editorial roles!
It's all very peculiar. I haven't seen anything like this in the past 15 years... maybe the financial crisis and big data was similar, but much much smaller at scale.
We were naive to think we could have nice things for free.
But OpenAI is having a hard time retaining/increasing ChatGPT users. Also, Alphabet's stock is about as valuable as it's ever been. So I don't think we have evidence that this is really challenging Google's search dominance.
But ChatGPT has really hurt Google's brand image.
The questions I still ask Google, have a lot of monetary value (restaurants, cloths, movie, etc).
But I agree seems SO often helps more than Google-AI.
Most NNs work by something akin to "(multi-)linear vector transformation, followed by elementwise nonlinear transformation", stacked over and over so that the output of one layer becomes the input of the next. This applies equally well to simple models like "fully-connected" / "feed-forward" networks (aka "multi-layer perceptron") and to more-sophisticated models like transformers (e.g. https://github.com/karpathy/nanoGPT/blob/325be85d9be8c81b436...).
It's less about combining lots of tiny local linear transformations piecewise, and more about layering linear and non-linear transformations on top of each other.
I don't really know how physics works beyond whatever Newtonian mechanics I learned in high school. But unless the underlying math is similar, then I'm hesitant to run too far with the analogy.
I hope the next version delivers on being smarter, as this update instead of making me excited, makes me feel they’ve reached a plateau on the improvement of the core value and are distracting us with fluff instead
I predict there will be a zoo (more precisely tree, as in "family tree") of models and derived models for particular application purposes, and there will be continued development of enhanced "universal"/foundational models as well. Some will focus on minimizing memory, others on minimizing pre-training or fine-tuning energy consumption, some need high accuracy, others hard realtime speed, yet others multimodality like GPT4.o, some multilinguality, and so on.
Previous language models that encoded dictionaries for spellcheckers etc. never got standardized (for instance, compare aspell dictionaries to the ones from LibreOffice to the language model inside CMU PocketSphinx) so that you could use them across applications or operating systems. As these models are becoming more common, it would be interesting to see this aspect improve this time around.
https://www.rev.com/blog/resources/the-5-best-open-source-sp...
But for conversation, you can’t speak Mandarin without using proper tones because you simply won’t be understood.
The first (and imo the main) hurdle is not reproduction, but just learning to hear the correct sounds. If you don't speak Hindi and are a native English speaker, this [1] is a good example. You can only work on nailing those consonants when they become as distinct to your ear as cUp and cAp are in English.
We can get by by falling back to context (it's unlikely someone would ask for a "shit of paper"!), but it's impossible to confidently reproduce the sounds unless they are already completely distinct in our heads/ears.
That's because we think we hear things as they are, but it's an illusion. Cup/cap distinction is as subtle to an Eastern European as Hindi consonants or Mandarin tones are to English speakers, because the set of meaningful sounds distinctions differs between languages. Relearning the phonetic system requires dedicated work (minimal pairs is one option) and learning enough phonetics to have the vocabulary to discuss sounds as they are. It's not enough to just give feedback.
Most modern physics, including Quantum Mechanics (QM) and General Relativity (GR) is represented primarily through "tensor fields" on a type of topological spaces called "manifolds". Tensor fields are like vector fields, just with tensors instead of vectors.
These tensor fields are then constrained by the laws of physics. At the core, these laws are really not so much "forces" as they're symmetries. The most obvious symmetries is that if you rotate or move all objects within a space, the physics should be unaltered. Now if you also insist that the speed of light should be identical in all frames of reference, you basically get Special Relativity (SR) from that.
The forces of electromagnetism, weak and strong force follow from invariance under the combined U(1) x SU(2) x SU(3) symmetries. (Gravity is not considered a real force in General Relativity (GR), but rather as interaction between spacetime and matter/energy, and what we observe as Gravity is similar to time dilation of SR, but with curved space)
Ok. This may be abstract if you're not familiar with it, and even more if you're not familiar with Group Theory. But it will be referenced further down.
"Manifolds" are a subset of topological spaces that are Euclidian or "flat" locally. This flatness is important, because it's basically (if I understand it correctly myself) the reason why we can use linear algebra for local effects.
I will not go into GR here, since that's what I know least well, but instead focus on QM which describes the other 3 forces.
In QM, there is the concept of the "Wave Function" which is distributed over space-time. This wave-function is really a tensor with components that give rise to observable fields, such as magnetism, the electric field and to the weak and strong forces. (The tensor is not the observed fields directly, but a combination of a generalization of the fields and also analogues to electric charge, etc.)
So how physics calculations tends to be done, is that one starts with assuming something like an initial state, and then impose the symmetries that correspond to the forces. For instance, two electrons wavefunctions may travel towards the same point from different directions.
The symmetries will then dictate what the wave function looks like at at each later incremental point in time. Computationally, such increments are calculated for each point in space using tensor multiplication.
While this is "local" in space, points in space immediately next to the point we're calculating for need to be include, kind of like for convolutional nets.
Basically, though, it's in essence a tensor multiply for each point in space to propagate the wave function from one point in time to the immediate next point.
Eventually, once the particles have (or have not) hit each other, the wave functions of each will scatter in all directions. The probability for it to go in any specific direction is proportional to the wave function amplitude in that direction, squared.
Since doing this tensor multiplication for every point in space requires infinite compute, a lot of tricks are used to reduce the computation. And this where a lot of our intuitions about "particles" show up. For simple examples, one can even do very good approximations using calculus. But fundamentally, tensor multiplication is the core of Quantum Mechanics.
This approach isn't unique to QM, though. A lot of other Physics is similar. For instance, solid state physics, lasers or a lot of classical mechanics can be described in similar frameworks, also using tensors and symmetry groups. (My intuition is that this still is related to Physics involving local effects on "locally flat" Manifolds)
And this translates all the way up to how one would do the kind of simulations of aspects of physical worlds that happen in computer games inside GPU's, including the graphics parts.
And here I believe you may see how the circle is starting to close. Simulations and predictions of physical systems at many different levels of scale and abstraction tend to reduce to tensor multiplication of various sorts. While the classical physics one learns in high school tend to have problems solvable with calculus, even those are usually just solutions to problems that are fundamentally linear algebra locally.
While game developers or ML researches initially didn't use the same kind of Group Theory machinery that Physics have adapted, at least the ML side seem to be going in that direction, based on texts such as:
https://maurice-weiler.gitlab.io/cnn_book/EquivariantAndCoor...
(There appears to be a lot of similar findings over the last 5-6 years or so, that I wasn't fully aware of).
In the book above, the methodology used is basically identical to how theoretical physics approach similar problems, at least for networks that describe physical reality (which CNNs tends to be good for)
And here is my own (current) hypothesis why this also seems to be extendable to things like LMM, that do not at face value appear like physics problems:
If we assume that the human brain evolved the ability to navigate the physical world BEFORE it developed language (should be quite obvious), it should follow that the type of compute fabric in the brain should start out as optimized for the former. In practice, that means that at the core, the neural network architecture of the brain should be good at doing operations similar to tensor products (or approximations of such).
And if we assume that this is true, it shouldn't be surprising that when we started to develop languages, those languages would take on a form that were suitable to be processed in compute fabric similar to what was already there. To a lesser extent, this could even be partially used to explain why such networks can also produce symbolic math and even computer code.
Now what the brain does NOT seem to be evolved to do, is what traditional Turing Machine computers are best at, namely do a lot very precise procedural calculations. That part is very hard for humans to learn to do well.
So in other words, the fact that physical systems seem to involve tensor products (without requiring accuracy) may be the explanation to why Neural Networks seem to have a large overlap with the human brain in terms of strengths and weaknesses.
Tensors in Physics are also subject to various "gauge" symmetries. That means that physical outcomes should not change if you rotate them in various ways. The most obvious is that you should be able to rotate or translate the space representation without changing the physics. (This leads to things like energy/momentum conservation).
The fundamental forces are consequences of some more abstract (at the surface) symmetries (U(1) x SU(2) x SU(3)). These are just constrains on the tensors, though. Maybe these constraints can be in the same family as backprop, though I don't know how far that analogy goes.
In terms of representation, the spacetime part of Physics Tensors is also treated as continous. Meaning that when, after doing all the matrix multiplication, you come to some aggregation step of calculations, you aggregate by integrating instead of summing over spacetime (you sum over the discrete dimensions). Obviously though, for when doing the computation in a computer, even integration reduces to summing if you don't have an exact solution.
In other words, it seems to me that what I originally replied to, namely the marvel about how much of ML is just linear algebra / matrix multiplication IS relatively analogous to how brute force numerical calculations over quantum fields would be done. (Theoretical Physicists generally want analytic solutions, though, so generally look for integrals that are analytically solvable).
Both domains have steps that are not just matrix multiplication. Specifically, Physics tend to need a sum/integral when there is an interaction or the wave function collapses (which may be the same thing). Though even sums can be expressed as dot products, I suppose.
As mentioned, Physics will try to solve a lot of the steps in calculations analytically. Often this involves decomposing integrals that cannot be solved into a sum of integrals where the lowest order ones are solvable and also tend to carry most of the probability density. This is called perturbation theory and is what gives rise to Feynmann diagrams.
One might say that for instance a convolution layer is a similar mechanic. While fully connected nets of similar depth MIGHT theoretically be able to find patterns that convolutions couldn't, they would require an impossibly large amount of compute to do so, and also make regularization harder.
Anyway, this may be a bit hand-wavy from someone who is a novice at both quantum field theory and neural nets. I'm sure there are others out there that know both fields much better than me.
Btw, while writing this, I found the following link that seems to take the analogy between quantum field theory and CNN nets quite far (I haven't had time to read it)
https://maurice-weiler.gitlab.io/cnn_book/EquivariantAndCoor...
That includes how it uses Group Theory (especially Lie Algebra) to describe symmetries, and to use that to explain why convolutional networks work as well as they do for problems like vision.
The notation (down to what latin and greek letters are used) makes it obvious that this was taken directly from Quantum Mechanics.
Lack of wonder? No, I think it's very cool. But you have to differentiate what is going to fundamentally change our lives and the world and something that isn't going to. GPT/LLM/AI will fundamentally change my life over time, the features shown today, 70% of them won't. They will replace existing products and make things more streamlined, but not going to really going to shift the world.
It seems like you're overgeneralizing to the point of missing what is innovative here. And I do think making AI realtime and work well at it, is innovative and will change our lives.
Open AI have explicitly made this clear. You are wrong. There's nothing else left to say here.
I think people who emphasis specialized models are operating under a false assumption that by focusing the model it'll be able to go deeper in that domain. However, the opposite seems to be true.
Granted, specialized models like AlphaFold are superior in their domain but I think that'll be less true as models become more capable at general learning.
For the average Joe programmer like me, GPT4 is already "dirt cheap". My typical monthly bill is $0-3 using it as much as I like.
The one time it was high was when I had it take 90+ hours of Youtube video transcripts, and had it summarize each video according to the format I wanted. It produced about 250 pages of output.
That month I paid $12-13. Well worth it, given the quality of the output. And now it'll be less than $7.
For the average Joe, it's not expensive. Fast food is.
OpenAI seems to build in cycles. First they focus on capabilities, then they work on driving the price down (occasionally at some quality degradation)
That said, given the price tag, when AI becomes genuinely expert then I'm probably not going to have a job and neither will anyone else (modulo how much electrical power those humanoid robots need, as the global electricity supply is currently only 250 W/capita).
In the meantime, making it a properly real-time conversational partner… wow. Also, that's kinda what you need for real-time translation, because: «be this, that different languages the word order totally alter and important words at entirely different places in the sentence put», and real-time "translation" (even when done by a human) therefore requires having a good idea what the speaker was going to say before they get there, and being able to back-track when (as is inevitable) the anticipated topic was actually something completely different and so the "translation" wasn't.
A real time translator would be a killer app indeed, and it seems not so far away, but note how you have to prompt the interaction with ‘Hey ChatGPT’; it does not interject on its own. It is also unclear if it is able to understand if multiple people are speaking and who’s who. I guess we’ll see soon enough :)
Indeed; I would be pleasantly surprised if it can both notice and separate multiple speakers, but only a bit surprised.
GPT-4o got slightly better overall. Ability to reason improved more than the rest.
interestingly, i think this isn't always true -- i was able to coach my native-spanish-speaking wife to correctly pronounce "v" vs "b" (both are just "b" in spanish, or at least her dialect) before she could hear the difference; later on she was developed the ability to hear it.
Even styles of thought might be different in other languages, so I don't say that lightly... (stay strong, Sapir-Wharf, stay strong ;)
The Italian sounded good to me.
Humans process photons, but only after converting them into nerve impulses via photoreceptor cells in the retina, which are sensitive to wavelengths ranges described as "red", "green" or "blue".
As a result, humans are not very good at distinguishing different spectra that happen to result in the same nerve impulses. That information was lost by the conversion from photons to nerve impulses. Sensors like the AS7341 that have more than 3 color channels are much better at this task.
I am a strong proponent of LLM's, but I just don't agree with the personification and trust we put into its responses.
Everyone in this thread is defending that ChatGPT can't count for _reasons_ and how its okay, but... how can we trust this? Is this the sane world we live in?
"The AGI can't count letters in a sentence, but any day not he singularity will happen, the AI will escape and take over the world."
I do like to use it for opinion related questions. I have a specific taste in movies and TV shows and by just listing what I like and going back and forth about my reasons for liking or not liking it's suggestions, I've been able to find a lot of gems I would have never heard of before.
Except that "http" should be "https". Silly humans, claiming to be intelligent when they can't even tokenize strings correctly.
I suspect we might find that adding in the multimodal visual and audio aspects to the model gives these models a much better basis for mental arithmetic and counting.
Most LLMs can just pass the string to an tool to count it to bypass it's built in limitations.
Does a blind man not understand quantity because you asked him how many apples are in front of him and he failed ?
I don't think that test determines his understanding of quantity at all, he has other senses like touch to determine the correct answer. He doesn't make up a number and then give justification.
GPT was presented with everything it needed to answer the question.
>he has other senses like touch to determine the correct answer
And? In my hypothetical, you're not allowing him to use touch.
>I don't think that test determines his understanding of quantity at all
Obviously
>GPT was presented with everything it needed to answer the question.
No, it was not.
I know this won't happen, of course, I am moreso hoping for laws to be updated to avoid similar kerfuffles in the future, as well as massive fines to act as a deterrent, but I don't dare to hope too much.
> have all of OpenAI's data for free
Doesn't really fit. Perhaps OpenAI might successfully prevent us from accessing it, but it wouldn't be "theirs" and we couldn't "have" it.
I'm not sure what kind of conversations we will be having instead, but I expect they'll be more productive than worrying about ownership of something you can't touch.
Is that understanding correct?
Anyways, the laws are mature enough for everyone to work this out in court. Maybe it comes out that they have a legitimate concern, but the way they presented their evidence so far in public has seriously been lacking.
Rather, the actual culprit is almost certainly overfitting. The articles in question were pasted many times on different websites, showing up in the training data repeatedly. Enough of this leads to memorization.
Effectively all mechanization, computerization, and I guess now AI-ization has done this. In the past you could have a rudimentary education and contribute to society. Then we started requiring more and more grade school, then higher education for years. Now we're talking about the student debt crisis!
At least if AI doesn't go ASI in the near term the question is how are we going to train the next generation of workers to go from unskilled to more skilled and useful than the AI is. Companies aren't going to want to do this. The individuals are going to think it's risky getting an education that could be replaced by a software update. If left to go out of control this is how a new generation of luddites will burn data centers in protest they are starving on the streets.
I believe finite state modeling is used at NASA, A google search brings up a few references (that I'm probably not qualified to speak to), and I also remember hearing/reading a lecture on how they use them to make completely verifiable programs but can't find the exact one at the moment.
But not all regexes (eg, those using PCRE extensions afaik) are amenable to such a treatment. Those you just tend to hope they work.
That happens with a lot of tech. Social networks are used to con people more than to educate people about con men.
You can be perfectly understood if you don't speak broadcast Chinese. There are plenty of heavy accents to deal with anyways. Like Beijing 儿化 or the inability of southerners to pronounce sh very differently from s.
I haven't exactly drafted this alternative set of laws, but I expect it would look something like this:
If the data is derived from sources that were made available to the public with the consent of its referents (and subject to whatever other regulation), then walling it off would be illegal. On the other hand, other regulation regarding users' behavior world be illegal to share without the users consent and might even be illegal to retain without their consent.
If you want to profit from something derived from public data while keeping it private, perhaps that's ok but you have to register its existence and pay taxes on it as a data asset, much like we pay taxes on land. That way we can wield the tax code to encourage companies that operate in the clear. This category would probably resemble patent law quite a bit, except ownership doesn't come by default, you have to buy your property rights from the public (since by owning that thing, you're depriving the masses of access to it, and since the notion that it is a peg that fits in a property shaped hole is a fiction that requires some work on our part to maintain).
And note that understanding broken language is a skill you have to train. If you aren't used to it then it is impossible to understand what they say. You might not have been in that situation if you are an English speaker since you are so used to broken English, but it happens a lot for others.
The deaf example now is like asking GPT "What am I pointing at?"
Imagine a legally blind person, they can barely see anything; just general shapes flowing into one another. In front of them is a table onto which you place a number of objects. The objects are close together and small enough such that they merge into one blurred shape for our test person.
Now when you ask the person how many objects are on the table, they won't be able to tell you! But why would that be? After all, all the information is available to them! The photons emitted from the objects hit the retina of the person, the person has a visual interface and they were given all the visual information they need!
Information lies within differentiation, and if the granularity you require is higher than the granularity of your interface, then it won't matter whether or not the information is technically present; you won't be able to access it.
Also the blind person wouldn't confidently answer. A simple "the objects blur together" would be a good answer. I had ChatGPT telling me 5 different answers back to back above.
Your argument is the same as Facebook saying “we can’t provide this service without invading your privacy” or another company saying “we can’t make this product without using cancerous materials”.
Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.
People who try to prevent models from training on all available data will simply lose to people who don’t, and eventually the maximally-trained models will proliferate. There’s no stopping it.
Assume a world where models proliferate that are trained on all publicly-accessible data. Whatever those models can do for free, humans will have a hard time charging money for.
That’s the sea change. Whoever happens to make money through that sea change is a sub-plot of the sea change, not the cause of it.
If you want to make money in this new environment, you basically have to produce or do things that models cannot. That’s the sink or swim line.
If most people start drowning then governments will be forced to tax whoever isn’t drowning and implement UBI.
Remember the 4 hour work week ? Maybe we are almost there ?
Let’s face it, most people in a developed country have more free time than they know what to do with, mostly spent in HN and social median ofc :)
It used to be that property rights extended all the way to the sky. This understanding was updated with the advent of the airplane. Would a world where airlines need to negotiate with every land-owner their planes fly above be better than ours? Would commercial flight even be possible in such a world? Also, who is greediest in this scenario, the airline hoping to make a profit, or the land-owners hoping to make a profit?
Tough luck, then. You don’t have the right to shit on and harm everyone else just because you’re a greedy asshole who wants all the money and is unwilling to come up with solutions to problems caused by your business model.
Once the IP is on the internet, you can't complain about a human or a machine learning from it. You made your IP available on the internet. Now, you can't stop humanity benefiting from it.
https://www.legaldive.com/news/Chabon-OpenAI-class-action-co...
Those authors didn’t “make their IP available on the internet”, did they?
There's zero doubt that people will still create art. Almost no one will be paid to do it though (relative to our current situation where there are already far more unpaid artists than paid ones). We'll lose an immeasurable amount of amazing new art that "would have been" as a result, and in its place we'll get increasingly bland/derivative AI generated content.
Much of the art humans will create entirely for free in whatever spare time they can manage after their regular "for pay" work will be training data for future AI, but it will be extremely hard for humans to find as it will be drowned out by the endless stream of AI generated art that will also be the bulk of what AI finds and learns from.
What nonsense. Just because doing the right thing is hard, or inconvenient doesn't mean you get to just ignore it. The only way I'd be ok with this is if literally the entire human population were equal shareholders. I suspect you wouldn't be ok with that little bit of communism.
You can bluster and scream and shout "Nonsense" all you want, but that's how it's going to be. Copyright is finished. When good models are illegal or unaffordable, only outlaws -- meaning hostile state-level actors with no allegiance to copyright law -- will have good models.
We might as well start thinking about how the new order is going to unfold, and how it can be shaped to improve all of our lives in the long run.
To your last point, I think the best case scenario is open source/weight models win so nobody owns them.
However the issue is that it will be much harder to make a career in the digital world from an artistic gift and personal style: one's style will not be unique for long as AI will quickly copy it and so make the original much less valuable.
When it's harder to make a career in the digital world (where most of the art is), it's more likely that many artists will never get the opportunity to fully develop their artistic gifts and personal style at all.
If artists are lucky then maybe in a few generations with fewer new creative works being created, AI almost entirely training on AI generated art will mean that the output will only get more generic and simplistic over time. Perhaps some people will eventually pay humans again for art that's better quality and different.
Note that the fashion industry doesn't have copyrights, and runway fashions get copied very quickly. Fashion designers still exist in such a world.
We could also do patronage. Thats how musicians used to be funded. Even today we have grants from public/private institutions.
We could also drift back into "owning the physical media" We see this somewhat with the resurgence of records.
NFTs would have been another way, but at least initially, it failed to become generally accepted into the popular conscious.
I already know an application for this, and AFAIK it's being explored in the SaaS space: guided learning experiences and tutoring for individuals.
My kids, for instance, love to hammer Alexa with random questions. They would spend a huge amount of time using a better interface, esp. with quick feedback, that provided even deeper insight and responses to them.
Taking this and tuning it to specific audiences would make it a great tool for learning.
Great, using GPT-4 the kids will be getting a lot of hallucinated facts returned to them. There are good use cases for tranformer currently but they're not at the "impact company earnings or country GDP" stage currently, which is the promise that the whole industry has raised/spent 100+B dollars on. Facebook alone is spending 40B on AI. I believe in the AI future, but the only thing that matters for now is that the models improve.
It seems like the people who are ohhing and ahhing at the former and the people who are frustrated that this kind of this is unbelivably impractical to productize will be doomed to talk past one another forever. The text generation models, image generation models, speech-to-text and text-to-speech have reached impressive product stages. Multi-model hasn't got there because no one is really sure what to actually do with the thing outside of make cool demos.
What's strange to me is that, as far as I know, "plátano" is only commonly used in Spain, but the accent of the AI voice didn't sound like it's from Spain. It sounds more like an American who speaks Spanish as a second language, and those folks typically speak some Mexican dialect of Spanish.
The wiktionary page for "plátano" has a map illustrating how various Spanish-speaking countries refer to the banana.
https://en.wiktionary.org/wiki/pl%C3%A1tano#/media/File:Porp...
My principal association with plátano is plaintain, personally, but I am not a Spanish speaker.
Plátano is commonly used for banana in Mexico, just bought some at a Soriana this weekend.
If the legally blind person never had had good vision or corrective instruments, had never been told that their vision is compromised and had no other avenue (like touch) to disambiguate and learn, then they would tell you the same thing ChatGPT told you. "The objects blur together" implies that there is already an understanding of the objects being separate present.
You can even see this in yourself. If you did not get an education in physics and were asked to describe of how many things a steel cube is made up, you wouldn't answer that you can't tell. You would just say one, because you don't even know that atoms are a thing.
ChatGPT can't count, the problem is the tokenizer.
I do find it funny we're trying to chat with an AI that is "equivalent to a legally blind person with no correction"
> You would just say one, because you don't even know that atoms are a thing.
My point also. I wouldnt start guessing "10" and then "11" and then "12" when asked to double check only to capitulate when told the correct answer.
First of all, it obviously changes everything. A shortsighted person requires prescription glasses, someone that is fundamentally unable to count is incurable from our perspective. LLMs could do all of these things if we either solve tokenization or simply adapt the tokenizer to relevant tasks. This is already being done for program code, it's just that aside from gotcha arguments, nobody really cares about letter counting that much.
Secondly, the analogy was meant to convey that the intelligence of a system is not at all related to the problems at its interface. No one would say that legally blind people are less insightful or intelligent, they just require you to transform input into representations accounting for their interface problems.
Thirdly, as I thought was obvious, the tokenizer is not a uniform blur. For example, a word like "count" could be tokenized as "c|ount" or " coun|t" (note the space) or ". count" depending on the surrounding context. Each of these versions will have tokens of different lengths, and associated different letter counts. If you've been told that the cube had 10, 11 or 12 trillion constituent parts by various people depending on the random circumstances you've talked to them in, then you would absolutely start guessing through the common answers you've been given.
Compared to YouTube and Google SEO trash, or Google Home / Alexa (which do search + wiki retrieval), at the moment GPT-4 and Claude are unironically safer for kids: no algorithmic manipulation, no ads, no affiliated trash blogs, and so on. Bonus is that it can explain on the level of complexity the child will understand for their age
I still see this as a cool application. Anything that provides easier access to knowledge and improved learning is a boon.
I'd rather worry about the potential economic impact than worry about possible hallucinations from fun questions like "how big is the sun?" or "what is the best videogame in the world?", etc.
There's a ton you can do here, IMO.
Take a look at mathacademy.com, for instance. Now slap a voice interface on it, provide an ability for kids/participants to ask questions back and forth, etc. Boom: you've got a math tutor that guides you based on your current ability.
What if we could get to the same style of learning for languages? For instance, I'd love to work on Spanish. It'd be far more accessible if I could launch a web browser and chat through my mic in short spurts, rather than crack open Anki and go through flash cards, or wait on a Discord server for others to participate in immersive conversation.
Tons of cool applications here, all learning-focused.
Set a memorable verification phrase with your friends and loved ones. That way if you call them out of the blue or from some strange number (and they actually pick up for some reason) and you tell them you need $300 to get you out of trouble they can ask you to say the phrase and they'll know it's you if you respond appropriately.
I've already done that and I'm far less worried about AI fooling me or my family in a scam than I am about corporations and governments using it without caring about the impact of the inevitable mistakes and hallucinations. AI is already being used by judges to decide how long people should go to jail. Parole boards are using it to decide who to keep locked up. Governments are using it to decide which people/buildings to bomb. Insurance companies are using to deny critical health coverage to people. Police are using it to decide who to target and even to write their reports for them.
More and more people are going to get badly screwed over, lose their freedom, or lose their lives because of AI. It'll save time/money for people with more money and power than you or I will ever have though, so there's no fighting it.
Alternatively while it may be difficult to trick you directly, phishing the passphrase from a more naive loved one or bored coworker and then parroting it back to you is also a possibility. 'etc.
Phone scams are no joke and this is getting past the point where regular people can be expected to easily filter them out.
1. something you have
2. something you know
3. something you are
These three things are required for any authz.
I only say this sort of jokingly. Three out of four of my parents/in laws are questionably literate on the internet. It wouldn't take much of a "me bot" for them to start telling it the stories of our childhood and then that information is out there.
As technology advances those proportions will be boosted. Seems inevitable.
We went from living in villages where everyone knew each other to living in big cities where almost everyone is a stranger.
We went from photos being relatively reliable evidence to digital photography where anyone can fake almost anything and even the line between faking and improving is blurred.
We went from mass distribution of media being a massive capital expenditure that only big publishers could afford to something that is free and anonymous for everyone.
We went from a tiny number of people in close proximity being able to initiate a conversation with us to being reachable for everyone who could dial a phone number or send an email message.
Each of these transitions caused big problems. None of these problems have ever been completely solved. But each time we found mitigations that limit the impact of any misuse.
I see the current AI wave as yet another step away from trusting superficial appearances to a world that requires more formal authentication protocols.
Passports were introduced long ago but never properly transitioned into the digital world. Using some unsigned PDF allegedly representing a utility bill as proof of address seems questionable as well. And the way in which social security numbers are used for authentication in the US is nothing short of bizarre.
So I think there are some very low hanging fruits in terms of authentication and digital signatures. We have all the tools to deal with the trust issues caused by generative AI. We just have to use them.
The nature of this tech itself is probably what is getting most people - it looks, sounds and feels _human_ - it's very relatable and easy for a non-tech person to understand it and thus get creeped out. I'd argue there are _far_ more dangerous technologies out there, but no one notices and / or cares because they don't understand the tech in the first place!
The "yet" is carrying a lot of weight in that statement. It is now five years since the launch of GPT-2, three years since the launch of GPT-3 and less than 18 months since the launch of ChatGPT. I cannot think of any technology that has improved so much in such a short space of time.
We might hit an inflection point and see that rate of improvement stall, but we might not; we're not really sure where that point might lie, because there's likely to still be a reasonable amount of low-hanging fruit regarding algorithmic and hardware efficiency. If OpenAI and their peers can maintain a reasonable rate of improvement for just a few more years, then we're looking at a truly transformational technology, something like the internet that will have vast repercussions that we can't begin to predict.
The whole LLM thing might be a nothingburger, but how much are we willing to gamble on that outcome?
> it'd have to be impacting the real world
By writing business plans? Getting lawyers punished because they didn't realise that "passes bar exam" isn't the same as "can be relied on for citations"? By defrauding people with synthesised conversations using stolen voices? By automating and personalising propaganda?
Or does it only count when it's guiding a robot that's not merely a tech demo?
It's still early, and I don't see much in corporate communications, for instance, but it will be quite the change.
It's worse than I thought. They've already managed to mimick the median HN user perfectly!
We need one who's doing the dirty work of not discussing.
I guess we need to have an AI secretary to take in all phonecalls from now on (spam folder will become a lot more interesting with celebrity phone calls, your dead relative phoning you etc)
has been for years mon ami. i remember when they started talking about GPT-2 here, and then seeing a sea-change in places like reddit and quora
quite visible on HN, esp. in certain threads like those involving brands that market heavily, or discussions of particular countries and politics.
Discovering an asteroid full of gold, with as much gold as half the earth to put a modest number, would have huge impact to the labour market. Anything conductive like copper, silver, mining jobs would all go away. Also housing would be obsolete as we would all live in golden houses. A huge impact to the housing market, yet it doesn't seem such a bad thing to me.
>We're already at a point where we're counselling elders to ignore late-night messages from people claiming to be a relative in need of an urgent wire transfer.
Anyone can prove their identity, or identities, over the wire, wire-fully or wire-lessly, anything you like. When i did go to university, i was the only one attending the cryptography class, no one else showed up for a boring class like this. I wrote a story about the Electrona Corp in my blog.
What i say to people for at least 2 years now, is that "Remember when governments were not just some cryptographic algorithms?" Yeah, that's gonna change. Cryptography is here to stay, it is not as dead as people think and it's gonna make a huge blast.
Probably why it's not released yet. It's unsafe for phishing.
- It helps them sleep at night if their creation doesn't put millions of people out of work.
- Fear of regulation
The world learnt to deal with Nigerian Prince emails and nobody is falling to those anymore. Nothing was changed - no new laws or regulations needed.
Phishing calls have been going on without an AI for decades.
You can be skeptical and call back. If you know your friends or family you should be able to find an alternative way to get in touch always without too much effort in the modern connected world.
Just recently a gang in Spain was arrested for "son in trouble" scam. No AI used. Most of the parents are not fooled in this.
https://www.bbc.com/news/world-europe-68931214
The AI might have some marginal impact, but it does not matter in the big picture of scams. While it is worrisome, it is not a true safety concern.
The best we can ever hope to do is find mitigations as and when problems arise.
That's massive fast change, and we haven't culturally caught up to any of it yet.
This happened from the 15th century onward. By the 19th century more than half the UK population could read and write.
That's how I look at where we're going with AI. Plunge along into the new arms race first and build the capacity, then later figure out the treaties and safeguards which we hope will keep our society safe (and by that I don't mean a Skynet-like AI-powered destruction, but the upheaval of our society potentially as impactful as the industrial revolution.)
Humanity will get through it, I'm sure. But I'm not confident it will be without a lot of pain and suffering for a large percentage of people. We also managed to survive 2 world wars in the last century--but it cost the lives of 100 million people.
So the question, I think, is how do we reclaim trust in a world where every kind of content can be convincingly faked? And I think the answer is by rebuilding trust between users such that we actually have reason to simply trust the users we're interacting with aren't lying to us (and that also goes for building trust in the platforms we use). In my mind, that means a shift to small federated and P2P communication since both of these enable both the users and the operators to build the network around existing real-world relationships. A federation network can still grow large, but it can do so through those relationships rather than giving institutional bad actors as easy of an entrance as anyone else.
Isn't it rather brilliant that you can just ask questions of competent people in some subreddit without first becoming part of that particular social circle?
It could also reintroduce geographical exclusion based on the rather arbitrary birth lottery.
This a problem with all technology. The mitigations are like technical debt but with a difference. You can fix technical debt. Short of societal collapse mitigations persist, the impacts ratchet upward and disproportionately affect people at the margin.
There's an old (not quite joke) that if civilization fell, a large percentage of the population would die of the effects of tooth decay.
Replacing all jobs except LLM developers? I’ll tell my hairdresser
Maybe they didn't know, maybe none of their colleagues used it, their company didn't pay for it, or maybe all they need is an Excel update.
But I am confident that using Copilot would be faster than clicking through the sludge that are Microsoft Office help pages (third party or not.)
So I think it is correct to fear capabilities, even if the real world impace is still missing. When you invent an airplane, there won't be an airstrip to land on yet. Is it useless, won't it change anything?
I fear that at some point the anonymity that made the internet great in the first place will be destroyed by this.
The dead internet theory started to look more real with time, AI spam is just scaling it up.
It has been so bad, I even considered injecting misspelling and incorrect grammar and bad punctuation into my prose to prove my words are mine.
i remember seeing the change when GPT-2 was announced
As someone that grew up with late-90’s internet culture and has seen all the pros and cons and changes over the decades, I find myself using the internet less and less for dialogue with people. And I’m spending more time in nature and saying hi to strangers in reality.
I’m still worries about the impact this will have on a lot of people’s ability to reason however. “Just” Tik Tok and apps like it has already had devastating results on certain demographics.
Brave New World indeed.
And that's the case even if you've never ever posted anything on your social media - it could be family&friends, or employer, or if you're ever been in a public-facing job position that has ever done any community outreach, or ever done a public performance with your music or another hobby, or if you've ever walked past a news crew asking questions to bystanders of some event, or if you've ever participated in some contests or competitions or sports leagues, etc, all of that is generally findable in various archives.
I'm sure AI-based ageing can do a good enough job to convince many people that a fake image of someone they haven't seen for years is an older version of the person they remember; but how often would it succeed in ageing an old photo in such a way that it looks like a person I have seen recently and therefore have knowledge rather than guesses about exactly what the years have changed about them?
(Not a rhetorical question to disagree with you, I genuinely have no idea if ageing is predictable enough for a high % result or if it would only fool people with poor visual memory and/or who haven't seen the person in over a decade.)
I feel like even ignoring the big unknowns (at what age, if any, will a person start going bald, or choose to grow a beard or to die their hair, or get a scar on their face, etc.) there must be a lot of more subtle but still important aspects from skin tone to makeup style to hair to...
I've looked up photos of some school classmates that I haven't seen since we were teens (a couple of decades ago), and while nearly all of them I think "ah yes I can still recognise them", I don't feel I would have accurately guessed how they would look now from my memories of how they used to look. Even looking at old photos of family members I see regularly still to this day, even for example comparing old photos of me and old photos of my siblings, it's surprising how hard it would be for a human to predict the exact course of ageing - and my instinct is that this is more down to randomness that can't be predicted than down to precise logic that an AI could learn to predict rather than guess at. But I could be wrong.
Why not an AI assistant in the browser to fend all the adversarial manipulation and spam AIs on the web? Going online without your AI assistant would be like venturing without a mask during COVID
I foresee a cat-and-mouse game, AIs for manipulation vs AIs for protection one upping each other. It will be like immune system vs viruses.
I don't think anyone has a good answer to that question, which is the problem in a nutshell. Job one is to start investing seriously in finding possible answers.
>We need to roll back to "don't trust anything online, don't share your identity or payment information online"
That's easy to say, but it's a trillion-dollar decision. Alphabet and Meta are both worthless in that scenario, because ~all of their revenue comes from connecting unfamiliar sellers with buyers. Amazon is at existential risk. The collapse of Alibaba would have a devastating impact on Chinese exporters, with massive consequent geopolitical risks. Rolling back to the internet of old means rolling back on many years worth of productivity and GDP growth.
Well that's exactly the sort of service that will be extremely valuable in a post-trust internet. They can develop authentication solutions that cut down on fraud at the cost of anonymity.
Even when it comes to people like our parents, there are things we would trust them to do, and things that we would not trust them to do. But what happens when you have zero trusted elements in a category?
At the end of the day, the digital world is the real world, not some seperate place 'outside the environment'. Trying to treat digital like it doesn't exist puts you in a dangerous place to be deceived. For example if you're looking for XYZ and you manage to leak this into the digital world, said digital world may manipulate your trusted friends via ads, articles, the social media posts they see on what they think about XYZ before you ask them.
This tech is dangerous, and I'm currently of the opinion that its uses for malicious purposes are far better and more significant than LLM's replacing anyone's jobs. The bullshit asymmetry principle is very incredibly significant for covert ops and asymmetric warfare, and generating convincing misinformation has become basically free overnight.
All this would do is crash the gold price. Also note that all the gold at our disposal right now (worldwide) basically fits into a cube with 20m edges (its not as much as you might think).
Gold is not suitable to replace steel as building material (because it has much lower strength and hardness), nor copper/aluminium as conductor (it's a worse conductor than copper and much worse in conductivity/weigth than aluminium). The main technical application short term would be gold plated electrical contacts on every plug and little else...
I didn't know that copper is a better conductor than gold. Surprised by that.
.. And gold teeth and grillz.
The thing about cryptography and government is that it's easy to imagine for a great technology to be adapted on the governmental level because of its greatness. But it is another thing to actually implement it. We live in a bubble, where almost anyone knows about cryptographic hashes and RSA, but for most of the people it is not the case.
Another thing is that political actors are tending to try to concentrate power in their own hands. No way they will delegate a decision making to any form of algorithm — being cryptographic or not.
As soon as this becomes a problem, then it might start bottom-up, citizens to government officials, rather than top to bottom, from president to government departments. Then governments will be forced to formalize identity solutions based on cryptography. See also this case in Germany [2].
One example like that, is bankruptcy laws in China. China didn't have any law regarding to bankruptcy till 2007. For a communist country, or rather not totally capitalist country like China, bankruptcy is not an important subject. When some people stop being profitable, they will keep working because they like to work and they contribute to the great nation of China. That doesn't make any sense of course, so their government was forced to implement some bankruptcy laws.
[1]https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos... [2]https://news.ycombinator.com/item?id=39866056
At that point, how can you validate it, as there’s no shared secret? The only thing we had was validating childhood stories. After a preponderance of them, we accepted she was real (she refused to talk on the phone — apparently her voice was damaged).
We eventually met her in real life.
The point is, you can always use these three principles: asking relatives to validate the phone number — something you have — and then the stories — something shared — and finally meeting in real life — something you are.
Of course this is much more difficult when it comes to software, and very few serious people think the idea of a ever present government monitoring your software would be a better option then reckless AI development.
Do you shop in large corporate stores and use credit cards? Do you go out in public in transportation registered to you?
If yes, then images and habits of yours are being stored in databases and sold to data brokers.
And you're not even including every single one of your family members that use internet connected devices/apps that are sucking up all the data they can.
But don't disagree with you - in a different comment that was about privacy, I (despite living under GDPR) suggested that for offline verification with known people it's better to choose secrets that definitely haven't been shared online/anywhere rather than just choosing random true facts and assuming they couldn't have been found out by hackers: https://news.ycombinator.com/item?id=40353820
It's hard to know what to believe if you get a phone call with the voice of your child or colleague, and your "child"/"colleague" replies within milliseconds in a convincing way.
I think unfortunately it will massively lower the trust of actual real videos and images, because someone can dismiss them with little thought.
Chances are they've already shoved some app on their phone that's voice to txting everything they say and sending off somewhere (well lower chance if they have an iphone).
Modern life is data/information security and humans are insanely bad at it.
This is my biggest gripe against the telecom industry. Calls pretending to be from someone else.
For every single call, someone somewhere must know at least the next link in the chain to connect a call. Keep following the chain until you find someone who either through malice or by looking the other way allows someone to spoof someone else's number AND remove their ability to send the current link in the chain (or anyone) messages. (Ideally also send them to prison if they are in the same country.) It shouldn't be that hard, right?
The technology to build trustable caller ID has existed for a long time, the problem is no one wants to be the one forcing telcos all over the world to upgrade their sometimes many decades old systems.
Edit: Also this doesn't mean less medea, just less actual humans getting paid to make medea or work adjacent jobs
Apparently so: https://www.businessinsider.com/jobs-lost-in-may-because-of-...
Note that this article is about a year old now.
> Comparing it to the internet is insane, based off of its status as a highly advanced auto complete.
(1) I was quoting you.
(2) Don't you get some cognitive dissonance dismissing it in those terms, at this point?
"Fancy auto complete" was valid for half the models before InstructGPT, as that's all the early models were even trying to be… but now? The phrase doesn't fit so well when it's multimodal and can describe what it's seeing or hearing and create new images and respond with speech, all as a single unified model, any more than dismissing a bee brain as "just chemistry" or a human as "just an animal".
~ Paul Krugman, winner of the 2008 Nobel Memorial Prize in Economic Sciences
But we do have a huge number of examples of jobs disappearing thanks to machines — even the term "computer" used to refer to a job.
More recently and specifically to LLMs, such losses were already being reported around this time last year: https://www.businessinsider.com/jobs-lost-in-may-because-of-...
And that motorized auto transport, it never went anywhere, it required roads. I mean, who would ever think we'd cover a huge portion of our land in these straight lines. Now, don't mind me, I'm going to go saddle up the horse and hope I don't catch dysentery on the way into town.
Maybe it's time to construct some (high[er] density) housing where people want to live? No? Okay, then maybe next decade ... but then let's construct transport for them so they can get to work, how about some new subway lines? Ah, okay, not that either.
Then I guess the only thing remains to construct is all the factories that will be built as companies decouple from China.
I don't think it's a problem to find a word or a sentence or a story - whatever - that's commonly used by everyone on daily basis but in different context. That's not a problem by itself :) try it
For the idiots, it is still possible to find a word. They may be idiots, but still, they work and live on their own. They coming along in life. So, it's up to the smarter one to find a no-brainer solution.
I am confident and believe nothing and no one is stupid enough not to be able to adapt to something. Even if it's me, who'll need to adapt to members with less brain.