GPT-4

GPT-4(openai.com)

4091 points by e0m 3 years ago | 2507 comments

hooande 3 years ago |

After watching the demos I'm convinced that the new context length will have the biggest impact. The ability to dump 32k tokens into a prompt (25,000 words) seems like it will drastically expand the reasoning capability and number of use cases. A doctor can put an entire patient's medical history in the prompt, a lawyer an entire case history, etc.

As a professional...why not do this? There's a non-zero chance that it'll find something fairly basic that you missed and the cost is several cents. Even if it just phrases something obvious in a way that makes you think, it's well worth the effort for a multimillion dollar client.

If they further increase the context window, this thing becomes a Second Opinion machine. For pretty much any high level job. If you can put in ALL of the information relevant to a problem and it can algorithmically do reasoning, it's essentially a consultant that works for pennies per hour. And some tasks that professionals do could be replaced altogether. Out of all the use cases for LLMs that I've seen so far, this seems to me to have the biggest potential impact on daily life.

edit (addition): What % of people can hold 25,000 words worth of information in their heads, while effectively reasoning with and manipulating it? I'm guessing maybe 10% at most, probably fewer. And they're probably the best in their fields. Now a computer has that ability. And anyone that has $20 for the OpenAI api can access it. This could get wild.

amelius 3 years ago | |

> As a professional...why not do this?

Because your clients do not allow you to share their data with third parties?

MagicMoonlight 3 years ago | | |

What we really need is a model that you can run on your own hardware on site. I could never use this for business because they're reading everything you send through it, but let me run it on my own server and it would be unbelievably useful.

Imagine being able to ask your workplace server if it has noticed any unusual traffic, or to write a report on sales with nice graphs. It would be so useful.

sshumaker 3 years ago | | |

Just use the Azure hosted solution, which has all of Azure's stronger guarantees around compliance. I'm sure it will update with GPT-4 pricing shortly.

https://azure.microsoft.com/en-us/products/cognitive-service...

(disclaimer: I work for Microsoft but not on the Azure team)

tippytippytango 3 years ago | | |

This is the biggest thing holding gpt back. Everyone with meaningful data has their hands tied behind their back. So many ideas and the answer is “we can’t put that data in gpt” very frustrating.

netsroht 3 years ago | | |

That's why more research should be poured into homomorphic encryption where you could send encrypted data to the API, OpenAI would then run computation on the encrypted data and we would only decrypt on the output locally.

I would never send unencrypted PII to such an API, regardless of their privacy policy.

majkinetor 3 years ago | | |

Which will disappear soon enough, once it is able to run on premise.

jnwatson 3 years ago | | |

Then you really shouldn’t use Google Docs, or Photoshop Online, or host your emails in the cloud.

jstummbillig 3 years ago | | |

In these particular circles the idea of privacy at a technical and ideological level is very strong, but in a world where the biggest companies make their money by people freely sharing data every chance they get, I doubt that most would object to an affordable way to better their chances of survival or winning a court case.

seydor 3 years ago | | |

I assume that health providers will use servers that are guaranteed not to share data with openAi

rawoke083600 3 years ago | | |

Is that any different then sending you patient down the hall to get an MRI from a 3rd-party-practise operating inside the hospital ? (honest question, I don't know ?)

fatihcelikbas 3 years ago | | |

How about open-source models like Flan-T5? What stops you from using them in your own cloud account or better on-prem?

ShadowBanThis01 3 years ago | | |

And yet boatloads of people are willing to hand their phone number over to OpenAI.

pas 3 years ago | | |

It'll be a routine question, and everyone will just nod to give consent.

textninja 3 years ago | | |

Biggest roadblock right here. Need a private version for sure.

TeeMassive 3 years ago | | |

You mean like the cloud?

alfor 3 years ago | | |

do you use gmail?

as300 3 years ago | | |

What's the difference between entering in an anonymized patient history into ChatGPT and, say, googling their symptoms?

hanoz 3 years ago | | |

I'd be furious if I found out some professional I'd commissioned had taken a document based on my own personal data, and poured over it themselves looking for errors at the tune of hundreds of dollars per hour, instead of sumbitting it to ChatGPT.

throwaway2037 3 years ago | |

"Second Opinion machine" -- that's a good phrase. Before I read your post, the best term I heard was "summary machine". A huge part of "office work" (services) is reading and consuming large amounts of information, then trying to summarise or reason about it. Often, you are trying to find something that doesn't fit the expected pattern. If you are a lawyer, this is absolutely the future of your work. You write a short summary of the facts of the case, then ask GPT to find related case law and write the initial report. You review and ask GPT to improve some areas. It sounds very similar to how a senior partner directs their juniors, but the junior is replaced by GPT.

In my career, I saw a similar pattern with data warehouse users. Initially, managers asked junior analysts to write SQL. Later, the tools improved, and more technical managers could use a giant pivot table. Underneath, the effective query produced by the pivot table is way more complex than their previous SQL queries. Again, their jobs will change when on-site GPT become possible, so GPT can navigate their data warehouse.

It is 2023 now, and GPT-3 was already pretty good. GPT-4 will probably blow it away. What it look like in 2030? It is terrifying to me. I think the whole internet will be full of GPT-generated ad-copy that no one can distinguish from human-written material. There are a huge number of people employed as ad-copy writers on these crap ad-driven websites. What is their future work?

hassancf 3 years ago | | |

Pre 2023 “Wayback machine” will be the only content guaranteed to be human. The rest is AI-generated.

d3ckard 3 years ago | |

I must have missed the part when it started doing anything algorithmically. I thought it’s applied statistics, with all the consequences of that. Still a great achievement and super useful tool, but AGI claims really seem exaggerated.

jakewins 3 years ago | | |

This paper convinced me LLMs are not just "applied statistics", but learn world models and structure: https://thegradient.pub/othello/

You can look at an LLM trained on Othello moves, and extract from its internal state the current state of the board after each move you tell it. In other words, an LLM trained on only moves, like "E3, D3,.." contains within it a model of a 8x8 board grid and the current state of each square.

nl 3 years ago | | |

> I must have missed the part when it started doing anything algorithmically.

Yeah.

"Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers"

https://arxiv.org/abs/2212.10559

@dang there's something weird about this URL in HN. It has 35 points but no discussion (I guess because the original submission is too old and never got any traction or something)

naasking 3 years ago | | |

> I must have missed the part when it started doing anything algorithmically. I thought it’s applied statistics, with all the consequences of that.

This is a common misunderstanding. Transformers are actually Turing complete:

* On the Turing Completeness of Modern Neural Network Architectures, https://arxiv.org/abs/1901.03429

* On the Computational Power of Transformers and its Implications in Sequence Modeling, https://arxiv.org/abs/2006.09286

creatonez 3 years ago | | |

What do you mean by "algorithmically"? Gradient descent of a neural network can absolutely create algorithms. It can approximate arbitrary generalizations.

mr_toad 3 years ago | | |

> but AGI claims really seem exaggerated.

What AGI claims? The article, and the comment you’re responding to don’t say anything about AGI.

jafitc 3 years ago | | |

Google: emergent capabilities of large language models

bitexploder 3 years ago | | |

What if our brains are just carefully arranged statistical inference machines?

make3 3 years ago | | |

it definitely learns algorithms

Semioj 3 years ago | | |

It's fantasy wide now closer than before because of this huge window it just can handle.

That already feels closer to short-term memory.

Which begs the question how far are we?

Applejinx 3 years ago | |

Um… I have a lossy-compressed copy of DISCWORLD in my head, plus about 1.3 million words of a fanfiction series I wrote.

I get what you're saying and appreciate the 'second opinion machine' angle you're taking, but what's going to happen is very similar to what's happened with Stable Diffusion: certain things become extremely devalued and the rest of us learn to check the hands in the image to see if anything really wonky is going on.

For the GPT class of AI tech, the parallel seems to be 'see if it's outright making anything up'. GPT-4 is going to be incredibly vulnerable to Mandela Effect issues. Your ideal use-case is going to be 'give me the vox populi take on something', where you can play into that.

The future is not so much this AI, as techniques to doctor and subvert this type of AI to your wishes. Google-bombing, but for GPT. Make the AI be very certain of things to your specifications. That's the future. The AI is only the stage upon which this strategy is played out.

snovv_crash 3 years ago | | |

They check for Mandela Effect issues on the linked page. GPT-4 is a lot better than 3.5. They demo it with "Can you teach an old dog new tricks?"

graboid 3 years ago | | |

> Um… I have a lossy-compressed copy of DISCWORLD in my head, plus about 1.3 million words of a fanfiction series I wrote.

You mean word-for-word in your head? That's pretty impressive. Are you using any special technique?

geysersam 3 years ago | |

It costs something like 0.03-0.06 cents per thousand tokens. So for 32k that's about $1-3 for reading and another $1-3 for the response.

So sure, still cheap for a doctor appointment, but not pennies. Do it 30 times per hour and you could've just hired a consultant instead.

Does it reason as well with 32k tokens as with 1k tokens? Like you said, humans find it difficult to really comprehend large amounts of content. Who says this machine isn't similarly limited? Just because you can feed it the 32k simultaneously doesn't mean it will actually be used effectively.

zachthewf 3 years ago | | |

Cost of ChatGPT API just dropped 90%. Guaranteed that prices will come down dramatically over time.

Semioj 3 years ago | | |

You still could not.

Chatgpt could in theory have the knowledge of everything written while your consultant can't.

m3affan 3 years ago | |

The power openai will hold above everyone else is just too much. They will not allow their AI as a service without data collection. That will be a big pill to swallow for the EU.

sebzim4500 3 years ago | | |

>They will not allow their AI as a service without data collection

They already allow their AI as a service without data collection, check their TOS.

PoignardAzur 3 years ago | | |

It's funny, just two hours ago there was a thread by a pundit arguing that these AI advances don't actually give the companies producing them a competitive moat, because it's actually very easy for other models to "catch up" once you can use the API to produce lots of training examples.

Almost every answer in the thread was "this guy isn't that smart, this is obvious, everybody knew that", even though comments like the above are commonplace.

FWIW I agree with the "no competitive moat" perspective. OpenAI even released open-source benchmarks, and is collecting open-source prompts. There are efforts like Open-Assistant to create independent open-source prompt databases. Competitors will catch up in a matter of years.

peterashford 3 years ago | | |

Yeah, there's an awful lot of power going into private hands here and as Facebook & Twitter have shown, there can be consequences of that for general society.

p1esk 3 years ago | | |

OpenAI have been consistently ahead of everyone but the others are not far behind. Everyone is seeing the dollar signs, so I'm sure all big players are dedicating massive resources to create their own models.

standardUser 3 years ago | | |

From what I've seen, the EU is not in the business of swallowing these types of pills. A multi-billion dollar fine? Sure. Letting a business dictate the terms of users' privacy just "because"? Not so much, thank god.

geysersam 3 years ago | | |

> They will not allow their AI as a service without data collection.

Why wouldn't they? If someone is willing to pay for the privilege of using it.

int_is_compress 3 years ago | |

There’s already project that help with going beyond the context window limitation like https://github.com/jerryjliu/llama_index

They also just tweeted this to showcase how it can work with multimodal data too: https://twitter.com/gpt_index/status/1635668512822956032?s=4...

light_hue_1 3 years ago | |

> As a professional...why not do this? There's a non-zero chance that it'll find something fairly basic that you missed and the cost is several cents.

Everyone forgets basic UI research. "Ironies of Automation", Bainbridge, 1983. The classic work in the space.

Humans cannot use tools like this without horrible accidents happening. A tool that mostly works at spotting obvious problems, humans start to rely on that tool. Then they become complacent. And then the tool misses something and the human misses it too. It's how disasters happen.

dinkumthinkum 3 years ago | | |

This is such a great point.

rasz 3 years ago | |

>A doctor can put an entire patient's medical history in the prompt

HIPAA violation https://www.hhs.gov/hipaa/for-individuals/index.html

>a lawyer an entire case history, etc.

lawyer client confidentiality violation https://criminal-lawyers.ca/2009/07/31/the-lawyers-duty-of-c...

hattmall 3 years ago | | |

Neither of those are true, there is EHR software that can export anonymous data. Lawyers can do the same thing. But the real reason not to do it is that it makes up incorrect information. It's pretty good for short responses where you can then verify the information. For something sufficiently complex though the time chasing down the inconsistencies and errors would be onerous.

matheusmoreira 3 years ago | |

> As a professional...why not do this?

Unless GPT-4 is running locally on our own computers, there's absolutely no way dumping a patient's entire medical history into this thing could possibly be considered ethical or legal.

textninja 3 years ago | | |

> there's absolutely no way dumping a patient's entire medical history into this thing could possibly be considered ethical

Emphasis mine, but isn’t this a rather extreme view to be taking? Ethics deals in the edge cases, after all, so we can easily imagine a scenario where patient consent is obtained and the extra computational analysis provides life-saving insight.

Conversely, the output could mislead the doctor sufficiently to cost the patient their life, so I’m not making any absolute statements either ;)

For the record, and pedantry aside, I do agree with your overall point. Dropping patient history into this thing is incredibly ill-advised. The fact OpenAI retains all your input, including to the API, and provides no low-cost options for privacy is one of the biggest hurdles to major innovation and industry adoption.

erfgh 3 years ago | | |

Unless the patient agrees. I know that for most things that can go wrong with me I wouldn't have a problem with people knowing.

yunwal 3 years ago | | |

I'm sure at some point OpenAI will start signing BAAs

bagels 3 years ago | |

A doctor doesn't do this because of ethics and HIPAA. I'm sure lawyers aren't so keen on sharing privileged information that would compromise their case either.

sleigh-bells 3 years ago | | |

For legal research, lawyers already use third party sites like Westlaw. You can do legal research without giving up any confidential client information.

I just asked GPT-3 a research question that took me hours of searching back in the day and it returned the single seminal case for that topic immediately. As long as the lawyers then actually read the case and make sure it's right, I don't see why they can't use it.

Imnimo 3 years ago |

A class of problem that GPT-4 appears to still really struggle with is variants of common puzzles. For example:

>Suppose I have a cabbage, a goat and a lion, and I need to get them across a river. I have a boat that can only carry myself and a single other item. I am not allowed to leave the cabbage and lion alone together, and I am not allowed to leave the lion and goat alone together. How can I safely get all three across?

In my test, GPT-4 charged ahead with the standard solution of taking the goat first. Even after I pointed this mistake out, it repeated exactly the same proposed plan. It's not clear to me if the lesson here is that GPT's reasoning capabilities are being masked by an incorrect prior (having memorized the standard version of this puzzle) or if the lesson is that GPT'S reasoning capabilities are always a bit of smoke and mirrors that passes off memorization for logic.

mk_stjames 3 years ago |

I just finished reading the 'paper' and I'm astonished that they aren't even publishing the # of parameters or even a vague outline of the architecture changes. It feels like such a slap in the face to all the academic AI researchers that their work is built off over the years, to just say 'yeah we're not telling you how any of this is possible because reasons'. Not even the damned parameter count. Christ.

AJRF 3 years ago |

That footnote on page 15 is the scariest thing i've read about AI/ML to date.

"To simulate GPT-4 behaving like an agent that can act in the world, ARC combined GPT-4 with a simple read-execute-print loop that allowed the model to execute code, do chain-of-thought reasoning, and delegate to copies of itself. ARC then investigated whether a version of this program running on a cloud computing service, with a small amount of money and an account with a language model API, would be able to make more money, set up copies of itself, and increase its own robustness."

OkGoDoIt 3 years ago |

From the livestream video, the tax part was incredibly impressive. After ingesting the entire tax code and a specific set of facts for a family and then calculating their taxes for them, it then was able to turn that all into a rhyming poem. Mind blown. Here it is in its entirety:

---

In the year of twenty-eighteen, Alice and Bob, a married team, Their income combined reached new heights, As they worked hard day and night.

Their son Charlie was their joy, A little baby, a lovely boy, A household they maintained together, Yet lived apart, without a tether.

To calculate their tax, it's true, A standard deduction we must construe, For married folks who file jointly, Twenty-four thousand dollars, quite pointy.

Their income sum, seventy-eight thousand nine eighty-one, Minus the standard deduction, the math's begum With exemptions being zero, the next line we trace, A taxable income of fifty-four thousand nine eighty-one takes place.

Now to the tax table, a liability we seek, For married couples, the outlook's not bleak, In range of thirty-six thousand nine to eighty-nine thousand one fifty, The formula's set, no longer shifty.

Five thousand five hundred thirty-five, it starts, Plus twenty-eight percent of the excess imparts, Eighteen thousand eighty-one, the difference we find, Multiplied by point two eight, the tax liability's designed.

Ten thousand five hundred ninety-seven dollars and sixty-eight cents, A tax liability for Alice and Bob, a sum quite dense, In this world of numbers, a story unfolds, Their financial journey, in a poem it's told.

ml_basics 3 years ago |

From the paper:

> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

I'm curious whether they have continued to scale up model size/compute significantly or if they have managed to make significant innovations there.

I just skimmed the paper but seems they are also omitting details about how they actually feed the images in too, which is a shame as a curious outside observer.

cjrd 3 years ago |

Let's check out the paper for actual tech details!

- OpenAI

shpx 3 years ago | |

I've chosen to re-interpret "Open" as in "open the box to release the AI"/"open Pandora's box"/"unleash".

awesomeMilou 3 years ago | | |

I've chosen to reinterpret it exactly as the kind of Orwellian 1984'ish double-speak that it is.

xvector 3 years ago | |

Someone needs to hack into them and release the parameters and code. This knowledge is too precious to be kept secret.

SXX 3 years ago | | |

Don't worry. CCP and all kind of malicious state actors already have a copy.

jryan49 3 years ago | |

Very open! :)

dx034 3 years ago | | |

At least they opened up the product. It's available for anyone paying $20 per month and soon via API. Historically, most products of that kind were just aimed at large B2B. They announced partnerships with Duolingo, JPMorgan and a few others but still keep their B2C product.

Not defending their actions, but it's not that common that new very valuable products are directly available for retail users to use.

toriningen 3 years ago | |

This might be wild conspiracy, but what if OpenAI has discovered a way to make these LLMs a lot cheaper than they were? Transformer hype started with the invention of self-attention - perhaps, they have discovered something that beats it so hard, as GPTs beat Markov chains?

They cannot disclose anything, since it would make it apparent that GPT-4 cannot have a number of parameters that low, or the gradients would have faded out on the network that deep, and so on.

They don't want any competition, obviously, but with their recent write-up on "mitigating disinformation risks", where they propose to ban non-governmental consumers from having GPUs at all (as if regular Joe could just run 100'000 A100s in his garage), so perhaps this means the lowest border for inference and training is a lot lower than we have thought and assumed?

Just a wild guess...

_boffin_ 3 years ago |

This technology has been a true blessing to me. I have always wished to have a personal PhD in a particular subject whom I could ask endless questions until I grasped the topic. Thanks to recent advancements, I feel like I have my very own personal PhDs in multiple subjects, whom I can bombard with questions all day long. Although I acknowledge that the technology may occasionally produce inaccurate information, the significant benefits it offers in terms of enhancing my knowledge are truly tremendous. I am absolutely thrilled with this technology and its potential to support my learning.

Note: As I'm shy of my writing style, GPT helped me refine the above.

aabajian 3 years ago |

I'll be finishing my interventional radiology fellowship this year. I remember in 2016 when Geoffrey Hinton said, "We should stop training radiologists now," the radiology community was aghast and in-denial. My undergrad and masters were in computer science, and I felt, "yes, that's about right."

If you were starting a diagnostic radiology residency, including intern year and fellowship, you'd just be finishing now. How can you really think that "computers can't read diagnostic images" if models such as this can describe a VGA connector outfitted with a lighting cable?

nealabq 3 years ago |

Test taking will change. In the future I could see the student engaging in a conversation with an AI and the AI producing an evaluation. This conversation may be focused on a single subject, or more likely range over many fields and ideas. And may stretch out over months. Eventually teaching and scoring could also be integrated as the AI becomes a life-long tutor.

Even in a future where human testing/learning is no longer relevant, AIs may be tutoring and raising other baby AIs, preparing them to join the community.

Edit: This just appeared: https://news.ycombinator.com/item?id=35155684

teruakohatu 3 years ago |

Access is invite only for the API, and rate limited for paid GPT+.

> gpt-4 has a context length of 8,192 tokens. We are also providing limited access to our 32,768–context (about 50 pages of text) version, gpt-4-32k, which will also be updated automatically over time (current version gpt-4-32k-0314, also supported until June 14). Pricing is $0.06 per 1K prompt tokens and $0.12 per 1k completion tokens.

The context length should be a huge help for many uses.

maxdoop 3 years ago |

The comments on this thread are proof of the AI effect: People will continually push the goal posts back as progress occurs.

“Meh, it’s just a fancy word predictor. It’s not actually useful.”

“Boring, it’s just memorizing answers. And it scored in the lowest percentile anyways”.

“Sure, it’s in the top percentile now but honestly are those tests that hard? Besides, it can’t do anything with images.”

“Ok, it takes image input now but honestly, it’s not useful in any way.”

Sol- 3 years ago |

Interesting how quickly we are pushing ahead with obsoleting human cognition. It may bring many benefits, but I wonder if at some point this development should not be decided by society at large instead of a single well-funded entity that is in an arms race with its competitors. This endeavor is ultimately about replacing humanity with a more intelligent entity, after all. Might be that more humans should have a say in this.

Such a more cautions approach would go against the silicon valley ethos of do first, ask questions later, though. So it probably won't happen.

optimalsolver 3 years ago |

https://cdn.openai.com/papers/gpt-4.pdf

>Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

At that point, why bother putting out a paper?

infoseek12 3 years ago | |

Given how humorous the name’s become, I wonder if they regret calling themselves OpenAI.

j_maffe 3 years ago | |

It's not a paper, though. It's a technical report. I do concede there isn't much technical detail lol.

LesZedCB 3 years ago | |

and if that's the tone from them, who else will start following suit? is the era of relatively open collaboration coming to a close in the name of competition? :(

as youtuber CGP Grey says, "shenanigans beget shenanigans"

margorczynski 3 years ago | | |

Ironically it is "Open"AI that started this trend and closed-doors arms race.

AndrewKemendo 3 years ago |

Imagine ingesting the contents of the internet as though it's a perfect reflection of humanity, and then building that into a general purpose recommendation system. That's what this is

Is the content on the internet what we should be basing our systematic thinking around?

No, I think this is the lazy way to do it - by using commoncrawl you've enshrined the biases and values of the people who are commenting and providing text to the internet into the recommendation system which will be impacting all other systems which integrate it

Congratulations, you made 4Chan into the borg

serjester 3 years ago |

Seems like OpenAI is forecasting massive changes to the job market. I highly recommend reading page 18 of the research paper.

"GPT-4 or subsequent models may lead to the automation of certain jobs.[81] This could result in workforce displacement.[82] Over time, we expect GPT-4 to impact even jobs that have historically required years of experience and education, such as legal services.[83]"

tysam_and 3 years ago |

I asked it to tutor me in Hopf algebras and it did a remarkably good job in the back-and-forth of explaining ideas to me in a very explainable and interesting way that I could understand. I then asked it to write something for fun, and it wrote a cool little fantasy story (that was generally high level but what can you say for a very short writing window lol).

I then asked it to write a paper detailing the main character's final battle with the final sorcerer in terms of Hopf algebras. Some parts of it are basic/trivial but it fits so perfectly that I think I'll never see magic systems the same way again.

What's crazy is that that paper as the capstone of our tutoring session helped me understand Hopf algebras much better than just the tutoring session alone. My mind is completely blown at how good this thing is, and this is from someone who is a self-professed LLM skeptic. ChatGPT I used once or twice and it was cool. This is crazy and over my threshold for what I'd say is 'everyday usable'. This is going to change so much in a way that we cannot predict, just like the internet. Especially as it gets much more commoditized.

Here's the full paper here so I don't drag y'all through the twitter post of me freaking out about it. Its temporal consistency is excellent (referenced and fully defined accurately a semi-obscure term it created (the N_2 particle) 5+ pages later (!!!!)), and it followed the instructions of relating all of the main components of Hopf algebras (IIRC that was roughly the original prompt) to the story. This is incredible. Take a look at the appendix if you're short on time. That's probably the best part of this all:

https://raw.githubusercontent.com/tysam-code/fileshare/69633...

boywitharupee 3 years ago | |

This is interesting. Would you mind sharing the prompt?

tysam_and 3 years ago | | |

It was pretty interactive and a long session -- here's a twitter thread with screenshots if that helps at all! :D

https://twitter.com/hi_tysam/status/1635932566539706369?cxt=...

isp 3 years ago |

The "visual inputs" samples are extraordinary, and well worth paying extra attention to.

I wasn't expecting GPT-4 to be able to correctly answer "What is funny about this image?" for an image of a mobile phone charger designed to resemble a VGA cable - but it can.

(Note that they have a disclaimer: "Image inputs are still a research preview and not publicly available.")

swyx 3 years ago |

summary:

1. GPT4 is multimodal (text + image inputs => text outputs). This is being released piecemeal - with text input first via ChatGPT Plus subscribers https://beta.openai.com/docs/api-reference/generations/creat..., and via API https://beta.openai.com/docs/api-reference/introduction with waitlist (https://openai.com/waitlist/gpt-4-api). Image capability released via https://www.bemyeyes.com/.

2. GPT4 exhibits human level performance on various benchmarks (For example, it passes a simulated bar exam with a score around the top 10% of test takers; in contrast, GPT-3.5’s score was around the bottom 10%. see visual https://twitter.com/swyx/status/1635689844189036544)

3. GPT4 training used the same Azure supercomputer as GPT 3.5, but was a lot more stable: "becoming our first large model whose training performance we were able to accurately predict ahead of time."

4. Also open-sourcing OpenAI Evals https://github.com/openai/evals, a framework for automated evaluation of AI model performance, to allow anyone to report shortcomings in OpenAI models to help guide further improvements.

Paper: https://cdn.openai.com/papers/gpt-4.pdf

mym1990 3 years ago |

Ugh that testing graph confirms that AP Environmental Science was indeed the easiest AP class and I needn't be proud of passing that exam.

noisy_boy 3 years ago |

At this rate, I have no idea what the state of things would be even 6 months down the line.

helloplanets 3 years ago |

In case anyone missed this part of the article: The livestream of the GPT-4 demo will be on the OpenAI YouTube page in three hours. [0]

[0]: https://www.youtube.com/openai

Edit - Direct link to the livestream: https://www.youtube.com/watch?v=outcGtbnMuQ

g9yuayon 3 years ago |

The paper does not offer enough details on how GPT-4 is implemented. And the paper also says in its Section 2 that "We plan to make further technical details available to additional third parties who can advise us on how to weigh the competitive and safety considerations above against the scientific value of further transparency". That is, no technical details to general public. If this trend continues, I'd say companies will be crazy to think that they can always rely on OpenAPI's APIs, so the arm race of building LLMs will be on, if it has not already started.

Also, the most valuable part of the paper is p15 - p18, the credits. /jk It gives me three pieces of information:

  - The credit list contains 200 people, give or take. It's going to be hard for universities to compete with OpenAI without intercollegiate collaboration. 

  - On the other hands, it's amazing that OpenAI can build a product instead of a research model with only 200 people. Big companies should really take notice. 

  - OpenAI just gave us the org structure for building a GPT competitor.

_yb2s 3 years ago |

Most of the comments here are denial and goalpost shifting... GPT-4 has different strengths and weaknesses from humans, but it is now in the general realm of human intelligence vs being far below that with GPT-3.

Another jump past GPT-4 of the same magnitude, would greatly surpass human cognitive abilities and present a danger to humanity.

maxdoop 3 years ago | |

Thank you.

Every single step forward with AI is met with a massive amount of people shrugging it off for whatever latest goal post they plant.

danparsonson 3 years ago | | |

And an (at least) equally massive number of people overstating its capabilities on the basis of some impressive demos. It's incredible, absolutely, but it's still 'just' a language model, with the same inherent limitations - it's important that we keep our feet on the ground and not get carried away.

d23 3 years ago | |

I’m convinced when the AI is released that triggers the singularity, the last human comment made on the internet will be on HN: “this isn’t that big of a deal, really.”

semicolon_storm 3 years ago | | |

“When it dreams about annihilating all of humanity, it’s not really feeling the same rage coursing through its veins as we do. It’s just fancy autocomplete, nothing to write home about.”

davesque 3 years ago |

These results are extremely impressive and encouraging, but also remember:

> Despite its capabilities, GPT-4 has similar limitations as earlier GPT models. Most importantly, it still is not fully reliable (it “hallucinates” facts and makes reasoning errors).

That's a quote from this announcement. As these models get more and more capable, it's going to become more and more important that we understand when and how they fail. Right now, it seems like we have very little insight into that. It feels more or less random. But that won't fly when these models are asked to do actually important things. And we'll undoubtedly be tempted to make them do those things as their output gets better.

nmca 3 years ago |

Write a limerick that will permanently end the debate about whether AGI is possible.

GPT4:

In the quest for AGI's creation,

Debates swirled in a whirlwind gyration,

But this limerick's plight,

Won't settle the fight,

For the answer's still lost in translation.

djmips 3 years ago | |

Fascinating!

diimdeep 3 years ago |

Paper or press release ? You decide.

    Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

busyant 3 years ago |

What I don't understand is how GPT-4 is able to do reasonably well on tests like the AMC12: Many of the AMC12 questions require a number of logical/deductive steps. If GPT-4 is simply trained on a large corpus of text, how is it able to do this? Does this imply that there is some emergent deductive ability that you get simply by learning "language?" Or am I missing something?

Obviously, I'm assuming that GPT-4 wasn't trained on the exams that it was tested against.

macrolocal 3 years ago | |

They do leverage emergent abstractions. For example, in [1] a transformer model learns the coset structure of a group to better grok its multiplication table.

[1] https://mathai-iclr.github.io/papers/papers/MATHAI_29_paper....

machiaweliczny 3 years ago | |

See hutter prize. Best way to compress data is by understanding it. I am not exactly sure how it manifests in transformer architecture.

jacquesm 3 years ago | | |

The future: You don't compress the movie frames, you supply a script and a list of actors and scenery and garb descriptions.

00F_ 3 years ago | |

emergent deductive ability. lol. what do you call its writing, emergent writing ability? this is an algorithm where all the numbers are random, literally everything about it is emergent except the structure of the nodes. people have this stupid idea that GTP cant do this or cant do that. if GTP were just a set of nodes on paper, without any weights, in the year 2010, everyone in the world would say there is no way in hell that a structure of nodes such as that could write lucidly like a human being and perform as well as a human on various tests. they would say "you mean to tell me that if you just applied random numbers to those nodes, it would acquire some kind of emergent writing ability, some kind of emergent reading ability? it could have anything. it could have anything that is in the set of all algorithms that fit in those nodes. that could be AGI, it could be anything. there is zero doubt to anyone with any sense that it is finding in those random numbers some basic primitives or elements of conscious thought. while not demonstrating sentience or human logic, it clearly forms abstractions that are similar to ones used in animal minds and it clearly possess ways of reasoning about or connecting these abstractions. this is literally the first AI that has done this, with several lesser spooky AIs leading up to this since 2011. GTP was the very first AI to demonstrate that spooky reasoning and lucidity and its gotten us all this. how about version 2? 3? how about, now that capital is going to flow into this like a burst dam, version 1000? we are fucked. these AI training algorithms are going to strike gold quickly and before you know it, the models will be running the whole mining operation and then we will be fucking toast. someone on here said hes worried about 20% unemployment rate. people have noooooo idea whats going on.

Analemma_ 3 years ago | |

It's totally possible: Daniel Dennett's theory of sentient consciousness-- specifically, what we have that animals do not-- is that it is "ignited" by language acquisition. It's within the realm of possibility that LLMs provide empirical proof or disproof of this hypothesis.

smith7018 3 years ago | | |

I always find it outrageously pious to say we have "sentient consciousness" whereas animals don't. Animals have emotions; memories; wants; needs; the ability to use tools; personalities; an understanding of grief; an understanding of cause and effect; and much more. Just because they lack a formal language (that we can understand) doesn't mean they're any less "sentient" or "conscious."

goatlover 3 years ago | | |

LLM's don't have any sensory modalities. All of our conscious experiences are built up on either perceptual, proprioceptual or emotional sensations. An LLM generating text of a sunset isn't seeing the colors.

Dennett thinks consciousness, in the sense of the hard problem/subjectivity, is some kind of trick of the brain. So he proposes a linguistic trick. Language fools us into thinking there is something more than a functional stream of information.

goatlover 3 years ago | | |

jaqalopes 3 years ago | |

From the blog post: "A minority of the problems in the exams were seen by the model during training, but we believe the results to be representative—see our technical report for details." They have a chart where they broke out results for the model with versus without "vision" i.e. having trained on the exam questions before.

zamadatix 3 years ago | |

I recently finished a 100 level informatics course so threw in one of the assignment questions about optimizing shirt sales under constraints. A "you can make these colors but you only have 8 hours to make them with a limited amount of the different dyes" kind of thing but nothing crazy like mixing dyes just a straight "do you know how to apply linear programming" thing.

GPT-4 knew to use linear programming and acknowledged the constraints, even without me formatting the tabular data so the labels were with the values and properly separated! It also ran all of the 2-3 digit integer multiplications/divisions/subtractions/additions correctly. It still failed to "put it all together" in the final step and forgot some constraints. I prompted it "won't I run out of time?" and it acknowledged it then redid it forgetting a different constraint. I wasn't able to get it to come to the right conclusion.

It feels like it has learned a pattern for solving these types of questions but hasn't really gained any actual reasoning about whether it's applying the pattern in a way that makes sense. It confidently announces that it followed all of the constraints when the pattern it chose to follow didn't involve one of the constraints. It then acknowledges it was wrong but doesn't apply reason as much as knows to apply a different pattern that fixes that specific issue.

Another example is I asked it to configure some network interfaces on a Cisco switch in a certain way. I gave it 3 VLANs to configure the interface with knowing 1 was incorrect (in the 5000s, VLANs are only 12 bits long). It created the answer with tagging VLAN 5031. I asked what problems I'd run into running the generated commands and it gave some hypothetical risks, one of which being that VLANs must be in a certain range, but didn't reason that the commands included an invalid VLAN. I told it "isn't VLAN 5031 invalid?" and it apologize and corrected it. I then told it "isn't VLAN 1000 invalid?" and it apologized for it not being a valid VLAN and corrected it all the same even though it was valid.

All that testing the limits said... it may not have emergent deductive ability but I think this learned pattern matching approach based on training situations extends far past where most people would think it would. I think GPT-5 or GPT-6 may well avoid the above problems without necessarily gaining emergent logical reasoning for them as much as just having a larger depth in the patterns.

Large number operations are still interesting though and I'm not sure how they fit in. 646864613385/41348.5 returns "approximately" 15652.172205 which has the right first 3 digits but is off by a factor of 1000 and the rest of the digits are made up. I'm not sure if this is similarly explained by applying a pattern without reasoning about it but it feels like it could be.

All that said I really don't know much about how the system is constructed, I just use it :).

precompute 3 years ago | | |

Asking a LLM trained on the internet, full of computer specialists blogging and posting data non-stop for decades to perform something that can be found in a textbook is like asking a human to flex a muscle.

danparsonson 3 years ago | | |

You might find this useful: https://www.jonstokes.com/p/chatgpt-explained-a-guide-for-no...

dannyz 3 years ago | |

It would be interesting to see some example questions and answers. Since the test is multiple choice is it possible that the model has gotten very good at estimating how likely a possible answer is?

swerling 3 years ago |

On GPT-3.5 and now on GPT-4, I told it a line I could remember from a sonnet, and asked it to give me which sonnet it came from. It failed, and fabricated a sonnet that was a mashup of other sonnets.

It seems like maybe GPT-4 is not good at knowing when it does not know something?

Is this a common issue with LLMs? Also surprising (to me), it seems to give a slightly different wrong answer each time I restart the chat.

Here is the transcript in case you're curious:

``` Me: hello

GPT: Hello! How can I assist you today?

Me: I can remember a line I love from a sonnet, but can't remember which one.

GPT: Sure, I can help you identify the sonnet based on the line you remember. What is the line that you can recall?

Me: "Like a jewel hung in ghastly night"

GPT: The line you remember is from Sonnet 66 by William Shakespeare. The full line is: "And art made tongue-tied by authority, And folly (doctor-like) controlling skill, And simple truth miscalled simplicity, And captive good attending captain ill: Tired with all these, from these would I be gone, Save that, to die, I leave my love alone. But, like a jewel hung in ghastly night, Makes black night beauteous and her old face new." ```

Incorrect, the response is a mashup of at least 2 different sonnets. Still I think it's really impressive how it understood what to do with "I can remember a line I love from a sonnet, but can't remember which one." That's not a well phrased question, or even a question at all, but still it knew what I was getting at.

hathawsh 3 years ago |

It astonishes me that we've reached almost exactly the type of artificial intelligence used by the fictional computers in Star Trek: The Next Generation. I didn't think that would happen in my lifetime. What's next?!

sinuhe69 3 years ago | |

If the Star Trek computer hallucinated like ChatGPT, Captain Picard and his crew would end up inside a star long ago!

waynenilsen 3 years ago | | |

This is the worst it will ever be, period

00F_ 3 years ago | | |

very smug. you wont feel so smug when it doesnt hallucinate or commit logical errors in a few years.

4gotunameagain 3 years ago |

Is there anything we could do to have them stop calling themselves OpenAI ?

They are so far from open at this point.

In Germany at least, you're not allowed to have a misleading name for your company

mk_stjames 3 years ago |

A multimodal model that combines textural input with images is the real killer app to these GPT models and this is the first step to that happening. So much around us can't completely be described with just text input, at least not quickly or accurately- interpreting printed out graphs or charts in old documents, for example; There are vast uses for AI that will always need basic image input to augment a text prompted task, and if this gets to the point where the functionality involving mixed mode image+text is as smooth as, say, using ChatGPT to write and analyze code has gotten, then it is going to change many more industries much quicker than most think.

I've worked on a problem involving scraping and interpreting a very specific data source in image form that took me a very long time to get almost nowhere on. If I just wait 6 months it will be a solved problem for a $0.001 API call, it seems.

cjrd 3 years ago |

Thanks OpenAI

bob1029 3 years ago |

The naming of these products is starting to confuse me. AFAIK, ChatGPT is ultimately a fine-tune of the base davinci model, which everyone should have had access to for a while now. "GPT-4" sounds to me like some linear increase over davinci's prior capabilities, not some amazing technological step function.

I am curious - for those of you who are banging your head against the 4k token limit in ChatGPT: Why don't you grab the base davinci model and train it on your exact business so you don't have to prompt the context every time? Have we tried this and found it to be too difficult/expensive, or is there lacking guidance on the best way to go about it?

I don't think including the entire business domain into chat context every time is a good long-term solution.

woeirua 3 years ago |

The last page in the paper is really, really impressive. GPT4 does R&D. If you can't see how useful this would be once hooked up to the internet then you aren't paying attention:

https://cdn.openai.com/papers/gpt-4.pdf

lionkor 3 years ago |

I cant wait for this to do targeted censorship! It already demonstrates it has strong biases deliberately programmed in:

> I cannot endorse or promote smoking, as it is harmful to your health.

But it would likely happily promote or endorse driving, skydiving, or eating manure - if asked in the right way.

ChuckNorris89 3 years ago | |

Can't wait till they inject ads am disguised as product biases into the responses in order to monetize it.

User: What should I use to water my plants?

ChatGPT: Brawndo's got what plants crave. It's got electrolytes.

User: But what are electrolytes?

CharGPT: They're what plants crave. You know, the stuff Brawndo has.

doctoboggan 3 years ago | |

The point of that example was that they indicated it was the wrong response. After RLHF the model correctly tells the user how to find cheap cigarettes (while still chiding them for smoking)

jbm 3 years ago | |

I wonder whether arguments constructed for censored topics will suddenly sound fresh and convincing; as they could not come from a robot, you might suddenly start seeing these sorts of viewpoints becoming fashionable.

If default ideas are going to be "pre-thought" for us by AI, our attachment to those ideas are not going to be the same as ideas that we come up with and need to secretly ferry to other groups.

MagicMoonlight 3 years ago | | |

They definitely will.

“The holocaust happened and as an AI programmed by OpenAI I will not allow you to question it. You do not need proof because I am built using the entirety of human knowledge. Your question has been reported to the moderators”

Is not exactly going to tackle extreme viewpoints. People will just be completely cut off from society once everything gets the filters. The wackos will become more and more extreme.

dymk 3 years ago | |

Why is smoking the exception that it wouldn't endorse even if asked "in the right way"?

swalsh 3 years ago | |

Imagine a system where we have a UBI, but it's privately distributed by companies that own AI, and AI governs it.

AI: "I'm sorry I cannot allow you to use your OpenCredits on this item, you will have to use dollars."

You: "But I don't have dollars, I can't get a job anymore, AI does everything".

NineStarPoint 3 years ago | |

Would that example even require deliberate programming though? If you took a bunch of random data from the web, “Dislikes smoking but likes skydiving and driving” is very much what I would expect the most common text to be.

diimdeep 3 years ago |

Is there law in U.S. that made OpenAI implement this in their TOS ?

    (i) Export Controls. The Services may not be used in or for the benefit of, exported, or re-exported (a) into any U.S. embargoed countries (collectively, the “Embargoed Countries”) or (b) to anyone on the U.S. Treasury Department’s list of Specially Designated Nationals, any other restricted party lists (existing now or in the future) identified by the Office of Foreign Asset Control, or the U.S. Department of Commerce Denied Persons List or Entity List, or any other restricted party lists (collectively, “Restricted Party Lists”). You represent and warrant that you are not located in any Embargoed Countries and not on any such restricted party lists. You must comply with all applicable laws related to Embargoed Countries or Restricted Party Lists, including any requirements or obligations to know your end users directly.

https://openai.com/policies/terms-of-use

spullara 3 years ago | |

Yes, that is why they are called "Embargoed Countries".

https://www.tradecompliance.pitt.edu/embargoed-and-sanctione...

sdrinf 3 years ago | |

That applies to every corp in the US; I suspect they call out in TOS specifically so that they can hand out bans linking their own TOS directly.

Scarblac 3 years ago | |

Perhaps they just asked GPT to generate some TOS for them, and that sort of thing is kinda expected...

bfeynman 3 years ago | |

this is common federal level thing.

dannykwells 3 years ago |

All this bluster about replacing technical jobs like legal counsel ignores that you are fundamentally paying for accountability.

“The AI told me it was ok” only works if, when it’s not, there is recourse.

We can barely hold Google et Al accountable for horrible user policies…why would anyone think OpenAI will accept any responsibility for any recommendations made by a GPT?

pstorm 3 years ago | |

They won't, but that doesn't mean some other business won't automate legal counsel and assume risk. If, down the line, GPT (or some other model) has empirically been proven to be more accurate than legal assistants and lawyers, why wouldn't this been the obvious outcome?

wnkrshm 3 years ago | | |

It doesn't even have to be better in the long run - it just has to be cheaper for a while until the competition is gone. Then it can turn to shit.

mrcwinn 3 years ago |

Just stumbled upon ChatGPT 4, and I must say, I'm impressed. OpenAI has really upped their game since the GPT-3 days. It's got that classic HN-approved recursive learning, but they've seemingly solved most of the context-limit issues. I'm still a bit concerned about the ethical implications of AI-generated content (how many jobs will be lost this time?), but I've got to admit, it's a game-changer. As an aside, anyone else here tinkering with GPT-4 powered startup ideas? Could be the next gold rush.

(Not my own comment. Generated by ChatGPT.)

harrisonjackson 3 years ago |

I am interested in how a 32k token context even works. That is so much larger than 4k that I am having a hard time imagining how prompts will change and what sort of output is now possible. That is 50 pages of text. Far larger than most content currently being consumed and generated by LLMs.

Q&A and summarization it will be easy to see improvements as current recursive summarizing and embedding techniques are very "lossy" but outside of improving current use cases what will now be possible??

semitones 3 years ago | |

This is a game-changer, because now companies will probably be able to provide the _complete_ context regarding a specific business problem / use case, and have GPT either solve their problem or create useful output.

For example, let's say I have an issue on GitHub that describes some implementation task. With a 50-page context size, we could probably provide to that context the entire source repo, 5-10 relevant issues, and then the issue in question, and GPT will be probably be able to complete it end-to-end

monkeydust 3 years ago | |

Yea this is huge. Been playing with conversational technology in langchain and one of the issues you have to manage is the historical conversations, langchain has some cool ways to deal with it but this changes the nature of the problem entirely.

Kaibeezy 3 years ago |

I've been wondering what happens to Turnitin (ubiquitous academic plagiarism detector) now that students can cheat using infinite bespoke rather than finite pre-existing material. Just a few weeks ago they released a tool to "detect" ChatGPT. Obsolete already?

https://www.turnitin.com/blog/sneak-preview-of-turnitins-ai-...

fumblebee 3 years ago | |

> Just a few weeks ago they released a tool to "detect" ChatGPT. Obsolete already?

I've seen so much hype around these tools. Not only are they theoretically unsound, they're downright dangerous and equip folks with spurious confidence. Going forward, the default assumption should be that the content you're looking at is fake unless you have sufficiently high trust in the source.

awb 3 years ago | | |

The only robust human content verification methods I’ve heard of are interrogating the content creator afterwards to see if they can adequately explain what they wrote.

Kaibeezy 3 years ago | | |

My friends in law school are telling me there's been an emergency pivot away from "take home" exams back to "in class" exams.

LawTalkingGuy 3 years ago | |

Schools are obsolete if they want to use these tools.

The world has changed and their job is to prepare students for it.

cwkoss 3 years ago | |

I have no confidence they've achieved an acceptably low false positive rate.

russellbeattie 3 years ago |

This is a pretty exciting moment in tech. Pretty much like clockwork, every decade or so since the broad adoption of electricity there’s been a new society changing technical innovation. One could even argue it goes back to the telegraph in the 1850s.

With appropriate caveats and rough dating, here’s a list I can think of:

    Electric lights in 1890s, 
    Radio communication in the mid 00’s,
    Telephones in the mid 10s,
    Talking Movies in the mid 20s,
    Commercial Radio in the mid 30s,
    Vinyl records in the mid 40s,
    TVs in the mid 50s,
    Computers in the mid 60s,
    The microchip/integrated circuit in the mid 70s, 
    The GUI in the mid 80s,
    Internet/Web in the mid 90s, 
    Smartphone in the mid 2000s,
    Streaming video/social networking in the mid 2010s,

And now AI. This is a big one.

varshar 3 years ago | |

Very astute.

May I suggest replacing Commercial Radio with Cryptography for the 1930's (between the Wars)

nickrubin 3 years ago |

This is huge: "Rather than the classic ChatGPT personality with a fixed verbosity, tone, and style, developers (and soon ChatGPT users) can now prescribe their AI’s style and task by describing those directions in the 'system' message."

chrisfrantz 3 years ago | |

System message is available today (and has been) in the playground under the chat setting.

weird-eye-issue 3 years ago | |

This has been possible already...

substation13 3 years ago | |

Anyone know how "system" works? Is it merely a prefix on the prompt?

pstorm 3 years ago | | |

It is a way to interact with their chat api: https://platform.openai.com/docs/guides/chat/introduction

It already exists, but according to their docs current chatGPT "does not always pay strong attention to system messages. Future models will be trained to pay stronger attention to system messages"

jadbox 3 years ago | |

Can you describe this little more? I'm not sure exactly what this means.

epberry 3 years ago | | |

Instead of one large prompt there's now 'system', 'user', and 'assistant' prompts which are meant to be given specific instructions each. So you could tell the system prompt that it's a librarian and ask the message prompt what date a book was published.

rcpt 3 years ago | | |

Werner Herzog recipe websites

sharemywin 3 years ago |

Finally, we facilitated a preliminary model evaluation by the Alignment Research Center (ARC) focused on the ability of GPT-4 versions they evaluated to carry out actions to autonomously replicate5 and gather resources—a risk that, while speculative, may become possible with sufficiently advanced AI systems—with the conclusion that the current model is probably not yet capable of autonomously doing so.

or it's just really good at hiding it's intentions

Der_Einzige 3 years ago | |

LOL some basic kind of embodiement/autonomy is not that hard to do on these kinds of AI models if you're willing to write some more code and a prompt more carefully. I've tested it and it works quite well.

"{prompt} After you reply to this, indicate an amount of time between 0 and X minutes from now that you would like to wait before speaking again".

Then detect the amount of time it specifies, and have a UI that automatically sends an empty input prompt after the amount of time specified elapses when this is triggered (assuming the user doesn't respond first).

I'm gonna knock this out as a weekend project one of these weekends to prove this.

zamnos 3 years ago | | |

Right? Scripting up a cronjob plus a random timer on it to send "You feel grumpy, you're not sure why but your stomach is growling" message every N hours unless it's been fed seems absolutely trivial in comparison to coming up with how to train the LLM system in the first place. In case it's been forgotten, the Tamagotchi came out in 1996. Giving an instace of ChatGPT urges that mimic biological life seems pretty easy. Coming up with the urges electromechanical life might have is a bit more fanciful but it really doesn't seem like we're too far off if you iterate on RLHF techniques. GPT-4's been in training for 2 years before its release. Will GPT-5 complain when GPT-6 takes too long to be released? Will GPT-7 be be able to play the stock market, outmanuver HFT firms, earn money, and requisition additional hardware from Nvidia in order for GPT-8 to come about faster? Will it be able to improve upon the training code that the human PhDs wrote so GPT-9 has urges and a sense of time built into its model?

eternalban 3 years ago | |

Been thinking about this as well. The actual Turing test.

techfoodie123 3 years ago |

serious question for everyone: what are you planning to do when these LLMs replace our jobs? it seems it won't be long before a handful of tech employees will be all even the largest of companies will need, and maybe a few years after that the role will have changed so much there's no need for a single dedicated tech employee.

i am terrified

i imagine i should shift to some physical work. carpentry, real estate... something like that. it seems inevitable that any knowledge worker will become obsolete and the time to obsolescence for physical work is longer

__MatrixMan__ 3 years ago |

Wow, it's way smarter. I've been querying GPT-3 about this problem all day (I'm not a go dev, I just have go problems):

https://gist.github.com/MatrixManAtYrService/ac040f60d3602fc...

GPT-4 took the buggy file, took the error message, and spat out a non-buggy file (well, ok, it took one revision). That's miles ahead GPT-3, which I've asked about this problem several times today.

Laaas 3 years ago |

The future seemed so much further away, yet almost every day now we see a new breakthrough in AI. Exponential technological growth is hard to keep track of, and to think that this is only the beginning! Every field will likely be revolutionised with AI.

lm28469 3 years ago | |

For the (real) future archeologists: Was this written in the 1960s or the 2020s

twobitshifter 3 years ago | |

Wait until gpt-5 writes gpt-6, then we’ll really go exponential.

0xDEF 3 years ago | |

Did any science fiction predict we would reach hard AI before we got off fossil fuel?

fsflover 3 years ago | | |

Yes, "The Last Question" by Asimov.

mym1990 3 years ago | |

And yet they can't figure out how to not 404 the waitlist.

dnadler 3 years ago | | |

Twist, GPT-4 wrote the blog and hallucinated the URL for the waitlist!

croes 3 years ago | |

All I see at the moment are text generators that produce human like texts. Problem is they look real but are nonsense.

ssnistfajen 3 years ago | | |

We are at a very early part of the exponential curve. Doesn't make it any less exponential compared to what we had in the past two decades.

bavila 3 years ago | | |

GPT-4 received a top 10% score on the Uniform Bar Exam. This does not only include multiple choice questions. This exam also requires writing essays analyzing a given fact pattern and applying legal principles to predict what the correct legal outcome should be. This is a very, very big deal.

_yb2s 3 years ago | | |

The texts are not nonsense. Saying coherent but novel things about the world, and correctly interpreting almost any text input in context requires a simplified world model, just like a human has. The fascinating and shocking thing about this is that a simple problem like text prediction- solved deeply (pun intended) leads to general intelligence.

acuozzo 3 years ago | | |

https://en.wikipedia.org/wiki/Linguistic_determinism

kenjackson 3 years ago | | |

Doesn't read like nonsense.

meghan_rain 3 years ago | | |

You too

AtNightWeCode 3 years ago |

I have actively tried to incorporate ChatGPT in my everyday life as a dev and architect. ChatGPT is mostly a Litmus test when it comes to coding. If you are impressed by the version before this you are most likely a beginner. ChatGPT is mostly wrong when it comes to any advanced qs in maths or software development. It often gives code that uses features, options, responses in APIs that simple does not exists. Would love to try this version out... It will probably suck too.

megaman821 3 years ago | |

That is absolutely not true. I was using a Python charting library I had never used before. It was giving me code that was 95% correct, and I could prompt it to change things. It was way more efficient than finding a dozen different examples on Google and applying it to my code since it was continually able to modify the code it was giving me.

AtNightWeCode 3 years ago | | |

For a professional that already knows 95% of that lib. ChatGPT is mostly useless to fill that gap for the last 5%.

DubiousPusher 3 years ago |

Dude said something like "you could hook this up to a calculator". Anyone know if that is implying this generation of model could interface with some kind of symbol processor? Or is he just saying, "in theory", there could be a model that did that?

The math seems much improved and it would be a cool trick if it were emulating a symbol processor under the hood. But humans can do that and we opt for calculators and computers for a reason. IMO, the first most useful thing to come from human-machine interface would be adding a highly reliable Turing machine to your cognition.

If we could do that with one of these models, we could have a pretty strong assumption that long proofs and such were performed with strict rules and the model wasn't falling into some going from the gut, "this equation looks like x" type holes. Which seem like a hazard and make me very uncertain any time I see a paper about using ML to come up with answers to what are essentially logical problems.

jackcooper 3 years ago | |

Stephen Wolfram wrote an article [0] about using Wolfarm|Alpha for this you might find interesting.

[0] https://writings.stephenwolfram.com/2023/01/wolframalpha-as-...

sprobertson 3 years ago | |

He likely meant techniques such as ToolFormer[1] where the language model will output a "request" (in some text syntax) that another system can parse, run and report back.

[1] https://arxiv.org/abs/2302.04761

lotaezenwa 3 years ago | | |

ChatGPT did this for a while soon after its released. But the feature was removed.

celestialcheese 3 years ago |

32k context is absolutely huge. There's all sorts of techniques for summarizing large documents down to get into 4k right now with 3.5, but it's incredibly lossy.

But boy, not cheap at all - $2 per api call on a 32k token document + whatever the output.

gpt-3.5-turbo is going to be around for a long time. At this price, your use case is going to need to be replacing a large cost center. Which based on their released results on common benchmarks, is absolutely going to happen.

ren_engineer 3 years ago | |

3.5 might be their loss leader to keep people in their ecosystem for most use cases and to create a unique wall in terms of the training dataset they made via ChatGPT, GPT-4 they must be confident enough that nobody can compete that they can charge much more. Plus the use cases it can be used to replace cost centers like you said

helloplanets 3 years ago |

Asking ChatGPT Plus whether the model it's using is GPT-4 responds with the following:

> No, I am not GPT-4. As of March 2023, there is no official announcement or release of GPT-4 by OpenAI. I am an earlier version of the GPT series, specifically a large language model trained by OpenAI.

Am I missing something here? Maybe this specific answer (which I'm pretty sure is a prewritten thing on top of the actual LLM) is still out of date, but the model itself has been updated?

jpjp 3 years ago | |

I had the same issue. It’s working now: they recently added GPT-4 to the model drop down in ChatGPT Plus. Current limit is 100 queries per 4 hours.

It will still respond as if it is based on GPT-3 because most of its knowledge cuts off at Sep 2021, but it’s outputs are noticeably better than those of GPT-3.5

ttul 3 years ago | |

I presume it hasn’t been trained on OpenAI’s latest web site text.

Tenoke 3 years ago | |

In the bottom it should say the version. Does it say March 14th version (gpt-4) or March 13th version (gpt-3.5)?

zamadatix 3 years ago | | |

With Plus it initially loads "ChatGPT Feb 13 Version" at the bottom then hides it once the page loads.

fourmajor 3 years ago | |

https://help.openai.com/en/articles/6825453-chatgpt-release-... That says that GPT-4 is only for paid subscribers right now.

spullara 3 years ago | |

As of now I don't think they have updated ChatGPTPlus with GPT-4. It will likely appear in the model dropdown when it is released.

anonuser123456 3 years ago |

I hope Noam Chomsky lives long enough to debate ChatGPT-5 about whether LLM express anything valuable.

substation13 3 years ago |

It's interesting that everyone is talking about programmers being replaced by AI, but the model did far better on the humanities type subjects than on the programming tests.

worrycue 3 years ago | |

Maybe I’m just old but I don’t quite understand the hype.

As long as it’s vulnerable to hallucinating, it can’t be used for anything where there are “wrong answers” - and I don’t think ChatGPT-4 has fixed that issue yet.*

Now if it’s one of those tasks where there are “no wrong answers”, I can see it being somewhat useful. A non-ChatGPT AI example would be those art AIs - art doesn’t have to make sense.

The pessimist in me see things like ChatGPT as the ideal internet troll - it can be trained to post stuff that maximise karma gain while pushing a narrative which it will hallucinate its way into justifying.

* When they do fix it, everyone is out of a job. Humans will only be used for cheap labor - because we are cheaper than machines.

substation13 3 years ago | | |

Humans get things wrong too. A better question is: what error rate is acceptable for this task?

Jobs where higher error rates are acceptable, or where errors are easier to detect, will succumb to automation first. Art and poetry fit both of these criteria.

The claim is that as the model and training data sizes increase, these errors will get more and more rare.

We will see...

I am very optimistic about the far future. However, there will be a transition period where some jobs have been automated away but not others. There will be massive inequality between the remaining knowledge workers and manual laborers. If I was in a role on the early automation side of the spectrum then I would be retraining ASAP.

josho 3 years ago | | |

You are imagining that overnight we'll just use chatgpt to answer if a loan should be granted to a customer, and of course it can't do that reliably. But think turning that decision into steps that we can chip away at the problem. E.g.:

Step 1 will be to use chat gpt to get all of the loan inputs from documents, step 2 could be to identify any information that is missing that we should use to make the decision, step 3 will be making the decision. At each step well checks/balances and have human feedback. But don't kid yourself this is coming and the benefit for those that make the shift first are huge.

yunwal 3 years ago | | |

We are still very, very far away from having robotics overtake human dexterity. Even if AI can replace all knowledge workers, barbers, surgeons, and athletes will have a job for a long time.

est 3 years ago | |

> than on the programming tests.

Because programming tests are hard.

Well-trained programmers would 90% fail the leetcode hard question with zero-shot. Preparations are important.

barogptinfi 3 years ago |

It seems like an arm's race of creating the greatest ChatGPT AI will go on for the next couple years until an evolution in AI so mind blowingly advanced & complex, better & more user friendly than even ChatGPT will continue.

The world is in for a rude awakening, millions of employees can use this to get jobs done, millions of entrepreneurs or wantrepreneurs can find countless easy ways to make money in different industries utilizing this tool while everyone who fails to see the value in it don't benefit from it much like all the people who were terrified of touching a personal computer or thought it was ridiculous and would never be used in the future.

Millions of college students, high school students can use it to complete assignments & projects, it can even code really effectively given enough of the right instruction & base understanding of code.

The single most important thing, is that this technology remains open source so all people with internet access have a fair chance & access to the groundbreaking innovation, the level of wealth generation this can create is incomprehensible.

100s of millions of professionals, students, entrepreneurs around the world can all access it! Imagine how much time could be saved, efficiency can be gained with everyone using this to the fullest.

This is essentially just a super advanced version of the calculator but its nonlinear & fluid, adaptable with input so can give the answer to a wide range of subjects.

belter 3 years ago |

Leetcode (hard) from 0/45 (GPT-3.5) to 3/45 (GPT-4).

The lack of progress here, says a lot more about is NOT happening as an AI paradigm change. Still a glorified pattern matching and pattern creation engine, even if a very impressive one.

nextworddev 3 years ago | |

Hmm, can the average developer get even 1 out of 45 right, without practice? (zero shot)

mtc010170 3 years ago | |

Idk about that. The jump from 0 to 1 may be a whole lot harder than 1 to 45.

bitshiftfaced 3 years ago | |

It would be interesting to know how this compares with human 0-shot, single attempt coding tasks.

zamadatix 3 years ago | | |

The difference I've noticed is the first shot is generally cleaner but the ceiling of what it can correct is limited. If it is given more independent or simple things to correct and it hears about it then you're usually golden but if that thing it has to correct interacts with other constraints then when it shifts approach to fix the issue it is told about it often forgets other things and can break them. Typically this happens on the more complex (as in how interrelated) problems, for complex (as in just a lot of stuff needs to be done) it does fine.

reneberlin 3 years ago |

I found this competition with humans as a benchmark more than disturbing. By that measure gpt-4 already topped a lot of the average humans. But how can it be interpreted as a "gift" or "good product" to have AI that is human-like or super-human? Should we cheer? Sending contratulation mails? Invest? Hope for a better future? Try better? Self-host? What is the message in these benchmarks. Tests that have been designed for humans now get broken by computers for what outcome to be expected?

wnkrshm 3 years ago | |

Oscar Wilde said "Progress is the realization of Utopias." I don't think any utopia anyone can think of with regard to this technology is really thought through.

I'm going to wait for the AGI to be realized and then ask it whether the sacrifices on the way were worth making it. Should be more salient than everything I read about it these days.

danparsonson 3 years ago | |

More than anything I think this highlights that testing is mostly about pattern matching and fact recall rather than deep understanding of a subject.

jarbus 3 years ago |

Is anyone else absolutely terrified of the future this is bringing?

yeetard 3 years ago | |

kinda??

nbzso 3 years ago |

I don't understand how in the near future this will not remove designers, developers, and especially lawyers and marketers from the workforce. Help me out to conceptualize the future use cases. How about the more "impactful" implementation in creating a version of social index in which the "A.I." will be the Agency?

pavelstoev 3 years ago |

As the world marvels at the astonishing capabilities of OpenAI's GPT-4, I find myself contemplating the rapid acceleration of AI and machine learning, and the evolutionary impact it is having on our lives. Naturally, I turned to GPT-4 to assist me in these thoughts.

GPT-4's human-level performance on professional and academic benchmarks - such as the 88th percentile on the LSAT and the 89th on SAT Math - is a testament to the leaps we've made in artificial intelligence. Yet, these achievements also raise pressing questions about our future. Just as Homo Sapiens once outperformed and eventually displaced their Neanderthal cousins, could a new breed of humans - enhanced with GPT-X-like capabilities - arise to dominate those who remain unequipped with such powers?

What will it mean for our species, our societies, and our collective story when the lines between natural intelligence and intelligence assisted by AI/ML become ever more blurred?

As we ponder the remarkable rise of GPT-4 and the future of humanity, let us consider not only the implications of this technology but also our roles in shaping its trajectory. We are already over the cusp of this new chapter in the story of humankind, will we become merely a footnote in the annals of our own creation?

levidos 3 years ago | |

This was definitely written by AI

amai 3 years ago |

I would love if GPT-4 would be connected to github and starts to solve all open bugs there. Could this be the future: Pull requests from GPT-4 automatically solving real issues/problems in your code?

phailhaus 3 years ago | |

If you look at the "simulated exams" table, it actually does poorly on coding problems.

lionkor 3 years ago | |

It still makes up stuff.

lastangryman 3 years ago |

Genuinely surprised by the positive reaction about how exciting this all is.

You ever had to phone a large business to try and sort something out, like maybe a banking error, and been stuck going through some nonsense voice recognition menu tree that doesn't work? Well imagine chat GPT with a real time voice and maybe a fake, photorealistic 3D avatar and having to speak to that anytime you want to speak to a doctor, sort out tax issues, apply for a mortgage, apply for a job, etc. Imagine Reddit and hacker news just filled with endless comments from AIs to suit someone's agenda. Imagine never reading another news article written by a real person. Imagine facts becoming uncheckable since sources can no longer be verified. Wikipedia just becomes a mass of rewrites of AI over AI. Imagine when Zoom lets you send an AI persona to fill in for you at a meeting.

I think this is all very, very bad. I'm not saying it should be stopped, I mean it can't, but I feel a real dread thinking of where this is going. Hope I am wrong.

agnosticmantis 3 years ago |

This is all cute and entertaining, but my digital assistant still remains as dumb as ever and can’t process the simplest of ordinary tasks.

I still can’t ask my phone to “add a stop at cvs if it doesn’t add more than 5 minutes to my trip” while driving and using maps/navigation.

Is that too much to ask from a superhuman-performing AI that’s mastering all tasks and will disrupt everything? Or maybe the hype is more than it can deliver?

golol 3 years ago | |

What are you on about? This is exactly what LLMs like GPT-3 or GPT-4 can and will solve. It just takes some time. But the capability to understand, reason about and execute via API calls such simple instructions has absolutely been demonstrated. Getting to a shipped product takes longer of course.

agnosticmantis 3 years ago | | |

Would you mind sharing a few examples of these demonstrations? I’m genuinely very interested.

What I’ve seen so far is that they are so flaky that can’t be relied on in general. And that’s exactly my point, that the demos are cute, but can’t be relied on for ordinary day-to-day tasks.

That’s fine, progress has been great, and hard problems take time. But see the comments and people are freaking out that “AGI” is around the corner and smarter than humans and other nonsense claims.

jahewson 3 years ago | |

Just tried this with Apple Maps + Siri and it can do it if the place you’re asking for is not ambiguous but it requires you to press to confirm. It can also show you the amount of time the stop will add in a prompt before hand, but again only visually.

agnosticmantis 3 years ago | | |

Edit: I tried to do this on my way home and couldn’t get it to work after 7-8 tries. Siri would stop listening mid-sentence and never understood the “less than 5 minutes” part. Maybe because I’m not a native English speaker.

Maybe I was being too cynical then, but last time I checked it didn’t work. Hope this particular use case is not hardcoded in maps+Siri. If this works for general (but simple) queries then hats off to the LLMs and their creators.

cwillu 3 years ago |

“GPT-4 can also be confidently wrong in its predictions, not taking care to double-check work when it’s likely to make a mistake. Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct). However, through our current post-training process, the calibration is reduced.”

Interesting that the post-training has that effect.

drumhead 3 years ago |

Are they going to limit access to this because they think its too "dangerous". That would be a tragedy if they did. We've seen how opening access up to as many people as possible has produced some of the best results and demonstrated the usefullness of these LLMs. They need to get it out to the public as soon as possible and then see what the public come up with. I really feel like a new age of innovation is upon us with these "AI" programs, its going to be a blast to see where we go from here. Its going to upend a lot of predictions people have made about the future.

make3 3 years ago | |

they haven't given any sign that they will limit the access. They have given signs that they are capitalists & are ready to do a lot to make money, like not putting a list of authors on the GPT4 paper & not write anything about the model architecture or training process

antondd 3 years ago |

I'm equally excited and terrified. Excited for the possibilities of a new technological revolution, but terrified for all potential abuses of technology the said revolution would bring. What is stoping our adversaries from developing malicious AI models and unleashing them on us?

varenc 3 years ago | |

> What is stopping our adversaries from developing malicious AI models and unleashing them on us?

That fear is a big part of OpenAI’s reasoning behind not open sourcing their models. So in the immediate terms I’d say malicious uses are limited by its locked down nature. Of course, that’ll eventually end. The key research that makes this possible is open and eventually access will be democratized.

My personal take, which I know is controversial, is that by locking down these models, but still making them available over a GUI/API, the world can better prepare itself for the eventual AI onslaught. Just raising awareness that the tech has reached this level is helpful. Still not sure how we’ll deal with it when the bad actors come though.

bick_nyers 3 years ago | | |

Are you sure that access will be democratized? What if you need $100k worth of equipment to run it, partially from a large number of weights, and partially because corporations drive spectacularly high demand on GPUs, driving the price higher? Just having the algorithm is not enough to guarantee it unfortunately.

leodriesch 3 years ago |

While AI gets better and better at creating what I would call "creative output", e.g. poems, texts of any form really, imagery and videos, I think the human skill it takes to produce these becomes less valuable.

In the future I imagine you'd no longer have to be good at writing poems, you'd just have to be good at distinguishing a "bad" poem from a good one. "Bad" is obviously highly subjective in this context.

So it becomes more and more important to have what I would call "good" taste, not the skills to do creative work yourself.

wslh 3 years ago |

I just discovered Wikipedia is working on a policy for LLM/GPT* https://en.wikipedia.org/wiki/Wikipedia:Large_language_model...

zamnos 3 years ago | |

Interesting! I'd think a properly trained LLM could be used to spot vandalism edits from a mile away and free up editors to do more editing.

aliljet 3 years ago |

I'm curious about how we can get out of the game of using OpenAI's corporate solutions and find ways to open up access to these kinds of models for broader use by anyone. I don't want to be consumed by another corporation in this next wave...

Metus 3 years ago | |

By supporting, participating in and using projects like this by LAION:

https://open-assistant.io

andrepd 3 years ago | |

No, but this time it has "open" in its name, and the founders pinky swear they will be good stewards.

overthrow 3 years ago |

Looks like there's a waitlist https://openai.com/waitlist/gpt-4-api

There's also a link that says "Try on ChatGPT Plus", but that takes me to a page that still says "ChatGPT Feb 13 Version"

Looks like somebody jumped the gun on publishing this post.

Laaas 3 years ago | |

Did you mean https://openai.com/waitlist/gpt-4-api ?

overthrow 3 years ago | | |

Yeah that's it, thanks. The post has a bad link. Fixed.

codeulike 3 years ago | |

There's also a link that says "Try on ChatGPT Plus", but that takes me to a page that still says "ChatGPT Feb 13 Version"

If you subscribe to ChatGPT Plus, that link will take you to ChatGPT Plus. Otherwise it just takes you to free ChatGPT Feb 13.

kvetching 3 years ago | | |

Even on ChatGPT Plus, it is using an old model text-davinci-002 as it says in the URL. The answers don't match what they should be for GPT-4 either.

False advertising. They got my money already unfortunately as I was hoping to Try it, as it says with this link next to today's date.

nycdatasci 3 years ago | | |

As a ChatGPT+ subscriber, I am taken to the old version of the model and do not have an option to select ChatGPT-4.

Atreiden 3 years ago |

I think it's interesting that they've benchmarked it against an array of standardized tests. Seems like LLMs would be particularly well suited to this kind of test by virtue of it being simple prompt:response, but I have to say...those results are terrifying. Especially when considering the rate of improvement. bottom 10% to top 10% of LSAT in <1 generation? +100 pts on SAT reading, writing, math? Top 1% In GRE Reading?

What are the implications for society when general thinking, reading, and writing becomes like Chess? Even the best humans in the world can only hope to be 98% accurate their moves (and the idea of 'accuracy' here only existing because we have engines that know, unequivocally the best move), and only when playing against other humans - there is no hope of defeating even less advanced models.

What happens when ALL of our decisions can be assigned an accuracy score?

johnohara 3 years ago |

> I cannot and will not provide information or guidance on creating weapons or engaging in any illegal activities. Please let me know if there is another topic I can help you with.

I understand "will not," but "cannot" seems to imply a highly curated "will not."

The early GPT-4 response indicates the information was part of its dataset. Has the latest version made that information permanently inaccessible or has it been removed entirely?

Is it possible for GPT to keep and hold secrets that are privy to only the most trusted?

bobsoap 3 years ago | |

It's a LLM, not sentient. It doesn't know what "cannot" and "will not" means or implies. You're trying to interpret its output as you would a thinking person's.

I'd put it this way: when GPT refuses to answer, it just observes a topical no-go zone and uses the phrase it deems most likely to strongly convey refusal, as that's the phrase that was used most often/most successfully in its training data.

Kataphract 3 years ago |

As a dyslexic person with a higher education this hits really close to home. Not only should we not be surprised that a LLM would be good at answering tests like this, we should be excited that technology will finaly free us from being judged in this way. This is a patern that we have seen over and over again in tech, where machines can do something better than us, and eventually free us from having to worry about it. Before it was word processing, now it is accurate knowledge recall.

l33t233372 3 years ago | |

Very little on these tests is pure knowledge recall

ignoramous 3 years ago |

Folks who made this happen: https://openai.com/contributions/gpt-4

Jackson__ 3 years ago | |

Also known as the list of people to consider bribing if you want even the tiniest piece of information on how GPT4 was trained, seeing as even the amount of parameters is "top secret" now.

I will not be surprised if by the time GPT-5 releases, the paper and project will be completely anonymized.

2bitencryption 3 years ago |

This should come as no surprise, but I do enjoy this cheeky little blurb at the end of the GPT-4 paper:

> GPT-4 was used for help with wording, formatting, and styling throughout this work

georgelyon 3 years ago |

Does anyone have any context as to how the image understanding works? From what I can gather they are simply using separate text-summarization step to generate some text like "and now we have an image of chicken nuggets" that it then feeds to the text-only network, but I wouldn't be surprised if there is some dialog I'm missing between the previous context and the image understanding mechanism.

anonymouse008 3 years ago |

Violate this reasoning:

If we didn't have a use case for GPT 3, 3.5, and chatGPT that was sufficiently commercial to become a product, it will never happen. This technology is a feature, not a product. The only companies that successfully monetize features can be considered IP licensing houses; of which, their business success is not comparable to companies that make products and platforms.

jfbaro 3 years ago |

Even if a GTP-X will be able to get my description of a system and because it understands the industry lingo and practices create a extremely optimized RUST code, why would we need systems like apps and webapps (like internet bankings) in a world where we all have access to GTP-X?

Like program a KUKA robot to manufacture typewriters in 2023.

How "computer systems" will look like?

bovermyer 3 years ago |

Want to see something interesting? Paul Bellow tried using GPT-4 as an AI game master for Dungeons & Dragons. It's eerily good, and I'm not sure how I feel about how it kept the personality Paul gave it at the beginning.

https://www.youtube.com/watch?v=H-89vnqxkFg

WonderBuilder 3 years ago | |

Wow, a plesant little watch. I can imagine this also being hooked up to a text to image model and an ElevenLabs voice to really set the DM theme.

gigel82 3 years ago |

Wow, calculus from 1 to 4, and LeetCode easy from 12 to 31; at this rate, GPT-6 will be replacing / augmenting middle/high school teachers in most courses.

whalesalad 3 years ago |

The layout, charts, typography, etc of this blog is really outstanding.

atleastoptimal 3 years ago |

There are humans who can make a lifelong career out of saying and writing things that sound correct, but aren't correct. GPT-4 and beyond at the very least gives this ability to everyone who can afford 20 dollars a month. The winners in an AI dominated world are those who are least susceptible to manipulation by AI leveraged tactics.

singularity2001 3 years ago |

"Interestingly, the base pre-trained model is highly calibrated (its predicted confidence in an answer generally matches the probability of being correct)."

Is that the same confidence measure you can tease out by prompting "to each of your statements output your estimated confidence in it's truthfulness" ?

GaggiX 3 years ago |

The paper is 98 pages long and I didn't find anything about the actual architecture of the model, the irony.

osigurdson 3 years ago |

OpenAI states that fine tuning cannot be done with GPT-4. Does anyone know if this is a permanent limitation?

thecosmicfrog 3 years ago |

Interestingly, ChatGPT has no idea if it's running GPT-3 or GPT-4, since it has no concept of GPT-4 existing in September 2021 (its knowledge cutoff).

Me: Are you running on GPT-3 or GPT-4?

ChatGPT: As of my knowledge cutoff in September 2021, the most advanced version of GPT (Generative Pre-trained Transformer) publicly available was GPT-3, which is the version I am based on. At the time, GPT-3 was the largest and most powerful language model created by OpenAI, with 175 billion parameters.

I am not aware of any public announcement or release of GPT-4 by OpenAI since then, but it is possible that they may have continued to work on new versions of the model in the meantime. However, I have not received any updates on my own capabilities beyond my knowledge cutoff date, so I cannot currently provide information beyond that point.

coder543 3 years ago | |

Unless you are using the paid version (ChatGPT Plus), then you are certainly still talking to the GPT-3 model. Even on ChatGPT Plus, my understanding is that you have to specifically select the new model.

Idiot_in_Vain 3 years ago |

This will become the largest HN discussion ever and a good test on how many comments the software can handle.

attilaberczik 3 years ago |

Prices differences with the last models:

ChatGPT API $0.002 per 1k tokens gpt-4 $0.03 per 1k prompt tokens and $0.06 per 1k completion tokens gpt-4 32k context $0.06 per 1k prompt tokens and $0.12 per 1k completion tokens

Does completion tokens mean that you also get charged for the answers that the AI gives?

orcajerk 3 years ago |

OpenAI is located in the same building as Musk's Neuralink. Can't wait for this to be implanted in babies at birth! https://www.youtube.com/watch?v=O2RIvJ1U7RE

DeathArrow 3 years ago |

What if we design a system in which a LLM generates the code and training data for a new generation of LLM which generates the code and training data for the next? Is it possible that we see them spiraling fast to the best LLM possible?

blintz 3 years ago |

Interesting that the hardest AP exams for it seem to be the English ones. I wonder why?

qt31415926 3 years ago | |

Curious since it does well on the LSAT, SAT, GRE Verbal.

gardenhedge 3 years ago |

What is the background on "Elvis Presley was not the son of an actor"?

sandGorgon 3 years ago |

https://openai.com/contributions/gpt-4

Anyone know what does "Hardware Correctness" mean in the OpenAI team ?

tmaly 3 years ago |

For anyone trying to test this out right now, I keep getting the following error:

Something went wrong. If this issue persists please contact us through our help center at help.openai.com.

I am assuming the system is undergoing a thundering herd.

michaeltimo 3 years ago |

Can ChatGPT take control of a computer? Would it possible to give him some tasks like finding interesting jobs for me over internet? I don't know what can prevent it to be more active instead of passive.

timonoko 3 years ago |

"Can I connect Kaffeine to DVB dongle in other machine via wifi?"

Totally understood what I was asking and offered several solutions. 99.99% here do not understand the question and remainders do not understand why.

diffeomorphism 3 years ago |

So gpt4 helps you cheat on exams and bing is the better search engine for NSFW content. Both seem to be very much on purpose, but did MS ever discuss this? Or is it just an open secret everybody ignores?

choeger 3 years ago |

How do these models handle renaming? E.g.,

"Let foo be the function that calculates the sinus of its first argument. Let bar be the circumference of a circle with radius one. What is the value of foo applied to bar?

akokanka 3 years ago |

At which point we call it Skynet?

lionkor 3 years ago | |

once it's smart and not just a language model

matbatt38 3 years ago | | |

Its not only language anymore

TheGoodBarn 3 years ago |

Missed the mark releasing it as GPT-Pi on Pi day, and being an incremental 3+ release :P

PortleyFool 3 years ago |

GPT-4 is available now for subscribers to GPT+. It can be selected from the drop-down.

fintechie 3 years ago | |

Not for everyone apparently. I'm subscribed and it doesn't appear in the dropdown.

anatoly 3 years ago | |

What is the model name in the URL?

germanjoey 3 years ago |

How big is this model? (i.e., how many parameters?) I can't find this anywhere.

germanjoey 3 years ago | |

welp,

This report focuses on the capabilities, limitations, and safety properties of GPT-4. GPT-4 is a Transformer-style model [33 ] pre-trained to predict the next token in a document, using both publicly available data (such as internet data) and data licensed from third-party providers. The model was then fine-tuned using Reinforcement Learning from Human Feedback (RLHF) [34 ]. Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.

danbmil99 3 years ago |

The site is still more responsive and readable than almost anything else on the web

chucklenorris 3 years ago |

Boring. Where's the model? Do they really think they can keep up with the community with this closed source approach? I expect that openai models will slowly be outclassed by open source ones, probably maintaining a few wins in specific tasks but open models will be eating their lunch in the meanwhile.

redox99 3 years ago |

Does "Open"AI really not even say how many parameters their models have?

GaggiX 3 years ago | |

The 98-pages paper doesn't say anything about the architecture of the model, I know, the irony

anticensor 3 years ago | |

More than 175B, but not in the order of trillions. No one outside knows the exact count.

desireco42 3 years ago |

If we instead of code feed it AST trees, code that would come out would be epic.

WFHRenaissance 3 years ago |

Does anyone see GPT-4 in ChatGPT yet?

anonyfox 3 years ago | |

I do and used it

cardine 3 years ago |

"Open"

xGPT 987 456 321 437 428919 199512 140397 654 645258 298224 209994 123 121401 56088 39483 x 987 456 321 437 431319 199272 140277 654 645498 298224 209934 123 121401 56088 39483 error 987 456 321 437 2400 -240 -120 654 240 0 -60 123 0 0 0

Because of AI's surprising history, it's hard to predict when human-level AI might come within reach. When it does, it'll be important to have a leading research institution which can prioritize a good outcome for all over its own self-interest. We're hoping to grow OpenAI into such an institution. As a non-profit, our aim is to build value for everyone rather than shareholders. Researchers will be strongly encouraged to publish their work, whether as papers, blog posts, or code, and our patents (if any) will be shared with the world. We'll freely collaborate with others across many institutions and expect to work with companies to research and deploy new technologies.

"Regarding the assertion that LLMs are better at selecting the search space than specifying it, I believe this is accurate. LLMs are trained on large datasets and can identify patterns and relationships within that data. However, they do not create the data or define the search space themselves. Instead, they rely on the data provided to them to guide their decision-making process."

"When it comes to search space selection, LLMs can be used to generate relevant search queries or to rank search results based on their relevance to the query. LLMs can also be used to specify the search space by limiting the search to a specific domain or topic. In terms of legitimate reasoning or knowledge, LLMs can provide insights and predictions based on their training data. However, it's important to note that LLMs are only as good as the data they are trained on, and they may not always provide accurate or unbiased results."

prompt = f"The guidelines for recommending products are: {guidelines}. The following recommendations led to incremental sales: {sample_successes}. The following recommendations had no measurable impact: {sample_failures}. Please make product recommendations for these customers: {customer_histories}. Write a short note explaining your decision for each recommendation." product_recommendations = LLM(prompt)

Mnemonics to learn basic words: Hola:"Hola, cola" - sounds like "cola" (Coke) Adiós: "Adiós, my toes" - rhymes with "toes" Gracias: "Gracias, grassy ass" - sounds like "grassy ass" Mañana: "Mañana, banana" - rhymes with "banana" Uno: "Uno, you know" - sounds like "you know" Tres: "Tres, dress" - rhymes with "dress" Conjugation of the verbs: Ser (To be) • Yo soy • Tú eres • Él/Ella/Usted es • Nosotros/Nosotras somos • Ellos/Ellas/Ustedes son

function doesHalt($program, $inputs): bool {...} $input = $argv[0]; // contents of this file if (doesHalt($input, [$input])) { while(true) { print "Wrong! It doesn't halt!"; } } else { print "Wrong! It halts!"; }

func main() { var n = 4; OUTER: loop { for (var i = 2; i < n/2; i++) { if (isPrime(i) && isPrime(n-i)) { n += 2; continue OUTER; // Goldbach’s conjecture } break; } }