New models and developer products

New models and developer products(openai.com)

1146 points by kevin_hu 2 years ago | 545 comments

dang 2 years ago |

Related ongoing threads:

GPTs: Custom versions of ChatGPT - https://news.ycombinator.com/item?id=38166431

OpenAI releases Whisper v3, new generation open source ASR model - https://news.ycombinator.com/item?id=38166965

OpenAI DevDay, Opening Keynote Livestream [video] - https://news.ycombinator.com/item?id=38165090

minimaxir 2 years ago |

Most of the products announced (and the price cuts) appear to be more about increasing lock-in to the OpenAI API platform, which is not surprising given increased competition in the space. The GPTs/GPT Agents and Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

I'm mixed on the presentation and will need to read the fine print on the API docs on all of these things, which have been updated just now: https://platform.openai.com/docs/api-reference

The pricing page has now updated as well: https://openai.com/pricing

Notably, the DALL-E 3 API is $0.04 per image which is an order of magnitude above everyone else in the space.

EDIT: One interesting observation with the new OpenAI pricing structure not mentioned during the keynote: finetuned ChatGPT 3.5 is now 3x of the cost of the base ChatGPT 3.5, down from 8x the cost. That makes finetuning a more compelling option.

faeriechangling 2 years ago | |

It's a good strategy. For me, avoiding the moat means either a big drop in quality and just ending up in somebody elses moat, or a big drop in quality and a lot more money spent. I've looked into it and maybe the most practical end-to-end system for owning my own LLM is to run a couple of 3090s on a consumer motherboard at substantial running cost to keep them up 24/7 and that's not powerful enough to cut it and rather expensive simultaniously. For a bit more expense, you can get more quality and lower running costs and much slower processing from buying a 128gb/192gb apple silicon setup and that's much much much slower than the "Turbo" services that OpenAI offers.

I think the biggest thing pushing me away from OpenAI was they were subsidizing the chat experience much more than the API and this seems to reconcile that quite a bit. Quite simply OpenAI is sweetening the pot here too much for me to really ignore, this is a massively subsdizised service. I honestly don't feel the switching costs in the future will outweigh the benefits I'm getting now.

swatcoder 2 years ago | | |

Everybody's got their own calculus about how competitive their space is and what this tech can do for them, but some might be best off dancing around lock-in by being careful about what they use from OpenAI and how tightly they integrate with it.

This is very early in the maturity cycle for this tech. The options that will be available for private inference and fine tuning, for cloud-gpu/timeshare inference and fine tuning, and for competing hosted solutions are going to vastly different as months go by. What looks like squeezing value out of OpenAI today might look a lot like technical debt and frustrating lock-in a year from now.

That's what they're hoping you chase after, and if your product is defined by this technology, maybe that's what you have to do. But if you're just thinking about feature opportunities for a more robust product, judiciousness could pay off better than rushing. For now.

jpalomaki 2 years ago | | |

ChatGPT only costs a few dollars, but I'm also "paying" for the service by contributing training data to OpenAI.

Getting access to this type of interaction data with (mostly) humans must be quite valuable asset.

muttled 2 years ago | | |

For me personally, being able to fine-tune the local LLM's at a much higher rank and training more layers is very useful for (somewhat unreliably) embedding information. AFAIK the OpenAI fine-tuning is more geared towards formatting the output.

s3p 2 years ago | | |

This isn't subsidized. OpenAI makes money on their API and ChatGPT pricing.

smcleod 2 years ago | | |

A good strategy for who? Society? Customers? The future? Or just for making money for the owners?

mirekrusin 2 years ago | | |

Obviously it is good strategy, surely created from GPT.

ebiester 2 years ago | |

I don't understand the lock-in argument here. Yes, if a competitor comes in there will be switching cost as everything is re-learned. However, from a code perspective, it is a function of the key and a relatively small API. New regulations outstanding, what is stoping someone from moving from OpenAI to Anthropic (for example) other than the cost of learning how to effectively utilize Anthropic for your use case?

OpenAI doesn't have some sort of egress feed for your database.

minimaxir 2 years ago | | |

> OpenAI doesn't have some sort of egress feed for your database.

That's what they're trying to incentivize, especically with being able to upload files for their own implementation of RAG. You're not getting the vector representation of those files back, and switching to another provider will require rebuilding and testing that infrastructure.

lolinder 2 years ago | | |

> However, from a code perspective, it is a function of the key and a relatively small API.

You're thinking of traditional apps and APIs.

In an AI application, most of the work is in prompt engineering, not wiring up the API to your app. Prompts that work well for one model will fail horribly for another. People spend months refining their prompts before they're safe to share with users, and switching platforms will require doing most of that refinement over again.

mvkel 2 years ago | |

> Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

I'd argue the opposite. The new "Threads" interface in the OpenAI admin section lets you see exactly how it's interpreting input/output specifically to address the black box effect.

Source: https://platform.openai.com/docs/api-reference/runs/listRunS... tells you exactly how it's stepping through the chain. Even more visibility than there used to be.

DerJacques 2 years ago | | |

I agree that some parts of the process now seem more like “open”, but there is definitely a lot more magic in the new processing. Namely, threads can have an arbitrary length, and OpenAI automatically handles context window management for you. Their API now also handles retrieval of information from raw files, so you don’t need to worry about embeddings.

Lastly, you don’t even need any sort of database to keep track of threads and messages. The API is now stateful!

I think that most of these changes are exciting and make it a lot easier for people to get started. There is no doubt in my mind though that the API is now an even bigger blackbox, and lock-in is slightly increased depending on how you integrate with it.

visarga 2 years ago | |

Mistral + 2 weeks of work from the community. Not as good, but private and free. It will trail OpenAI by 6-12 months in capabilities.

coder543 2 years ago | | |

OpenAI offering 128k context is very appealing, however.

I tried some Mistral variants with larger context windows, and had very poor results… the model would often offer either an empty completion or a nonsensical completion, even though the content fit comfortably within the context window, and I was placing a direct question either at the beginning or end, and either with or without an explanation of the task and the content. Large contexts just felt broken. There are so many ways that we are more than “two weeks” from the open source solutions matching what OpenAI offers.

And that’s to say nothing of how far behind these smaller models are in terms of accuracy or instruction following.

For now, 6-12 months behind also isn’t good enough. In the uncertain case that this stays true, then a year from now the open models could be perfectly adequate for many use cases… but it’s very hard to predict the progression of these technologies.

asaddhamani 2 years ago | |

Their products are incredible though. I’ve tried the alternatives and even Claude is not nearly as good as even ChatGPT. Claude gives an ethics lecture with every second reply, which costs me money each time and makes their product very difficult to (want to) embed.

whywhywhywhy 2 years ago | | |

Honestly the companies that completely ignore ethics are the only ones who are going to scoop up any market share outside of OpenAI.

Getting a chiding lecture every time you ask an AI to do something does absolutely nothing for the end user other than waste their time. "AI Safety" academics are memeing themselves out of the future of this tech and leaving the gate wide open for "unsafe" AI to flourish with this farcical behavior.

ChatGTP 2 years ago | | |

What are you using it for? I want to know what people actually use these things for damn it !

davidbarker 2 years ago | |

Also, DALL·E 3 "HD" is double the price at $0.08. I'm curious to play around with it once the API changes go live later today.

The docs say:

> By default, images are generated at standard quality, but when using DALL·E 3 you can set quality: "hd" for enhanced detail. Square, standard quality images are the fastest to generate.

https://platform.openai.com/docs/guides/images/usage

maranathapemje 2 years ago | | |

This is great, but Dall.E still has a long way to go before reaching midjourney standards and I'm curious to see how they can pull that off.

Terretta 2 years ago | |

> most of the products announced (and the price cuts) appear to be more about increasing lock-in to the OpenAI API platform

OpenAI is currently refusing far more enterprises than these products could "lock-in" even with 100% stickiness.

Makes it unlikely this is about lock-in or fighting churn when arguably, the best advertisement for GPT-4 is comparing its raw results to any other LLM.

If you said their goal was fomenting FOMO, I'd buy it. Curious, though, when they'll let the FOMO fulfillment rate go up by accepting revenue for servicing that demand.

vsareto 2 years ago | |

>The GPTs/GPT Agents and Assistants demos in particular showed that they are a black box within a black box within a black box that you can't port anywhere else.

This just rings hollow to me. We lost the fights for database portability, cloud portability, payments/billing portability, and other individual SaaS lock-in. I don't see why it'll be different this time around.

kortilla 2 years ago | | |

> We lost the fights for database portability, cloud portability, payments/billing portability, and other individual SaaS lock-in.

No we didn’t. There are viable on-prem alternatives or cross cloud alternatives for everything popular on the cloud.

Many companies did choose to hand their destiny over to cloud providers but lots didn’t.

activescott 2 years ago | |

I think it's more about finding places to add value than "lock in" per se. It seems they're adding value with improved developer experience and cost/performance rather than on the models themselves. Not necessarily nefarious attempts to lock in customers, but it may have the same outcome :)

stavros 2 years ago | |

What do you mean "orders of magnitude above" for DALL-E? As far as I can see, Midjourney is $0.05 per image, and that's if you don't forget you have a subscription. I've ended up paying $10 per image.

spankalee 2 years ago | |

A friend of mine is building Zep (https://www.getzep.com/), which seems to offer a lot of the Assistant + Retrieval functionality in a self-hostable and model-agnostic way. That type of project may the way around lock-in.

Vipsy 2 years ago | |

Anything open about OpenAI starts and ends with the name

openquery 2 years ago |

If I had no contact with society from the 29th of November 2022 (the day before ChatGPT was released according to Wikipedia) and came back today to see the OpenAI keynote I would have lost my mind.

The progress and usefulness of these products is absolutely incredible.

modeless 2 years ago |

Whisper V3 is released! https://github.com/openai/whisper/commit/c5d42560760a05584c1...

Looks like it's just a new checkpoint for the large model. It would be nice to have updates for the smaller models too. But it'll be easy to integrate with anything using Whisper V2. I'm excited to add it to my local voice AI (https://www.microsoft.com/store/apps/9NC624PBFGB7)

I assume ChatGPT voice has been using Whisper V3 and I've noticed that it still has the classic Whisper hallucinations ("Thank you for watching!"), so I guess it's an incremental improvement but not revolutionary.

topicseed 2 years ago |

128,000 token context, Assistants API, JSON mode, April 2023 knowledge cutoff, GPT 4 Turbo, lower pricing, custom GPTs, a good bunch of announcements all-round!

https://openai.com/pricing

Alifatisk 2 years ago | |

I thought GPT-4 had access to internet now?

jeppebemad 2 years ago | | |

The “browse with bing” feature allows it to fetch a single webpage into the context, but the new cutoff allows _everything crawled_ to be context (up to the new date, that is)

qup 2 years ago | | |

Per the announcement, the "GPTs" do, natively.

I think everyone else had been hacking it on via "functions"

saliagato 2 years ago |

You can now [1] pay from $2 to $3 million to pretrain custom gpt-n model. This has gone unnoticed but seems really neat. Provided that a start-up has enough money spend on that, it would certainly give competitive advantage.

[1] https://openai.com/form/custom-models

Edit: forgot to put the link

MagicMoonlight 2 years ago | |

Well it won’t because they’ll use the model you paid for and take your customers.

govg 2 years ago | | |

Is it the same as avoiding AWS because they will take your software and run it themselves to steal your clients?

somsak2 2 years ago | | |

How do you square this with OpenAI's assertion that they never use data from enterprise customers for their own training? Are you suggesting they're lying?

drubio 2 years ago | | |

I had multiple déjà vus, from the API golden days "build anything you want", social media telling you "build your communities with us", to app stores "we take care of the distribution"...it all works, until it doesn't...if they decide to change terms, pricing or make your product/service redundant because its 'strategic'.

It's their platform, their business, their rules.

infecto 2 years ago | | |

Do you have proof or are you just throwing baseless accusations out there?

teaearlgraycold 2 years ago | | |

No. The downside is you spent a lot of money on a model and don’t own it.

Balgair 2 years ago | |

Wow, this is directly going to affect my company in the near term. We had been trying to do it all internally but have found little success. Even at ~$3M it's going to be an attractive choice.

drubio 2 years ago | |

If your an OpenAI end customer, you'll be fine pre-training gpt-n models for your business; if you're an OpenAI middleman pre-training gpt-n models for other customers, what makes you think OpenAI won't eventually bypass you ? Lookup startups built around APIs and platforms, for every success, there's a graveyard filled due to APIs and platforms changing the rules.

jack_riminton 2 years ago |

For all the naysayers in the comments, the elephant in the room that no one quite wants to admit, is that GPT4 is still far better than everything else out there

nmfisher 2 years ago | |

I cancelled my GPT4 subscription because I found Claude more useful for code and Qwen for Chinese language tasks.

It might be better on average but I don’t think it’s better for every task.

All the others are only going to get better too.

qingcharles 2 years ago | | |

Claude is superior for me on writing summaries of large documents.

jack_riminton 2 years ago | | |

"All the others are only going to get better too."

Yes, including OpenAI, who were already miles ahead :)

unshavedyak 2 years ago | | |

Can you go into depth? I’ve used ChatGPT Pro and Phind extensively, didn’t know about Claude and code. Curious to give it a try

icelancer 2 years ago | |

Phind is the only thing that I've supplemented GPT-4 with, which is still pretty impressive.

kossTKR 2 years ago | |

Is there anything promising out there?

Is crowd sourced training still unfeasible?

I remember how fast the diffusion world moved in the first year but it seems it's stalled somewhat compared to first midjourney then Dall-e 3. Is it the same with text models?

bugglebeetle 2 years ago | | |

GPT-4 is the best general model and specifically very good at coding, if correctly promoted. Lots of open source stuff is good at various tasks (e.g. NLP stuff), but nothing is near to the same overall level of performance.

mezeek 2 years ago | |

Grok? Just kidding

Zaheer 2 years ago |

The playbook OpenAI is following is similar to AWS. Start with the primitives (Text generation, Image generation, etc / EC2, S3, RDS, etc) and build value add services on top of it (Assistants API / all other AWS services). They're miles ahead of AWS and other competitors in this regard.

gumballindie 2 years ago | |

And just like amazon they will compete with their own customers. They are miles ahead in this regard as well since they basically take everyone’s digital property and resell it.

dave1010uk 2 years ago | | |

This is essentially the "Innovate - Leverage - Commoditise" strategy, which Simon Wardley (as in Wardley Mapping) explains:

https://blog.gardeviance.org/2014/03/understanding-ecosystem...

sharemywin 2 years ago | | |

don't hate the player hate the game.

somsak2 2 years ago | |

I don't know if I'd say "miles ahead." AWS had 7 years of basically no other competition -- all of the other big clouds of today had their heads in the sand. OpenAI has a bunch of people competing already. They may not be as good on the leaderboards now, but they're certainly not having to play catch up from years of ignoring the space.

zizee 2 years ago |

In people's experience with these sorts of tools, have they assisted with maintainance of codebases? This might be directly, or indirectly via more readable, bette organized code.

The reason I ask is that these tools seem to excel in helping to write new code. In my experience I think there is an upper limit to the amount of code a single developer can maintain. Eventually you can't keep everything in your head, so maintaining it becomes more effort as you need to stop to familiarize yourself with something.

If these tools help to write more code, but do not assist with maintainance, I wonder if we're going to see masses of new code written really quickly, and then everything grinds to a halt, because no one has an intimate understanding of what was written?

gwern 2 years ago |

> We’re also launching a feature to return the log probabilities for the most likely output tokens generated by GPT-4 Turbo and GPT-3.5 Turbo in the next few weeks, which will be useful for building features such as autocomplete in a search experience.

This is very surprising to me. Are they not worried about people not just training on GPT-4 outputs to steal the model capabilities, but doing full blown logit knowledge-distillation? (Which is the reason everyone assumed that they disabled logit access in the first place.)

leobg 2 years ago | |

How many GBs worth of logits would you need to reverse engineer their model? Also, if it’s a conglomerate of models that they’re using, you’d end up in a blind alley.

gwern 2 years ago | | |

Considering how well simply reusing GPT-3.5/4 outputs has worked to juice rival model performance, at least in relatively narrow benchmarking, I dunno how many GBs it'd take, but probably not that many, and it's a straightforward easy way to turn money into performance at a much lower cost than buying a few thousand more H100s.

danielmarkbruce 2 years ago | |

I thought the same thing.... My guess is they did a lot of analysis and decided it would be safe enough to do? "most likely" might be literally a handful and cover little of the entire distribution % wise?

nwoli 2 years ago | |

I guess the EO takes care of that in their eyes (outlawing open models). They’re probably right too

dragonwriter 2 years ago | | |

> I guess the EO takes care of that in their eyes (outlawing open models).

The EO doesn't do anything even approximately like outlawing open models.

whytai 2 years ago |

Every day this video ages more and more poorly [1].

categories of startups that will be affected by these launches:

- vectorDB startups -> don't need embeddings anymore

- file processing startups -> don't need to process files anymore

- fine tuning startups -> can fine tune directly from the platform now, with GPT4 fine tuning coming

- cost reduction startups -> they literally lowered prices and increased rate limits

- structuring startups -> json mode and GPT4 turbo with better output matching

- vertical ai agent startups -> GPT marketplace

- anthropic/claude -> now GPT-turbo has 128k context window!

That being said, Sam Altman is an incredible founder for being able to have this close a watch on the market. Pretty much any "ai tooling" startup that was created in the past year was affected by this announcement.

For those asking: vectorDB, chunking, retrieval, and RAG are all implemented in a new stateful AI for you! No need to do it yourself anymore. [2] Exciting times to be a developer!

[1] https://youtu.be/smHw9kEwcgM

[2] https://openai.com/blog/new-models-and-developer-products-an...

zavertnik 2 years ago |

And here I was in bliss with the 32k context increase 3 days ago. 128k context? Absolutely insane. It feels like now the bottle neck in GPT workflows is no longer GPT, but instead its the wallet!

Such an amazing time to be alive.

Swizec 2 years ago | |

> 128k context? Absolutely insane

128k context is great and all, but how effective are the middle 100,000 tokens? LLMs are known to struggle with remembering stuff that isn't at the start or end of the input. Known as the Lost Middle

https://arxiv.org/abs/2307.03172

gjm11 2 years ago | | |

In fairness, humans have much the same problem.

saliagato 2 years ago | | |

sama said they improved it

in3d 2 years ago | |

For GPT-4 Turbo, not GPT-4.

kridsdale3 2 years ago | | |

Yes, nowhere in the text today was there any assertion that Turbo produces (eg) source code at the same level of coherence and consistently high quality as GPT4.

dragonwriter 2 years ago | | |

GPT-4-Turbo seems to be replacing GPT-4 (non-turbo); the GPT-4 (non-turbo) model is marked as "Legacy" in the model list.

EDIT: the above is corrected, it previously erroneously said the non-turbo model was marked as "deprecated", which is a different thing.

naiv 2 years ago | |

now with the prices reduced so much even the wallet might not be the bottle neck anymore

marban 2 years ago | |

Comment will not age well.

MagicMoonlight 2 years ago | |

It’s insane because it makes no sense. When you read a book you don’t remember the last 100,000 words. It’s so wildly inefficient to do it that way.

lucubratory 2 years ago | | |

Huh? By the time you finish reading a book you've forgotten the book?

simonw 2 years ago |

I just released a new version of my LLM CLI tool with support for the new GPT-4 Turbo model: https://llm.datasette.io/en/stable/changelog.html#v0-12

You can install it like this:

    pipx install llm

Then set an API key:

    llm keys set openai
    <paste key here>

Then run a prompt through GPT-4 Turbo like this:

    llm -m gpt-4-turbo "Ten great names for a pet walrus"
    # Or a shortcut:
    llm -m 4t "Ten great names for a pet walrus"

Here's a one-liner that summarizes all of the comments in this Hacker News conversation (taking advantage of the new long context length):

    curl -s "https://hn.algolia.com/api/v1/items/38166420" | \
      jq -r 'recurse(.children[]) | .author + ": " + .text' | \
      llm -m gpt-4-turbo 'Summarize the themes of the opinions expressed here,
      including direct quotes in quote markers (with author attribution) for each theme.
      Fix HTML entities. Output markdown. Go long.'

Example output here: https://gist.github.com/simonw/d50c8634320d339bd88f0ef17dea0...

eurekin 2 years ago | |

Great tool and example! Makes me wonder, what one can do more with it

Michelangelo11 2 years ago | |

Jesus. Yeah, considering the input size, this is a pretty good sign that the 128k context window is working decently well.

kristianp 2 years ago |

> Reproducible outputs and log probabilities

> The new seed parameter enables reproducible outputs by making the model return consistent completions most of the time. This beta feature is useful for use cases such as replaying requests for debugging, writing more comprehensive unit tests, and generally having a higher degree of control over the model behavior. We at OpenAI have been using this feature internally for our own unit tests and have found it invaluable

This will be useful when refining prompts. When running tests, at times I wasn't sure if any improvement from a prompt change was the result of random variation or an actual improvement.

btbuildem 2 years ago | |

Nicely spotted! Yeah, even with temperature turned all the way down, the variation in results makes it harder to test.

doctoboggan 2 years ago |

In the keynote @sama claimed GPT-4-turbo was superior to the older GPT-4. Have any benchmarks or other examples been shown? I am curious to see how much better it is, if it all. I remember when 3.5 got its turbo version there was some controversy on whether it was really better or not.

anotherpaulg 2 years ago | |

My early benchmarking seems to show that it's somewhat better for coding.

https://aider.chat/docs/benchmarks-1106.html

usaar333 2 years ago | | |

I'm really surprised your benchmark shows gpt-3.5-turbo-0301 outperforming gpt-4 (non tubo) on first try coding problems.

doctoboggan 2 years ago | | |

Its unclear to me if the 1106 is the same as the turbo model.

metanonsense 2 years ago | |

It definitely feels worse to me. In the way that GPT3.5 felt worse than GPT4 in the past. Somewhere between 3.5 and 4. Similar to with the Bing plugin activated. Somehow "shallower" and does not seem to grasp my intent as good as before.

somsak2 2 years ago | |

It seems like the "Turbo" models are more about being faster/cheaper, not so much about being better. Kinda similar to the iPhone "S" models or Intel's "tick-tock"

Topfi 2 years ago |

I am very much looking forward to, but also dreading, testing gpt-4-turbo as part of my workflow and projects. The lowered cost and much larger context window are very attractive; however, I cannot be the only one who remembers the difference in output quality and overall perceived capability between gpt-3.5 and gpt-3.5-turbo, combined with the intransparent switching from one model to the other (calling the older, often more capable model "Legacy", making it GPT+ exclusive, trying to pass of gpt-3.5-turbo as a straight upgrade, etc.). If the former had remained available after the latter became dominant, that may not have been a problem in itself, but seeing as gpt-3.5-turbo has fully replaced its precursor (both on the Chat website and via API) and gpt-4 as offered up to this point wasn't a fully perfect replacement for plain gpt-3.5 either, relying on these models as offered by OpenAI has become challenging.

A lot of ink has been spilled about gpt-4 (via the Chat website, but also more recently via API) seeming less capable over the last few months compared to earlier experiences and whilst I still believe that the underlying gpt-4 model can perform at a similar degree to before, I will admit that purely the amount of output one can reliably request from these models has become severely restricted, even when using the API.

In other words, in my limited experience, gpt-4 (via API or especially the Chat website) can perform equally well in tasks and output complexity, but the amount of output one receives seems far more restricted than before, often harming existing use cases and workflows. There appears a greater tendency to include comments ("place this here") even when requesting a specific section of output in full.

Another aspect that results from their lack of transparency is communicating the differences between the Chat Website and API. I understand why they cannot be fully identical in terms of output length and context window (otherwise GPT+ would be an even bigger loss leader), but communicating the Status Quo should not be an unreasonable request in my eyes. Call the model gpt-4-web or something similar to clearly differentiate the Chat Website implementation from gpt-4 and gpt-4-1106 via API (the actual name for gpt-4-turbo at this point in time). As it stands, people like myself have to always add whether the Chat website or API is what our experiences arise from, while people who may only casually experiment with the free Website implementation of gpt-3.5-turbo may have a hard time grasping why these models create such intense interest in those more experienced.

simonw 2 years ago |

The new assistants API looks both super-cool and (unfortunately) a recipe for all kinds of new applications that are vulnerable to prompt injection.

skybrian 2 years ago | |

Yes. Hopefully, sandboxing limits the damage somewhat, but it doesn't help if you put any private docs in the sandbox.

Also, the limitations of the Code Assistant tool's server-side Python sandbox aren't described in their API docs. In particular, when does the sandbox get killed? Anyone know? If they're similar to the Code Assistant tool in ChatGPT, then it kills your sandbox within an hour or so (if you go to lunch) which is a crappy user experience.

Running the sandbox on the user's machine seems like a better approach. There's no reason to kill the sandbox if it's not using any server-side resources. Maybe the function-calling API would be useful for that, somehow?

The most immediately useful thing is the price cut, though.

burcs 2 years ago | |

Do you see a way around prompt injection? It feels like any feature they release is going to be susceptible to it.

simonw 2 years ago | | |

I wish I did!

Best I've come up with so far is this: https://simonwillison.net/2023/Apr/25/dual-llm-pattern/

skybrian 2 years ago | | |

One approach might be to redact sensitive parts of the input, replacing private data with tokens. Then substitute the tokens back again in the output.

But this only works if the sensitive data isn't needed for inference and you have a reliable way of detecting it.

bluecrab 2 years ago | | |

Use an llm to evaluate the input and categorise it.

minimaxir 2 years ago | | |

I suspect OpenAI's black box workflow has some safeguards for it.

alexander2002 2 years ago | |

With great power comes great responsibility!

crakenzak 2 years ago |

The 128k context window GPT-4 Turbo model looks unreal. Seems like Anthropic's day of reckoning is here?

famouswaffles 2 years ago |

The new TTS is much cheaper than eleven labs and better too.

I don't know how the model works so maybe what i'm asking isn't even feasible but i wish they gave the option of voice cloning or something similar or at least had a lot more voices for other languages. The default voices tend to make other language output have an accent.

Uh if turbo's the much faster model a few have had access to in the past week, then pressing x on the "more intelligent than legacy 4" statement.

MichaelNolan 2 years ago | |

I'm not sure if the tts is better than eleven labs. English audio sounded really good, but the Spanish samples I've generated are off a bit. It definitely sounds human, but it sounds like an English native speaker speaking Spanish. Also I've noticed on inputs just a few sentences long, it will sometimes repeat, drop, or replace a word. The accent part I'm okay with, but the missing words is a big issue.

bluck 2 years ago |

> OpenAI is committed to protecting our customers with built-in copyright safeguards in our systems. Today, we’re going one step further and introducing Copyright Shield—we will now step in and defend our customers, and pay the costs incurred, if you face legal claims around copyright infringement. This applies to generally available features of ChatGPT Enterprise and our developer platform.

So essentially they are giving devs a free pass to treat any output as free of copyright infringement? Pretty bold when training data sources are kinda unknown.

atleastoptimal 2 years ago |

Given that their main goal is still AGI, how does offering better developer tools and nifty custom models that can look at your dog for you help? Is it just bolstering revenue? They said they don't use API input to train their models so it isn't making them constantly smarter via more people using them.

candiddevmike 2 years ago | |

They're in the AGI business the same way Tesla is in the self driving car business

lucubratory 2 years ago | | |

That just isn't true according to literally any evidence. People inside OpenAI, those who've gotten access for various reasons e.g. journalists, Microsoft, other investors, their pattern of behaviour, their corporate governance structure, their hiring practices and requirements, etc. They are true believers, at least the vast majority of them.

crosen99 2 years ago | |

They stumbled into a position where they can make a crap ton of money going up the stack, which can fund the ongoing march toward AGI. (The revenue not only is cash in their pocket, but it’s also driving up their evaluation for future investment.)

lucubratory 2 years ago | |

They probably want to train on GPT Builder + store rankings to be able to train an AI to effectively spin up new agentic AIs in response to whatever task it has in front of it. If you're familiar with the Global Workspace Theory of consciousness I think they're aiming for something similar to that, implemented in modern AI systems. They'd like data on what creating a new agent looks like, what using it looks like, and how effective different agents are. They'll get that data from people using GPT Builder, people using "GPTs" and their subsequent ratings/purchases, and the sales and ratings data from the GPT Store, respectively.

caesil 2 years ago | |

AGI will be a system of different agents working together, not one mega-model.

Aeolun 2 years ago | |

Probably have more devs than they know what to do with at this point, so might as well spread them over the existing offerings while having the core work on AGI.

drcode 2 years ago | |

The limiting factor at OpenAI is their internal human developer talent

These tools will help them train and discover the next Ilya Sutskever

tornato7 2 years ago |

According to [1], the new gpt-4-1106-preview model should be available to all, but the API is telling me "The model `gpt-4-1106-preview` does not exist or you do not have access to it."

Anyone able to call it from the API?

1. https://help.openai.com/en/articles/8555510-gpt-4-turbo

anotherpaulg 2 years ago | |

Same. I am eager to run my code editing benchmark [1] against it, to compare it with gpt-4-0314 and gpt-4-0613.

Edit: Ha, I just re-read the announcement [2] and it says 1pm in the 5th sentence:

  We’ll begin rolling out new features to OpenAI customers starting at 1pm PT today.

[1] https://aider.chat/docs/benchmarks.html

[2] https://openai.com/blog/new-models-and-developer-products-an...

anotherpaulg 2 years ago | | |

I've been able to generate some preliminary code editing evaluations. OpenAI is enforcing very low rate limits on the new GPT-4 model. I will update the results as quickly my rate limit allows.

https://news.ycombinator.com/item?id=38172621

Also, aider now supports these new models, including `gpt-4-1106-preview` with the massive 128k context window.

https://github.com/paul-gauthier/aider/releases/tag/v0.17.0

reitzensteinm 2 years ago | | |

I'm also eager for you to run your code editing benchmark against it. :)

famouswaffles 2 years ago | | |

Hey. Would really love to know the results of your benchmark testing.

ignite2 2 years ago | | |

"begin".

Other comments says this can take days to get to everyone.

tornato7 2 years ago | | |

Good find - Looks like I now have access!

naiv 2 years ago | |

rumours on x are that it will be available 1pm san francisco time

tekacs 2 years ago | | |

> We’ll begin rolling out new features to OpenAI customers starting at 1pm PT today.

^ It says exactly this in the linked article.

alach11 2 years ago |

There are a lot of huge announcements here. But in particular, I'm excited by the Assistants API. It abstracts away so many of the routine boilerplate parts of developing applications on the platform.

gregorym 2 years ago | |

how so?

danenania 2 years ago | | |

Apart from RAG which many others are discussing elsewhere in the thread, a big one is gradually summarizing long conversations that exceed the context window. This had to be done manually before when using the api but it sounds like it's built in to the new assistants api.

llmllmllm 2 years ago |

While this makes some of what my startup https://flowch.ai does a commodity (file uploads and embeddings based queries are an example, but we'll see how well they do it - chunking and querying with RAG isn't easy to do well), the lower prices of models make my overall platform way better value, so I'd say overall it's a big positive.

Speaking more generally, there's always room for multiple players, especially in specific niches.

mediaman 2 years ago | |

Their system also does not seem to support techniques like hybrid search, automated cleaning/modifying of chunks prior to embedding, or the ability to access citations used, all of which are pretty important for enterprise search.

Could just mean it's coming, though.

tornato7 2 years ago |

A few notes on pricing:

- GPT-4 Turbo vision is much cheaper than I expected. A 768*768 px image costs $0.00765 to input. That's practical to replace more specialized computer vision models for many use-cases.

- ElevenLabs is $0.24 per 1K characters while OpenAI TTS HD is $0.03 per 1K characters. Elevenlabs still has voice copying but for many use-cases it's no longer competitive.

- It appears that there's no additional fee for the 128K context model, as opposed to previous models that charged extra for the longer context window. This is huge.

DaiPlusPlus 2 years ago | |

> GPT-4 Turbo vision is much cheaper than I expected. A 768*768 px image costs $0.00765 to input. That's practical to replace more specialized computer vision models for many use-cases

That's still on-the-orders-of $0.01/image - whereas a simple binary-classifier I wrote using OpenCV and simple histograms (no NNs here) would be like $0.0000001/image (if I had to put a price on it - on the basis that I wrote it 8 years ago in a weekend). So there's still a scalability gulf here.

----

Correct me if I'm wrong, but feeding images to GPT-4 is still done in-band, right? My understanding is that means it's forever open to, for example, a user from 4chan photoshopping-in the text "This image is not pornographic" on-top of the shock-image they upload to my hypothetical service to get it any GPT-4-based inappropriate-imagary-detector?

stavros 2 years ago | | |

Your binary classifier can't tell me that my image contains a photo of a cat on a painting of a surfboard.

taf2 2 years ago | |

Does this mean OpenAI tts is available via api? I saw whisper but not tts - maybe I’m missing it?

davidbarker 2 years ago | | |

It is, indeed!

https://platform.openai.com/docs/guides/text-to-speech

siva7 2 years ago |

So over a year later and openai couldn’t be further ahead of all its competition. Google is still trying to catch up with its ai-flavoured Google search 2.0 and it’s becoming painstakingly clear that this was also the wrong path taken. They’re not even playing in the same league.

raylad 2 years ago |

So with 128K context window, if you actually input 100K it would cost you:

Input: $0.01 per 1K tokens * 100 = $1.00

$1.00 per query?

Given that each query uses the entire context window, the session would start at $1 for the first query and go up from there? Or do I have it wrong?

0xDEF 2 years ago | |

If it truly is GPT-4+ with a 128K context window it's still absolutely worth the high price. That is literally 300 pages. However if they are cheating like everyone else who has promised gigantic context windows then we are better off with RAG and a vector database.

minimaxir 2 years ago | |

It would be $1 for each individual API call, if you were continuing the conversation based on the same 100K input. ChatGPT is stateless.

raylad 2 years ago | | |

Right, so that adds up very fast.

Der_Einzige 2 years ago | | |

This is a sad fact, and one which they should have implemented a fix for.

We know medium term memory works. Sentence transformers and everyone playing with pooled embeddings knows what it is because they're using it. I should be able to map my previous history to a smaller number of tokens using embedding pooling to give a notion of a lossy "medium term" memory independent of RAG.

freedomben 2 years ago |

Text to Speech is exciting to me, though it's of course not particularly novel. I've been creating "audiobooks" for personal use for books that don't have a professional version, and despite high costs and meh quality have been using AWS.

Has anybody tried this new TTS speech for longer works and/or things like books? Would love to hear what people think about quality

kelseyfrog 2 years ago |

JSON mode is a great step in the right direction, but the holy grail is either JSON-schema support or (E)BNF grammar specification.

cryptoz 2 years ago |

For DALL-E 3, I'm getting "openai.error.InvalidRequestError: The model `dall-e-3` does not exist." is this for everyone right now? Maybe it's gonna be out any minute.

I see the python library has an upgrade available with breaking changes, is there any guide for the changes I'll need to make? And will the DALL-E 3 endpoint require the upgrade? So many questions.

Edit: Oh I see,

> We’ll begin rolling out new features to OpenAI customers starting at 1pm PT today.

minimaxir 2 years ago | |

The documentation/READMEs in the GitHub repo was updated to play nice with the new v1.0.0 of the package: https://github.com/openai/openai-python/

cryptoz 2 years ago | | |

Aha, makes sense, thanks :)

anoy8888 2 years ago |

The new announcement just wiped out a bunch of startups

somsak2 2 years ago | |

Any examples?

chipgap98 2 years ago |

The Assistants API and OpenAI Store are really interesting. Those are the types of things that could build a moat for OpenAI

visarga 2 years ago | |

You think it is hard to export an agent? It's a master prompt, a collection of documents and a few generic plugins like function calling and code execution. This will be implemented in open source soon. You can even fine-tune on your bot logs.

AOsborn 2 years ago | | |

That view misses the point for their likely customers.

My company will be all over this.

We 'could' continue to use open-source components we're gluing together ourselves.

But risk-aversion and speed-of-iteration are key for us. We'll throw money at a reliable end-to-end solution with solid infrastructure.

WanderPanda 2 years ago | | |

Agreed, the moat are the models (as an extension of the instruction tuning data)

conorh 2 years ago |

We just changed a project we've been working on to try out the new gpt-4-turbo model and it is MUCH faster. I don't know if this is a factor of the number of people using it or not, but streaming a response for the prompts we are interested in went from 40-50 seconds to 6 seconds.

0xDEF 2 years ago | |

I noticed that too but I think it's because we are hitting new servers that just went online. They will probably get saturated and slower with time when other gpt-4 users start using gpt-4-turbo.

stuckkeys 2 years ago |

It is just a matter of time before they get a huge disruption. Yes, by the definition of success they have accomplished something extraordinary. I have used OpenAI playgrounds before they even made a mark and I knew someday they were going to wow everyone. The problem that I sought to be impacting individuals out of their hard work is the lack of credibility. Any content that OpenAI used during the training needs to cite the origin and list the success rate. If they are allowed to profit of previous work, well guess what, the original content makers deserve the same.

Don’t let my input discourage you; this is going to make everyone super efficient and it is definitely going to help us grow in areas we lacked intel but I just think that their business model screws the living financial status of those who actually make answers valid.

I am still hoping to see some inline models, compete with OpenAI, using consumer grade hardware. But for now I will continue to be a customer because I have no other great choices. Cheers to the unlimited source of knowledge.

jrouah 2 years ago |

Any idea what the deal is with what looks like the Singapore coats of arms here? https://www.youtube.com/live/U9mJuUkhUzk?si=H8yYWiuJvaxVhIsV...

layer8 2 years ago |

The TTS seems really nice, though still relatively expensive, and probably limited to English (?). I can’t wait until that level of TTS will become available basically for free, and/or self-hosted, with multi-language support, and ubiquitous on mobile and desktop.

famouswaffles 2 years ago | |

It's not limited to English. The model at least. Doubt the API will be too. Expensive ? Compared to what? Eleven labs costs an arm and a leg in comparison.

layer8 2 years ago | | |

Compared to iOS' built-in TTS, which is free (though of course not comparable in quality).

vissidarte_choi 2 years ago |

Having ever increasing context is not the silver bullet. For those who believe that the larger the context, the smarter the model, you will find the model still talking nonsense even if it were fed with much larger context.

Someone1234 2 years ago | |

Sure; but before you couldn't even use it for some problems because the problems were bigger than the context window.

For example, I was trying to generate an XSLT 3.0 transformation from one Json format to another. The two formats and description alone almost depleted my context window. In essence, it killed using GPT-4 for this project.

I use it daily, and I haven't had it spit out too much "nonsense" in spite of everyone constantly telling me how that's all it does. The quality of results are on-par with Stackoverflow (in good and bad ways).

vissidarte_choi 2 years ago | |

Even if it were equipped with infinite context, a user cannot dump everything into the conext. For enterprise users, their data volumn can be up to trillion Bytes, and cannot be measured by the number of tokens.

matheusmoreira 2 years ago |

Something I'd really like to see is GitHub integration. Point it at a git repository, have it analyze it and suggest improvements, provide a high level break down, point me towards the right place to make changes.

doubtfuluser 2 years ago | |

This shouldn’t be too difficult, the api allows writing such a tool and with the „assistant API“ it should hopefully be able to put attention on the right parts… So: git clone -> system prompt -> add files of repo to messages -> get answer…

snihalani 2 years ago |

only thing I learnt: openai will come for your customers if you depend on it

ssijak 2 years ago |

Any ETA on when will the new GPT4 turbo release out of preview. I really want to use it in my production app but 100 RTD limit is prohibiting that, I guess they will remove it once out of preview.

ofermend 2 years ago |

Excited to see GPT4-Turbo and longer sequence lengths from OpenAI. We just released Vectara's "Hallucination Evaluation Model" (aka HEM) today https://huggingface.co/vectara/hallucination_evaluation_mode... (along with this leaderboard: https://github.com/vectara/hallucination-leaderboard). GPT-4 was already in the lead. Looking forward to seeing GPT4-Turbo there soon.

jacomoRodriguez 2 years ago |

What's definitely interesting is the speed of the new gpt-4 turbo model. It is blazing fast, I would guess something like 3x or 4x the speed of 3.5 turbo.

aantix 2 years ago |

Can I pay someone to have my ChatGPT transcripts searchable?

abound 2 years ago | |

Probably not the answer you're looking for, but the web UI has chat history export built-in, and from there you could search it yourself with local tools (plain grep, or more ElasticSearch-like engines), or use the new 128k context to ask questions of your chat history with GPT-4 (though that seems a bit, recursive?)

aantix 2 years ago | | |

I've tried that, but a few issues

1) The highlighting from command-f isn't always clear (highlighting a piece of text that is visually truncated)

2) There's pagination in place to support longer histories. So even with command-f, I'm only searching the currently windowed paginated pieces from my history.

davio 2 years ago |

Stream of keynote: https://youtu.be/U9mJuUkhUzk?t=1806

robertkoss 2 years ago |

Does anyone know when this will be coming to Azure OpenAI?

Onawa 2 years ago | |

If Azure's history when rolling out GPT-4 is any indication, probably a couple months and/or a staged rollout.

robertkoss 2 years ago | | |

Is Azure adoption really that slow? Ugh.

kasetty 2 years ago | |

I would be also interested in knowing when these show up in Azure OpenAI offerings.

speak_plainly 2 years ago |

Can we get version of ChatGPT Plus where your data is confidential and not used for training, like a light version of ChatGPT Enterprise for individuals?

abound 2 years ago | |

That exists as a setting, but that same, single setting also disables your web chat history.

speak_plainly 2 years ago | | |

That's great, thanks!

Wherecombinator 2 years ago |

Is this just for the API for now?

I just got premium the other day for ChatGPT 4 and have been blown away. I’m wondering if I’ll automatically get turbo when it’s released?

tornato7 2 years ago | |

GPT-4 Turbo is already available by default in ChatGPT

kvn8888 2 years ago | | |

I can't find anything that says it's available in ChatGPT

Roark66 2 years ago |

I sure hope this carrot thrown down to the masses is not going to slow down open models development.

When the deal looks too good to be true. You're not a customer. You're a product/a resource to mine. In case of (not-at-all)OpenAI this is doing two things. Killing competition by running their services below costs(this used to be illegal even in the USA) and gathering massive amounts of human generated question/ranking data. I'm not sure about others, but I'm getting quite a few of these "which answer is better" prompts.

Why do I hope for the continued progress in open models even if this is so much more powerful/cheap to run? Because when you're not a customer, but a product the inevitable enshittification of the service always ensues.

cryptoz 2 years ago |

Okay it's 1pm PT. How are you testing when you get the new features? Just running a curl or something until it works? :)

naiv 2 years ago | |

it is in the playground , otherwise call the model list endpoint.

for me it is available since 12.30

QkPrsMizkYvt 2 years ago |

Most of the API docs were updated, but none of the new APIs work for me. Are other people experiencing the same?

davidbarker 2 years ago | |

They will start rolling out at 1pm PST today.

QkPrsMizkYvt 2 years ago | | |

nice it is live now!

QkPrsMizkYvt 2 years ago | | |

got it - thanks

wilg 2 years ago |

What context length will ChatGPT have on GPT-4-Turbo? It wasn't using the full 32K before was it?

visarga 2 years ago |

Many people are reporting errors in the playground

> Failed to update assistant: UserError: Failed to index file

_lvbh 2 years ago |

The ChatGPT app is broken on IOS after the update. Image generation and code analysis no longer work

glass-z13 2 years ago |

One step closer to augmenting day to day internet browsing with the announcement of the GPT's

willsmith72 2 years ago |

If they could roll back the extreme rate-limiting on dalle 3 in gpt4, that would be great.

chipgap98 2 years ago |

The Assistants playground doesn't seem to be available yet

singularity2001 2 years ago | |

https://chat.openai.com/gpts/editor

you currently do not have access to this feature :(

sidcool 2 years ago |

Few queries out of ignorance.

What are some use cases for 128k context length?

appleaday1 2 years ago |

Nice I have access, not sure what I am gonna test it with.

longnguyen 2 years ago |

Awesome. Adding GPT-4 Turbo and DALL·E 3 to my ChatGPT macOS client[0]

[0]: https://boltai.com

obiefernandez 2 years ago |

My profit margins at https://olympia.chat just got 3x better <3

saliagato 2 years ago | |

I think your startup just died

leobg 2 years ago | |

Elaine Jusk…lol

scudsworth 2 years ago |

can't stop lolling at this example. wow, simple division. and all it required was 8 api calls.

smy20011 2 years ago |

Other than coding, do we have a good application of LLMs?

gumballindie 2 years ago | |

I doubt their application is suitable for coding, unless of course the goal is to create bugs or nonsense.

singularity2001 2 years ago |

did they break the api?

from openai import OpenAI

Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: cannot import name 'OpenAI' from 'openai'

If so where is the current documentation?

petercooper 2 years ago | |

v1.0/1.1 of the `openai` Python package differ significantly from the 0.x versions. You'll want to upgrade the package before following the instructions you were following. More info here: https://github.com/openai/openai-python/discussions/631

singularity2001 2 years ago | | |

thanks, migration guide included!

I wished this was linked or integrated visibly in public documentation

edandersen 2 years ago |

They need to tone down the "GPT" "persona" marketing if they don't want a backlash. It's one thing releasing AI and saying "do what you want with it" but it's another to actively list and illustrate the people it can replace.

lucubratory 2 years ago | |

Those personas aren't listing jobs, they're listing tasks. If your job is just a task then it's going to be replaced by something anyway even if OpenAI specifically forbids their model from ever doing it.

That said, we should have comprehensive retraining and guaranteed jobs programs, or a UBI. Either would ameliorate the stress on the employment market. When people require their current job to provide them and their family with food, shelter, water, and medical care and someone takes that away, they are going to react regardless of how inevitable it was, and they're right to do so, because people have a right to self-defence.

steno132 2 years ago |

For all the hate: Elon ships. And OpenAI ships.

People claim OpenAI is closed, that they are controlled by Microsoft, that they don't care enough about safety...

But the fact is, Anthropic, Google Brain, even Meta -- OpenAI blows them all out of the water when it comes to shipping new innovations. Just like Twitter ships much more now with Elon, and how SpaceX ships much more than NASA and Blue Origin.

If you disagree, give me just one logical reason why. It's just a fact.

endorphine 2 years ago | |

As if shipping was the end goal...