Anthropic's 100k context is now available in the web UI

Anthropic's 100k context is now available in the web UI(twitter.com)

253 points by jlowin 3 years ago | 176 comments

Claude 100k 1.3 blew me away.

Giving it a task of extracting a specific column of information, using just the table header column text, from a table inside a PDF, with text extracted using tesseract, no extra layers on top. (for those that haven't tried extracting tables with OCR, it's a non-trivial problem, and the output is a mess)

> 40k tokens in context, it performed at extracting the data, at 100% accuracy.

Changing the prompt to target a different column from the same table, worked perfectly as well. Changing a character in the table in the OCR context to test if it was somehow hallucinating, also accurately extracted the new data.

One of those "Jaw to the floor" moments for me.

Did the same task in GPT-4 (just limiting the context window to just 8k tokens), and it worked, but at ~4x more expensive, and without being able to feed it the whole document.

arnaudsm 3 years ago | |

Using LLMs with 100GB VRAM to convert PDFs to CSVs is truly depressing, but I am sure many companies will love it.

2023 office software already uses 1000x more ressources than 1990s'. I bet we are ready to do that again.

visarga 3 years ago | | |

Not just PDFs with tables. It works on any semi-structured document with key-value pairs like invoices, purchase orders, receipts, tickets, forms, error messages, logs, etc.

The "Information Extraction from semistructured and unstructured documents" task is seeing a huge leap, just 3 years ago it was very tedious to train a model to solve a single use case. Now they all work.

But if you do make the effort to train a specialised model for a single document type, the narrow model surpasses GPT3.5 and 4.

version_five 3 years ago | | |

Consulting companies are paying juniors > $150k per year to do this kind of thing. In some objective sense, it's absurd, but locally, it makes more sense to use an expensive gpu than an MBA class president. And in 10 years, everyone's phone will have that much compute anyway.

csomar 3 years ago | | |

It's funny but React/Node/Electron apps will suddenly become minimalist once everyone and his brother start adding a neural model to his app that consumes 10GB of V/RAM.

martythemaniak 3 years ago | | |

You're missing the developer time. You no longer have to spend hours (or days, perhaps weeks depending on the sources) stringing together random libs, munging and cleaning data, testing, etc etc.

celestialcheese 3 years ago | | |

If you’ve never built PDF or archive document parsing systems, you don’t know true pain.

I see it as incredible. Most PDFs that i see are basically just thin wrappers around image scans of documents that don’t exist anywhere anymore. Archives from estates, manuals, etc.

These techniques of using LLMs to clean ocr output is game changing because best in class before was human-in-the-loop systems that required huge amounts of rewriting to get useable output.

Now LLMs are unlocking for significantly cheaper previously difficult data sources for relatively cheap.

SongofEarth 3 years ago | | |

On youtube there are timer and stopwatch videos that have millions of views, people are streaming 1080p videos for something that can be implemented locally within 20 lines of code, but does it matter really, it won't make a dent on Google's revenue.

If LLMs are deployed in large enough scale, the convenience really could justify the cost.

yawnxyz 3 years ago | | |

we also had more secretaries and people who just retyped things all day in the 90's!

throwaway888abc 3 years ago | | |

It's worth double for the increase in accuracy. Don't let me go to Amazon Mechanical poor souls Turk.

https://en.wikipedia.org/wiki/Amazon_Mechanical_Turk

juancampa 3 years ago | | |

The better version of this is using this massive LLM to _create a program_ that can then extract the same data of similar PDFs. That way the high cost is incurred only once.

anonymouse008 3 years ago | |

> text extracted using tesseract

You're saying 'the text' without normalizing the rows and columns (basically the tab, space or newline delimited text with sporadic lines per row) was all you needed to send? I still have to normalize my tables even for GPT-4, I guess because I have weird merged rows and columns that attempt to do grouping info on top of the table data itself.

celestialcheese 3 years ago | | |

exactly. Just sent raw tesseract output, no formatting or "fix the OCR text" step. So the data looked like:

``` col1col2col3\nrow label\tdatapoint1\tdatapoint2... ``` Very messy.

I don't think this is generalizable with the same 100% accuracy across any OCR output (they can be _really_ bad). I'm still planning on doing a first pass with a better Table OCR system like Textract, DocumentAI, PaddPaddle Table, etc which should improve accuracy.

swyx 3 years ago | | |

better - you can do it copy pasting from pdf to gpt on your phone! https://twitter.com/swyx/status/1610247438958481408

modernpink 3 years ago | |

What was the dollar cost to do this work? To iterate over a 40k context must be expensive.

celestialcheese 3 years ago | | |

~$0.45

nightski 3 years ago |

The discourse has made it seem that with context length larger is always better. I'm wondering if there is any degradation in quality of results when the context is scaled this large. Does it scale without loss of performance? Or is there a point where even though you can fit in a lot more information it causes the performance to degrade?

phillipcarter 3 years ago | |

In a brief test, I found that the bigger context window only meant that I could stuff a whole schema into the input. It still hallucinated a value. When I plugged in a call to a vector embedding to only use the top k most "relevant" fields it did exactly what I wanted: https://twitter.com/_cartermp/status/1657037648400117760

YMMV.

koboll 3 years ago | | |

The fundamental problem seems to be that it's still slightly sub-GPT-3.5-quality, and even a long context window can't fix that. It will remember things from many many tokens ago, but it still doesn't reliably produce passable work.

The combination of a GPT-4-quality model and a long context window will unlock a lot of applications that now rely on somewhat lossy window-prying hacks (i.e. summarizing chunks). But any model quality below that won't move the needle much in terms of what useful work is possible, with the exception of fairly simple summarization and text analysis tasks.

rpcope1 3 years ago | |

Well, a larger context makes it easier to integrate other tools, like a vector database for information retrieval to jam into the context, and the more context, the more potentially relevant information can be added. For models like llama, where context is (usually) max 2K tokens, you're sort of limited as to how much potentially relevant information you can add when doing complex tasks.

emptysongglass 3 years ago |

Any magic tricks to gaining access apart from waiting for months? I've been using GPT-4 and love it but would really love to test that 100k context window with long running chatbots.

famouswaffles 3 years ago | |

Claude-Instant-100k is available on Poe.com (but only usable as a paying subscriber). Claude-plus-100k isn't up yet but I'm guessing that's a matter of time.

dmix 3 years ago | | |

Nice to see Poe is an actual iOS app for AI chat. Using ChatGPT via the Home Screen “app” is extremely frustrating because it logs you out constantly (maybe due to using Google to auth).

marcopicentini 3 years ago |

Any timeframe when it will be released to the public?

We are in the middle of developing and app and we are not able to do it with the limited context window of Open Ai. We already submitted the request of access.

pmarreck 3 years ago | |

There are tricks you can do to better utilize the smaller context window, such as sub-summaries and attention tricks. That's how there are already products on the market that consume entire big PDF's and let you query them. Granted, a larger context window would still work better, but it's possible to do.

yawnxyz 3 years ago | | |

it's using "overlapping chunking" methods and it usually works for generic PDFs. It really falls apart on technical documents, SOPs and research articles where you need to get context from chunks way above. Using vector DBs also doesn't work well bc you have to twiddle around with window size / overlappy-ness, which changes depending on what kind of paper you're uploading. It's a mess and takes too long

marcopicentini 3 years ago | | |

The problem is that making a summary of a text of 100k token costs 2$ using Davinci.

modernpink 3 years ago | |

What are the commercial applications of mega context window LLMs at current prices? I would guess mainly legal. And what strategies would you rely on to reduce the accumulating costs over the course of a session?

atemerev 3 years ago |

I don't understand this "slow rollout" thing about OpenAI competition. The chat / instruction models are continuously fine-tuned on real dialogues. To get these dialogues en masse, you need to deploy models to wide public. Otherwise, you will forever be on the losing side, if you can't quickly grab the streams of real time human-generated content.

People at OpenAI are smart, they understood that quickly, GPT-4 is available nearly everywhere, and lesser models are even free for anyone to use. This required hiring huge teams of moderators, but we are at land grab stage, everyone in the business needs to move fast and break a lot of things. However, GPT-4 and open source models are the only thing I can use. Bard "is not available in my country" (Switzerland), and the first thing that Claude access form is asking is whether I am based in US.

Well, their loss.

dataangel 3 years ago | |

It's probably the GPUs, they don't have enough capacity to handle more users. My guess is that GPT4 set off a buying spree. Even for CPUs, I've recently heard lead times for Sapphire Rapids servers are 2-3 months, high end switches 6 months, and those probably have way less demand.

s3p 3 years ago | |

I think it's cloud limitations. Anthropic probably doesn't have the ability to scale up extremely fast and accomodating hundreds of millions of users probably isn't as easy for them as it is for OpenAI.

williamcotton 3 years ago | |

If they are resource constrained and then opened up the flood gates resulting in poor performance and timeouts for every user it seems like it would sour more milk than otherwise.

nl 3 years ago | |

Is Bard still unavailable?

It was unavailable to Australia until last week but was made more widely available at Google I/O.

It's pretty good, too!

atemerev 3 years ago | | |

Still unavailable here in Switzerland.

okdood64 3 years ago |

New to ML here, what’s the difference between parameters and context?

sghiassy 3 years ago | |

Parameters is like the number of neurons in your brain

Context is how much short term memory you can retain at any one time (think how many cards you can remember the order of in a deck of cards)

Closi 3 years ago | |

Paramters - number of internal variables/weights in the model

Context - Length of input/output buffer (number of input/output tokens possible).

capableweb 3 years ago | |

Other answers are already good, just offering yet another difference.

Parameters is something that gets set indirectly via training, it's kept within the weights of the model itself.

Context is what you as a user passes to the model when you're using it, it decides how much text you can actually pass it.

Being able to pass more context means you can (hopefully) make it understand more things that wasn't part of the initial training.

flerovium 3 years ago |

POC or STFU

We can't assess how good it is if it's in closed beta. It's all cherry-picked twitter.

nico 3 years ago |

It’s also available here on google collab: https://twitter.com/gpt_index/status/1657757847965380610?s=4...

anotheryou 3 years ago | |

no. you still need to bring your own api key for that.

syntaxing 3 years ago |

Is there a trick to getting access? I’ve been on the waitlist for GPT-4 and Claude for a while. Been building some proof of concepts with GPT-3.5 but having better models would be a huge help.

gee_m_cee 3 years ago | |

If you're referring to a paid account, I never received a notification about my GPT-4 waitlist spot. I waited awhile for one, and then, at the prompting of a colleague, I just found a spot in the web UI to sign up. After one false start, it just worked.

pmoriarty 3 years ago | |

Try going through poe.com. I got access right away.

pr337h4m 3 years ago |

Also available on poe.com

rgbrgb 3 years ago | |

great domain. what is pricing?

s3p 3 years ago | | |

$20/month for 1000 queries if I remember correctly

wangg 3 years ago |

Sharing that this is available on Poe.com from Quora.

jlowin 3 years ago |

The 100k context was originally released only via API, but I just noticed that it's now available in the Claude web UI.

greyman 3 years ago | |

What is the URL of Claude web UI? I somehow cannot find it.

pmoriarty 3 years ago | | |

Also https://poe.com/Claude-instant-100k

Veen 3 years ago | | |

console.anthropic.com

tikkun 3 years ago |

I requested access when it was released.

Other HN readers, how many days did it take you from requesting access to Claude to having API access? I didn't use it prior to 100K so I don't have an existing API account.

famouswaffles 3 years ago | |

Requested access way before 100k and still haven't gotten in.

npsomaratna 3 years ago | | |

Same here. Been waiting for a couple of months now.

malux85 3 years ago | | |

Yeah me too, waiting patiently as context windows are our biggest blocker on more complex chemistry simulations

Mockapapella 3 years ago | | |

been a couple months for me as well. Actually forgot about `claude` and have just been using OpenAI's API instead.

tikkun 3 years ago | | |

Could you send me an email? I've liked a few of your comments, want to say hi over email. Email in profile.

lachlan_gray 3 years ago | |

Randomly gained access long after I had forgotten I signed up, maybe 3 or 4 months

ntonozzi 3 years ago | |

I requested access on March 14th or 15th and got it on March 20th.

absentmoon 3 years ago | | |

Did you fill in the form with super compelling use case or something?

anotheryou 3 years ago | |

did any of you get a confirmation mail or something?

rmckayfleming 3 years ago | | |

Nope. Nothing. I’ve been waiting since they released it. Part of me thinks it might be because I responded yes to “Outside of the US”.

ChikkaChiChi 3 years ago |

Is there a place I can track all releases, announcements, and invite links?

thomasahle 3 years ago |

This is the world we are entering of "commercial AI" rather than public, peer reviewed AI. No benchmarks. No discussion of pros and cons. No careful comparison with state of the art. Just big numbers and big announcements.

bulbosaur123 3 years ago |

Where can I actually physically use it? Or is it again only limited to chosen ones?

arpowers 3 years ago |

Is it useful?

greyman 3 years ago | |

You mean Claude bot in general? For me, yes, I use it daily, and comparing to GPT, it answers more quickly, more friendly and in general it is less woke. I use gpt-4 as a fallback, when I need more reasoning capabilities, there GPT-4 is better. To sum it up, if you find GPT-3.5&4 useful, then yes, Claude is useful as well.

13415 3 years ago | | |

Out of curiosity, what do you mean by "less woke"? Does it frequently insult minorities or make racist remarks?

Edit: To clarify, I was mostly interested in examples and side by side comparisons to better understand what OP meant, not political discussions.

s3p 3 years ago | | |

Another person addicted to using the word "woke".... sigh

arpowers 3 years ago | |

The vast majority of AI tools are vaporware mock-ups …

Adobe Firefly is best example of “just ship a mock-up of the feature” Ai marketing

viggity 3 years ago | | |

Firefly has some genuinely cool shit in it (their text treatments are pretty neat), but overall quality is dramatically lacking because they only train on images they have explicit rights to.

adamsmith143 3 years ago | | |

Of course Adobe put out crap but Claude is a real product, not vaporware...

weird-eye-issue 3 years ago | | |

Bad take