GPT-4 could pass bar exam, AI researchers say

GPT-4 could pass bar exam, AI researchers say(the-decoder.com)

102 points by nafeen 3 years ago | 160 comments

I feel like I can now see the event horizon of commoditized intelligence. No idea what society (is "society" even the right word? Who knows) is going to look like on the other side of it, but it is going to be wildly different. Perhaps a brief period where everyone is using an AI to do their job, uh, I mean, assist their work, but beyond that it's unknowable.

Moreover, this looks like it is going to be happening sooner rather than later.

Aperocky 3 years ago | |

GPT has no reasoning ability, it has billions of parameters that make it pretend it has it, purely going off of previously digested material.

As long as it comes across some reasoning process that have not been seen before in the training wordset, which can be as easy as a middle school math question, it fails. Because it has no ability to extrapolate logic.

If it manages to pass Bar test, that says more about the Bar test than it says about GPT.

kuschku 3 years ago | | |

Most jobs today don't need novel reasoning. This is the equivalent of the steam machine for intelligence.

During the industrialization, machines did not replace all jobs, but they replaced or changed most jobs. The same will happen here.

A typical office job will have a few hours a week of actual, intensive thought. The vast majority of time will be spent doing simple, repetitive work. This work can be automated, or at least significantly sped up, using technology like GPT.

“write an API client for …”, “integrate APIs … and …” can easily be automated. Yes, you'll still have to write the business logic, but that's not the majority of your work today. You could even have it write unit tests based on the JIRA ticket description.

The same applies to many other jobs.

nopinsight 3 years ago | | |

You are implying either:

* Understanding complex language does not require logic/reasoning,

* There are infinitely many forms of logic/reasoning or at least more than those existing in a vast training set.

Neither of which is likely true.

What do you think of the Minerva system, which can solve multi-step quantitative reasoning questions better than many competent students and most adults?

https://ai.googleblog.com/2022/06/minerva-solving-quantitati...

Note: If you look at LSAT test samples, many questions are tests of complex logical reasoning, a requisite for legal professions.

ok123456 3 years ago | | |

Most people's reasoning ability functions at this level.

sebzim4500 3 years ago | | |

>As long as it comes across some reasoning process that have not been seen before in the training wordset, which can be as easy as a middle school math question

Is this true even if you tell it to show its working? In my experience that drastically improves its ability to do math problems.

Workaccount2 3 years ago | | |

Until someone can point out the difference between neuronal reasoning and silicon reasoning, I remain completely agnostic about the underlying mechanics of whatever model.

Gun to my head where I had to put money down, I would put it on "Brains are not nearly as special as we (they?) think they are." No fairy dust or supernatural beings required, brains are just another AI model (and likely not even a particularly great one).

swagmoney1606 3 years ago | | |

I've already been using GPT and ChatGPT to much success for my work.

Yes, it doesn't have reasoning ability, but being able to manage knowledge and information in the way that these models can is still an amazing feat.

flatline 3 years ago | | |

It does have some ability to extrapolate to new problems, provided its training corpus has reasonably close coverage. It is not going to be making new scientific discoveries or insights but then neither are most people. With a sufficiently large training set I think these models can achieve human parity for a subset of language generation tasks, and be effectively of human intelligence. They nearly already have.

It doesn’t matter to me if they have “reasoning” capabilities or not if the outcome is the same.

I think we are a long ways off from AGI still.

wrycoder 3 years ago | |

Kurzweil's "Singularity" is upon us, but he's now being cagey about it.

He says it's still years away. His interview with Lex Fridman[0] was pretty tame - I didn't learn much new from it. Kurzweil deflected the Singularity segment to be a discussion about the history of computer power.

Remember that Kurzweil is Director of Engineering[1] at Google, with the mandate to "bring natural language understanding to Google"[2]. He started there in 2012, just after publishing his book, "How to Create a Mind"[3], and that's exactly what he and his team have been doing for ten years. Publication of his new book, "The Singularity is Nearer"[4] is now pushed out to mid 2023. Maybe he'll change the title to "Here" by then. (It's hard to believe that OpenAI is actually ahead of Google.)

Fridman made the point that maybe we won't realize at the time that the Singularity is passing, and only understand later that it did. Kurzweil didn't disagree.

[0] https://www.youtube.com/watch?v=ykY69lSpDdo

[1] https://archive.is/vVEBv

[2] https://en.wikipedia.org/wiki/Ray_Kurzweil

[3] https://www.amazon.com/How-Create-Mind-Thought-Revealed-eboo...

[4] https://www.amazon.com/s?k=kurzweil+singularity+is+nearer

oidar 3 years ago | | |

>It's hard to believe that OpenAI is actually ahead of Google.

Are Google's LLMs available for us to test out? From what I've gleaned, they've locked them up - I'd love to compare GPT vs Google's LLMs.

tazjin 3 years ago | |

I think we're very close to Saturday from Clippy[0].

By this I don't mean an AI as in the story acting by itself with its own motivations, I'm only talking about the subversion of established verification & communication methods used by it by humans with malicious purposes.

Essentially, if you do anything security related, we might only be O(months) away from you needing to stop using basically any electronic communication for your purposes. Companies can't have online meetings anymore in which decisions are made, everything will have to be more analog, more in-person.

Look at the kind of access the Russian comedians Vovan & Lexus [1] have gotten. Without advanced AI, just a little social engineering, they got heads of state on the phone. Now combine this with the kind of text/audio/video synthesis we're not too far away from, and you have an absolute recipe for disaster ...

[0]: https://www.gwern.net/fiction/Clippy#saturday

[1]: https://en.wikipedia.org/wiki/Vovan_and_Lexus

forgetfulness 3 years ago | |

We were perhaps a bit too enamored with the idea that it was intellect that made us unique, and thus knowledge workers would be the last to be replaced. Pouring our brains out by the Petabytes for neural networks to pick them up made the economics just work for an AI industrial revolution to start from there.

mikepurvis 3 years ago | | |

I feel a bit like this with the whole firestorm around AI artwork as well— it's been a big wakeup call to people who have been creating using technology-assisted workflows for decades, but still felt in their gut that they were bringing something unique to the table and were therefore "safe" from being completely automated away. That hitting the button for magic eraser or magic lasso or magic color correction was someone okay in a way that the AI itself sitting in the driver's seat was not.

Now that's been reduced to pointing out minor flaws that the next generation of AI artists will trivially resolve, and sharing memes beseeching other humans to participate in a boycott.

There's real pain and angst there, and I don't want to be callous about it with a comparison to buggy-whip manufacturers or something. But I wish the participants in these types of discussions were able to zoom out a bit and see that there's a larger societal issue here around automation, and that the real solution is going to be rethinking the basic economics of how we distribute wealth in a time of extraordinary machine-driven productivity— productivity that is no longer just about assembly lines and primary industries, but now also includes an increasing bite out of realms previously classified as "knowledge work".

mdp2021 3 years ago | | |

No, we were enamored with the idea that intelligence was well distributed between people, as if following Descartes' massive incipit "Good sense must be the best distributed thing in the world, given that nobody seems to be asking for more".

Inability to recognize intelligence is and will be devastating.

BitwiseFool 3 years ago | |

I feel like there is a difference between being able to pass a bar exam and being a "good" lawyer. I suspect AI tools would enhance the jobs of clerks rather than attorneys, mostly because clerks spend a great deal of time going over case law, text, and doing research.

mdp2021 3 years ago | | |

> enhance the jobs of clerks

We already did, it is called "Case Based Reasoning" within Decision Support Systems.

xiphias2 3 years ago | | |

While it won't be a good lawyer, it can replace lots of bad lawyers when people just want to send some legal papers or ask some legal advice.

belter 3 years ago | |

Only that there is no intelligence being commoditized...Yet.

And that is obvious, if you ask one of these models, a meta question like for example: "If a person says I am lying, are they lying or saying the truth?"

You will see these models will spit a canned elegant response, talking how a question could possibly be true or false, some persons not being able to attest if another one is truthful or not...But no mention of the Liar Paradox.

So we are not yet ready to say: "Your Honor its not fair! My Lawyer is version 2.2.3 with SP1 while the Prosecution is version 4.0 with an additional Cloud Based Elastic Inference! "

addisonl 3 years ago | | |

>It is impossible to determine whether a person is lying or telling the truth when they make a statement like "I am lying." The statement is self-contradictory, as it asserts that the person is both lying and telling the truth at the same time. This creates a paradox, as it is impossible for the statement to be both true and false at the same time. The Liar Paradox has been the subject of philosophical and logical study for centuries, and there is no universally agreed upon resolution to it.

ChatGPT's response to me asking "If a person says I am lying, are they lying or saying the truth?"

nopinsight 3 years ago |

Related: "Large Language Models Encode Clinical Knowledge" https://arxiv.org/abs/2212.13138

"On the MedQA dataset consisting of USMLE style questions with 4 options, our Flan-PaLM 540B model achieved a multiple-choice question (MCQ) accuracy of 67.6%..."

"The percentages of correctly answered items required to pass varies by Step and from form to form within each Step. However, examinees typically must answer approximately 60 percent of items correctly to achieve a passing score." -- https://www.usmle.org/bulletin-information/scoring-and-score...

It seems like the models in the paper could pass USMLE already.

Some tests suggest that Med-PaLM is close to human clinicians in many aspects, incl reasoning (Figures 6-7). Other tests show that Med-PaLM still returns inappropriate/incorrect results much more often than clinicians do, however (Figure 8).

lukko 3 years ago | |

I'm kind of surprised the model doesn't score higher as there is clear pattern to questions + answers and there would a huge amount of training data for USMLE. But as stated elsewhere, there is an enormous gap between passing exams and treating real patients as a doctor. It's rarely about making obscure diagnoses found in exam questions, but about managing illness in the context of a patient and their lifestyle, with many very human aspects - difficult communication, ethics & assessing family dynamics. Written exams are just to assess whether a medical student has the minimum required knowledge to practice, but also there are lots of practical exams and communication scenarios required too. It may well be the same for lawyers - passing the bar does not really relate to actual day-to-day practice.

pyb 3 years ago |

Sounds like they didn't have access to GPT-4, but "Based on anecdotal evidence"... they still predict this.

minimaxir 3 years ago | |

For some reason, there's a thought-leader sect of Twitter talking about how good GPT-4 is, despite OpenAI having provided zero hints of what GPT-4 could entail or be differentiated from GPT-3/chatGPT.

naillo 3 years ago | |

Source: "I have a hunch"

danenania 3 years ago | | |

My knee always gets achey right before a technological singularity hits.

swyx 3 years ago | |

yeah this is really low quality for HN. source is basically "trust me i heard a guy who knows a guy"

munchler 3 years ago | |

They’re extrapolating from the performance of GPT-3.5. It’s speculative, but not anecdotal. GPT has improved rapidly over time, so it's not a huge leap to predict that GPT-4 will be even better.

booleandilemma 3 years ago | |

Sounds like they're writing science fiction then.

michpoch 3 years ago | |

Maybe they asked chatGPT.

ldh0011 3 years ago |

fwiw I had my dad ask ChatGPT relatively high-level questions about his field of practice in the state he is licensed in. Some were very good answers but that some were wildly off. The ones that seemed to be better were questions about a concept (ie "What is x concept in law") while the incorrect ones were the ones asking for specifics ("What is the statute of limitations for x in y state").

czzr 3 years ago |

There’s a gap between passing the bar exam and actually practicing law - I’m pretty certain that I (someone with no legal training whatsoever) could pass the bar exam if you gave me unlimited access to the internet and a couple of additional hours to write the test. However, I don’t think that would make me an effective lawyer.

Ultimately standardised tests are proxy measurements of legal ability - it’s easy to see how a LLM could subvert the proxy without being sufficiently reliable in real life.

I do expect that even unreliable versions will be very useful tools for practicing lawyers, though.

allochthon 3 years ago | |

> I do expect that even unreliable versions will be very useful tools for practicing lawyers, though.

Agreed. It's like being able to call up a map on Google Maps for an area that you're already familiar with. The map can help you remember things about the area and terrain that you might not have recalled right away. A kind of cognitive aid.

microtherion 3 years ago |

IF it could (I wouldn't know one way or the other), I'd consider that a damning indictment of the Bar Exam failing to test for sentience, rather than evidence of GPT-4 having attained the same.

xyzelement 3 years ago | |

Bar exam is not a test of sentience but of the ability to recall, interpret, and apply the law. Because law is an entirely textual thing, I would expect GPT to be exceedingly well suited for it.

I've said for a long time that most doctors and lawyers are just databases with quick and imperfect retrieval.

booleandilemma 3 years ago | | |

And so as AI advances, the goal posts for what counts as intelligence are moved yet again.

morsecodist 3 years ago |

I am sorry but this title is click bait. These researchers ran GPT-3.5 on only the multiple choice sections of the Bar and it passed 2/7 sections. Is this really impressive? Absolutely. But the only element of the article that is about GPT-4 potentially passing the Bar is one paragraph near the end:

> According to the researchers, the history of large language model development strongly suggests that such models could soon pass all categories of the MBE portion of the Bar Exam. Based on anecdotal evidence related to GPT-4 and LAION’s Bloom family of models, the researchers believe this could happen within the next 18 months.

GPT-4 could potentially pass the Bar, it could potentially do a lot of things. But by their own admission the researchers have no hard evidence for this.

preommr 3 years ago |

How soon before this qualifies as a public defender? Gonna put this on my dystopia bingo.

elicksaur 3 years ago |

Such a milestone would say more about the Bar Exam (and other standardized tests) being a poor proxy for wisdom, than the advancement of computers.

dragonwriter 3 years ago |

> By passing this exam, lawyers are admitted to the bar of a U.S. state.

No, they aren't.

Meeting certain preparatory requirements (the details vary but in most US jurisdiction an accredited/approved law school program or, in some, what amounts to an apprenticeship with a licensed practitioner of certain duration and standards is required) and then passing the bar exam allows this.

The difference is important, the bar exam is not seen, standing alone, aa adequate proof of readiness.

w1nst0nsm1th 3 years ago |

AI seems to be the next financial buzzword, after crypto, gig economy, CDO, dotcom, and so on.

I have seen a video a few days ago saying we are coming out of data era and entering the 'Knowledge Era' thank to AI where knowledge is following a logarythmic path. A 'revolution', a 'paradigm shift', and other bubblebabble.

Who was telling that ? A 30 years old startup CEO wearing... a t-shirt and a jeans... You see the pattern.

I'm not an AI specialist, but for what I know, current AI are nothing more than fine tuned statistic algorythm.

Here a is a short french video with english subtitles from arte, the german-french public cultural television, about a painting coming from Midjourney : https://www.arte.tv/en/videos/110342-003-A/the-world-in-imag...

The video explain very well what AI are able to do (and consequently what they can't do) if you listen (read) carefully what the art historian say about the painting, which received the first price of 2022 collorado art festival.

In short, the painting is nothing new by itself but a patchwork of elements from different period of art history. In other word a statistic average of previous painting, photography, drawing, etc... based on the artist prompts in midjourney.

Not to say the painting is aweful, I personnaly find it's beautiful and could happily put it in my living room, but it definitively shows how current AI works, commented by an historian art specialist which has no ball in AI game.

nafeen 3 years ago |

Bar exam down. Medical next?

While GPT-3 wasn't advanced enough for cracking medical exam, it was used for notable contributions. For e.g. this is an interesting 2021 paper about "Medically Aware GPT-3 as a Data Generator" - https://aclanthology.org/2021.nlpmc-1.9.pdf

Would love to see if GPT-4 is advanced enough to take medical exam.

Aardwolf 3 years ago |

I envisioned a cocktail shaking robot, but apparently Bar Exam is an exam for US lawyers

ben_w 3 years ago |

I want to perform some research of my own on which exams chatGPT can and can't do. It's multilingual, so can people from outside the UK (I already know where to get those) point me at some example exams and marking schemes? Any level, not just top.

Currently have Polish school maths: https://news.ycombinator.com/item?id=34205732

dmurray 3 years ago | |

Ireland Leaving Cert (17-18 year olds)

https://theleavingcert.com/exam-papers/

ben_w 3 years ago | | |

Thanks :)

softwaredoug 3 years ago |

The Bar Exam is multiple choice, right?

This isn't grading some freeform essay or generating arbitrary legal opinion. It's answering from a limited set of answers.

IMO it's cool, but not THAT shocking given what we've seen from ChatGPT? Especially given GPT 3.5 is only 17% below human test takers?

morsecodist 3 years ago | |

From the article it looks like there are multiple choice and written sections but they only ran the model on the multiple choice portion.

post-it 3 years ago | |

No, you're thinking of the LSAT.

Iwan-Zotow 3 years ago |

So, how new knowledge would be created?

GPT has no reasoning capability. So, as time goes on, information massive(s) will be filled with GPT-X made up answers. It means GPT-X+1 will be trained on GPT-X generated data. So, without reasoning, how this thing will work in perspective?

criddell 3 years ago | |

I wouldn't assume that future versions are going to work the same way past versions did.

Iwan-Zotow 3 years ago | | |

MAybe, maybe not.

Problem is with data/content creation. If all new data are created with GPT-3, how it will help GPT-4?

No new original content -> no new model

charcircuit 3 years ago |

How do they know GPT-4 will be enough to let it pass? Is there even a big enough difference in the training data for it to improve in the areas it was struggling with?

sebzim4500 3 years ago | |

Rumours are that GPT-4 is a significant improvement over GPT-3.5. Given how big an improvement GPT-3.5 is over GPT-3 I am inclined to believe them. Probably we will find out for sure in a few months.

secondcoming 3 years ago |

How long until it's smart enough to be a judge?

klntsky 3 years ago | |

Never, assuming current legislation.

First of all, it is not formalized (despite being written with the use of bureaucratic language). So, there's no way to validate the output. Secondly, juridical system is based on authority of the state (which manifests clearly in their ability to alter the rules). Why would any sovereign ruler(s) want to get rid of their authority?

The only use cases would be automatic fines for speeding or inappropriate parking - but it's already there.

mdp2021 3 years ago | |

...After sharpness and judgement will be implemented.

Incidentally: there is an interesting video interview to Noam Chomsky and Gary Marcus on limits of current attempts at https://www.youtube.com/watch?v=PBdZi_JtV4c

...And Gary Marcus saying just before 7:00 that "something is missing" (understatement): ontology.

Gray Marcus: «...and these systems fall apart left and right».

Nice summary from Gary Marcus: «What they do is, they perpetuate past data - they don't really understand the world».

kneebonian 3 years ago | |

I don't know about judge but it could probably outperform most of Congress at this point.

BeFlatXIII 3 years ago |

How much of the bar exam consists of confident rhetoric using deductive logic? That seems to be right up the alley for GPT models.

ss108 3 years ago | |

A minority.

It's mostly about having stored legal rules in long term memory.

chrismcb 3 years ago |

I would think that it could post most tests, as the tests are generally based on factual information and not creativity.

micromacrofoot 3 years ago |

You know how hard it can be to talk to an actual support person at some companies? Imagine that for everything.

DisjointedHunt 3 years ago |

If you're not actively building it or related tech, you shouldn't carry the label "Researcher" in the press.

It's like : "I'm a doctor of homeopathy so i can write a headline for a story about a neural chip implant"

machiaweliczny 3 years ago |

How is baseline 50% in 4 choices exam?

moneywoes 3 years ago |

Wonder how data biases will surface

sdenton4 3 years ago | |

The fun part here is that most humans in the legal profession carry pretty extreme biases, judges included... The hope for legal ai is that you could progressively improve the biases, instead of waiting for N years for a bad judge to retire same maaaaybe get replaced by someone better.

tetris11 3 years ago | | |

who though, who has access to the resources to push the boundaries of next-gen AI except the rich who already have their own biases? The AI that the public will get will be just as useful as the tech that public get now: limited, isolating, and designed to restrict their freedoms I exchange for easy entertainment

SV_BubbleTime 3 years ago | |

This is what I found immediately interesting about ChatGPT.

I asked about controversial topics. Its answers didn’t seem like biases that were programmed in, but rather it took traditional media and gave it more weight than what turned out to be the truth only accepted much later on and still against a media retelling.

I lost a lot of faith in it knowing it was more CNN than careful deliberating AI.

anononaut 3 years ago | | |

It's well documented that controversial topics are subject to varying degrees of censorship and prompt editing/modification/appending. What you think may be a response in alignment with corporate media may in fact be corporations disallowing you from obtaining actual responses through various means that are being tested now. We can't know unless we have open source access to the unmodified model.

notwokeno 3 years ago |

The bar exam answer key can pass the bar exam, that doesn't mean that it would be a good lawyer.

criddell 3 years ago | |

We don't ask students to calculate sin(1.234) by hand these days. Exams for mechanical engineering students assume they will have a calculator with SIN and EXP buttons.

It may soon be time to update the bar exam and assume law students have access to AI tools.

nigerianbrince 3 years ago |

You passed the bar!