GPT-4V(ision) system card [pdf](cdn.openai.com) |
GPT-4V(ision) system card [pdf](cdn.openai.com) |
OpenAI have the resources to also publish this as HTML. They chose not to.
They're not alone in this - most of the academic and research world, plus the concept of a "whitepaper" seems predicated on the idea of publishing PDFs.
Is this some stupid thing where human beings are expected to attach more prestige to information published in this way?
PDFs are a terrible way of publishing information in 2023:
- they render poorly on mobile devices, where many (most?) people do their reading
- they're hard to copy and paste information out of
- you can't link to headings within them (like HTML fragment links)
- you can't easily run them through translation tools like the one built into Chrome
The benefits of PDF I can see are:
1. Easier to print and get the exact expected output
2. You can save one file offline
3. Easier to author
I'm not arguing to replace PDFs with HTML (though I wouldn't miss them personally) - I'm saying publish documents as both!
Provide an HTML version and a PDF alternative for people who want it.
Am I missing something here? Why does the academic and research world stubbornly stick to such a hostile way of publishing their results?
This isn't necessarily still true: HTML content can stay up on the web forever and a pdf can change, but people still prefer to cite something that looks like a paper document.
Since a whitepaper is often meant to be cited, it's published as a pdf to take advantage of this preference.
The best approach is to publish a PDF for citation along with a public HTML demo, like https://jonbarron.info/mipnerf360/
With web pages, you have to download all the linked files and turn them into a deterministic archive and hope that the Javascript included doesn't pull any dynamic content (which isn't really practical to begin with).
Acrobat Reader solves this with their ‘liquid mode’. But yeah, it would be nice if there was a FOSS renderer to do the same.
I was looking over older State building codes from early 90s for a homeowners association issue.
Most of these older codes are scanned pictures of the text.
It would be interesting if they have some type of OCR extension for ChatGPT where you could upload the image of the pages and it could OCR and work with the text.
This same situation happens with the city council agendas current day. They make these 300 page pdf documents all of scanned images of the text. It is really hard to search them and figure out what is going on.
Checkout aihub.instabse.com or docsumo.com
This also seems to acknowledge that the model has deep bias-related flaws and instead of treating the causes, they are going after symptoms.
- ramped up to 16k BeMyEyes + 1k developer alpha testers over 6 months
- reduced frequency and severity of hallucinations
- improved OCR and quality of descriptions
- great demand for describing people without affecting privacy/bias - intentionally refusing person identification 98% of the time and lowering accuracy to 0%. also declining a whole lot of problematic queries, per fig 8
- converting known jailbreaks to images to defend against multimodal jailbreaks. ironic how jailbreak collection websites probably made it a lot easier to break the jailbreaks
- interesting descriptions of mitigation process in 2.4.2.
discussion linked https://twitter.com/swyx/status/1706359912283152556