GPT-4V(ision) system card [pdf]

GPT-4V(ision) system card [pdf](cdn.openai.com)

46 points by juunge 2 years ago | 15 comments

simonw 2 years ago |

Genuine question: why is this only published as a PDF?

OpenAI have the resources to also publish this as HTML. They chose not to.

They're not alone in this - most of the academic and research world, plus the concept of a "whitepaper" seems predicated on the idea of publishing PDFs.

Is this some stupid thing where human beings are expected to attach more prestige to information published in this way?

PDFs are a terrible way of publishing information in 2023:

- they render poorly on mobile devices, where many (most?) people do their reading

- they're hard to copy and paste information out of

- you can't link to headings within them (like HTML fragment links)

- you can't easily run them through translation tools like the one built into Chrome

The benefits of PDF I can see are:

1. Easier to print and get the exact expected output

2. You can save one file offline

3. Easier to author

I'm not arguing to replace PDFs with HTML (though I wouldn't miss them personally) - I'm saying publish documents as both!

Provide an HTML version and a PDF alternative for people who want it.

Am I missing something here? Why does the academic and research world stubbornly stick to such a hostile way of publishing their results?

lwneal 2 years ago | |

I think it's about citation. Traditionally, a pdf is a complete and finished work, analogous to a published journal article or book. It is static content and will not change, unlike HTML which might be "under construction".

This isn't necessarily still true: HTML content can stay up on the web forever and a pdf can change, but people still prefer to cite something that looks like a paper document.

Since a whitepaper is often meant to be cited, it's published as a pdf to take advantage of this preference.

The best approach is to publish a PDF for citation along with a public HTML demo, like https://jonbarron.info/mipnerf360/

civilitty 2 years ago | | |

It's also feasible to track changes this way. Download the PDF and compare the md5/crc/sha hash to an older pdf file - if they're the same, then there haven't been any changes.

With web pages, you have to download all the linked files and turn them into a deterministic archive and hope that the Javascript included doesn't pull any dynamic content (which isn't really practical to begin with).

simonw 2 years ago | | |

This is a really convincing answer, thank you.

solveit 2 years ago | |

I would guess OpenAI uses LaTeX as their default choice because they want to write equations.

behnamoh 2 years ago | | |

I’ve been using Word Equations and never have I needed to use Latex. I think most people are still under the wrong impression that only Latex can handle math expressions. I use my custom shortcuts in Word to speed it up tho.

yberreby 2 years ago | | |

Using LaTeX does not necessarily imply rendering to PDF.

https://ar5iv.labs.arxiv.org/

behnamoh 2 years ago | |

On top of the other comments, using PDF also makes it harder to crawl data to train or finetune language models. I absolutely hate the PDF format for the reasons you mentioned. I went to long lengths to find PDF viewers with dark mode and Vim key bindings just to make my PDF experience better.

Tijdreiziger 2 years ago | |

> they render poorly on mobile devices, where many (most?) people do their reading

Acrobat Reader solves this with their ‘liquid mode’. But yeah, it would be nice if there was a FOSS renderer to do the same.

simonw 2 years ago | | |

Apparently https://www.zotero.org/ is an open source tool that can render PDFs in that way - I haven't tried it myself yet.

tmaly 2 years ago |

Looking at this, it gave me this other idea.

I was looking over older State building codes from early 90s for a homeowners association issue.

Most of these older codes are scanned pictures of the text.

It would be interesting if they have some type of OCR extension for ChatGPT where you could upload the image of the pages and it could OCR and work with the text.

This same situation happens with the city council agendas current day. They make these 300 page pdf documents all of scanned images of the text. It is really hard to search them and figure out what is going on.

dhalp 2 years ago | |

There are a few companies who are actively tackling these types of challenges. The space is called "Intelligent Document Processing".

Checkout aihub.instabse.com or docsumo.com

hsdropout 2 years ago |

In this PDF there is an example of a controversial output in response to an image for job applicants. The "solution" was to decline to answer that category of question. This doesn't feel like a reasonable approach, as it will become a game of whack-a-mole.

This also seems to acknowledge that the model has deep bias-related flaws and instead of treating the causes, they are going after symptoms.

stoicbatman 2 years ago |

An interesting perspective on the use of PDFs in the academic and research world. What I find striking is how PDFs have remained so prevalent despite the rapid digital transformation in recent years. While the static nature of PDFs lends itself to easy citation, it's time we reconsidered the emphasis on format over functionality.

swyx 2 years ago |

my notes:

- ramped up to 16k BeMyEyes + 1k developer alpha testers over 6 months

- reduced frequency and severity of hallucinations

- improved OCR and quality of descriptions

- great demand for describing people without affecting privacy/bias - intentionally refusing person identification 98% of the time and lowering accuracy to 0%. also declining a whole lot of problematic queries, per fig 8

- converting known jailbreaks to images to defend against multimodal jailbreaks. ironic how jailbreak collection websites probably made it a lot easier to break the jailbreaks

- interesting descriptions of mitigation process in 2.4.2.

discussion linked https://twitter.com/swyx/status/1706359912283152556