Nougat: Neural Optical Understanding for Academic Documents

Nougat: Neural Optical Understanding for Academic Documents(facebookresearch.github.io)

54 points by falkaer 2 years ago | 27 comments

echo_time 2 years ago |

Funnily enough the Example Page 1 is wrong. Rendering du^n as du^*, and then nu^n-1 as nw^*-1.

It is impressive but...it really feels like those are the details that really really matter.

nicodjimenez 2 years ago | |

Second page is even worse. Ends in repeated \cdots and doesn’t finish parsing page. Also it read number 73 as 3 I guess because the previous section number was 2.

orbital-decay 2 years ago | |

The main issue with OCR of anything with math in it is always that it has to be not 99.99% but 100% correct. Which is probably not possible.

stevenae 2 years ago |

Missed opportunity to call it "...Texts" instead of "Documents".

rnadomvirlabe 2 years ago |

If you are interested in this sort of thing for producing LaTeX documents, I've had good experience with [Mathpix](https://mathpix.com) in the past. It can take most anything I throw at it, and even render matrices well.

gremlinunderway 2 years ago |

This is great, but when is academia, business and government going to finally get off PDF as a typical standard? It's awful, not adaptive for mobile, and a pain in the ass to work with for any kind of development.

vosper 2 years ago | |

What's a good alternative, for users and developers?

I don't have any love for PDF, but I'm actually not sure what's more cross-platform. Any browser will render PDF, so everyone already has a viewer on their computer. A browser will also print any document to PDF, and many other editors can export to PDF (though perhaps not import for editing)

It can't be replaced by an Office format, like docx, because even today apps like Pages can't render MS Office docs correctly half the time.

Doesn't seem like HTML would fly, either, given all the kinds of things that get embedded into PDF.

harshreality 2 years ago | | |

HTML and various javascript libraries like mathjax or other libraries for charts and graphs.

> Doesn't seem like HTML would fly, either, given all the kinds of things that get embedded into PDF.

That's ironic. Browser PDF readers, at least open source ones, render PDFs as HTML using javascript. At least I'm sure about FF because I just checked that text from a native-digital pdf showed up in the DOM in developer tools.

froh 2 years ago | |

pdf looks the same everywhere and is self-contained. an "immutable" document which looks the same for everyone if it hashes to the same sha... key. which has a value on it's own.

harshreality 2 years ago | | |

There's nothing immutable about pdfs. If you have an "original" document, it'll always hash to whatever it hashes to. I fail to see the point. You can cite md5 hashes on LG the same whether they're pdfs or epubs or, heaven forbid, azw3 (amazon's proprietary epub-like format).

What's the obsession with "looking the same everywhere"?

Page references: this shouldn't be a thing. Academia has already solved this problem for notable texts. Rather than nearly uncountable numbers of paragraphs that all run together, paragraphs or short sections or lines are numbered. See any good edition of Plato or Aristotle, or just about any notable play or longer poem ever translated. Relying on a single published layout of a work to reference is dumb.

Citing exact line numbers isn't even necessary for native-language works. When they're digital, search works. It works even better in flowed-format texts than it does in pdfs, which sometimes, depending on how the pdf was constructed, won't match text properly across newlines.

Visual quality: As long as images—data, charts, graphs, photographs—are not degraded beyond usefulness, the actual text, and its display, is up to the reader application. Everyone uses the web complete with mathjax, and those doesn't have Knuth-approved formatting in every respect. But they're good enough, and they work everywhere on every device without squinting or pinch to zoom. There are some people who insist on putting pre-rendered images of math in html, and they always look worse, because they don't match the text without a lot of work to have extra high-res images that are auto-scaled according to viewport and surrounding font size—work that I bet not many people have ever done in the history of html publishing.

anigbrowl 2 years ago | |

not adaptive for mobile

I look at pdfs on my phone all the time, it's great. 'Optimized for mobile' usually means oversized fonts and a shitty UI so I get RSI in my thumb from endless scrolling.

PDF is kind of an ugly format, but the problem with realtime text flow etc. is that designers are (at the behest of clients) are always trying to look visually distinct and as a result nothing is standardized or predictable at the rendering end. 95% of digital layout is ass compared to the print version.

yonatan8070 2 years ago | | |

It works, but that doesn't mean it's good. A document designed to be printed on an A4 page isn't easy to read when scaled down to the size of a phone, especially for people with suboptimal eyesight, and if you zoom in to read the text, you need to scroll horizontally for every line you want to read

adr1an 2 years ago | |

We need more DjVu!

ttul 2 years ago |

“Hey guys, we need to train a large language model to understand all of science. All we have is a stack of old papers from the dawn of time and we need to convert them into LaTeX…”

stavros 2 years ago |

Semi offtopic, but don't pretend you're making an acronym when the letters aren't the first ones in each word.

Or, as I like to call it, SOUNDYMAREHEATRONER.

mikewang 2 years ago |

I made a quick test: 1. GPU resoursce consuming. 16G~ 2. I test some document other than English, it is poor.

poulpy123 2 years ago |

it's interesting to see they didn't pick their examples to exclude all failures. I find that really great and it should be done more often

czbond 2 years ago |

pretty impressive