Falsehoods programmers believe about video

Falsehoods programmers believe about video(haasn.xyz)

250 points by pomfpomfpomf3 9 years ago | 133 comments

derefr 9 years ago |

> rendering subtitles at the output resolution is better than rendering them at the video resolution

I would like to know what's wrong with this approach. I watch a lot of commentated speed-run videos: that's often something like ~244p video, plus soft subtitles. The subtitles get rendered at the source resolution (presumably, into the video framebuffer) and then upscaled along with the image, forcing them to be a tiny blurry mess instead of the crisp, readable text they could be.

CoolGuySteve 9 years ago | |

It's also missing the most common error I see: conflating subtitles with closed captions.

Closed captions are positioned on the screen to indicate who's talking, have descriptive audio for sound effects, and should be in a high contrast easy to read font (most people with hearing deficiencies also have problems seeing, ie: out of date prescriptions for both hearing aids and eye glasses).

As far as I know, QuickTime does it right but the Apple TV, Netflix, and YouTube fuck it up, but that's because I helped write the QuickTime one way back.

deadmutex 9 years ago | | |

AFAIK, The YouTube implementation does all of those.

Here is a demo: https://www.youtube.com/watch?v=BbqPe-IceP4

Please do not spread falsehoods.

Disclamer: I work at YouTube.

tantalor 9 years ago | | |

Okay, so how are subtitles different?

akiselev 9 years ago | |

I think that point should be amended to say "rendering subtitles at the output resolution is always better than rendering them at the video resolution." You don't want to upscale 244p soft subtitles to 1080p but you do want to default to giving video authors creative control over how the subtitles are displayed. The ASS subtitle format allows for some very complex styling that can be used as an artistic element in video (or just to make sure there's proper contrast, can be read by color blind people, character differentiation, etc.) so you generally don't want to assume anything. There's also the issue of coordinates for where the subtitles are supposed to be that all go to shit if you render them on a transformed (up/downscaled) frame.

haasn 9 years ago | | |

This comment is pretty much what I was going for. I've reworded it to make it clearer.

The issue you can run into in practice is stuff like softsubbed signs, which can clash and look out of place with the native video if you render them at full res. There's also a related issue, which is that if you're using something like motion interpolation (e.g. “smoothmotion”, “fluidmotion” etc. or even stuff like MVTools/SVP), softsubbed signs will not match the video during pans etc., making them stutter and look very out-of-place - the only way to fix that is to render them on top of the video before applying the relevant motion interpolation algorithms.

Personally I've always wished for a world in which subtitles are split into two files, one for dialogue and for signs, with an ability to distinguish between the two. (Heck, I think softsubbed signs should just be separate transparent video streams that are overlayed on top of the native picture, allowing you to essentially hardsub signs while still being capable of disabling them)

Also, sometimes, rendering at full resolution is prohibitively expensive, e.g. watching heavily softsubbed 720p content on a 4K screen.

JoshTriplett 9 years ago | | |

> There's also the issue of coordinates for where the subtitles are supposed to be that all go to shit if you render them on a transformed (up/downscaled) frame.

Sure, you have to transform the coordinates to the output. But still, better to render fonts at the final resolution; they'll always look better than if scaled after rendering.

kazinator 9 years ago | |

Maybe nothing is wrong; just that maybe it's not always strictly better. Suppose you are asked to form a plan for adding subtitle support to some unfamiliar video platform. It's probably best to start with an open mind about where in the pipeline subtitles will be composed with the video.

the8472 9 years ago | |

In fact, rendering subtitles at the display resolution is one of the big selling points of the xy-subfilter + madvr renderer combination.

The only practical downside I have noticed is that accurate rendering of subs containing complex vector graphics or effects (ASS supports that) at > HD resolutions takes a lot of CPU time, sometimes more than a single core can handle in realtime.

There probably is a lot of potential for optimization, but those are hobby projects for their maintainers.

jheriko 9 years ago | |

the point is precisely that it is more complicated than this obvious interpretation.

whilst i don't necessarily agree... i do agree that if you want to conform to specs then you can't go thinking this way.

franciscop 9 years ago |

The original one ( http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-b... ) left me bafled. Then I realized you have to strike a balance; otherwise you cannot deal with names at all. The point where drawing the line depends on your industry/customers, but I'd safely say that it's too restrictive nowadays so these lists are useful somewhat and of course they are interesting.

nhaehnle 9 years ago |

This is a good list, but it would be so much better with some (brief) pointers to counter-examples to the beliefs.

donatj 9 years ago |

- "all subtitle files are UTF-8 encoded"

Hah, this strikes really close to home. I've had to work with so so many subtile files in Eastern European and Turkish Windows codepages mostly but not entirely compatible with Win-1252. There's no way to tell them apart programmatically, so you check that the extended characters make sense. It's a bit of a nightmare.

smallnamespace 9 years ago |

This article would be infinitely better if it any provided counterexamples.

iopq 9 years ago |

> my hardware contexts will survive the user’s coffee break

hell, they don't survive alt-tabbing into a game that has a different resolution than the monitor

pvdebbe 9 years ago | |

Heh... for some reason youtube can't survive when I start a video on my monitor and then I switch outputs to TV using an xrandr script by closing one output and opening the other. I thought it was possible to continue the video that way but once I noticed it doesn't work, it made sense immediately.

Mplayer and co, on the other hand can cope with it but my window manager can mess it up so I don't bother.

tuxidomasx 9 years ago |

This list makes me not want to program any [video stuff]

scottlamb 9 years ago |

From the article:

> I can exclusively use the video clock for timing

Heh. I just finished writing up a design doc to address problems I had with this, and I referenced "Falsehoods programmers believe about time". Then I opened Hacker News and saw this article. So this is very timely for me.

(My doc: https://github.com/scottlamb/moonfire-nvr/blob/new-schema/de...)

jheriko 9 years ago |

it is true, video is a nightmare mess littered with weird functionality nobody needs. (limited range only just disappeared in rec 2100, optionally??? really??? i'm not worried about my electron gun in my CRT from 1975 these days...nor do i want to know what a Y or a Cb or a Cr means because everything is RGB and B&W TV is long dead... and 4:2:2 is not exactly compression so much as computational overhead etc.. etc.)

its a nightmare, but the reason for these observations is precisely that it shouldn't be a nightmare. this area of programming is a wasteland ... nobody that good wants to solve these trivial problems :/

lolc 9 years ago |

And this is why I don't do video. (And have lots of respect for the people who write the libraries I use.)

FranOntanaya 9 years ago |

Could write an entire page just on subtitles.

antirez 9 years ago |

There is a lot of potential information in such a list. But in this form is quite a "trust me" thing that does not really add to the reader knowledge.

milansuk 9 years ago |

Nice one! Now I would like to see article like this, but about ciphers, hashes, digital signitures, etc.

the_duke 9 years ago |

An explanation for each 'falsehood' would have been nice

ryanmarsh 9 years ago |

Well video programming just sounds delightful.

/sarcasm

justinlaster 9 years ago |

> a H.264 hardware decoder can decode all H.264 files

and

> video decoding is easily parallelizable

At a previous job, I don't know if it was just the field I was in or just bad luck, but having to explain this over and over again was kind of a personal nightmare.

That being said, this is an excellent list!

microcolonel 9 years ago |

I don't think programmers believe any of the video decoding falsehoods; not because they know any better, but because they know they don't know.

Also, none of these unfounded preconceptions make intuitive sense, so I don't see why people would believe them.

imaginenore 9 years ago |

> interlaced video files no longer exist

Interlaced video files should no longer exist.

Seriously, fk interlaced video.

> upscaling algorithms can invent information that doesn’t exist in the image

That's not a falsehood. Upscaling does invent information that doesn't exist in the image.

mrob 9 years ago | |

"Information" in the information theory sense. The output of a deterministic upscaling algorithm can be exactly described by the input and the algorithm. There's no added information, only a different way of presenting the original information.

jeff_tyrrill 9 years ago | |

> Interlaced video files should no longer exist.

Yes, they should, as should silent movies, black and white movies, old game consoles with exotic output formats like vector graphics, and the like.

It is a worthy endeavor to create and maintain video playback software that lets people consume beloved content that was made to the technology of its day, including home videos, sports games, TV shows with special effects edited in 60i, and video games.

emcq 9 years ago | |

Perhaps that author was being pendantic, but from an information theroetic perspective it is correct that you cannot invent information with upscaling.

The upscaled image does not have more information than what was in the original image; you can reconstruct the upscaled image given only the information available in the original image, the output resolution dimensions, and upscaling algorithm.

imaginenore 9 years ago | | |

That's like saying fractal images are not information. Just because something is generated by a formula, doesn't mean it's not new information.

AznHisoka 9 years ago |

can we have falsehoods programmers believe besides video that are more common? this list probably is relevant for 1% of programmers here.

greenyoda 9 years ago | |

Just type "falsehoods programmers believe" into the search box at the bottom of the page and you'll get a ton of previous articles on falsehoods in various domains that have been posted here over the years:

https://hn.algolia.com/?query=falsehoods%20programmers%20bel...

And while this topic is not personally relevant to me since I don't work with video decoding, I do find learning about different technologies interesting. Reading this gives me an appreciation for how much effort goes into making video, something we all take for granted, work.

If people only posted articles that were relevant to a majority of readers, HN would be a much less interesting place.

fgandiya 9 years ago | |

Here's a whole list! https://github.com/kdeldycke/awesome-falsehood/blob/master/R...