BTW This is the best sci-fi book ever.
Traditional codecs have always focused on trade offs among encode complexity, decode complexity, and latency. Where complexity = compute. If every target device ran a 4090 at full power, we could go far below 22kbps with a traditional codec techniques for content like this. 22kbps isn't particularly impressive given these compute constraints.
This is my field, and trust me we (MPEG committees, AOM) look at "AI" based models, including GANs constantly. They don't yet look promising compared to traditional methods.
Oh and benchmarking against a video compression standard that's over twenty years old isn't doing a lot either for the plausibility of these methods.
Learned video codecs definitely do look promising: Microsoft's DCVC-FM (https://github.com/microsoft/DCVC) beats H.267 in BD-rate. Another benefit of the learned approach is being able to run on soon commodity NPUs, without special hardware accommodation requirements.
In the CLIC challenge, hybrid codecs (traditional + learned components) are so far the best, so that has been a letdown for pure end to end learned codecs, agree. But something like H.267 is currently not cheap to run either.
Agreed hybrid presents real opportunity.
Someone was just having fun here, it's not as if they present it as a general codec.
This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.
> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.
It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.
However is does raise an interesting property in that if you are on the spectrum or have ADHD, you only need one headshot of yourself staring directly at the camera and then the capture software can stop you from looking at your taskbar or off into space.
I don't know. I think you'd be surprised.
That's already kind of an issue with vloggers. Often they're looking just left or right of the camera at a monitor or something.
Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21
If you could reserve a small portion of the radio bandwidth to broadcast a thumbnail + low bandwidth compressed representation of the face movements, you could technically have something similar without encoding any video (think low res, eye + mouth movements).
Maybe there is a custom web filter in there somewhere that could block particular people and images of them.
Does anyone else remember the weirder (for lack of a better term) features of MPEG-4 part 2, like face and body animation? It did something like that, but as far as I know nearly no one used that feature for anything.
https://en.wikipedia.org/wiki/Face_Animation_Parameter
and in the worst, trust on the internet will be heavily undermined
...as long as the model doesn't include data to put a shoe on one's head.
Lossiness definitely matters when you’re doing forensics. But not for consumers.
If you just want to bop to Taylor who the fuck cares. The iPod ended that argument. Yes I can be a perfectionist, or I can have one thousand songs in my pocket. That was more than half of your collection for many people at the time.
24fps * 52 facial 3D marker * 16bit packed delta planar projected offsets (x,y) = 19.968 kbps
And this is done in Unreal games on a potato graphics card all the time:
https://apps.apple.com/us/app/live-link-face/id1495370836
I am sure calling modern heuristics "AI" gets people excited, but it doesn't seem "Magical" when trivial implementations are functionally equivalent. =3
- Arthur C. Clarke
It just means that a person can't readily distinguish between the compressed image and the uncompressed image. Usually because it takes some aspect(s) of the human visual system into account.
[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=per...
As an example, crf=18 in libx264 is considered “perceptually lossless” for most video content.
for the record, I found liveportrait to be well within the uncanny valley. it looks great for ai generated avatars, but the difference is very perceptually noticeable on familiar faces. still it's great.
"no perceived loss" is a perfectly internally consistent and sensible concept and is actually orthogonal to whether it's actually lossless or lossy.
For instance an actually lossless block of data could be perceptually lossy if displayed the wrong way.
In fact, even actual lossless data is always actually lossy, and only ever "perceptually lossless", and there is no such thing as actually lossless, because anything digital is always only a lossy approximation of anything analog. There is loss both at the ADC and at the DAC stage.
If you want to criticize a term for being nonsense misleading dishonest bullshit, then I guess "lossless" is that term, since it never existed and never can exist.
In that scenario it certainly would not be `transparent` ie visually without any lossy artifacts. But your perception of it would look lossless.
The future is going to be weird.
"As a rule, strong feelings about issues do not emerge from deep understanding." -Sloman and Fernbach
No doubt encoders and the codecs themselves have improved vastly since then. It would be interesting to see if I could tell the difference in a double-blind test today.
It's easy enough to specify an average person looking very closely, or a 99th percentile person, or something like that, and show the statistics backing it up.
On the other hand I think both books developed ideas wonderfully and there are bits of them I keep coming back to, even if I'll probably never reread them
Though, I somewhat doubt even 22kbps is available generally.
Does the compression algorithm work well for transparency? Yes, it's effect on transparency is totally transparent! In fact the transparency is fully transparently compressed by our codec.
Yeah, don't do this please. Perceptually lossless is a term I've heard lots of times before and companies developing codecs usually have a fairly strong technical basis for making the claim. As in, it's not like they just glance at the results and say "yep, looks good to me". Rather, they'll be looking and spectral curves and image diffs - probably also motion diffs for videos - and checking whether they the losses are small enough to be undetectable to human eyes.
https://www.npr.org/sections/therecord/2015/06/02/411473508/...
That being said, 320Kbps AAC encoded by Core Audio I’ve found to be pretty much transparent with anything I’ve thrown at it. Anything less than that (256Kbps AAC, 320Kbps MP3, etc) I can ABX sometimes, as long as I’m familiar with the source material, and usually only with quality headphones. Although no streaming services provide that, so I’m stuck with ALAC through Apple Music for streaming (which is more convenient than my old solution, which was transcoding and transferring to an iPod ~20k songs selected from ~90k in my library based on a variety of rules than never gave me the song I’m looking for). And really, ~900Kbps lossless is pretty easy to justify these days with 5G data speeds and generally much higher data transfer limits.
The other downside to storing losing encodings these days is the fact that almost everyone uses Bluetooth for their listening, which is an additional lossy encoding. While 256Kbps AAC/320Kbps MP3 might be transparent in some cases, when it’s re-encoded it very rarely is (in my experience)
I always intend to figure out how that works, because i don't feel a lot of audiophiles are actually speaking truth in many cases lol. Still, i don't know - i can't remember my sources to figure it out for myself :/
Smells like rationalization to me.
https://www.unrealengine.com/en-US/metahuman
The artifacts in raster image data is nowhere near what a reasonable model can achieve even at low resolutions. =3
On the other hand AI videos can be easily mistaken for people or hyper realistic physical sculptures.
https://img-9gag-fun.9cache.com/photo/aYQ776w_460svvp9.webm
There's something basic about how light works that traditional computer graphics still fails to grasp. Looking at its productions and comparing it to what AI generates is like looking at output of amateur and an artist. Sure, maybe artist doesn't always draw all 5 fingers but somehow captures the essence of the image in seemingly random arrangement of light and dark strokes, while amateur just tries to do their best but fails in some very significant ways.
Are you sure? After all, you can effectively summarize meetings in a plain text which is extremely restricted in comparison to the original input. Guaranteed, exact manner of speech and motions and all subtleties should be also included to be fair, but that information is still far limited to fill the 20 kbps bandwidth.
We need far more bandwidth only because we don't yet have an efficient way to reconstruct the input faithfully from such highly condensed information. Whenever we actually could, we ended up having a very efficient lossy algorithm that still preserves enough information for us human. Unless you are strictly talking about the lossless compression---which is however very irrelevant in this particular topic---, we should expect much more compression in the future even though that might not be feasible today.
There is "Is identical", "looks identical" and "has lost sufficient detail to clearly not be the original." - being able to differentiate between these three states is useful.
The other two are variations of lossy.
Calling one of them "perceptually lossless" is cheating, to the disadvantage of algorithms that honestly advertise themselves as lossy while still achieving "looks identical" compression.
It's also used in the first paragraph of the Wikipedia article on the term "transparency" as it relates to data compression.
It’s certainly true that Vinge doesn’t spend much time on the engineering details, but I find him unusually clear on “imagine if we had this kind of impossible-now technology, but the rest of what we know about physics remained, how would people behave?”
He was, after all, a physics professor.
Rainbow’s End is much clearer on this than his distant future stuff, of course.
That award goes to Greg Egan who has full list of citations on his website for each of his novels, as well as a list of mathematicians and physicists he requested help from.
If you want to read books that occasionally delve into pages of equations, Greg Egan is the author for you! (Seriously though, really good books, and the implications of his "what-ifs" are pretty damn cool)
The short stories "Luminous" and "Dark Integers", the novels "Diaspora" and "Schild's Ladder". So good.
qntm (another author) hits somewhat similarly.
Actually, he was a mathematics and computer science teacher at San Diego State University.
Maybe it's not productive to quibble about definitions like this, but FWIW I don't agree with this criteria. I would argue Greg Egan's work, for example, is just about the "hardest" sci-fi there is, and yet much of that work takes place in universes that are entirely unlike our own.
Personally, I think what makes for "hard" sci-fi is that the rules of the universe are well-laid-out and consistent, and that the story springs (at least in some significant part) out of the consequences of those rules. That may mean a story set in the "future", where we have new technology or discover new physics, or "alternate universe" sci-fi like Egan's.
You don’t claim to be definitive?
Definitely worth giving it a try if you're a programmer, just for the fact that it's written by another programmer: the opening scene where they find a bunch of rules written down and just follow them reminds me of ACPI; the discussion of public-key cryptography and shipping drives full of one-time-pad around the galaxy; the "compression scheme" with the video.
One could rely on the media encoder to garble output enough to look more plausible (people on potato devices are used to looking at garbage content.) However, at the end of the day the "uncanny valley" effect takes over every-time even for live action data in a auto-generated asset, as the missing data can't be "Magically" recovered with 100% certainty.
Bye =3
In movies it can be done with enough of manual tweaking by artists and a lot of photographic content around to borrow sense of reality from it.
"Potato" devices by which I assume you mean average phones, currently have better resolutions than PCs had very recently and a lot still do (1080p).
And a photo on 480p still looks more real than anything CGI (not AI).
Your signature is hilarious. I won't comment about the reasons because I don't want this whole thread to get flagged.
https://www.youtube.com/watch?v=vJG698U2Mvo
Several 8bit games had their own aesthetic charm, but were at least fun...
Cheers, =3
So it’s not universal but is an accepted definition that any deviation from the possible or probable (for example, including faster-than-light travel or paranormal powers) to be a mark of "softness."
https://en.wikipedia.org/wiki/Soft_science_fiction
Popular science fiction is generally extremely soft, but occasionally you get stuff like The Cold Equations where the plot is driven by real world constraints. Even then it included FTL so a purest would call it soft.
In defense of hard science fiction, it’s a meaningful category to talk about even if it’s not something you personally care about. People often want to weaken it but that just opens a door for a new category say “scientific science fiction” and we are back to square one.
Asking questions like what does AGI look like when they can’t just magically solve all issues can be fun. Hand waving the singularly as some religious event can also make interesting stories but so is considering how chaos theory limits what computation can actually achieve.
Greg Egan's law changes are on the level of "I consulted with a bunch of theoretical physics professors and asked them what the implication of tweaking this one fundamental constant would be, then I spent years meticulously crafting a world that takes into account those implications, and I had others physics professors check over my work to make sure it was within the bounds of actuality, and then I wrote a story about characters in this new world."
> Asking questions like what does AGI look like when they can’t just magically solve all issues can be fun.
Greg Egan actually has a great book about this! Permutation City. CPU cycles aren't unlimited, and there are tons of ethical problems being confronted with the entire "simulate a person" thing.
It’s common for the rules of the universe to be adapted to fit the plot of random Star Trek episodes.
HP is not considered science fiction because of the trappings of the story. People use spells and enchanted objects for telekinesis, teleportation, and time travel not psychic abilities and technology to do the same things.
> critically examine its own premise
A great deal of science fiction doesn’t do that while plenty of fantasy does.
the laws of the universe in Harry Potter are so fickle and ever changing with the plot line that to me, it can only be considered soft. compare with Egan who takes a given cosmology and then works 100% within that world. that's hard.
Consider The Martian, early versions where posted online and the author changed what resources the character had to work with at the beginning. So what feels like a creative solution to limited resources was really giving the character exactly what they needed after a solution was found. Only examining a work we can’t distinguish ‘soft’ physics updated as the plot demands from a story based around fixed rules.
One of the more interesting science fiction short stories I read seemed to have very inconsistent time travel, but on closer reading you find the two different methods involved had two different sets of rules. It’s easy to say something is inconsistent, but any possible story has a corresponding set of rules that work.
It’s rather similar to considering what characters may have been lying in a story.
> but on closer reading
does not make the inconsistencies go away, but they multiply.
Again based on specific assumptions. The universe of possibilities includes very strange places.
that's an unjustified assumption.
If nothing else pure randomness is a unsatisfying possibility as is a full branching search of every possible state for a universe.