3K, 60fps, 130ms: achieving it with Rust(blog.tonari.no) |
3K, 60fps, 130ms: achieving it with Rust(blog.tonari.no) |
Oh please. This is just rust sensationalism. People don't truly believe rust is faster than C do they?
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
https://benchmarksgame-team.pages.debian.net/benchmarksgame/...
I don't think this is true; the whole point of Rust is that unsafe operations are explicit, not that you never do so.
Also, I looked at the first one, and it's only using unsafe on what are basically op code calls; I don't think it is realistic to complain about that.
Of those 4 tasks, the rust programs for task 3 and task 4 do not use the keyword "unsafe".
For task 2, the spectral-norm Rust #6 program does use "unsafe" but #5 does not and it's almost as fast.
>I can send an IP packet to Europe faster than I can send a pixel to the screen. How f’d up is that?
and to relate to the other post about landlines: https://twitter.com/ID_AA_Carmack/status/992778768417722368
>I made a long internal post yesterday about audio latency, and it included “Many people reading this are too young to remember analog local phone calls, and how the lag from cell phones changed conversations.”
Is there somewhere to read about the changes in question?
I'm old enough to remember extensive use of analog landlines, and can't really think of any difference to a cellphone other than audio quality.
Either Cisco needed to bring down the cost massively to expand access or someone needed to build it in major cities and bill by the hour to compete against flying. None of those happened so it stayed a niche. Compared to those experiences more than a decade ago the common VC is still very slowly catching up. Part of it is setup, like installing VC rooms with 2 smaller TVs side by side instead of one large one so you can see the document and the other people at decent sizes. But part of it is still the technology. Those "telepresences" were almost surely on a dedicated link running on the telecom core network that guaranteed quality instead of routing through the internet and randomly failing. I suspect getting really low latency will require that kind of telecom level QoS otherwise you'll be increasing buffer sizes to avoid freezes.
Something to consider is that there are alternative techniques to interframe compression. Intraframe compression (e.g. JPEG) can bring your encoding latency per frame down to 0~10ms at the cost of a dramatic increase in bandwidth. Other benefits include the ability to instantly draw any frame the moment you receive it, because every single JPEG contains 100% of the data. With almost all video codecs, you must have some prior # of frames in many cases to reconstitute a complete frame.
For certain applications on modern networks, intraframe compression may not be as unbearable an idea as it once was. I've thrown together a prototype using LibJpegTurbo and I am able to get a C#/AspNetCore websocket to push a framebuffer drawn in safe C# to my browser window in ~5-10 milliseconds @ 1080p. Testing this approach at 60fps redraw with event feedback has proven that ideal localhost roundtrip latency is nearly indistinguishable from native desktop applications.
The ultimate point here is that you can build something that runs with better latency than any streaming offering on earth right now - if you are willing to make sacrifices on bandwidth efficiency. My 3 weekend project arguably already runs much better than Google Stadia regarding both latency and quality, but the market for streaming game & video conference services which require 50~100 Mbps (depending on resolution & refresh rate) constant throughput is probably very limited for now. That said, it is also not entirely non-existent - think about corporate networks, e-sports events, very serious PC gamers on LAN, etc. Keep in mind that it is virtually impossible to cheat at video games delivered through these types of streaming platforms. I would very much like to keep the streaming gaming dream alive, even if it can't be fully realized until 10gbps+ LAN/internet is default everywhere.
I was able to get latency down to 50ms, streaming to a browser using MPEG1[1]. The latency is mostly the result of 1 frame (16ms) delay for a screen capture on the sender + 2-3 frames of latency to get through the OS stack to the screen at the receiving end. En- and decoding was about ~5ms. Plus of course the network latency, but I only tested this on a local wifi, so it didn't add much.
[1] https://phoboslab.org/log/2015/07/play-gta-v-in-your-browser...
All the benefits of efficient codecs, more manageable handling of the latency downsides.
The challenges you'll run into instantly with JPEG is that the file size increase & encoding/decoding time on large resolutions outstrips any benefits you get in your limited tests. For video game applications you have to figure out how you're going to pipeline your streaming more efficiently than transferring a small 10 kb image as otherwise you're transferring each full uncompressed frame to the CPU which is expensive. Doing JPEG compression on the GPU is probably tricky. Finally decode is the other side of the problem. HW video decoders are embarrassingly fast & super common. Your JPEG decode is going to be significantly slower.
* EDIT: For your weekend project are you testing it with cloud servers or locally? I would be surprised if under equivalent network conditions you're outperforming Stadia so careful that you're not benchmarking local network performance against Stadia's production on public networks perf.
(Be gentle on your coworkers and use cabled headphones.)
AptX low latency codec adds only 40ms max.
Just buy headphones with good low latency support. They aren't even expensive anymore.
Why can't I have both? Wifi doesn't seem to have this latency problem.
The thing is that when we talk in a room, sound will take <10ms to reach my ears from your mouth. This is what "enables" all of the human turn taking cues in conversation (eye contact, picking up whether a sentence is about to end/whether it's a good time to chime in/etc) - I've been looking for work from people who've tried to see at what point things start feeling really bad (is it 10ms, or 50ms?), but haven't found much so far. No matter what it is though, it's likely that long distance digital communications just cannot match it.
See also this interesting comment about the feeling of "closeness" from phone copper wires:
https://news.ycombinator.com/item?id=22931809
Landlines were so fast and so "direct" in their latency (where distance correlates very directly with time, due to a lack of "hops") that local phone calls were faster than the speed of sound across a table, and for a bit after they came out--before people generally got used to seemingly random latency--local calls felt "intimate", like as if you were talking to someone in bed with their head right next to you; I also have heard stories of negotiators who had gotten really tuned to analyzing people's wait times while thinking that long distance calls were confusing and threw them off their game.
It seems normal phones are able to do it, though. At least it seems normal phones suffer less latency problem.
In a way, simplicity in technology often means better performance.
Digital communication could cheat, though!
There's a lot of latency hiding you can do, if you can predict well enough what's coming next. Humans are fairly predictable most of the time.
I don’t have many other comments to make other than I am surprised rust-analyzer was only mentioned in passing.
That's just one of their dependencies. It's possible to know every line without rewriting. And it's possible to rewrite and still not know every line.
They seem to strike a reasonable balance.
I bet IBM didn't expect using off the shelf components would mean that the IBM PC was the standard for the next 30 years and it wouldn't be theirs.
-----
As someone who works at a company that relied heavily on video conferencing (half the devs off shore) - every single major solution absolutely sucks, they are flaky, unreliable, sound quality is poor, video rate is poor (and this is with fat pipes at both ends) and worst of all latency, latency when trying to have a round table conversation with people remotely is horrific, it is good to see someone pushing the limits, Skype et al haven't gotten much better in the last decade yet my internet connection at home/work is x50 times faster and even mid range business laptops have much improved graphics grunt.
As for sound, I don’t think audiophile quality is necessary...
Given you'll need about 10Mbps upstream for 60fps 3K video it seems a little unreasonable not to add on a 320Kpbs (or more) audio stream.
It could make this useful for things like streaming music concerts.
JackTrip is the resulting software -- not end-user friendly, but apparently it works.
https://ccrma.stanford.edu/groups/soundwire/software/jacktri...
(Some basic numbers: sounds takes 1 ms to travel a foot, every ms is a foot of separation between musicians, 30ms of latency = 30 ft separation = the max for jamming. So 130ms is not low enough.)
Regarding the premise of high latency in webrtc: Google Stadia has ~160ms round trip latency at 4k from my Macbook to a data center, so it’s not like that’s unachievable.
Is it live streaming or is it the transport?
Are they doing video encoding (the audio encoding seems to be done by that webrtc-audio thing)?
Have they chosen a progressive encoding format that compresses frames and pumps them out to the wire as soon as they're done?
Is TCP or UDP involved or a new Layer 3 protocol entirely?
Have I just missed all of those parts or were they really missing amid all the Rust celebration?
tonari is the entire stack, similar in "feature scope" to WebRTC but with different goals and target environments.
> Are they doing video encoding (the audio encoding seems to be done by that webrtc-audio thing)?
Yep, this includes video encoding and transport. We don't use the WebRTC audio library for encoding or transport, just for echo cancellation and other helpful acoustic processing.
> Have they chosen a progressive encoding format that compresses frames and pumps them out to the wire as soon as they're done?
Yep, basically, if by that you mean we don't use B-frames or other codec features that would require buffering multiple video frames before receiving a compressed stream, so we're able to send out encoded frames as they arrive.
> Is TCP or UDP involved or a new Layer 3 protocol entirely?
We encapsulate our protocol in UDP since we operate on normal internet - a new protocol is out of the question without a huge lobbying force and 15 years of patience on your side.
> Have I just missed all of those parts or were they really missing amid all the Rust celebration?
We intentionally didn't get into the protocol details because we are saving that for a dedicated post (and code to back it up).
Looking forward to the technical post. If you're planning on releasing all of this royalty-free and opensource, you'd be quite a boon to the free and open internet. Getting this picked up by the likes of Mozilla and getting it into a browser would be amazing.
It seems that it can only connect to publicly visible hosts? Overall it looks like somebody should develop an application on top of this.
like Brian's 1970s-era MacBook Pro
That's a writer(s) who knows what it's like to read long (aka thorough) technical articles and not bore the readers to death.Great article!
After interaction with both rustfmt and go fmt, I have concluded that .editorconfig is solving a problem that really shouldn't be solved. We went through the ordeal of defining our C# coding standards where I work and, let me tell you, people (myself included) care very deeply about their way of structuring code. And it's a bloody waste of their time.
Having the language designers say, "here is how our language should be structured" is a breath of fresh air.
From a product point of view, I find it interesting that the illustrations/concept videos for these things always show people interacting very closely to the wall - e.g. playing chess, sitting around a table, etc.
https://tonari.no/static/media/family.48218197.svg
But in practice, people tend to keep their distance from it. E.g. the pictures of this setup tend to show people clustered in their own group on each side of the wall, with a solid 2-3 meters from the wall.
https://blog.tonari.no/images/ea56c74d-a55d-4183-9a7b-d69795...
It makes sense, it's awkward to be close to a large solid (emissive) surface, and humans instinctively get closer to their in group when faced with an out group. I wonder how the system could be designed to encourage participants being closer, if there is an advantage to that.
Even over wifi, speedtest shows 4ms/100mb/100mb on my internet connection, but Zoom, FaceTime, and others never use more than about 0.8Mbit/s for a video stream, and the resulting quality of audio and video is...understandably poor.
Latency too totally feels like a software problem, perhaps with too many layers of abstraction. (60fps->16ms for the camera, ~10ms for encoding with NVENC/equivalents, 35ms measured one-way latency from my laptop to my parents 4000km away, ~10ms decode, 16ms frame delay = 87ms one way). Maybe I'm asking for too much from non-realtime systems (I'm used to RTOS, extensive use of DMA, zero-copy network drivers, etc), but it seems that there is a lot of room to improve.
I have been itching to convert a small headshot videostream (thing under 100x100px) to audio, stream it over mumble and then convert it back to video, just to see what the latency is like. It would obviously be a big undertaking, but not as big as this methinks.
This rings very true for every high-performance thing I've ever worked on, from games to trading systems.
I totally feel you. It's impressive what the WebRTC implementation has achieved, but it's just not pleasant at all to work with it.
Latency happens throughout the whole stack; Unfortunately much would need to be fixed outside this project to achieve any further significant improvement.
Operating System, firmware, blackbox hardware are some other non-negligible sources of latency. Everything adds up.
They actually work on reducing latency and pushing high res video if your connection supports it.
for crate in $(ls */Cargo.toml | xargs dirname); do
cargo build
Why do this instead of cargo --workspace build
Is it so you can time the individual crates?But as long as we're nitpicking, nobody should just pipe `ls` into `xargs` like this, since it fails if anything has spaces in it.
Instead, do:
for cargo_toml in */Cargo.toml; do
crate="$(dirname "${cargo_toml}")"
pushd $crate
# ...
done
Don't be that person who writes a script which won't tolerate spaces in filenames!Beyond that I wish the article had explain a bit better why it chose these "better-than-std" crates. I'm actually using all the std variants in my projects, I'm curious to know if I'm missing out or if I just happen not to hit on their limitations.
At least for parking_lot, its README has a long list with its advantages over std: https://github.com/Amanieu/parking_lot/blob/master/README.md
To clarify, we're targeting "transparent" sounding audio, not "FLACs or bust" audio. Right now we send stereo 48kHz 96kb/s Opus (CELT, not SILK) that we found hit the voice transparency sweet-spot compared to the lossless audio source. We had used higher bitrates in the past, and could easily go back to them, but quality plateaued at around 96k in our experimentation.
More than choosing sane transparent-sounding encoding parameters, the biggest difference in fidelity by far was choosing the correct microphones and speakers for accurate reproduction of voices.
Are you using 48khz for a specific reason?
Audiophile grade at least has roots in high fidelity.
Does it though? Audiophiles generally seem to eschew fidelity in favour of something that sounds subjectively nice, including the psychoacoustic effects of spending a lot of money.
Eg. they seem very fond of "warmth". If you asked me to make something sound "warm", I'd be applying some soft clipping and dampening the top end, not eliminating sources of distortion.
Edit: If you actually wanted high fidelity, you'd use studio headphones / monitors, which are designed to be "unflattering", so you can be confident you'll hear any issues when mixing / mastering. People don't normally listen for pleasure with those, because they become fatiguing after a few hours.
Choosing equipment because you like the sound is a very reasonable thing to do, but it's not the same as pursuing fidelity.
It's too bad they didn't explain it. I expected they meant allowance for "full bandwidth" audio (possibly including music you can listen to).
Video conferencing systems generally use voice-only codecs compressed to shit, full of artifacts in the voice range and utterly dead outside of it.
Zoom calls do not sound there in the room with you. Microphones are terrible, there's compression artifacts, latency, packet loss, background noise, and tiny speakers. No one could possibly close their eyes and forget that the other person is not there in the room with them, on any POTS or VOIP technology that exists. But what if you could create an audio communications system with an actual illusion of auditory presence. Sounds amazing!
And given that this company is trying to create wall-screen, life size ultra-HD video conferences, I'm pretty sure that "audiophile" exactly what they're going for. Personally as a remote worker, I would absolutely swoon for this.
Google is well-aware of this, thus Fuchsia.
SeL4 would make a good base for such a device.
"only" is positive thinking.
I do play some rhythm games (LLSIF, deresute, mirishita) on Android. The difference between "only adds 30ms" and plugging my headphones directly to the headphone jack is the difference between unplayable and playable. The games do have a latency compensation setting (with a calibration procedure), but compensation is no substitute for the real thing: Low latency.
And even 30ms delay, using it on headphone/mic and both talker means 3022=120ms delay.
(The main reason for the advice of "Don't put spaces in paths" is really only because it breaks lots of poorly-written software... but that's not an excuse for your software to be poorly-written!)
I was never satisfied with the lower latency bound for that approach and felt like I had to keep pushing into latency territory that was lower than my frame time.
That said, MPEG1 was probably the simplest way to get nearly-ideal latency conditions for an interframe approach.
I guess as long as keyframes are common and packet loss is low it'd work well enough.
Any rhythm game player will disagree.
Some games (e.g. llsif, for android) have "perfect" window sized to 16ms (a video frame). Even with latency compensation, these are unplayable on bluetooth yet fine on headphone jack. As the game has calibration, the resulting offset is seen to be at least 30ms worse on bluetooth.
The only case that started to suck was the metro-local, and even then it was indistinguishable from the other cases until resolution or framerate were increased to the point of saturating the link.
One technique I did come up with to combat the exact concern raised above regarding encoding time relative to resolution is to subdivide the task into multiple tiles which are independently encoded in parallel across however many cores are available. When using this approach, it is possible to create the illusion that you are updating a full 1080/4k+ scene within the same time frame that a tile (e.g. 256x256) would take to encode+send+decode. This approach is something that I have started to seriously investigate for purposes of building universal 2d business applications, as in these types of use cases you only have to transmit the tiles which are impacted by UI events and at no particular frame rate.
In terms of encoding the audio, I don't know that I would. For video, going from MPEG->JPEG brought the perfect trade-off. For reducing audio latency, I think you would just need to be sending raw PCM samples as soon as you generate them. Maybe in really small batches (in case you have a client super-close to the server and you want virtually 0 latency). If you use small batches of samples you could probably start thinking about MP3, but raw 44.1KHz 16-bit stereo audio is only 1.44 mbps. Most cellphones wouldn't have a problem with that these days.
Edit: The fundamental difference in information theory regarding video and audio is the dimensionality. JPEG makes sense for video, because the smallest useful unit of presentation is the individual video frame. For audio, the smallest useful unit of presentation is the PCM sample, but the hazard is that these are fed in at a substantially higher rate (44k/s) than with video (60/s), so you need to buffer out enough samples to cover the latency rift.
Oscillations are the main issue with "catch-ups" in synchronization, and dropping frames once your buffer is too far behind is often a more pleasant artifact. It's not really a one-size-fits-all engineering problem.
The attached documents have additional information on implementation and (non) usage, including deadline to migrate legacy military systems. It's sadly quite cumbersome to go through the tens of PDF to find the relevant information.
That being said since they're drop-in replacements for the most part I suppose I could just try to rebuild my project with this crate and see if I notice a difference performance-wise.
With crossbeam for example, you can hit roadblocks with std since their channels are MPSC, whereas crossbeam supports MPMC channels (and is faster than std in every meaningful measurement last I checked).
Reading the description it almost seemed too good to be true but if it's indeed objectively better in basically every situation I should probably give it a try.
Rust leaves a lot of improvements to its standard library to the community, so these improvements start off as separate libraries for faster iteration. The most recent example I remember is the hashbrown crate replacing the standard HashMap.
The latency doesn't come from bluetooth radio part itself (there ARE low latency BT headphones after all).
It comes from the fact that all audio is encoded (usually into SBC or AAC or AptX), transmitted and then decoded in the headphones. And each of those steps has buffers. And those buffers are configured by the manufacturer.
The bigger the buffer, the more stable the audio connection - there's less stuttering, less dropouts. But every buffer in the chain adds latency.
So why can't you have both? You sure can. You just need to somehow find headphones and a PC that doesn't add latency to bluetooth. Sadly that's not something that's usually documented in technical specs.
But pretty much all serious gamers use an ethernet connection because wifi is a pain in the ass. In fact, the first thing a support representative for any game will tell you when complaining about excessive lag is to try a wired connection.
I only use wifi where I cannot attach a cable. I will run 15m ethernet cable on an apartment's floor if I have to, in order not to have to use wifi.
There is some patented codec I think that does allow low latency bluetooth streaming (forgot the name) but that's not heavily implemented in my experience.
Wifi is one of the best things you can do to add unreliability and latency.
https://en.wikipedia.org/wiki/Cisco_TelePresence#/media/File...
That way, when you turn your head to the person on each tv, it would seem as if you were actually looking at them.
And you can mitigate that by not using tons of bandwidth in the background while gaming.
https://web.archive.org/web/20200310174634/https://people.xi...
The trick is that our hearing systems are logarithmic (we can't hear a quiet sound next to a loud sound--that's what compression relies on), so they map to floating point numbers better (ie. 16-bit floating point is way more than enough).
24-bits is effectively for recording engineers so they have lots of headroom and don't have to worry about clipping basically at all (6dbm per bit implies about 18dbm of extra headroom which is a LOT).
However, when you calculate non-linear audio effects, you want extra bit depth (generally floating point) because cancellation and multiplication in your intermediate results can really move your noise floor up into bits that humans can actually hear.
The effect of bit depth has little to do with how you perceive the sound; what adding more bits does is allowing for more dynamic range, i.e. more difference between the loudest possible and the quietest possible sound. More bits brings down the noise floor. This means that for example the final part of a fade-out retains more detail at 24 bits than at 16, but this difference is not something that you would be able to observe in normal listening conditions.
If you like to learn more about the effects of bit depth, I would recommend “Digital Show & Tell” by Xiph Mont at https://www.xiph.org/video/.
That really doesn't make any sense. The bit depth provides for a dynamic range, meaning the difference between the loudest and quietest sounds which can be encoded. 16 bits is enough to go from "mosquito in the room" to "jackhammer right in your ear". Congratulation, 24 bits let you go up to "head in the output nozzle of a rocket taking off" with room to spare, that's… not very useful?
Now what might make sense — aside from plain placebo — is a difference in mastering. For instance lots of SACD comparisons at the time were really comparing differences in mastering, with the SACD converted to regular CDDA turning out way superior to the CD version because the mastering of the CD was so much worse.
The "Loudness Wars" is an especially bad period of horrible mastering, and it went from the mid 90s to the early-mid 2010s (which doesn't mean that regular-CD has gone back to "super awesome", just that you're unlikely to have clipping throughout a piece these days).
Dynamic range is not loudest sound / quitest sound ratio (as would one expect), but loudest sound / noise level ratio. Otherwise you would need to count additional bits to encode quietest sound with low enough quantization noise.
Threshold of hearing could be as low as -9 dB SPL, so one wound want noise level below that. Therefore with 96 dB dynamic range from 16 bits the loudest representable sound would be say 86 dB SPL. But symphonic orchestra music may have peaks way above 100 dB.
And some are all about accuracy and measurements.
For instance, I use Sennheiser HD600[0], which I strongly recommend, attached to Topping DX3 Pro (old model)[1], which I cannot recommend, as the v2 model shipping now is garbage[2], a consequence of a redesign to work around high fault rates. Mine is fine as problem units fail within weeks, and I've had it for years.
[0]: https://reference-audio-analyzer.pro/en/report/hp/sennheiser...
[1]: https://www.audiosciencereview.com/forum/index.php?threads/r...
[2]: https://www.audiosciencereview.com/forum/index.php?threads/m...
Also warmth is just a single quality. I have a pair of very accurate “cold” headphones that I prefer for music and a pair of “warm” headphones for electronic music and gaming.
Past the headphones, it is not so much warmth as it is space in the sound for me. My headphone amplifier sounds effortless and that’s the best way I can describe the quality of what I hear.
The characteristic of warmth is related to amplification of certain harmonics as well as equalization in the signal. This is fairly well understood by now.
I don't really know what, if anything, that means. But if we're talking about fidelity, surely the ideal would be no sound signature? If a particular "sound signature" makes it sound "warm", surely it's decreasing the fidelity?
You're saying that I ought to judge the merits of audiophile equipment by the subjective measure of whether I like the sound of it. Which is the metric I said audiophiles would favour.
> If apply "some soft clipping" it will sound bad
Soft clipping often sounds nice, which is why it's very commonly applied to music. You're saying that eg. the sound of a classic Vox amp is bad, which I guess you're free to believe if that's what your ears tell you, but it's certainly not an objective truth.
That might sound good! But it's a less-than-perfect reproduction of the source signal.
If there's a better explanation than what I've come across every time I've search for this, I'm all ears and honestly open to being corrected.
The audiophile world would do well to adopt the concept of double-blind study.
“Listening fatigue” when you know which is which is simply placebo.
When people talk about 24 bits (and >48kHz) in the context of "audiophilia", it's generally about the data at rest and "HD audio" (aka 24 bit music files and downloads). Not about the bit depth of the processing pipeline for which it's generally acknowledged that yes, >16 bit depth does make sense for the audio processing pipeline (as well as the original recording).
Nobody said it would hurt so I’m not sure why you’re pointing out the consensus like it’s some sort of profound statement.
> If your recording is not loud, you lose your dynamic range
If your sound engineer is wasting your dynamic range, maybe get a better sound engineer? And if they manage to fuck up something at the core of their job, there’s no reason they wouldn’t fuck up just as much with 24 bits to waste.
> So it is true in both senses.
In no meaning of “true” and “both” in common use.
Unless this was a double-blind study and the audio levels were exactly the same between runs, this is useless data. Even a 0.1dbSPL difference between runs is noticeable (people gravitate to louder sounds as better).
> every time I switch sound cards to 24 bits
This may be related to the sound card. I use an external DAC, not a soundcard, as most soundcards that come with computers are not up to par.
Changing 16 bits to 24 bits should not change the audio in a way that is discernible to the human ear.
> This may be related to the sound card. I use an external DAC, not a soundcard, as most soundcards that come with computers are not up to par.
For simplicity, I did not talk about them separately, BTW, following your logic there is no point in bying DAC, unless there was a double-blind study, comparing these DACs to cheaper sound-cards. Both are 16-bit/48000, are not they?
> Changing 16 bits to 24 bits should not change the audio in a way that is discernible to the human ear.
This a bold statement, which begs a proof itself.
Only if one doesn't understand what those bits mean or what they correspond to.
These bits are important for quantization, which is the process of converting analog sound into digital numbers. On a graph, X = time and Y = amplitude. The higher the bits, the higher the resolution.
A 16bit recording has 2^16 steps (discrete values) available for amplitude (65,536) and a 24bit recording is 2^24 or 16,777,216 steps.
So why is this important? Well, a 24-bit recording can more finely record differences in amplitude. Given that 1bit = 6dB: a regular 16-bit recording already has a dynamic range of 96dB. A 24-bit recording has a dynamic range of >144db. At ~125-130dB SPL is where hearing loss (permanent) begins.
You do not hear the difference because if you were listening to a 24-bit recording on a 24-bit capable system at sound levels loud enough to actually discern a difference, you would have permanently damaged your ears. Actually, I believe that applies to 20-bit, let alone 24-bit.
So why do 24-bit or higher recordings even exist? They are useful for people mixing and working with the raw audio, before it gets processed down to 16bit audio for distribution. At 24-bit resolution you have a larger amount of headroom before you start clipping, so it's easier to work with considering you have X amount of bits that are just part of the noise floor.
This is also assuming your input files are actually 24-bit to begin with. The vast majority of files are 16-bit because there is literally no point as a consumer to have larger file sizes for no humanly audible benefit.
44.1kHz 16-bit files are all that you need as a human consumer of audio. 48kHz has to do with video and is not better than 44.1kHz because you (a human) cannot hear the difference. 44.1kHz is 22.5kHz x 2. Humans hear sound from 20hz to 20kHz -at best-. This is assuming perfect hearing with no degradation. We sample at 44.1kHz due to the Nyquist-Shannon sampling theorem, and 22kHz gives us just a bit of headroom to apply filters to avoid aliasing. [2]
So I reiterate my initial assumption: flicking a switch to change from 16bit to 24bit should not magically change the quality of audio (in a humanly discernible manner). Assuming the file being played is 24bit lossless audio in the first place.
> BTW, following your logic there is no point in bying DAC
We're talking about dedicated external equipment vs an onboard soundcard+amp which are generally neglected. Not -all- onboard cards suck of course, the Realtek ALC1220 chip on my mobo seems to be comparable or better than entry level DACs from the specs I'm seeing. This is assuming no interference is happening, which is more likely to happen around unshielded electrical components. If you don't believe this is a thing, ask why the audio industry uses thick XLR [shielded AND grounded] cables as standard.
Certain headphones require equipment that can drive them properly, whether it's an onboard soundcard+amp or a DAC+amp. For example, my sennheiser hd600s are 300Ω but some models go up to 600Ω. And yes, the quality of the amp/preamp does make a huge difference.
If one can prove that a component is unable to drive a component, or is sub-par mathematically, one doesn't exactly need double abx trials. Those are for tests like "Monster says their $200 cable is better than <X> standard cable?", or "Is a McIntosh amp better than a $<amount> competitor?".
I don't need to do a double ABX study to realize that beats headphones are drastically worse in performance than sennheiser hd600s: [3], [4], [5]
[0]: https://www.mojo-audio.com/blog/the-24bit-delusion/
[1]: https://web.archive.org/web/20200202124704/https://people.xi...
[2]: https://en.wikipedia.org/wiki/44,100_Hz#Origin
[3]: https://reference-audio-analyzer.pro/en/report/hp/monster-be...
[4]: https://reference-audio-analyzer.pro/en/report/hp/sennheiser...
[5]: https://reference-audio-analyzer.pro/en/report/hp/audio-tech...
It is so believed (although there's a lack of supporting evidence, and knowledge that human hearing has excellent dynamic range), but only as long as the mastering work was well done. 24bit allows for much less destructive human error and is very welcome. Much more so than absurdly high sample rates (96KHz, reproducing sounds up to 48KHz as per Niquist), which are of dubious value.
>my sennheiser hd600s are 300Ω
At some frequencies. At some others, it's more like 600Ω. Impedance is seldom stable across the frequency range in headphones.
Amplifier design should account for this and still provide enough power[0].
Output impedance of headphone jacks should be low enough (1:10 is commonly cited, which means <2Ω in practice as 20-30Ω headphones are very common) relative to the low end of the headphone impedance range, in order to prevent the impairment of frequency response.
>Not -all- onboard cards suck of course
But most do. The design of audio circuitry in motherboards doesn't get that much attention. None of my motherboards have good sound. Flaws vary. Some are lowpassed (greedy anti-aliasing filter). Some are noisy. Most have excessive output impedance (typically more than 6Ω, and at times higher than 15Ω). None can output enough power[0] for hd600 (my favourite pair).