Don't get me wrong, this sort of thing is a valuable exercise and we are better off with better encoders for these older codecs. But look at the numbers for Opus on this benchmark. It simply blows all the AAC encoders out of the water even at 64 kbps.
ndiddy 1 days ago [-]
The biggest advantage for having a good AAC encoder isn't efficiency, it's that for nearly the past 2 decades the de facto standard for live streamed video has been RTMP with H.264 video and AAC audio. There is basically no support for any other codecs. If you want to send a video stream to Youtube or Twitch, you will be sending H.264 and AAC. If you want an idea of how ubiquitous this is, I just checked in OBS and it will not even let you select different video and audio codecs in streaming mode, it just (correctly) assumes that anybody who's streaming will be streaming H.264 and AAC.
jshier 20 hours ago [-]
YouTube actually supports H.265 and VP9 ingest, depending on the streaming protocol. I can actually stream 4K@60 H.265 from my Mac Studio with < 5% CPU usage due to the hardware encoder support in OBS.
yeah I use vp9 with opus for uploads by choice, it's great!
YouTube serves vp9 but it always re-encodes my videos as AV1. Annoying.
(I would upload as AV1 but the encoder is slooooooooow.)
nwallin 5 hours ago [-]
> (I would upload as AV1 but the encoder is slooooooooow.)
If you're using libaom, try switching to libsvtav1. It's still slow, but it's slooow instead of slooooooooow.
booi 24 hours ago [-]
Also the fact that hardware-accelerated AAC and even full AAC offload is ubiquitous in modern-ish hardware. I think my rice cooker can play AAC audio
lesscraft 23 hours ago [-]
No one really offloads AAC, apart from Apple. Opus can be decoded on very cheap microcontrollers entirely in software using the reference library.
AnggaSP 13 hours ago [-]
There absolutely does, Android did with low power audio. They even goes a step further by offloading bluetooth processing into DSP.
I’m not in this space anymore but as of Android 5-6 era aac and bt is offloaded to hexagon dsp on qualcomm device.
ZeroGravitas 10 hours ago [-]
You might be referring to this but on top of hardware decode some Bluetooth setups can send the actual AAC file to the headphones and decode it there.
Traditionally Bluetooth audio meant decoding and reencoding it into a crappier codec before transmission. So it's an efficiency and quality win.
I think some Google Pixel Bud Pro earphones do this for Opus but that is rarer (there's a few other codecs that have been done like this over the years by different manufacturers).
philistine 19 hours ago [-]
On a microcontroller doing nothing else sure. But on a phone, a tablet, a laptop, you absolutely want hardware decode to preserve your battery life.
nulld3v 16 hours ago [-]
That's their point though. Basically no modern phone/laptop/tablet other than Apple offloads audio decoding (of any codec) to hardware. You can check this on Android phones by installing the Codec Info app.
philistine 3 hours ago [-]
Yeah no. All chips in computers, tablets, etc. have hardware decode. Intel chips have hardware decode. AMD, Arm, Raspberry Pi, what have you.
LtdJorge 2 hours ago [-]
I’m pretty sure no x86 chip has hardware decode/encode for audio. Together with dGPUs, they tend to have decoders for JPEG and decoders/encoders for H.264, H.265, AV1 and sometimes VP9.
coldtea 7 hours ago [-]
Snapdragon chips do (used in many/most androids), Samsung own exynos also does iirc.
If the OS/platform doesn't use it that could be another thing, but those chips do offer audio coded decoding, including aac
cogman10 15 hours ago [-]
Audio decode is extremely cheap. It's true that a hardware implementation will be more efficient, but really not a whole lot more.
repelsteeltje 1 days ago [-]
Sample accurate editing is with AAC is a pain though. Especially if you also have video, because frame rates are usually incompatible.
If you want flexibility without fully transcoding both audio and video, Opus is your friend
ErroneousBosh 10 hours ago [-]
Editing with any playback-only format like AAC or H.264/5 is a pain.
Everyone I've seen complaining about slow choppy playback in DaVinci Resolve appears to be using long-GOP codecs which require a massive amount of processing to decode. It's something like playing out two seconds of video to access every single frame.
7 hours ago [-]
ksncksmckwkf 24 hours ago [-]
Opus is your friend as long as the software you’re using supports it—besides, Apple’s AAC-LC can beat out Opus in low bitrates scenarios.
Whether you like it or not, AAC is still the standard.
someonebaggy 18 hours ago [-]
The RTMP protocol comes from Adobe Flash which only supported a limited set of codecs, the only still useful ones being H264 and AAC. Nobody published the needed protocol extension "enhanced RTMP" until 2022 and it still isn't supported widely. RTMP is not a generic container for any codec, like Mastroska - RTMP is tightly coupled to the codec.
CharlesW 1 days ago [-]
Plus, at 96+ kbps (assuming an Apple-quality AAC-LC encoder) Opus loses its quality advantage. So at higher bitrates, the benefit of choosing Opus is that encoders/decoders are royalty-free.
pkulak 21 hours ago [-]
Am I reading that chart wrong? I see Opus ahead across every bitrate.
CharlesW 19 hours ago [-]
The evaluation tools used are helpful for encoder development, but at best they're imperfect proxies for human perception, and their predictions are often inconsistent with the human experience. I assume that statements like "apparently the best AAC encoder" aren't meant to be taken too seriously, since everybody who does this stuff knows that ABX/MUSHRA tests with real humans is what tells the tale.
That paper was published in 2014. The reference Opus encoder has certainly had a number of improvements that affect sound quality since then, whereas very few AAC implementations have.
stefan_ 21 hours ago [-]
I think often of how all it would have taken was a bomb for the 10 or so people that years ago at some browser vendor consortium out of pure self centeredness went „nah lets fragment“. We could have saved many many collective years, electricity and eyeballs simply watching the most basic content.
derf_ 20 hours ago [-]
At one point in I think 2012 three of us who normally all live in different countries were riding in the same car in Australia. We advised the driver to be extra careful (she was dating one of us, so incentives were aligned).
But it is nice to hear that you have been thinking of us, too.
arikrahman 14 hours ago [-]
Took me a second to realize you were talking about the encoder not the model before going into this article
jck86 1 days ago [-]
Choosing a lossy audio codec has become such a no brainer. Either use opus and be done with it or if for some reason opus cannot be used then use aac for compatibility with insane high bitrate for good quality without having to do research on what encoder and mode to pick.
Still having a good quality and default aac encoder is great. Though I don't get why it is mainly CBR.
BoingBoomTschak 23 hours ago [-]
Eh, I prefer Vorbis mostly because it's still competitive at transparent bitrates (esp. with Aotuv patches) and benefits from a much saner volume normalization spec (simply transfer RG 2.0 tags from the FLAC source): Xiph decided to exclude peak information from Opus' spec while adding that weird thing where album gain is stored in the format header and additional track gain in the metadata.
It also uses less battery on my Rockbox'd Clip+.
jck86 23 hours ago [-]
For replaygain purposes simply ignore the spec and use RG 2.0 tags? That works with Opus too and hardly any players support Opus R128 gain anyway. For very low spec devices Vorbis would do a bit better though. For legacy devices legacy codecs can be a better fit indeed.
But would you really store new material encoded in Vorbis just to be able to play it on an old device? Vorbis can sound fine, even at lower bitrates like 128k or 96k, but Opus would sound much better. So perhaps then use Vorbis at higher bitrates like +192k? I prefer Vorbis to Aac but at that bitrate minor intricacies of the container format become more important than the codec because audio quality wise they are near indistinguishable.
ksncksmckwkf 24 hours ago [-]
[flagged]
jck86 23 hours ago [-]
> Falser words hath never been spoken.
Why? Care to explain?
I have fully switched to opus for lossy since I cannot be bothered to find the sweet spot for aac encoders and bitrate. Opus simply is too good and convenient and has been for ages.
What other lossy codec is better and for what reason? Under what circumstances and use cases? I really need put effort in looking for edge cases to not choose opus.
Aac is good too, but way too many choices to make for storing mass material for the long term and be sure the quality is always good enough.
Aachen 19 hours ago [-]
I have the same experience as you and wondered if GP (green account at the time of writing) was trolling. Their other comments seem reasonable though. Based on those, I'm guessing their issue might be compatibility. (Still a flippant and useless comment in isolation, though)
I've not had any issues myself (all players I use supported it already once I learned of Opus' existence about six years ago), but GP doesn't seem to be the only one in this thread. Platforms like youtube don't seem to have an incentive to switch, everyone wants to be compatible with them and they'll re-encode uploads anyhow
unethical_ban 17 hours ago [-]
Since you didn't detail your objection, I assume you're angry at the idea of choosing a lossy codec. Parent comment means that if one is to choose a lossy codec, the choices are simple.
This essentially causes opus to never be used in games or in things in stores that may have issues with specific licenses.
scratcheee 23 hours ago [-]
That’s going a bit far. I’m in the games industry and have used opus regularly, it’s a great codec for games, often the hardware decoding is so restricted that we’re using software regardless so we might as well use something like opus.
The licensing restriction is unfortunate, but only restrictive for those with very specific goals, under normal conditions BSD is a wonderful license for game devs since you’re free to use the code and only have to add an acknowledgement somewhere.
I suppose a public domain game might hit the same limitation, though as a non-lawyer I would guess the chance of anyone with standing trying to sue anyone implementing from this spec is realistically zero (though I don’t fault stb for being unwilling to roll those dice!)
duskwuff 21 hours ago [-]
> under normal conditions BSD is a wonderful license for game devs since you’re free to use the code and only have to add an acknowledgement somewhere.
And it's not as though libopus is an outlier in using a BSD license. A lot of other commonly used libraries have similar licenses; a few examples that come to mind which are likely to show up in games are zlib, curl, Lua, and SDL.
chaosharmonic 20 hours ago [-]
libopus isn't even an outlier in using it for a media format specifically. See: everything coming out of the Alliance for Open Media
a1o 7 hours ago [-]
The game doesn’t have to be publicly licensed, the issue is for a library used by this game - or their engine. This remark is what blocks anyone from Valve making their own opus compatible library to use on their engines and supported libraries from what I could tell.
upofadown 2 hours ago [-]
The linked article makes the argument that looking at the BSD licensed example code in the RFC that defines Opus would mean that code written based on that understanding would be a derivative work and would have to be BSD licensed. This seems to have something to do with the fact that "clean-room design"[1] is a thing. But as the Wikipedia article points out:
>Clean-room design is usually employed as best practice, but not strictly required by law.
As the article points out, if this was actually true then we could change the licensing on code examples found in RFCs to fix the issue, but there doesn't seem to be any actual issue here. Imagine a world where simply reading some code caused licensing issues...
This essay says it's not possible to make a public-domain implementation of Opus. But it could be released under BSD (as libopus is), which is fine for games, as evidenced by the Licenses section of the credits in many games.
ack_complete 23 hours ago [-]
Most games use the sound support that comes with their game engine or choice of sound system, so I don't think the lack of an STB version is an issue. Performance is more of a problem. Audiokinetic, the makers of the popular Wwise audio system, estimate that Opus takes ~3-5x the CPU of Vorbis:
He's definitely being way over pedantic. Reading the law like HN programmers imagine it works, rather than how it actually works.
The intent of the legal language in that spec is pretty clearly that you have to use the BSD license if you copy that code, but if you merely read it to understand the spec then you don't.
account42 6 hours ago [-]
Opus is used in games.
sbseitz 19 hours ago [-]
Most of my collection is Opus 256K, the only downside is support. A lot of tools like Bliss/Roon don't support it :(
skydhash 1 days ago [-]
I would like Opus, but I’m using a subsonic client on iOS and my choice has been Flac (Alac?), MP3, or AAC. Opus wouldn’t play (There are some that supported it, but I didn’t like their UX).
lutoma 6 hours ago [-]
You can give Arpeggi a try. It’s still in beta and Testflight only, but already (imo) by far the best iOS app for Navidrome/subsonic servers. It also supports Opus playback on current iOS versions (since Apple added native support for the codec).
I read almost all the way through your comment thinking there was a decent probability you were saying this new AAC encoder was written with Claude Opus.
theandrewbailey 24 hours ago [-]
I've never been AI guy, and have more fascination with audio. I've long stopped being excited when I read "Opus" on HN. It's refreshing when it turns out to be the audio codec.
Aachen 19 hours ago [-]
To be fair, Opus was never a great name. I always feel the need to specify further when using it outside of a clear context of music codecs (also way before Claude was announced). Love it in every other way though
pezezin 17 hours ago [-]
No, he was talking about the Opus Dei because this code quality can only be reached by God himself /jk
numlock86 8 hours ago [-]
Opus, the codec, has been a thing looong time before Claude.
divan 7 hours ago [-]
why is this downvoted? for people who aren't in audio codec dev space, parent comment reads exactly as 'Opus 4.8 rewrote the codec and blew out all competitors'
spider-mario 5 hours ago [-]
> for people who aren't in audio codec dev space
You don’t need to be in the audio codec dev space to have heard of one of the most widespread audio codecs of the last decade (used by YouTube, WhatsApp, SoundCloud and added to WebM in 2013).
divan 4 hours ago [-]
Sure, you could've heard about it outside of that space. Yet, most likely, you have not.
It just feels wrong when HN commenters downvote comments that try to clear up legitimate confusion.
subarctic 8 hours ago [-]
Definitely thought you meant claude opus but now from reading a couple other comments it sounds like you mean something else called opus?
Nice, I'm looking forward to seeing how this performs in practice. FFmpeg's previous AAC encoder produced poor quality output and often had irritating chirping artifacts, so I've always had to install Apple's Core Audio encoder on any computer I do video recording on to get decent sound. I've done A/B/X comparisons and found that a 320kbps MP3 sounds better than a 320kbps AAC encoded by FFmpeg, but about the same as a 256kbps AAC encoded by Core Audio. If installing Core Audio is no longer necessary, that'll be a huge improvement and people who use something like OBS to do screen recordings or streaming will get a massive sound quality boost the next time they update.
madars 24 hours ago [-]
A useful project related to Apple's Core Audio is qaac - it wraps iTunes Windows DLL's in a standalone encoding tool with a CLI interface. I believe it even works under Wine on Linux: https://web.archive.org/web/20250814194428/https://www.andre... So you don't need a Mac or even a full iTunes installation to get high quality AAC encoding.
winstonwinston 23 hours ago [-]
I was using FDK AAC encoder, I didn’t know Apple encoder was available for systems other than Apple. Though I have once compared AAC FDK to Apple AAC at 192kbps, and couldn’t tell the difference, while the old FFmpeg AAC encoder fall apart at this bitrate.
ndiddy 16 hours ago [-]
It gets installed when you install iTunes. If you don't want to install iTunes, you can pull out the codec installer by opening an old version of the iTunes installer in 7-zip and extracting the MSI. Here's a copy I keep around for whenever I have to do a screen recording on a new computer, it's signed by Apple so you don't have to trust me. https://www.infochunk.com/obs/AppleApplicationSupport64.msi
kderbe 24 hours ago [-]
In the Hydrogenaudio discussion thread's metrics table, the new encoder scores better than Core Audio. But this is at constant bitrate (CBR) [edit: maybe not? see lesscraft's reply below]. Core Audio also has variable bitrate modes (TVBR) which the new encoder lacks.
So maybe Core Audio will continue to be the best when TVBR is available, but I'm hopeful the new FFmpeg encoder will be "good enough", especially if more folks find and contribute problem samples to help tune it.
lesscraft 23 hours ago [-]
The benchmarks were made using afconvert on OSX with the default VBR settings.
repelsteeltje 1 days ago [-]
Why not use a lossless codec if you care about quality? Or use Opus, descent for specht and works pretty much anywhere these days.
Fnoord 4 hours ago [-]
Because almost all people cannot hear the difference between a high quality lossy codec versus lossless in a double blind test. They think they do, but they don't.
CharlesW 24 hours ago [-]
> Why not use a lossless codec if you care about quality?
(1) Lossy codecs are transparent at half the file size (or less) of FLAC/ALAC.
(2) AAC (strictly, AAC-LC) is universal, where FLAC and Opus are not yet there.
ksncksmckwkf 24 hours ago [-]
You can care about quality to the extent that a lossy codec allows. Lossless is not always necessary or wanted. This is like saying “why care about transcoding quality when you can keep the video as is?”. There’s a myriad of use cases and preferences at play here.
cosmic_cheese 24 hours ago [-]
There are a ton of older, but still perfectly usable devices that support AAC well but not Opus.
moniosi 21 hours ago [-]
i will never understand apples cuckoldry for proprietary codecs, if it wasn't for their adoption of h265 we would live in the av1 utopia
kasabali 9 hours ago [-]
because H265 predates AV1 by 5 years. H265’s contemporary was VP9, which was honestly worse than H264 done with a good encoder like x264.
HugoTea 1 days ago [-]
>FFmpeg's AAC DEcoder is busted with regards to stereo PNS, and the bug may be in other AAC decoders too, so we work around it in the encoder. Since no other encoder used PNS, the bug was not found until now.
I don't know what PNS is, but I bet this has been bothering someone's niche use-case for 20 years
lesscraft 23 hours ago [-]
The issue was twofold, on one hand, using TNS on top of PNS meant the noise that got inserted was shaped by TNS, which is nonsense since the decoder generated the noise, not the encoder. This made PNS explode.
The second, biggest issue was that using PNS in combination with any stereo tools resulted in noise leaking in both channels equally, ruining stereo imaging. So the best and only thing to do was to enable PNS only if the band in both channels is noise (or is sufficiently non-tonal and masked).
Hah, this sounds like the audio equivalent of Netflix’s grain reconstruction.
BoingBoomTschak 23 hours ago [-]
Netflix's or AV1's FGS?
dcrazy 21 hours ago [-]
Netflix developed it as a member of AOM.
mondainx 4 hours ago [-]
This is a great update with a clear break-down with lots of detail; bravo lynne! For naysayers Opus is great and has its place, but AAC isn't going anywhere.
superzazu 1 days ago [-]
> The encoder was mainly optimized for 48Khz audio. Get over it. It's 2026, resampling is free, 48Khz is the standard. 44.1Khz will work, and so will 96Khz but use 48Khz if you want the best quality.
Is 48kHz really the standard nowadays?
Joeboy 24 hours ago [-]
I think the closest thing to an actual "standard" is AES5-2018, "Recommended practice for professional digital audio".
Abstract:
> A sampling frequency of 48 kHz is recommended for the origination, processing, and interchange of audio programs employing pulse-code modulation. Recognition is also given to the use of a 44.1-kHz sampling frequency related to certain consumer digital applications, the use of a 32-kHz sampling frequency for transmission-related applications, and the use of a 96-kHz sampling frequency for applications requiring a higher bandwidth or more relaxed anti-alias filtering. This revision further quantifies the preferred choices for higher sampling frequencies.
Edit: From my personal perspective, 44.1kHz is a legacy minor annoyance
legdoge 1 days ago [-]
AAC has a strange quirk that the window size is dependent on the sampling rate, thus requiring a complete psychoacoustics reoptimization of all encoder parameters for each sampling rate, since a 20msec window sounds very different than a 60msec window, to human ears.
This was of course fixed in Opus.
spider-mario 5 hours ago [-]
By just always using 48 kHz, from what I recall?
pipo234 1 days ago [-]
48kHz makes alignment between video and audio so much easier. (I.e.: Lip synchronization after edits)
asveikau 1 days ago [-]
I know the opus codec assumes everything is 48kHz and will resample inputs to that.
lesscraft 23 hours ago [-]
Pretty much all DACs run at 48Khz by default due to operating systems picking it as a sane default.
bpye 22 hours ago [-]
Pipewire will quite happily pipe through audio without resampling if it is the only source on a system. You can see this by running pw-top and using speaker-test with various sample rates.
chronogram 8 hours ago [-]
Even if it reports 44.1 it resamples internally to 48 in all hardware I've seen.
atoav 1 days ago [-]
More or less. Streaming is often done with 48, video content has ben 48 for a while now, so unless you still produce content for CDs it is the standard.
44100 Hz had reasons no longer really needed (storing audio in 3 samples per line in VHS: 490 lines × 3 samples × 30 GPS = 44100 sample/s).
Qualitywise both are more than enough snd 99.99% of people would not be able to tell it apart in a blind test. Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
Aside from this higher than 48 kHz sample rates may have only downsides, like increased size and potential distortion in the ultrasonic frequency range that has sidebands in the audible range. Yet there is a persistent, but unscientific "more-is-better"-crowd in the HiFi-sector.
someonebaggy 18 hours ago [-]
VHS doesn't store audio in samples nor does it have 490 lines or 30 G(?)PS. NTSC uses 525 lines per frame and PAL uses 625, both with interlacing at 60 fields per second. The VHS system is analog for audio and video, though analog video has discrete lines, and VHS records discrete stripes on the tape which should be one field each.
44100 was chosen for CD, as 20kHz upper limit of human hearing, doubled for Nyquist theorem, plus a 10% guard band so that anti-aliasing filters don't have to be made of magical fairy dust, plus a bit (maybe to make it relatively prime with something else in the system).
The first digital audio systems encoded the audio as a black-and-white video signal on video tapes. 44100 HZ was selected at it was the highest sampling rate achievable on both NTSC and PAL video tapes.
duped 1 days ago [-]
> Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
There are numerous use cases for higher sample rates that go beyond this but it's hard to talk about it without starting flame wars filled with junk science.
zamadatix 1 days ago [-]
Say it or don't but "I have evidence otherwise but don't think I should say" is just as bad a flame war gateway as tempting the junk science audiophiles directly.
duped 1 days ago [-]
Higher sample rates are lower latency for the same block size and resampling is not "free" (pick 2: performance, aliasing, latency) so there can be advantages to working with audio archived at higher sample rates.
But all the advantages come down to professional or editing use cases. There's next to zero advantage to using it as a storage format for listening. Just like 24 bit audio (do you have an amp with 96dB SNR?).
Just personally, I have seen little evidence (personally, professionally, or academically) that there is any advantage for lossless audio for consumer applications. For professional applications there are plenty, and it's endlessly tiring to convince people that "no, actually I need 96kHz for my use case."
Where the audiophiles have _some_ argument here is the design of reconstruction filters which I've heard alleged can perform better in the audible frequency range if the stop band is outside of it. But I have never personally tested this, nor cared enough to. But the theory is sound.
Whether or not it's perceptible depends on what you're measuring, though. In theory, there should be perceptual differences in sound localization if your DAC's reconstruction filter is at 24kHz vs 48kHz since it will change the group delay in a critical frequency region, where you'll get sound at >~2kHz arriving later at the lower sample rate. I think it would be extremely hard to test this though, because humans are really shitty at sound localization to begin with, and practically speaking most recorded material is processed to shit in that frequency range to intentionally decorrelate the channels for the perception of "width."
amluto 21 hours ago [-]
> Higher sample rates are lower latency for the same block size
This a truly bizarre statement. On the one hand, of course higher sampling rates are lower latency for the same block size measured in samples. But all sampling rates have (almost [0]) identical latency for the same block size measured in time and lower sampling rates allow less computation for those shorter blocks.
[0] If you are concerned about needing to know future samples in order to calculate the actual signal amplitude at a time between samples, then (a) this matters less at higher sampling rates and (b) this is at most a small number of samples and we're talking about block sizes that presumably exceed, say, 5, so this isn't really a big deal.
duped 19 hours ago [-]
The unit of a block size is samples (frames, technically), not seconds. When configuring audio devices for playback you tune both sample rate and block size for latency. It used to be far more common to tune sample rate than block size alone for tracking. This is getting into the weeds of actual devices though.
Also to your point, this is why compliant peak meters use a mandatory 4x upsampling at 48k.
Sesse__ 11 hours ago [-]
> Also to your point, this is why compliant peak meters use a mandatory 4x upsampling at 48k.
This isn't due to latency, it's because the true peak (in the analog waveform) could be between samples.
duped 4 hours ago [-]
I didn't say it did?
toast0 21 hours ago [-]
> Just personally, I have seen little evidence (personally, professionally, or academically) that there is any advantage for lossless audio for consumer applications
I think the advantage of lossless audio is for archival: rip once, archive as lossless; then you can reencode your library with the latest and greatest lossy encoders over time, or just use the lossless if your player can manage it, cpu and storage is less of a limiting factor for players than 20 years ago.
I don't know how many people are actually managing their libraries these days though, so I dunno if makes a huge difference.
duped 16 hours ago [-]
I wouldn't call archiving a consumer application but I understand the point. Really it gets back to the word: fidelity. Some say it means "truth" but really it's latin for faithful or in the context of audio, perceptually identical (a faithful representation). Even among highly trained and skilled listeners, lossy codecs are faithful and imperceptible.
spider-mario 5 hours ago [-]
It’s maybe not a universal consumer application, but it doesn’t seem that outlandish to imagine that some of them might want to protect their personal collection. Or am I that extraordinarily attached to mine?
Dylan16807 21 hours ago [-]
> Higher sample rates are lower latency for the same block size
And if your goal is latency, it makes far more sense to change the block size rather than the sample rate.
> But all the advantages come down to professional or editing use cases.
That sounds about right.
jpc0 12 hours ago [-]
Group delay is a poor argument.
Unless you also have a pretty decent monitoring system the group delay of the speakers isn't going to be consistent so the filters before them wouldn't matter all that much...
Even in that case I would have a hard time believing that any human in a blind test would be able to perceive a group delay of even 360deg above 2k...
You are talking about sub milliseconds differces in the time frequency content arrives at the ears, just tiling your head slightly will have a greater impact...
skydhash 1 days ago [-]
I know that with oscilloscopes, it’s recommended to use 5x instead of nuquist 2x of the highest frequency you want to use., but the most reasonable argument I’ve heard for higher than 48kHz sampling is digital audio effects.
But for the end result 48kHz is more than necessary. I can’t even hear any frequency above 17kHz.
Aurornis 20 hours ago [-]
> I know that with oscilloscopes, it’s recommended to use 5x instead of nuquist 2x of the highest frequency you want to use.
For capturing analog signals, 2.5X is enough headroom.
The 5X recommendation is probably for digital signals where the frequency refers to the baud rate, not the highest frequency coming through. A fast switching digital signal will have components with higher bandwidth than the fundamental. Using a higher multiple of samples (assuming the bandwidth is there) will let you see the shape of the waveform and rise and fall times better.
dcrazy 24 hours ago [-]
Yes, bit depth headroom is very useful for audio production to avoid aliasing. Pro DAWs support 96KHz.
adgjlsfhk1 21 hours ago [-]
yeah for real time signals higher frequency makes sense (very briefly before you fft and kill the high frequencies), but for stored signals nyquist is king.
atoav 13 hours ago [-]
> But for the end result 48kHz is more than necessary. I can’t even hear any frequency above 17kHz
And even if you could, would the frequencies that all humans lose with age really be all that essential for the enjoyment of music? We are talking about frequencies most instruments won't even produce unless severely abused.
For some reasons in audiophile-land the magic is always in some elusive outer realms and never right there where the important stuff happens. They spend a fortune on speaker cables, while often not giving a second thought on room acoustics beyond the cosmetic. The magic sparkle is all the way in the ultrasonic, while their listening spaces have deep nulls in the mid-range due to comb filtering from reflective surfaces caused by a lack of acoustic treatment.
I love music (enough to have mixed it for a living) and to me it is very clear how the priorities are ordered when it comes to audio fidelity:
1. Room Acoustics
2. Speakers
3. Electronics & Digital
Going from the back: Assuming you don't get the cheapest of the cheapest and don't abuse the gear by making it do things it wasn't build for electronics and digital audio nowadays is transparent. That means, it essentially sounds the same if operated within spec. Even a 0,50 € IC will have distortion figures so staggeringly low it is below human perception and equipment is getting better still. A decent opamp can have distortion figures like 0.005 % THD with a linear frequency response all the way up to radio frequencies. There can be challenges with driving very weird speakers or headphones, but if you hsve the right combination of gear it doesn't have to be expensive to be indistinguishably good in it's audio performance.
This means speakers are way more important thsn the electronics before it. Their distortion numbers are multiple magnitudes higher (in the ball park of 3% THD), their frequency response is inherently problematic (often many dBs up and down even in expensive speakers), they will hsve different beaming characteristics st different frequencies, small speakers lack bass, placement is essential, etc. So getting good speakers is important.
But all of this is dwarfed by the impacts acoustics. The position of the speakers alone makes a huge difference. The impact of an acoustically untreated space is severe: you can get a completely smeared time response with deep nulls of 20dB and more while other frequencies are highly resonant. Even a budget speaker won't have problems of that magnitude.
So get some ok electronics, even more ok speakers, but invest the bulk of the money/time into the setup of the room itself.
Many adiophiles have that priority list reversed. Room acoustics suck. You need to measure a lot, add ugly absorbers in inconvenient places, can't place speakers where they look nice and conserve space, but need to place them where they work well acoustically, there is no ideal solution and everything is a compromise. So buying a gold plated HDMI cable and imagining the improvement appears to be better. Only that you might be doing it in a room where a positional difference of a few centimeters changes the frequency response of the listening position massively.
xuhu 1 days ago [-]
For one, audio transcription services that use Whisper will sample the input down to 16Khz mono first.
1 days ago [-]
daneel_w 24 hours ago [-]
Yes and no. It is the standard for audio in film, which explains the author's focus. But is the audio CD bigger and more "standarder" than DVD and Blu-Ray? I think they're equals, and I personally think this encoder only makes sense for video content. Given all the caveats the author mentions (in particular about the sample rate) I would steer clear from using it when ripping CDs.
izacus 1 days ago [-]
Yes, pretty much all new hardware uses it as default output setting as well (by that I mean laptops, phones, smart speakers, etc.)
1 days ago [-]
TheChaplain 1 days ago [-]
48kHz has been the recommended setting with Premiere Pro as long as I can remember.
44.1kHz, isn't that what lameMP3 uses as default?
williadc 1 days ago [-]
It's what CDs use, so it would make sense for mp3 encoders to follow suit.
pseudosavant 17 hours ago [-]
I applaud a new/better FFMPEG AAC encoder, but there are two pretty massive caveats that are mentioned in the specifics that need to be called out:
- CBR only
- Only optimized for 48khz sampling
Not being able to do quality-based variable bitrate encoding is a major gap, and since all of the CD audio in the world is at 44.1k sampling, that seems like a huge miss too.
ezoe 3 hours ago [-]
Why do you need VBR for audio encoding? VBR audio encoding sounds horrible and it can't save much of bitrate anyway.
lesscraft 16 hours ago [-]
You can use -q:a, for "true" VBR, but its metrics are a few percent (imperceptable, we still win) less.
"The benchmarks I posted were done mainly on 44.1Khz. I tuned by ear on 48Khz data though, so some of the windowing/transient logic is tied to 48Khz. It translated to 44.1Khz well enough that I left it as-is, since the timing difference isn't that large."
JSR_FDED 23 hours ago [-]
It’s fascinating so much of this comes down to the developer’s own ears - disturbing and quite cool at the same time how subjective this is
ant6n 18 hours ago [-]
The table and comparison uses “Google's new Zimtohrli, ViSQOL, and my own hearing”
MrBuddyCasino 11 hours ago [-]
In audio, this is usually the case. Musepack was niche-popular for some time as a simple, but very well tuned codec.
Its the same with speakers and headphones. People think its the component quality, but it’s mostly competency in overall audio physics and the ability to tune well.
sneezychl 1 days ago [-]
A very welcomed addition, hopefully I can replace fdk-aac
ant6n 20 hours ago [-]
This is truly a representative of the old internet: somebody codes up the best AAC encoder ever, and the first response comes from some admin, and it's some bickering about 48Khz vs 44Khz.
SideQuark 17 hours ago [-]
It’s not that cynical. The author didn’t test on the most common rate in use, so it would be ludicrous for any serious project to wholesale replace a decades old working pipeline with it. It makes perfect sense to wait till due diligence is done.
ant6n 11 hours ago [-]
It's not cynical. It's dismissive. Especially given that these codecs work in the frequency domain anyway.
>>...use 48Khz if you want the best quality.
>Yet most of the worlds audio is 44KHz...
functionmouse 20 hours ago [-]
Last time I used ffmpeg to encode songs for my iPod nano they were broken; playback was interrupted by pops and clicks every few seconds. I wonder if this is fixed now?
esafak 22 hours ago [-]
HA, a blast from the past, when audio encoders were making strides and collecting mp3s was a thing. Same for video encoders.
amluto 21 hours ago [-]
It was kind of fun being able to easily distinguish 128kbps MP3 from the source audio. (Some early encoders were really bad.)
MrBuddyCasino 9 hours ago [-]
These kinds of forums are some of the best parts of the internet. Many still exist, but they're slowly shrinking unfortunately.
ximdotro 20 hours ago [-]
Nice, I can’t wait to see how this turns out in practice.
refulgentis 24 hours ago [-]
Older I get, more it seems it’s possible to ping pong between rewrites for good reasons (ex. here, metric maxes but I find it hard to believe VBR and not-48 kHz are silly things and not worth investing it)
binaryturtle 20 hours ago [-]
I always encode my AAC with VBR. Why wouldn't you, right? I guess I'll stick to apple or fdkaac for now.
timcobb 16 hours ago [-]
Why do you record AAC?
Marsymars 13 hours ago [-]
It’s better than mp3, and my car supports it from a usb stick.
thomasnowhere 20 hours ago [-]
[dead]
thisislife2 1 days ago [-]
Flagged for the wrong link.
defrost 1 days ago [-]
Hopefully they see this - there's still time to edit the submission link.
ledoge 1 days ago [-]
It doesn't let me edit the link, but I'm confused by what even happened here... I posted this from my phone and that wrong link doesn't show up in my clipboard history.
Our software follows redirs and somehow we got a 302 to our own IP. Perhaps it is someone's idea of a bot detector?
Aachen 18 hours ago [-]
Unrelated: Hey, I sent hn@ycombinator.com two emails. One was May 6th, the other June 18th (UTC+2). The former's subject is "Broken prev/next links sometimes". In the latter, I've asked to let me know if it arrived. It didn't bounce so your email server has acknowledged receipt, but based on fast responses to previous emails and someone else mentioning randomly that you responded quickly to theirs in iirc early June, I'm starting to assume you're not seeing mine. I don't know how else to reach out than via an off-topic comment or a dummy submission or so. Is there a fallback mechanism to use when your email doesn't?
defrost 15 hours ago [-]
Well, they don't read all the emails -or- respond to @dang, @mod, etc.
Your approaches are, ride a comment (as you did, 9 hours, no response, likely didn't read) or lean on a frequent flyer (the only privilege I have is increased ability to [dead] obvious spam not caught by filters - but it gets my emails seen)
I sent them an email in past minute - Good Luck! (YMMV)
defrost 1 days ago [-]
Your options are:
* quick email to HN@ycombinator.com with a "Help Me please!! and link ( mods can edit link in and sideline (hide) these comments )
* Just live with the rotting fish head of public boo boo (we've all made mistakes, as the Dalek said whilst climbing down off the dustbin)
* I can kill the whole thing dead.
Rendered at 18:07:44 GMT+0000 (Coordinated Universal Time) with Vercel.
Don't get me wrong, this sort of thing is a valuable exercise and we are better off with better encoders for these older codecs. But look at the numbers for Opus on this benchmark. It simply blows all the AAC encoders out of the water even at 64 kbps.
https://developers.google.com/youtube/v3/live/guides/ingesti...
YouTube serves vp9 but it always re-encodes my videos as AV1. Annoying.
(I would upload as AV1 but the encoder is slooooooooow.)
If you're using libaom, try switching to libsvtav1. It's still slow, but it's slooow instead of slooooooooow.
I’m not in this space anymore but as of Android 5-6 era aac and bt is offloaded to hexagon dsp on qualcomm device.
Traditionally Bluetooth audio meant decoding and reencoding it into a crappier codec before transmission. So it's an efficiency and quality win.
I think some Google Pixel Bud Pro earphones do this for Opus but that is rarer (there's a few other codecs that have been done like this over the years by different manufacturers).
If the OS/platform doesn't use it that could be another thing, but those chips do offer audio coded decoding, including aac
If you want flexibility without fully transcoding both audio and video, Opus is your friend
Everyone I've seen complaining about slow choppy playback in DaVinci Resolve appears to be using long-GOP codecs which require a massive amount of processing to decode. It's something like playing out two seconds of video to access every single frame.
Whether you like it or not, AAC is still the standard.
On Opus vs. AAC specifically, there's a long history of studies like https://www.researchgate.net/publication/301428302_Perceived... to help answer that question. (There are interesting charts at the top of page 1175.)
But it is nice to hear that you have been thinking of us, too.
Still having a good quality and default aac encoder is great. Though I don't get why it is mainly CBR.
It also uses less battery on my Rockbox'd Clip+.
But would you really store new material encoded in Vorbis just to be able to play it on an old device? Vorbis can sound fine, even at lower bitrates like 128k or 96k, but Opus would sound much better. So perhaps then use Vorbis at higher bitrates like +192k? I prefer Vorbis to Aac but at that bitrate minor intricacies of the container format become more important than the codec because audio quality wise they are near indistinguishable.
Why? Care to explain?
I have fully switched to opus for lossy since I cannot be bothered to find the sweet spot for aac encoders and bitrate. Opus simply is too good and convenient and has been for ages.
What other lossy codec is better and for what reason? Under what circumstances and use cases? I really need put effort in looking for edge cases to not choose opus.
Aac is good too, but way too many choices to make for storing mass material for the long term and be sure the quality is always good enough.
I've not had any issues myself (all players I use supported it already once I learned of Opus' existence about six years ago), but GP doesn't seem to be the only one in this thread. Platforms like youtube don't seem to have an incentive to switch, everyone wants to be compatible with them and they'll re-encode uploads anyhow
https://nothings.org/stb/stb_opus.html
This essentially causes opus to never be used in games or in things in stores that may have issues with specific licenses.
The licensing restriction is unfortunate, but only restrictive for those with very specific goals, under normal conditions BSD is a wonderful license for game devs since you’re free to use the code and only have to add an acknowledgement somewhere.
I suppose a public domain game might hit the same limitation, though as a non-lawyer I would guess the chance of anyone with standing trying to sue anyone implementing from this spec is realistically zero (though I don’t fault stb for being unwilling to roll those dice!)
And it's not as though libopus is an outlier in using a BSD license. A lot of other commonly used libraries have similar licenses; a few examples that come to mind which are likely to show up in games are zlib, curl, Lua, and SDL.
>Clean-room design is usually employed as best practice, but not strictly required by law.
As the article points out, if this was actually true then we could change the licensing on code examples found in RFCs to fix the issue, but there doesn't seem to be any actual issue here. Imagine a world where simply reading some code caused licensing issues...
[1] https://en.wikipedia.org/wiki/Clean-room_design
https://www.audiokinetic.com/en/community/blog/a-guide-for-c...
The intent of the legal language in that spec is pretty clearly that you have to use the BSD license if you copy that code, but if you merely read it to understand the spec then you don't.
I take it you mean this Opus (https://en.wikipedia.org/wiki/Opus_(audio_format)) not that Opus (https://en.wikipedia.org/wiki/Claude_(AI)).
I read almost all the way through your comment thinking there was a decent probability you were saying this new AAC encoder was written with Claude Opus.
You don’t need to be in the audio codec dev space to have heard of one of the most widespread audio codecs of the last decade (used by YouTube, WhatsApp, SoundCloud and added to WebM in 2013).
It just feels wrong when HN commenters downvote comments that try to clear up legitimate confusion.
Been around a lot longer than Claude Opus.
So maybe Core Audio will continue to be the best when TVBR is available, but I'm hopeful the new FFmpeg encoder will be "good enough", especially if more folks find and contribute problem samples to help tune it.
(1) Lossy codecs are transparent at half the file size (or less) of FLAC/ALAC.
(2) AAC (strictly, AAC-LC) is universal, where FLAC and Opus are not yet there.
I don't know what PNS is, but I bet this has been bothering someone's niche use-case for 20 years
Is 48kHz really the standard nowadays?
Abstract:
> A sampling frequency of 48 kHz is recommended for the origination, processing, and interchange of audio programs employing pulse-code modulation. Recognition is also given to the use of a 44.1-kHz sampling frequency related to certain consumer digital applications, the use of a 32-kHz sampling frequency for transmission-related applications, and the use of a 96-kHz sampling frequency for applications requiring a higher bandwidth or more relaxed anti-alias filtering. This revision further quantifies the preferred choices for higher sampling frequencies.
Edit: From my personal perspective, 44.1kHz is a legacy minor annoyance
This was of course fixed in Opus.
44100 Hz had reasons no longer really needed (storing audio in 3 samples per line in VHS: 490 lines × 3 samples × 30 GPS = 44100 sample/s).
Qualitywise both are more than enough snd 99.99% of people would not be able to tell it apart in a blind test. Higher sample rates than 48kHz only needed when you want to pitch down ultrasonic recordings (of whales, bats and other such animals for example).
Aside from this higher than 48 kHz sample rates may have only downsides, like increased size and potential distortion in the ultrasonic frequency range that has sidebands in the audible range. Yet there is a persistent, but unscientific "more-is-better"-crowd in the HiFi-sector.
44100 was chosen for CD, as 20kHz upper limit of human hearing, doubled for Nyquist theorem, plus a 10% guard band so that anti-aliasing filters don't have to be made of magical fairy dust, plus a bit (maybe to make it relatively prime with something else in the system).
The first digital audio systems encoded the audio as a black-and-white video signal on video tapes. 44100 HZ was selected at it was the highest sampling rate achievable on both NTSC and PAL video tapes.
There are numerous use cases for higher sample rates that go beyond this but it's hard to talk about it without starting flame wars filled with junk science.
But all the advantages come down to professional or editing use cases. There's next to zero advantage to using it as a storage format for listening. Just like 24 bit audio (do you have an amp with 96dB SNR?).
Just personally, I have seen little evidence (personally, professionally, or academically) that there is any advantage for lossless audio for consumer applications. For professional applications there are plenty, and it's endlessly tiring to convince people that "no, actually I need 96kHz for my use case."
Where the audiophiles have _some_ argument here is the design of reconstruction filters which I've heard alleged can perform better in the audible frequency range if the stop band is outside of it. But I have never personally tested this, nor cared enough to. But the theory is sound.
Whether or not it's perceptible depends on what you're measuring, though. In theory, there should be perceptual differences in sound localization if your DAC's reconstruction filter is at 24kHz vs 48kHz since it will change the group delay in a critical frequency region, where you'll get sound at >~2kHz arriving later at the lower sample rate. I think it would be extremely hard to test this though, because humans are really shitty at sound localization to begin with, and practically speaking most recorded material is processed to shit in that frequency range to intentionally decorrelate the channels for the perception of "width."
This a truly bizarre statement. On the one hand, of course higher sampling rates are lower latency for the same block size measured in samples. But all sampling rates have (almost [0]) identical latency for the same block size measured in time and lower sampling rates allow less computation for those shorter blocks.
[0] If you are concerned about needing to know future samples in order to calculate the actual signal amplitude at a time between samples, then (a) this matters less at higher sampling rates and (b) this is at most a small number of samples and we're talking about block sizes that presumably exceed, say, 5, so this isn't really a big deal.
Also to your point, this is why compliant peak meters use a mandatory 4x upsampling at 48k.
This isn't due to latency, it's because the true peak (in the analog waveform) could be between samples.
I think the advantage of lossless audio is for archival: rip once, archive as lossless; then you can reencode your library with the latest and greatest lossy encoders over time, or just use the lossless if your player can manage it, cpu and storage is less of a limiting factor for players than 20 years ago.
I don't know how many people are actually managing their libraries these days though, so I dunno if makes a huge difference.
And if your goal is latency, it makes far more sense to change the block size rather than the sample rate.
> But all the advantages come down to professional or editing use cases.
That sounds about right.
Unless you also have a pretty decent monitoring system the group delay of the speakers isn't going to be consistent so the filters before them wouldn't matter all that much...
Even in that case I would have a hard time believing that any human in a blind test would be able to perceive a group delay of even 360deg above 2k...
You are talking about sub milliseconds differces in the time frequency content arrives at the ears, just tiling your head slightly will have a greater impact...
But for the end result 48kHz is more than necessary. I can’t even hear any frequency above 17kHz.
For capturing analog signals, 2.5X is enough headroom.
The 5X recommendation is probably for digital signals where the frequency refers to the baud rate, not the highest frequency coming through. A fast switching digital signal will have components with higher bandwidth than the fundamental. Using a higher multiple of samples (assuming the bandwidth is there) will let you see the shape of the waveform and rise and fall times better.
And even if you could, would the frequencies that all humans lose with age really be all that essential for the enjoyment of music? We are talking about frequencies most instruments won't even produce unless severely abused.
For some reasons in audiophile-land the magic is always in some elusive outer realms and never right there where the important stuff happens. They spend a fortune on speaker cables, while often not giving a second thought on room acoustics beyond the cosmetic. The magic sparkle is all the way in the ultrasonic, while their listening spaces have deep nulls in the mid-range due to comb filtering from reflective surfaces caused by a lack of acoustic treatment.
I love music (enough to have mixed it for a living) and to me it is very clear how the priorities are ordered when it comes to audio fidelity:
1. Room Acoustics
2. Speakers
3. Electronics & Digital
Going from the back: Assuming you don't get the cheapest of the cheapest and don't abuse the gear by making it do things it wasn't build for electronics and digital audio nowadays is transparent. That means, it essentially sounds the same if operated within spec. Even a 0,50 € IC will have distortion figures so staggeringly low it is below human perception and equipment is getting better still. A decent opamp can have distortion figures like 0.005 % THD with a linear frequency response all the way up to radio frequencies. There can be challenges with driving very weird speakers or headphones, but if you hsve the right combination of gear it doesn't have to be expensive to be indistinguishably good in it's audio performance.
This means speakers are way more important thsn the electronics before it. Their distortion numbers are multiple magnitudes higher (in the ball park of 3% THD), their frequency response is inherently problematic (often many dBs up and down even in expensive speakers), they will hsve different beaming characteristics st different frequencies, small speakers lack bass, placement is essential, etc. So getting good speakers is important.
But all of this is dwarfed by the impacts acoustics. The position of the speakers alone makes a huge difference. The impact of an acoustically untreated space is severe: you can get a completely smeared time response with deep nulls of 20dB and more while other frequencies are highly resonant. Even a budget speaker won't have problems of that magnitude.
So get some ok electronics, even more ok speakers, but invest the bulk of the money/time into the setup of the room itself.
Many adiophiles have that priority list reversed. Room acoustics suck. You need to measure a lot, add ugly absorbers in inconvenient places, can't place speakers where they look nice and conserve space, but need to place them where they work well acoustically, there is no ideal solution and everything is a compromise. So buying a gold plated HDMI cable and imagining the improvement appears to be better. Only that you might be doing it in a room where a positional difference of a few centimeters changes the frequency response of the listening position massively.
44.1kHz, isn't that what lameMP3 uses as default?
- CBR only
- Only optimized for 48khz sampling
Not being able to do quality-based variable bitrate encoding is a major gap, and since all of the CD audio in the world is at 44.1k sampling, that seems like a huge miss too.
Its the same with speakers and headphones. People think its the component quality, but it’s mostly competency in overall audio physics and the ability to tune well.
>>...use 48Khz if you want the best quality.
>Yet most of the worlds audio is 44KHz...
Link should be: https://hydrogenaudio.org/index.php/topic,129691.0.html
Our software follows redirs and somehow we got a 302 to our own IP. Perhaps it is someone's idea of a bot detector?
Your approaches are, ride a comment (as you did, 9 hours, no response, likely didn't read) or lean on a frequent flyer (the only privilege I have is increased ability to [dead] obvious spam not caught by filters - but it gets my emails seen)
I sent them an email in past minute - Good Luck! (YMMV)
* quick email to HN@ycombinator.com with a "Help Me please!! and link ( mods can edit link in and sideline (hide) these comments )
* Just live with the rotting fish head of public boo boo (we've all made mistakes, as the Dalek said whilst climbing down off the dustbin)
* I can kill the whole thing dead.