> I can't take all of the credit. My little robot intern (Opus 4.5) has been very helpful with the busy work, leaving me free to handle the trickier planning and implementation. ;)
Audio systems get used for more than playing back music and film soundtracks.
People use audio system at home to play electronic instruments. People also play video games. People do all kinds of stuff.
Latency is an important factor in these things.
Even videoconferencing and podcasting: With a microphone pointed at your face and a set of headphones used for monitoring that microphone, latency matters.
(It matters more to some people than others -- some people can tolerate hearing themselves later and continue to speak just fine, while some others increasingly sound like they're having a stroke as monitoring latency goes up and eventually become unable to produce coherent strings of phonemes.)
Hello. I am the creator of this project! Nominal latency is currently 8ms, with ±1ms of variance. All output channels are phase-locked, so this doesn't present a problem for multi-way crossover implementations.
I wonder if 264/520 kB RAM is also enough for a high quality parametric stereo reverb/echo effect? Should fit about 3/6 seconds of uncompressed 16-bit 44.1/48 kHz audio.
Also: Raspberry Pi Ltd - please keep increasing the RAM size in future iterations to unlock even more use cases.
Default specs matter a lot for worldwide availability and affordability, as well as for the willingness of people to spend a lot of time creating free software for it.
520KB of SRAM is actually on the high end for microcontrollers. It doesn't seem like much but SRAM is on-die and significantly lower density than DRAM. For comparison, it's the same type of memory used for CPU caches, which are also small!
You can easily find dev boards with 8MB of PSRAM online if you need it. Or you can buy the PSRAM and hook it up yourself. If you still need more memory than that then you're looking at the wrong chip for the job.
I have always wondered kind of bandwidth you could make by multiple channels of PSRAM driven by PIO/DMA. Individually they're not so speedy(although the APS6408L-OCH-BA seems pretty crazy) , but how many can you run simultaneously. In terms of the RP2350 it would be fascinating to see how many times a second could you replace the entire contents of SRAM.
PSRAM is a possibility that I have explored for offloading the delay line buffers, which occupy quite a significant chunk of SRAM at the moment. It should be fast enough.
Yes, I was thinking of it more like bank switching.
Although, going back to the start of the thread where the suggestion was adding more RAM to future chips perhaps the request could be for support for multiple channels in the future.
It;s the age old question of parallel Vs serial Vs multi channel serial.
> high quality parametric stereo reverb/echo effect
I’m sometimes annoyed that the home audio/audiophile world is so separate from the live/professional world.
For playing recordings with fancy effects, you can throw massive overkill CPUs at it with small batches, brutefir style, or you can do high-latency FFT filters, and you can get essentially perfect FIR reverb effects with a latency vs complexity tradeoff.
But the algorithm in the middle exists and is not that exotic. You divide your impulse response into a very short piece at the beginning, then a longer piece after that, then a longer piece after that, in exponentially increasing pieces. And then you add up the results, with straight addition and multiplication for the short one, and (carefully scheduled to avoid stalls) FFT convolution for the long ones, and you get basically arbitrary long FIR filters with logarithmic amortized complexity per sample and as low as zero sample latency if you are so inclined.
I think this is called “non-uniform partitioning” or something to the effect. I’m not aware of any serious, public implementation for audio use.
Tangentially related, I recently had some hand-me-down high-end full tower speakers lose their integrated subwoofer amps. I bypassed them and wired in an external amp but people said the integrated DSP would be missing. That's when I learned about CamillaDSP [1] and CamillaFIR [2]. I got a calibrated UMIK-1 microphone and did a frequency sweep in the room. Then I applied the Camilla-computed FIR filter to my snapcast-sourced music stream on the Raspberry Pi 3 B I have networked into the living room. Now I have room-corrected and loudspeaker corrected fancy DSP and the speakers sound better than ever. Pretty fun, and very cheap. The Pi3 runs it using about 20% of its CPU. Not bad! I did the same process up in my office with some desk speakers and they sound great too (that time using EasyEffects to apply the filter in real-time rather than CamillaDSP).
FWIW, I've tried Dirac Live and compared it to the correction suggested by REW [0]. In both cases, the measurements were taken with a UMIK-1, and the correction was done on a computer. Contrary to GP, I didn't have to fix borked components, just a random, untreated living room.
Dirac seemed to have a fairly heavy-handed correction. In my case, I only had fairly narrow frequency ranges that needed correcting, but Dirac seemed to move much wider ranges at a time. It's also nearly impossible to tweak; you basically can only increase/decrease "the lows" or "the highs". But maybe I'm missing something.
In contrast, the suggestions produced by REW were loaded in EasyEffects on Linux, and I could tweak everything to my heart's content. But I actually just left it alone, since it was good enough.
I also have a UMIK-1, and tried the REW route once, but it made everything worse. I suspect a lot of the know-how in Dirac is how to automatically get good results.
In my case, the setup is pretty simple. I have full-range floorstanders that only take a single input, and I mostly wanted to control some booming in my listening position. So there's no crossover to handle or anything fancy.
Maybe for more involved situations Dirac does a better job, but, in my case, it didn't really solve anything. Also, I see they now have this newer "bass control" thing, and it's not clear if my version had it when I last tested it (around November 2025).
It's equal parts science and art. Best left as a last resort, never a shortcut. These utilities are generally directed towards much louder systems, in bigger spaces, with far more than a few speakers. Everything that has bled down into the consumer space is a band aid for people who either can't or don't know/care/want to treat their rooms and position speakers correctly.
Ideally you want to be going into it intenting to correct a specific aspect of the room or the speakers, after already ensuring that you've placed the speakers correctly for the room and listening position. If you did not use a tape measure and the full dimensions of the speakers, start over. One of the most useful things REW/Dirac can do for you is confirm that you've placed everything correctly. It is not a magic "make it sound better" utility.
Hate to sound like an ad but the most impressive thing I've purchased wrt audio in the past 20 years has been some isolators from https://isoacoustics.com/. It's legit engineering magic, you will spend the first hour thinking something is wrong with your body because you can no longer feel the sound.
> a band aid for people who either can't or don't know/care/want to treat their rooms and position speakers correctly.
Indeed, but I'd bet many people are in the "can't" category. Especially for low frequencies, you need pretty hefty treatment to make a difference, which is oftentimes impractical to install in a room which wasn't designed for that. And I seriously doubt any sizable number of rooms in apartments are designed for that. Combine this with the ungodly amount of snake oil peddled, and I can easily understand why many people look at Dirac and similar solutions.
And while they are band-aids, in many cases that's enough. I used to live in a studio apartment where room correction made a night and day difference to my listening position. Elsewhere the sound wasn't that great, but I didn't really care since I never listened from theme. I was renting, and the space was rather small, so there was no way to install any useful treatment.
In my current apartment it works much worse, it's actually close to useless. But it's rather bigger, so I could put in some treatment. But I've spent a lot of time researching this, and it's still not clear how to go about doing this. People can't even seem to agree on what kind of material to look at. And while I love listening to music, I'm not keen on investing thousands, plus time living under construction for weeks just to throw multiple solutions at the walls and see what sticks.
Ive done quite extensive testing with Dirac(with a MiniDSP Flex), rePhase, normal PEQs, BruteFIR, CamillaDSP etc. etc.
Dirac is the most user friendly of the bunch, but honestly once you limit the correction to below Schroeder frequency I cannot tell them apart. So for my systems I just stick to a few PEQs targeting the main peaks under 300hz.
Ah that’s super cool. Wish I knew about this a week earlier. Just last week I got the iLoud sub to correct speakers for my living room because I wanted a standalone piece of equipment that’s not my PC that can hold the corrected EQ/phase.
I use a UCA202 for the same purpose. Does yours output static sometimes when it sits for too long? Based on my testing this seems to be a Linux thing instead of a Behringer thing.
I wonder if you could do the same thing in reverse and have a cheap way to get multiple inputs.
I would love a cheap way to add 8–16 inputs to my PC; all the audio interfaces I found cost quite a bit.
Even worse, the ENOB is closer to 9 bits in testing. It’s got horrible DNL/INL. Totally worthless for any audio unless you’re trying to do chiptunes or something.
Yes, but this project doesn't do anything analog to begin with. It could just have several S/PDIF and I2S inputs, and convert that to USB. You probably don't want any processing then, and just pass the digital inputs straight to USB. The limit of how many channels you could simultaneously process would then be the USB bandwidth.
The analog input will use separate ADC modules, just as the analog output uses separate DACs. DSPi itself is purely digital (OK, excepting the PWM based sub out). These modules are just a few dollars on AliExpress for ~96dB SINAD
The Topping Pro audio interfaces have ludicrously good inputs. The E8x8 has eight analog ins and eight outs plus more connectivity for $450. It is very cheap for what you get. The inputs are crazy good. $450 is also a good chunk of cash, so…
For the $450 you get a lot of stuff. Preamps for mic and guitar pickups. Powerful headphone amp. It's clearly worth it if you make use of some of it, and potentially even just for the inputs alone. $450/8 = $56 per ludicrously clean input is good.
I bought an E1x2 kind of as a joke. Just to see how bad it was. It's actually really, really good.
And also:
It's actually possible to gang together multiple disparate audio interfaces. Let the audio stack keep them in sync with ASRC. Aggregate Device on macOS can do this. People say you can't but you can. Linux is good for this too. If you find a cheaper per channel input, this can actually be done; Piecemeal it.
Thanks for the suggestion. I was hoping for something cheaper since I don't need really high quality. For now, I'm using a bunch of cheap USB soundcards that are good enough, but having multiple USB devices makes routing hell.
A Behringer UMC1820 does that combination of things (cheap, lots of analog IO, PC interface) very well. It provides 8 inputs OOTB.
For more inputs, a Behringer ADA8200 can be connected with a garden-variety TOSLINK cable, bringing the total of 16.
Or: Two UMC1820s, clocked together using that same TOSLINK cable. That provides 16 inputs that are all identical and also operating in lock-step.
In terms of cost: A smart way to play with this stuff is to buy used gear, and treat eBay as a long-term rental program. Just buy it, use it, and when you want to try something different: Sell it. It works because the depreciation on stuff like this is basically a straight line once the initial hit of turning "new" into "used" gear is over with.
The long-term rental cost then is mostly a combination of time, shipping expense, and seller fees. Keep it as long as you want. :)
edit: alright. so the UMC1820 is apparently having production issues right now, which constrains supply, so prices are higher than normal. On a normal day, they sell for $229 new. I've bought them for ~$100 used. Things will go back to normal soon enough.
Pi or pi pico? At first glance it looks like that software is designed for double precision floats. That would certainly be some compute. The M0+ doesn't have hardware floating point let alone double precision. The M33 on the newer chip I think has hardware single precision float so a simple find-replace should let it go.
If it's not doing anything else and the sample rates aren't outrageous it might be doable but I'd have to dig into the code more to see how much work they're doing per sample.
Looking through the GitHub and the AudioScienceReview link - this appears to be specifically about firmware features. You'd need to ensure that hardware inputs on device have input impedance of at least 250Kohms, probably closer to 1Megohm to prevent loading and signal loss if plugging guitar right in. I'd also assume (didn't see confirmation) that I/O is at line level, which is significantly higher than instrument (passive guitar) level, but this device can clearly add/adjust gain along the way. If you use active pickups with a built-in preamp like EMGs, it would probably work just fine.
Since a Raspberry Pi Pico doesn’t have built-in audio output ports, I think the main thing blocking ordinary people from using it is figuring out the hardware? A link to a tutorial for how to add audio output would be useful.
For USB input and SPDIF output, all that you need is a TOSLINK TX module(s) or a couple of capacitors and a resistor if you want coaxial SPDIF. For I2S output, the PCM5102A modules that you find on Amazon work very well, with very reasonable performance (SNR >100dB, THD+N ~95dB).
For 2.1 configurations in a pinch, the firmware includes a software DAC that's more than adequate to drive a subwoofer, so only one external DAC is needed.
Maybe start by omitting the part in the first paragraph about how it acts as a USB sound card. :)
I mean, I know what you meant, but that's pretty misleading phrasing for many people.
It's not much of a stretch to think that most people interpret a "USB sound card" as a thing with analog audio on one side and USB on the other side. But other than the subwoofer output, we don't have any analog IO on a Pico running this firmware.
A sampling rate of 192kHz is overkill. And 192KHz exists as a sample rate in audio world because it is overkill.
With a Nyquist frequency of ~96KHz, all of the arguments about whether a person can hear up to eg 22.05KHz, 24KHz, or if there's something meaningful all the way up at 48KHz, become completely and totally ameliorated.
Those arguments were always such tiresome ordeals.
The cost of dissolving those arguments is just some some bandwidth and CPU cycles -- which is to say, it costs approximately nothing.
Oh it's worse than that, for distribution and playback sampling at more than 48kHz is likely worse in many ways due to unwanted ultrasonic noise and increased intermodulation distortion. 96/24 makes sense for production, and 96/float56 is common in DSP chains.
When the production produces unwanted ultrasonic noise, then that's not a sampling rate problem. It is instead a production problem.
And that's perfectly OK, too: The neat part about having too much data is that other end-users (like you and me) are free to throw it away as expeditiously as we choose to.
To that end: I, for one, welcome our 192kHz overlords. (And then I'll shove it through my hardware DSP that operates at 24-bit 48kHz and fuhgettaboutit.)
I don't LISTEN to music in 192kHz. I listen in 48kHz like everyone else and it sounds perfectly fine. But, I do MIX my music in 192kHz, however, before it's final export to 48kHz. It is about the anti-aliasing principle I described in my post above. But, while I'm mixing my audio clock is at 192kHz, and I can't escape that. Hence I will be looking at how to run this project on a beefier device that could run at 192kHz sample rate.
In my personal experience as a music producer for the last 36 years, MIXING hundreds of channels benefits enormously from the available bandwidth. Think of it in the terms of graphics (anti) aliasing. If you open your canvas in 1920x1080, for example, and draw a diagonal line, your line will be jagged (aliased) to a certain extent. If you, on the other hand, start a canvas in 7680x4320 and draw the same diagonal line, and then rescale your output back to FullHD, your line will be perfectly smooth with no visible alias whatsoever. It is absolutely the same principle when mixing music: I MIX everything in 192kHz and I PUBLISH in 48kHz. And, yes, my ears can hear the difference perfectly fine. But, do people like me who are forced to run their audio clock at 192kHz most of the time, deserve a DSP processor like this? It could be very useful, yes.
We've been using Nyquist's work and anti-aliasing filters for nearly as long as we've been using digital audio at all.
Your DAW (or whatever) may be able to show you the stairsteps of individual samples on a screen, but with a functional playback system it is never that way at all by the time things become analog again. Instead, it's always smoothed out by an anti-aliasing filter.
It works this way regardless of sampling rate. The stairsteps don't make it outside of number-land. You can run your DAW at 48KHz, 96KHz, or 192KHz, and signals below the least-common-denominator cutoff frequency will be identical on an oscilloscope -- and free of stairsteps. (Try it sometime. It's fun.)
Aliasing is a solved problem that has been solved for a long time. Your analogy about scaling and diagonal lines is actually a decent visual representation of how this stuff works, except it has already been working that way without being deliberately clever with overkill sampling rates.
Meanwhile: This Pi Pico DSP stack is structured very heavily towards being the last digital stage of a listening system. As-constructed, it's quite clearly evident that it is really not meant to be anything else. A person can certainly bend it to be other things (yay open source!), but you've probably already got a set of filters well-integrated into your existing toolchain that work superbly.
But if that's what you want, then by all means: Use it. Integer sampling rate conversions are trivial operations to get correct. To get the 96KHz that this project works with from your your 192KHz workflow, it's just a matter of throwing away half of the samples and playing back whatever remains. Any aliasing is out-of-band, and is removed by the anti-aliasing filter that is part of the digital-to-analog stage.
Neat! I've been using a Teensy 4 for some of these things recently. The Teensy Audio Library is pretty good, but even though open source is pretty well tied to the Teensy hardware.
Also, for those watching for it: https://www.audiosciencereview.com/forum/index.php?threads/i...
> I can't take all of the credit. My little robot intern (Opus 4.5) has been very helpful with the busy work, leaving me free to handle the trickier planning and implementation. ;)
https://github.com/WeebLabs/DSPi/commit/ba8e481570e6a5ce3d35...
The end-to-end delay is about 10ms, according to this comment:
https://www.audiosciencereview.com/forum/index.php?threads/i...
People use audio system at home to play electronic instruments. People also play video games. People do all kinds of stuff.
Latency is an important factor in these things.
Even videoconferencing and podcasting: With a microphone pointed at your face and a set of headphones used for monitoring that microphone, latency matters.
(It matters more to some people than others -- some people can tolerate hearing themselves later and continue to speak just fine, while some others increasingly sound like they're having a stroke as monitoring latency goes up and eventually become unable to produce coherent strings of phonemes.)
I wonder if 264/520 kB RAM is also enough for a high quality parametric stereo reverb/echo effect? Should fit about 3/6 seconds of uncompressed 16-bit 44.1/48 kHz audio.
Also: Raspberry Pi Ltd - please keep increasing the RAM size in future iterations to unlock even more use cases.
You can easily find dev boards with 8MB of PSRAM online if you need it. Or you can buy the PSRAM and hook it up yourself. If you still need more memory than that then you're looking at the wrong chip for the job.
Although, going back to the start of the thread where the suggestion was adding more RAM to future chips perhaps the request could be for support for multiple channels in the future.
It;s the age old question of parallel Vs serial Vs multi channel serial.
I’m sometimes annoyed that the home audio/audiophile world is so separate from the live/professional world.
For playing recordings with fancy effects, you can throw massive overkill CPUs at it with small batches, brutefir style, or you can do high-latency FFT filters, and you can get essentially perfect FIR reverb effects with a latency vs complexity tradeoff.
But the algorithm in the middle exists and is not that exotic. You divide your impulse response into a very short piece at the beginning, then a longer piece after that, then a longer piece after that, in exponentially increasing pieces. And then you add up the results, with straight addition and multiplication for the short one, and (carefully scheduled to avoid stalls) FFT convolution for the long ones, and you get basically arbitrary long FIR filters with logarithmic amortized complexity per sample and as low as zero sample latency if you are so inclined.
I think this is called “non-uniform partitioning” or something to the effect. I’m not aware of any serious, public implementation for audio use.
[1] https://github.com/HEnquist/camilladsp
[2] https://github.com/VilhoValittu/CamillaFIR
Dirac seemed to have a fairly heavy-handed correction. In my case, I only had fairly narrow frequency ranges that needed correcting, but Dirac seemed to move much wider ranges at a time. It's also nearly impossible to tweak; you basically can only increase/decrease "the lows" or "the highs". But maybe I'm missing something.
In contrast, the suggestions produced by REW were loaded in EasyEffects on Linux, and I could tweak everything to my heart's content. But I actually just left it alone, since it was good enough.
---
[0] https://www.roomeqwizard.com/
Maybe for more involved situations Dirac does a better job, but, in my case, it didn't really solve anything. Also, I see they now have this newer "bass control" thing, and it's not clear if my version had it when I last tested it (around November 2025).
Ideally you want to be going into it intenting to correct a specific aspect of the room or the speakers, after already ensuring that you've placed the speakers correctly for the room and listening position. If you did not use a tape measure and the full dimensions of the speakers, start over. One of the most useful things REW/Dirac can do for you is confirm that you've placed everything correctly. It is not a magic "make it sound better" utility.
Hate to sound like an ad but the most impressive thing I've purchased wrt audio in the past 20 years has been some isolators from https://isoacoustics.com/. It's legit engineering magic, you will spend the first hour thinking something is wrong with your body because you can no longer feel the sound.
Indeed, but I'd bet many people are in the "can't" category. Especially for low frequencies, you need pretty hefty treatment to make a difference, which is oftentimes impractical to install in a room which wasn't designed for that. And I seriously doubt any sizable number of rooms in apartments are designed for that. Combine this with the ungodly amount of snake oil peddled, and I can easily understand why many people look at Dirac and similar solutions.
And while they are band-aids, in many cases that's enough. I used to live in a studio apartment where room correction made a night and day difference to my listening position. Elsewhere the sound wasn't that great, but I didn't really care since I never listened from theme. I was renting, and the space was rather small, so there was no way to install any useful treatment.
In my current apartment it works much worse, it's actually close to useless. But it's rather bigger, so I could put in some treatment. But I've spent a lot of time researching this, and it's still not clear how to go about doing this. People can't even seem to agree on what kind of material to look at. And while I love listening to music, I'm not keen on investing thousands, plus time living under construction for weeks just to throw multiple solutions at the walls and see what sticks.
I have one and personally didn't bother, did the usual UMIK-1 + REW to create the room correction.
> https://www.minidsp.com/products/dirac-series/index.php?opti...
Dirac is the most user friendly of the bunch, but honestly once you limit the correction to below Schroeder frequency I cannot tell them apart. So for my systems I just stick to a few PEQs targeting the main peaks under 300hz.
The loudspeaker would have used one; a driver is both cheaper and of higher quality.
https://www.raspberrypi.com/news/upcycle-a-sonos-play1/
There are other projects for the Pico which implement S/PDIF in.
In either case, since it is digital, the quality (or lack of) of the internal ADCs should not matter.
The cheapest option is probably some Behringer mixer with enough inputs and multitrack interface over USB, like XR18.
https://topping.pro/E8x8-Pre/
For the $450 you get a lot of stuff. Preamps for mic and guitar pickups. Powerful headphone amp. It's clearly worth it if you make use of some of it, and potentially even just for the inputs alone. $450/8 = $56 per ludicrously clean input is good.
I bought an E1x2 kind of as a joke. Just to see how bad it was. It's actually really, really good.
And also:
It's actually possible to gang together multiple disparate audio interfaces. Let the audio stack keep them in sync with ASRC. Aggregate Device on macOS can do this. People say you can't but you can. Linux is good for this too. If you find a cheaper per channel input, this can actually be done; Piecemeal it.
For more inputs, a Behringer ADA8200 can be connected with a garden-variety TOSLINK cable, bringing the total of 16.
Or: Two UMC1820s, clocked together using that same TOSLINK cable. That provides 16 inputs that are all identical and also operating in lock-step.
In terms of cost: A smart way to play with this stuff is to buy used gear, and treat eBay as a long-term rental program. Just buy it, use it, and when you want to try something different: Sell it. It works because the depreciation on stuff like this is basically a straight line once the initial hit of turning "new" into "used" gear is over with.
The long-term rental cost then is mostly a combination of time, shipping expense, and seller fees. Keep it as long as you want. :)
edit: alright. so the UMC1820 is apparently having production issues right now, which constrains supply, so prices are higher than normal. On a normal day, they sell for $229 new. I've bought them for ~$100 used. Things will go back to normal soon enough.
What are the odds a Raspberry Pi could keep up with BTrack?
https://github.com/adamstark/BTrack
If it's not doing anything else and the sample rates aren't outrageous it might be doable but I'd have to dig into the code more to see how much work they're doing per sample.
But there seems to be new features being planned all the time, so who knows what it might do in the future.
Since a Raspberry Pi Pico doesn’t have built-in audio output ports, I think the main thing blocking ordinary people from using it is figuring out the hardware? A link to a tutorial for how to add audio output would be useful.
There will also be an official plug-and-play custom board that includes all of the relevant IO, connectors and codecs.
I had a project in mind that was waiting for something like this! :)
For 2.1 configurations in a pinch, the firmware includes a software DAC that's more than adequate to drive a subwoofer, so only one external DAC is needed.
I mean, I know what you meant, but that's pretty misleading phrasing for many people.
It's not much of a stretch to think that most people interpret a "USB sound card" as a thing with analog audio on one side and USB on the other side. But other than the subwoofer output, we don't have any analog IO on a Pico running this firmware.
It is not 100% plug and play as you can choose your own software.
A custom board sounds great, too.
With a Nyquist frequency of ~96KHz, all of the arguments about whether a person can hear up to eg 22.05KHz, 24KHz, or if there's something meaningful all the way up at 48KHz, become completely and totally ameliorated.
Those arguments were always such tiresome ordeals.
The cost of dissolving those arguments is just some some bandwidth and CPU cycles -- which is to say, it costs approximately nothing.
Please let the man cook. :)
And that's perfectly OK, too: The neat part about having too much data is that other end-users (like you and me) are free to throw it away as expeditiously as we choose to.
To that end: I, for one, welcome our 192kHz overlords. (And then I'll shove it through my hardware DSP that operates at 24-bit 48kHz and fuhgettaboutit.)
Your DAW (or whatever) may be able to show you the stairsteps of individual samples on a screen, but with a functional playback system it is never that way at all by the time things become analog again. Instead, it's always smoothed out by an anti-aliasing filter.
It works this way regardless of sampling rate. The stairsteps don't make it outside of number-land. You can run your DAW at 48KHz, 96KHz, or 192KHz, and signals below the least-common-denominator cutoff frequency will be identical on an oscilloscope -- and free of stairsteps. (Try it sometime. It's fun.)
Aliasing is a solved problem that has been solved for a long time. Your analogy about scaling and diagonal lines is actually a decent visual representation of how this stuff works, except it has already been working that way without being deliberately clever with overkill sampling rates.
Meanwhile: This Pi Pico DSP stack is structured very heavily towards being the last digital stage of a listening system. As-constructed, it's quite clearly evident that it is really not meant to be anything else. A person can certainly bend it to be other things (yay open source!), but you've probably already got a set of filters well-integrated into your existing toolchain that work superbly.
But if that's what you want, then by all means: Use it. Integer sampling rate conversions are trivial operations to get correct. To get the 96KHz that this project works with from your your 192KHz workflow, it's just a matter of throwing away half of the samples and playing back whatever remains. Any aliasing is out-of-band, and is removed by the anti-aliasing filter that is part of the digital-to-analog stage.
You can't hear it if it isn't there. :)
https://github.com/WeebLabs/DSPi/blob/main/Documentation/Roa...