Designing a signal processing environment (RFC) - part 1

This is the first post of the thinking out loud series about the signal processing framework and modular environment I’m developing - SignalDub.

Dubbing the Signals

What’s the functionality I need most when processing audio signals in-the-box (i.e. using a computer)? Dubbing. It has multiple meaning and all those are valid in this context:

I feel that not only music producers and sound engineers, but also visual artists, broadcast engineers… anyone working with multimedia, deserves such system allowing experimentation.

Free as in Freedom

That’s a pretty obvious requirement. You can’t have a hackable system if its core can’t be changed. Also, if the source code is kept secret, the knowledge used to develop it is wasted.

Extreme Modularity and Signal Loops

Most audio software today is modular. You can use whatever DAW you want with your soundcard, insert whatever plugin you want on the track in the DAW. But have you ever tried to insert an EQ on delay’s feedback path? Or distort it?

In analog world - easy. Instead of using delay’s feedback knob, make return->send loop on the mixing console and insert whatever you want. Or open the delay unit, solder 2 more sockets and you have custom signal path.

In a computer - not so easy. The DSP engine will tell you that you have a loop and refuse to run or miscalculate latencies. If it somewhat works, you’ll have 1-block latency (32-1024 samples, depending on how low you can get without overloading CPU and causing glitches) in the loop path.

What to do? We can easily go down to 1 sample latency even when there are loops, but if each module is a different plugin, it will increase the overhead of running their code and passing data between them. We’ll talk about overcoming inter-plugin overhead in the next part.

Going down to 0 samples - no loop latency - is also possible, and desirable if we want to replicate the features of analog, but more complicated mathematically.

DRY (Don’t Repeat Yourself), Modularize

This idea doesn’t have its hardware/analog counterpart, it even contradicts that approach, but comes from a logical assumption that computers should serve humans (never the other way around) and save them from doing the same thing again and again - that’s just boring. That’s what good programmers and systems administrators think and that’s how various programming paradigms (procedural, functional, generic) were invented.

How to do it in practice? Let’s come back to modularity. Let the user create their own modules, as if they’d take few guitar effect pedals, put them in a rack case and bring the knobs and buttons to the rack’s front panel. But in digital world we can clone the finished device instead of buying new stomp boxes and doing all the work each time new unit is needed. Goooood for experimental use, isn’t it?

And you don’t need to be a programmer to make your own virtual devices!

I’m quite surprised how unpopular this concept is in artistic software.

Easy to Use and Flexible

I’m under an impression that most vendors of commercial DAWs don’t test them enough in the field. Otherwise they would realize that during a spontaneous jam session, ability to record the signal, DAW output or soundcard input (if an external mixer is used), is essential. Yes, recording audio is a feature that exists in every DAW, but combine it with using the DAW for looping or arranging, and it becomes either impossible or not-so-obvious, requiring reading the manual or watching tutorials - the last thing you want when you’re in a creative flow. And you end up plugging a portable recorder into the mixer’s headphone output…

No problem like that would even exist if the system was really modular and allowed running multiple instances of a DAW simultaneously, sharing the same soundcard of course. Linux audio ecosystem allows it for years, using the JACK audio server. Having multiple timelines & independent transports within a DAW would also work and would have other artistic use cases.

Full Recording

Recording of the master output is often good enough, but what if we want to have more freedom when making remixes afterwards? Multitrack stems are the answer. What if we want to change the timbre but keep all notes played? MIDI recording is the answer. What if certain virtual knobs of virtual devices are manipulated live? Automation recording comes into play. A good DAW should record audio, MIDI and automation without requiring the user to manually arm each track and parameter that needs to be recorded.

Changes within project made during live performance (adding/removing tracks or other elements, changing the routing) should also be recorded. This seems hard but if we treat every change as an event which triggers action within the host, the events can be recorded just like notes played. An open question is how to visualize such changes in the timeline GUI.

But do we need to record audio output of virtual synthesizers and effects chains if we have their whole input (notes, parameter changes) saved? It depends.

Deterministic Algorithms and Digital Preservation

It would be nice to:

  • save only virtual device’s input signals (including control signals, such as virtual knob turns) and avoid taking disk space by its output signals
  • be able to open the saved project after 50 years and get exactly the same sound

Fortunately, it is possible, if we constrain what the virtual device can do and how it’s made.

  • First, it must not communicate with the outside world using methods that the host has no control over.
  • Second, it must not generate (pseudo)random numbers on its own. If it needs them, the host will help with it considering that pseudorandom numbers must be generated deterministically, from a seed saved with the project.
  • Finally, the device must be programmed using technologies that will be in use as long as computers are in use.

We also need the ability to save the project as a self-contained archive that can be extracted and edited any time in the future - but if it’s not edited, the rendered output must be bit-perfect identical to the render the day it was saved.

Unfortunately, it means that .so or .dll LV2/VST binaries of many existing plugins are out of question, since they are just binaries… dependent on CPU architecture and operating system calls. However, if the DSP algorithm is deterministic, the plugin is cross-platform and doesn’t use external dependencies, it could possibly be compiled after 50 years, for whatever CPU architecture we’ll use then, if plugin API doesn’t bitrot. Which brings us to…

Custom plugin API

We probably need one. I’ll tell you why in the next part.

Plugin management

Look at npm or cargo. Libraries are distributed as a source code and an exact code version, confirmed by cryptographic hash of the source code, is compiled on demand when dependent project needs it. See how it fits the digital preservation requirement?

What SignalDub will be?

I don’t precisely know. Not yet.

Maybe a library for realtime signal processing like TensorFlow is for deep learning.

Maybe a modular environment like PureData or BespokeSynth.

Maybe a Digital Audio Workstation app. Or a framework that someone will use to write a DAW.

Maybe all of the above.

But what I know is that it’ll help me and like-minded music producers and sound engineers do cool things.

To be continued

Stay tuned. In following parts, we’ll think how the signal rendering engine should work internally and we’ll talk about user interface considerations.

This is a Request for Comments so if you think you know how to implement some ideas I’ve written about, or improve the ideas, write about it!

Comment system is not ready yet (it’s not easy if you don’t want to be cloud-dependent), for now please write to comments at lumifaza dot org and I’ll put them here manually, answering questions if necessary. You can also reach me on unfa’s RocketChat (username teo_lumifaza).