, submission for BrickHack4 Audio Spatializer Some Background Information I spent last weekend hacking at . Normally, I love to spend my hackathons building multiplayer games, but this time, I wanted to try working on an idea that I’d had in the back of my TODO list for a while. This idea came to me after I was listening to some (a remixed track of a song that speeds it up, increases its pitch, and puts more emphasis on a strong beat). Specifically, I was listening to this one on YouTube: RIT’s BrickHack 4 nightcore If you skip to the 0:36 mark, you can hear an interesting audio effect if you’re wearing headphones. The beat jumps between the left and right ear (this also happens at a few other points in this video). This was such an interesting effect that I wondered if it was possible to algorithmically create this effect for other songs. My end goal for this project was to be able take some dubstep and use stereo effects to switch the beat between the left and right ear and create an immersive listening experience. During the hackathon, I worked with my friend to try and create a web application to do this (Andrew also showed me ). Andrew Searns other interesting songs where the artist played around with stereo effects While it wasn’t as successful as we expected, we had a lot of fun coming up with an architecture to do this and I personally learned quite a bit of CSS Flexbox while designing the application interface. In this techsploration, I’ll take you through my thought process and show you how to generalize the code we wrote to build an audio processing framework in the browser. The Audio Streaming Architecture Okay, let’s figure out what we need to make this work. Ideally, we want users to simply paste a YouTube link into our application, which will handle streaming the audio, spatializing it, and then playing it. Obviously, the first thing we’ll need is some way to do that audio streaming. I’ll discuss how we set that up in this section, and we’ll talk about the audio spatialization part later. Because we wanted to develop and iterate quickly, we used (also because I happen to like node.js very much). During the hackathon, we found two well documented ways to do audio streaming from YouTube. O̶n̶e̶ ̶o̶p̶t̶i̶o̶n̶ ̶w̶a̶s̶ ̶t̶h̶e̶ ̶p̶a̶c̶k̶a̶g̶e̶ ̶ ,̶ ̶a̶n̶d̶ ̶t̶h̶e̶ ̶o̶t̶h̶e̶r̶ ̶o̶p̶t̶i̶o̶n̶ ̶w̶a̶s̶ ̶t̶h̶e̶ ̶p̶a̶c̶k̶a̶g̶e̶ ̶ ̶(̶w̶h̶i̶c̶h̶ ̶y̶o̶u̶t̶u̶b̶e̶-̶a̶u̶d̶i̶o̶-̶s̶t̶r̶e̶a̶m̶ ̶a̶c̶t̶u̶a̶l̶l̶y̶ ̶d̶e̶p̶e̶n̶d̶s̶ ̶o̶n̶)̶.̶ ̶A̶f̶t̶e̶r̶ ̶a̶ ̶l̶o̶t̶ ̶o̶f̶ ̶i̶t̶e̶r̶a̶t̶i̶o̶n̶ ̶a̶n̶d̶ ̶t̶e̶s̶t̶i̶n̶g̶,̶ ̶w̶e̶ ̶s̶e̶t̶t̶l̶e̶d̶ ̶o̶n̶ ̶u̶s̶i̶n̶g̶ ̶y̶o̶u̶t̶u̶b̶e̶-̶a̶u̶d̶i̶o̶-̶s̶t̶r̶e̶a̶m̶ ̶t̶o̶ ̶p̶i̶p̶e̶ ̶t̶h̶e̶ ̶a̶u̶d̶i̶o̶ ̶s̶t̶r̶e̶a̶m̶ ̶d̶i̶r̶e̶c̶t̶l̶y̶ ̶t̶o̶ ̶t̶h̶e̶ ̶c̶l̶i̶e̶n̶t̶.̶ y̶o̶u̶t̶u̶b̶e̶-̶a̶u̶d̶i̶o̶-̶s̶t̶r̶e̶a̶m̶ y̶t̶d̶l̶-̶c̶o̶r̶e̶ EDIT: I’ve revisited this project and the youtube-audio-stream package is a bit funky sometimes and isn’t as actively maintained as ytdl-core . I’ve rewritten the project to use ytdl-core, but for legacy reasons I will not change this blog post. Worry not, this change is almost a drop-in replacement and only affects around 5 lines of code. To do this, the client sends the YouTube video ID to our server, which uses to fetch it from YouTube and then pipe it back to the client. youtube-audio-stream Diagram created using websequencediagrams.com We decided to offload the audio processing to the client to reduce the load on our server. This turned out to be the most pragmatic solution for our purpose and was simple to code and put together. Now you might be looking at this and asking why we didn’t cut out the middleman and have the web browser directly request the audio stream from YouTube. prevents us from making a request from the client directly to YouTube. Additionally, we wanted our server to provide a communication channel between all the clients (for reasons we will discuss in a later section). Cross-Origin Resource Sharing (CORS) Below is a minimal working example of the architecture described above: This might look like a lot of code to digest, but it’s actually quite simple and most of it is just boilerplate. Let’s run through it. is a pretty standard with two routes defined: one for serving the HTML page to the client, and one for serving the audio stream to the client. The server will also statically serve files in the folder, which is where we will store . audio-processor-server.js boilerplate express.js server /client audio-processor.js is a plain HTML page with nothing but an input field for a YouTube video ID and a submit button. audio-processor.html is the client side script loaded into the HTML page to send a request to the server for an audio stream. We’re using an to request the audio stream so that we can specify a return type of arraybuffer. Using the ’s object, we can containing the decoded PCM audio data. This buffer is passed into the method (not defined in the example stub above) which you can substitute that with any function you want. audio-processor.js XHR request Web Audio API AudioContext decode the arraybuffer into an AudioBuffer object processAudio() Here’s an example function that we used to do some naive beat detection. We passed the audio data through a low pass filter and isolated all points above a certain relative value. processAudio() We’d like to thank Joe Sullivan for his fantastic article explaining and providing the low pass filter code below. Here’s a . You can do all sorts of interesting things using the PCM audio data in the . link to a branch of our project containing the sample code above AudioBuffer Note that the code in is still relevant even if you don’t want to load audio from YouTube. You can apply it to statically loaded audio files or any other audio sources you may want to use. I recommend browsing through the to explore all the cool things you can do with an audio buffer. audio-processor.js Web Audio API documentation An Alternative Audio Streaming Architecture Suppose however, you want to perform the audio processing part on the server. This has several advantages to doing the processing on the client: server side caching, architecture specific processing libraries, or simply convenience. The disadvantage however, is that many concurrent requests will put a lot of load on your server. We also experimented with doing the processing server side during the hackathon using an architecture like this: Diagram created using websequencediagrams.com We chose not to do this to reduce server load, but I’ll give sample code for this architecture as well if you would like to replicate this for your own projects. is again a simple , but with a few extra pieces this time. In this version, we save the audio stream to the file system as a .flv file. This is not necessary if you want to just pipe the audio stream into your custom audio processor, but we found that this allowed for more flexibility and control. Like before, this server has two routes defined: one for serving the HTML page and one for serving the audio streams. Again, the server will also statically serve files in the folder. audio-processor-server.js express.js boilerplate server /client is the same plain HTML page with an input field for the YouTube video ID and a submit button. audio-processor.html is a client side script loaded into the HTML page that simply requests an audio file using an and plays it. audio-processor.js HTMLAudioElement If you choose to do the processing server side, . Therefore, it is not available to the node.js server running on the server side. be aware that the Web Audio API is a web JavaScript API that is not part of the JavaScript language If you want even more granularity, . We didn’t try this but you can probably do a lot of cool things by combining backend specific libraries with the power of the Web Audio API. you can actually combine the two methods described above to do processing on the server and additional post-processing on the client The rest of the techsploration will be about how we connected this audio streaming architecture to the for our hack. If you were only interested in building an in-browser audio processing architecture, then you can stop reading here or skip to the end. Resonance Audio API Spatializing The Audio Stream Once we got past the challenge of figuring out the best way to manipulate an audio stream, we needed to figure out how to move the audio in stereo to create the effect we wanted. We used the to do this. Given an audio source and a vector position, this API allowed us to position the audio source around the listener in 3D space. You can explore this effect in their (best experienced with headphones). Google Resonance Audio API demos Our plan was to generate a list of vector values based on the audio data and use the Resonance Audio API to move the sound source around while the audio was playing. This was pretty trivial to do using and synchronizing the list of vector positions using the current frame in the audio. We could calculate exactly which vector position to position the audio source at since the sampling rate of the audio buffer was known. setInterval() The hard part (and the coolest part) was figuring out how to generate the list of vector positions properly so that the stereo panning was in sync with the beat. We didn’t want to just oscillate between left and right and we didn’t want the movement pattern to be repetitive and boring. and I experimented with various movement patterns and developed a scheme to encode them. Andrew Locations around the head which we can place the audio source. Image from — Imgur. 3D audio experience We named the movements patterns as Transforms and decided on 5 distinct ways in which we could move the audio around the listener: Jumps (which move the audio source to a random location around the listener), Flips (which move the audio source to the direct opposite side of the listener), Rotates (which slowly rotate the audio source around the listener), Delays (which hold the audio source in place), and Resets (which reset the audio source to the origin). Using the naive beat detection algorithm described before, we transitioned from one Transform to another on every beat. To generate sequences of Transforms, we used a Markov chain, initially seeding every transition with equal probability. Here’s where we actually utilized the server as a relay point between the connected clients. We attempted to make the sequence generation somewhat intelligent by adding a button to the interface allowing the user to vote up or vote down the current audio spatialization pattern. The server would then take this into account by increasing or decreasing the probability of the sequences used in the song the user was listening to. Diagram made with draw.io The new Markov chain would be propagated to all clients, and any new clients that visited the site would use the new probabilities to generate their spatialization. Even though we fully hashed out this idea, miscellaneous bugs and issues with JavaScript objects prevented us from fully implementing it in time for the hackathon submission. Afterthoughts This was a fantastic project to design and play with. There are lot of interesting things to explore in the Web Audio API. I don’t know much about audio encoding but I imagine the resources there would be a lot more useful if I had a little more background knowledge about audio codecs and PCM data. If you’d like to check out our project in its entirety, . There are three notable branches. The contains our implementation of the audio spatialization project. The contains the client side processing example described in the first section, and the contains the server side processing example described in the second section. here is a link to the git repository master branch blog-min-example branch blog-min-example-2 branch If you’d enjoyed this techsploration, consider hitting the clap icon or following me on for more content like this. Thanks very much for reading! Twitter

Chain

Fetch

Google

Mozilla

Resonance

Twitter

YouTube

Open Source Packages Kind Of Suck Sometimes

Too Long; Didn't Read

How To Build An Audio Processor In Your Browser

How To Build An Audio Processor In Your Browser

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Automating My Personal Note-Taking Compilation System

10 Things Everyone Should Know About Machine Learning

10 Repositories that Will Transform the Way You Approach Technical Interviews

10 (Free) Data Structure and Algorithm Courses Junior Developers Should Explore

10 Data Structure & Algorithms Books Every Programmer Should Read

The Noonification: How to Develop a DSL in Kotlin (12/12/2023)

Automating My Personal Note-Taking Compilation System

10 Things Everyone Should Know About Machine Learning

10 Repositories that Will Transform the Way You Approach Technical Interviews

10 (Free) Data Structure and Algorithm Courses Junior Developers Should Explore

10 Data Structure & Algorithms Books Every Programmer Should Read

The Noonification: How to Develop a DSL in Kotlin (12/12/2023)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps