Guide: Spatial audio at DeoVR: Everything you need to know

8 days ago

DeoVR supports spatial audio. But what is it and how does it work? In this blog we'll explain everything you need to know about spatial audio in VR at DeoVR.

Important! Only tick the 'spatial audio' box when uploading VR videos if you are certain your video uses a full-sphere surround sound format such as Ambisonics. If you're not sure, leave the box unchecked - otherwise, your upload will fail.

How does spatial audio work?

Spatial audio allows you to experience sound in VR as you would in real life when moving and rotating your head. This delivers deeper immersion, with the audio perfectly matching what you see in virtual reality.

Spatial audio (also referred to as 3D audio or 360 audio) is an audio experience where the sound changes with the movement of the viewer’s head. 3D audio effects manipulate the audio waves produced by stereo speakers, surround-sound speakers, speaker-arrays, or headphones.

This involves the simulated placement of virtual sound sources anywhere in a simulated three-dimensional space, including behind, above or below the listener. To account for direction, distance, and environmental factors, content creators produce this type of “spatially oriented” audio experience when creating soundtracks. Spatial audio uses an ambisonic audio format.

What is Ambisonics?

Ambisonics is a full-sphere surround sound format: in addition to the horizontal plane, it covers sound sources above and below the listener.

Ambisonics offer a matrix coding system for all stages of sound processing: A-format for recording microphone signals, and B-format for studio sound processing. The Ambisonics system offers full compatibility with mono, stereo, and surround systems, and can also provide a "periphonic" system (from the Greek word meaning "sound around corners") by adding pitch information.

The main signal matrixing formats are the following operations:

The A-format is formed by signals obtained from four (or more) cardioid microphones arranged on a tetrahedron as follows (see Fig. 1): the left frontal is LF, the right frontal is RF, the left rear is LB, the right rear is RB. These signals can also be generated by appropriately panning mono-signals from distributed microphones.
B-format uses signals that are obtained by sum-difference transformation of A-format signals:

X=0.5(LF-LB)+(RF-RB)

Y=0.5(LF-RB)+(RF-LB)

Z=0.5(LF-LB)+(RB)

W=0.5(LF+LB+RF+RB) This signal is obtained by adding all the signals in phase. B-format signals can be generated directly using the combination of microphones shown in Fig. 4a: three microphones with a "figure-of-eight" directional characteristic, oriented in three perpendicular directions of X, Y, Z signals and one non-directional microphone (extended MS stereophony), which produces a W signal (see Fig.2).

Ambisonics are a widely employed format for capturing, transmitting and decoding spatial audio, employing 1, 4, 9,16 or 25 channels

How to make spatial audio VR content for DeoVR

Note: if your video has ambisonic audio, make sure to select the Spatial Audio checkbox during uploading.

First, you need to convert Ambisonic A-format to B-format.

For each spatial microphone, Ambisonic B-format is converted differently (for example, Zoom H3VR records b-format if it is selected in the microphone settings. Which means you don't need to do steps 1-5 of these instructions). Check your microphone specification to find out the format.

Download and install Facebook 360 Workstation

Let's take the example of an 8-channel Voyage Audio Spatial Mic as an example of how to make a Spatial Audio TBE (Two Big Ears) file for the DeoVR player.

The TBE format is an 8-channel format employed by Facebook's Spatial Workstation and part of the "spatial audio" 8+2 channels available for Facebook's 360 videos. As shown here below, TBE is basically 2nd-order Ambisonics, where one of the Ambisonics 2nd-order channels (R) has been removed, reducing the number of channels to 8.

1) Open the recorded audio file in Reaper. This file was only recorded on the microphone, so it is in Ambisonic A-format.

2) To convert it to B-format you need to open the free plugin firm microphone. To do this, press FX > Voyage Audio > Spatial Mic Converter > Add.

3) Adjust the settings as required.

4) Everything is ready. Now you can process the master track using effects like equalizing, compression, limiter, etc.

5) Render the audio: Choose Stems in the menu, put 9 channels manually (4 channels for 1st order) and press Render

6) Next we need to bind the audio track to the video. To do this we enter the FB360 Spatial Workstation, select:

Output Format: FB360 Matroshka (Experimental)
Spatial Audio: B-format (2nd order ambiX)
Spatial audio file: select 9-channel audio
Head-Locked Stereo: empty or choose stereo audio file (if needed)
Video file: select the video file
Press Encode

Important!

Don't skip this step: you need to add the 2nd audio stream to the finished MKV file. To do this you need to:

1. Download MKVToolnix
2. Prepare a stereo/mono track. It needs to use the AAC codec. You can use any converter for this, for example, the AIMP audio converter:

You need to change the output format to AAC or AAC m4a format with max bitrate.

Then load MKVToolNix, and in the first block, add the video and audio. In the second block, move AAC audio down. Then press Start:

Useful links:

See and hear how it works in DeoVR here: The Conscious Outlaws - I'd Rather Be Homeless Than Home with You (acoustic version)
Read the full spatial audio DeoVR specs.
Find more information about FB360 Workstation by reading this PDF guide.

If you have any questions about recording spatial audio feel free to contact us at audio@deovr.com or evgeniy.kordiuk@deovr.com.