Skip to content

Conversation

@mattbirman
Copy link

Explainer for improving the WebAudio OfflineAudioContext.startRendering() by adding streaming output.

@mattbirman mattbirman force-pushed the mattbirman/webaudio-offlineaudiocontext-explainer branch from 9aa7b65 to 9cefcb6 Compare October 31, 2025 11:31
@mattbirman mattbirman changed the title [WIP] WebAudio OfflineAudioContext startRendering() explainer WebAudio OfflineAudioContext startRendering() explainer Nov 1, 2025
mattbirman and others added 4 commits November 1, 2025 11:45
Extraneous 'is'

Co-authored-by: Nishitha Burman Dey <54861497+nishitha-burman@users.noreply.github.com>
@mattbirman
Copy link
Author

Thanks all, I've updated the PR incorporating the feedback. Looking forward to your thoughts.

@mattbirman mattbirman marked this pull request as ready for review November 3, 2025 03:24
@mattbirman mattbirman changed the title WebAudio OfflineAudioContext startRendering() explainer WebAudio OfflineAudioContext streaming API Nov 3, 2025
Copy link
Contributor

@gabrielsanbrito gabrielsanbrito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the detailed "user-facing problem" section. Just one minor comment, otherwise LGTM!


### Output format

There is an open question of what data format `startRenderingStream()` should return. The options under consideration are `AudioBuffer`, `Float32Array` planar or `Float32Array` interleaved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is the size of each audio chunk delivered through the audio stream determined? Do we need to give developers control of this parameter?


```js
// From https://developer.mozilla.org/en-US/docs/Web/API/AudioData/format
enum AudioFormat {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm new to this issue, but it seems like two separate feature requests:

(1) OfflineAudioContext streaming.
(2) WebAudio/WebCodec interop or adding more audio formats to Web Audio.

It might be more tractable to solve (1) first with AudioBuffers and then tackle (2) afterwards. Then again, looking at past discussions, maybe bundling these is the way to go. Either way, I like how this explainer goes through all of the different options. Even if we tackle (1) first, we'll do to do so in a way that's forward looking to support (2).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add the idea of sample formats to WebAudio. All of it is already designed to be f32 from start to finish and I think it would be a lot to change that.

Copy link
Author

@mattbirman mattbirman Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code comment above the interface explains that it comes from an existing part of the spec, I don't propose adding these formats to WebAudio. The only addition is f32-interleaved. Which leaves the questions of how do we add this format if we want to use it as the output format?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this section

const reader = context.startRenderingStream({ format: "f32-planar" }).reader();
while (true) {
// get the next chunk of data from the stream
const result = await reader.read();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we expand on this example code to actually use result so we can see what the result contains?

break;
}

const buffers = result.value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the value contain multiple buffers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with f32-interleaved then this value will contain interleaved Float32Array values


- Simple to integrate with existing event-driven patterns

### Cons
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main con is that the web platform already has a pattern for streaming (ReadableStreams). Using this provides interop with the rest of the platform that uses ReadableStreams. You implicitly mentioned this above with the BYOB reader. ReadableStreams also provides a pattern that many developers are familiar with. One other potential alternative would be to use a MediaStream instead of a ReadableStream. The non-offline AudioContext provides MediaStreams through the MediaStreamAudioDestinationNode.

Copy link

@matanui159 matanui159 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for all the nitpicks 😅

**Pros**

- semantically closest to the `startRendering()` API
- does not add a new type to the WebAudio spec

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebAudio already uses Float32Array in its AudioBuffers and as the input/outputs of AudioWorkletProcessor.process


**Cons**

- requires the output of `startStreamingRendering()` to return an array of `Float32Array` in planar format for each output channel

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify on this point more, we could technically have each chunk of Float32Array be planar (e.g. with 128 left samples followed by 128 right samples), but it makes more sense for a stream of samples (with no sort of explicit separation like AudioBuffer provides) to be interleaved (also makes BYOB easier since you can read as many samples as you need).

The better alternative for if we wanted to stream planar Float32Array audio data is for an array of ReadableStream's each with their own Float32Array output (not an array of Float32Array). This would still allow us to use BYOB reading but still with explicitly separate planes for each channel

**Pros**

- allows for streaming a single stream of data, rather than one for each channel
- interoperates with WebCodecs APIs because they operate with interleaved streams of data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebCodecs supports both planar AND interleaved audio so this point isn't true (and since u can get the individual planes form an AudioBuffer it can also easily be used in WebCodecs)


**Cons**

- introduces a new type to the WebAudio spec, `f32-interleave` which does not exist at the moment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify more on this point too, most of WebAudio uses AudioBuffer for raw data with the only exception (afaik?) being the AudioWorkletProcessor which uses planes of Float32Array. Even though a single stream of interleaved Float32Array makes the most sense for a BYOB stream, it would be inconsistent with the only other usage of Float32Array in WebAudio which could be confusing.


#### Recommendation

`f32-interleaved` as it is the most interoperable with other media APIs, like WebCodecs, and simplifies processing with other data streams such as video.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, WebCodecs supports both. Idk what other web APIs there are that only support interleaved?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what should we put here for the format? f32 or f32-planar?

@SteveBeckerMSFT
Copy link
Contributor

Thanks for updates! LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants