|
| 1 | +# WebAudio OfflineAudioContext.startRendering() streaming output |
| 2 | + |
| 3 | +## Authors: |
| 4 | + |
| 5 | +- [Matt Birman](mailto:mattbirman@microsoft.com) |
| 6 | + |
| 7 | +## Table of Contents |
| 8 | + |
| 9 | +[You can generate a Table of Contents for markdown documents using a tool like [doctoc](https://github.com/thlorenz/doctoc).] |
| 10 | + |
| 11 | +<!-- START doctoc generated TOC please keep comment here to allow auto update --> |
| 12 | +<!-- END doctoc generated TOC please keep comment here to allow auto update --> |
| 13 | + |
| 14 | +## Introduction |
| 15 | + |
| 16 | +WebAudio `OfflineAudioContext.startRendering()` allocates an `AudioBuffer` large enough to hold the entire render WebAudio graph before returning. For example, a 4 hour audio graph at 48 kHz with 4 channels will create gigabytes of in-memory float32 data in the `AudioBuffer`. This behaviour makes the API is unsuitable for very long offline renders or very large channel/length combinations. There is no simple way to chunk the output or consume it as a stream. |
| 17 | + |
| 18 | +The [spec](https://webaudio.github.io/web-audio-api/#dom-offlineaudiocontext-startrendering) explicitly says at step 5: "Create a new AudioBuffer ... with ... length and sampleRate ... Assign this buffer to an internal slot" which means the API design currently mandates the full buffer allocation. |
| 19 | + |
| 20 | +The participants on the [GitHub discussion](https://github.com/WebAudio/web-audio-api/issues/2445) agree that incremental delivery of data is necessary. Either streaming chunks of rendered audio or dispatching data in bits rather than everything at once so that memory usage is bounded and the data can be processed/consumed as it is produced. |
| 21 | + |
| 22 | +## User-Facing Problem |
| 23 | + |
| 24 | +The user in this context is the web developer using the WebAudio API. Their goal is to perform media processing using the feature-rich WebAudio API without taking a dependency on a 3rd party library to render the graph in an offline context. Because the current WebAudio OfflineAudioContext API is not suitable for this use case, the developer needs to create a WASM audio processing library or use an existing 3rd party dependency to perform the workload. |
| 25 | + |
| 26 | +### Goals |
| 27 | + |
| 28 | +- Allow streaming data out of an WebAudio `OfflineAudioContext.startRendering()` for rendering large WebAudio graphs faster-than-realtime |
| 29 | + |
| 30 | +### Non-goals |
| 31 | + |
| 32 | +- Change the existing `startRendering()` behavior, this API change is additive |
| 33 | + |
| 34 | +## Proposed Approach |
| 35 | + |
| 36 | +The preferred approach is to add an `outputMode` to `startRendering()` to allow consumers to define the behavior of the offline rendering. |
| 37 | + |
| 38 | +```typescript |
| 39 | +interface OfflineAudioContext { |
| 40 | + startRendering( |
| 41 | + outputMode?: "audiobuffer" | "stream" | "none" = "audiobuffer" |
| 42 | + ): Promise<AudioBuffer | ReadableStream | void>; |
| 43 | +} |
| 44 | +``` |
| 45 | + |
| 46 | +In "stream" mode the implementation will not allocate a giant `AudioBuffer` upfront. Instead it will render in quantums (e.g., 128 frames at a time), and enqueue chunks onto a return `ReadableStream`. [ReadableStream](https://developer.mozilla.org/en-US/docs/Web/API/ReadableStream/getReader) will return a reader. `reader.read()` will resolve stream values until it is done and when no more data is available it will set `done = true`. In this mode, the user can read chunks as they arrive and consume them for storage, transcoding via WebCodecs, sending to a server, etc. |
| 47 | + |
| 48 | +Memory usage is bounded by the size of each chunk plus the backlog of unhandled buffers. |
| 49 | + |
| 50 | +### Pros |
| 51 | + |
| 52 | +- Aligns well with other web streaming APIs, similar to [WebCodecs](https://streams.spec.whatwg.org/#readablestream) |
| 53 | +- Works with very large durations, no upper limit to WebAudio graph duration |
| 54 | +- Flexible usage scenarios for the consumers |
| 55 | + |
| 56 | +### Cons |
| 57 | + |
| 58 | +- Backwards-incompatible as existing code expects an `AudioBuffer` result |
| 59 | +- Requires spec change |
| 60 | +- Need to define sensible chunk sizes, backpressure, error handling, and end-of-stream |
| 61 | + |
| 62 | +### Implement OfflineAudioContext.startRendering() streaming behaviour with this approach |
| 63 | + |
| 64 | +```js |
| 65 | +/* |
| 66 | + * New API |
| 67 | + */ |
| 68 | + |
| 69 | +const offlineContext = new OfflineAudioContext(...); |
| 70 | +// build up WebAudio graph |
| 71 | +const readable: Promise<ReadableStream> = await offlineContext.startRendering("stream"); |
| 72 | +const reader: ReadableStreamDefaultReader = readable.getReader(); |
| 73 | +while (true) { |
| 74 | + const result = await reader.read(); |
| 75 | + if (result.done) break; |
| 76 | + const buffers = result.value; |
| 77 | +} |
| 78 | + |
| 79 | +/* |
| 80 | + * Existing API unchanged |
| 81 | + */ |
| 82 | +const offlineContext = new OfflineAudioContext(...); |
| 83 | +// build up WebAudio graph |
| 84 | +const renderedBuffer: Promise<AudioBuffer> = await offlineContext.startRendering(); |
| 85 | +``` |
| 86 | + |
| 87 | +## Alternatives considered |
| 88 | + |
| 89 | +[This should include as many alternatives as you can, |
| 90 | +from high level architectural decisions down to alternative naming choices.] |
| 91 | + |
| 92 | +### [Alternative 1] |
| 93 | + |
| 94 | +[Describe an alternative which was considered, |
| 95 | +and why you decided against it. |
| 96 | +This alternative may have been part of a prior proposal in the same area, |
| 97 | +or it may be new. |
| 98 | +If you did any research in making this decision, discuss it here.] |
| 99 | + |
| 100 | +### [Alternative 2] |
| 101 | + |
| 102 | +[You may not have decided about some alternatives. |
| 103 | +Describe them as open questions here, and adjust the description once you make a decision.] |
| 104 | + |
| 105 | +### [Alternative 3] |
| 106 | + |
| 107 | +[etc.] |
| 108 | + |
| 109 | +## Accessibility, Internationalization, Privacy, and Security Considerations |
| 110 | + |
| 111 | +[Highlight any accessibility, internationalization, privacy, and security implications |
| 112 | +that have been taken into account during the design process.] |
| 113 | + |
| 114 | +## Stakeholder Feedback / Opposition |
| 115 | + |
| 116 | +[Implementors and other stakeholders may already have publicly stated positions on this work. If you can, list them here with links to evidence as appropriate.] |
| 117 | + |
| 118 | +- [Implementor A] : Positive |
| 119 | +- [Stakeholder B] : No signals |
| 120 | +- [Implementor C] : Negative |
| 121 | + |
| 122 | +[If appropriate, explain the reasons given by other implementors for their concerns.] |
| 123 | + |
| 124 | +## References & acknowledgements |
| 125 | + |
| 126 | +[Your design will change and be informed by many people; acknowledge them in an ongoing way! It helps build community and, as we only get by through the contributions of many, is only fair.] |
| 127 | + |
| 128 | +[Unless you have a specific reason not to, these should be in alphabetical order.] |
| 129 | + |
| 130 | +Many thanks for valuable feedback and advice from: |
| 131 | + |
| 132 | +- [Person 1] |
| 133 | +- [Person 2] |
| 134 | +- [etc.] |
| 135 | + |
| 136 | +Thanks to the following proposals, projects, libraries, frameworks, and languages |
| 137 | +for their work on similar problems that influenced this proposal. |
| 138 | + |
| 139 | +- [Framework 1] |
| 140 | +- [Project 2] |
| 141 | +- [Proposal 3] |
| 142 | +- [etc.] |
0 commit comments