Session Date/Time: 24 Jul 2025 15:00

mlcodec

Summary

The Machine Learning for Audio Coding (MLCodec) working group met at IETF 123 to discuss progress on several audio codec extensions and enhancements. The session covered updates to the Opus Extension Mechanism, Deep Redundancy (DRED) extension, speech coding enhancements including bandwidth extension, scalable quality extensions, and comprehensive speech quality testing methodologies. Key technical discussions focused on model optimization, test vector implementation, and evaluation frameworks for ML-based audio coding techniques.

Key Discussion Points

Opus Extension Mechanism

Version 04 published with changes discussed in Bangkok regarding frame separators with zero increment
Clarified that frame separators with non-zero increment are not repeated by the repeat extensions mechanism
Improved clarity and precision following feedback from Mark Harris

Deep Redundancy (DRED) Extension

New model is 3x smaller than previous version while maintaining better quality
Encoder reduced to ~200K weights (45KB), decoder to ~290K weights (365KB)
Improvements from enhanced training schedule, soft quantization, and architectural changes including sparsity in GRU layers and dimensional reduction in convolutional layers
Test vectors now available in binary format with comprehensive validation tools
Three types of test vectors: DRED feature decoder, vocoder validation, and integration testing
Uses entropy coding and provides comparison tools for compliance verification

Speech Coding Enhancements

Blind bandwidth extension (BWENet) fully integrated into Opus, extending wideband to fullband
Requires compile-time and runtime activation, operates at decode complexity ≥4 and 48kHz output
Works with SILK-only wideband or hybrid mode without CELT high-end data
Proposed evaluation methodology using energy correlation in frequency bands to validate extensions
Testing on EAS dataset shows bandwidth extension consistently passes requirements

Scalable Quality Extension (Opus HD)

Version 03 draft supports >8-bit depth, >20kHz bandwidth, and higher bitrates
Uses entropy-coded refinement symbols for higher quality layer
Test vectors available at 96kHz with strict decoder conformance and looser encoder conformance
Achieves 100-115dB THD+N at 480 kbps per channel

Speech Quality Testing Battery

Comprehensive evaluation framework for generative codecs addressing range equalization biases
Implements MUSHRA-1S methodology with fixed anchors and references
Tests both clean speech quality and real-world conditions (noise/reverb)
Includes intelligibility testing using Diagnostic Rhyme Test for phoneme-level assessment
Results show DRED performance comparable to Opus at various bitrates with some differences in noisy conditions

Decisions and Action Items

Opus Extension Mechanism: Ready for working group last call pending final review
DRED Extension:
- Final model training to be completed within one month
- Jean-Marc to incorporate test vector details in next draft update
Chairs Action: Determine permanent archival location for normative weights and test vectors for both DRED and speech enhancements
Chairs Action: Check with ADs regarding Creative Commons Non-Commercial licensing for EAS dataset
Scalable Quality Extension: Initiate adoption call on mailing list following positive room consensus

Next Steps

Run working group last call for Opus Extension Mechanism (2-3 weeks)
Complete final DRED model training and update draft with test vector specifications
Resolve permanent storage locations for normative reference materials
Process adoption call for Scalable Quality Extension
Continue development of speech enhancement quantization and PLC masking
Gather working group input on packet loss testing scenarios for comprehensive evaluation framework
Consider integration of evaluation methodologies into respective specification documents