Markdown Version | Session Recording
Session Date/Time: 24 Jul 2025 15:00
mlcodec
Summary
The Machine Learning for Audio Coding (MLCodec) working group met at IETF 123 to discuss progress on several audio codec extensions and enhancements. The session covered updates to the Opus Extension Mechanism, Deep Redundancy (DRED) extension, speech coding enhancements including bandwidth extension, scalable quality extensions, and comprehensive speech quality testing methodologies. Key technical discussions focused on model optimization, test vector implementation, and evaluation frameworks for ML-based audio coding techniques.
Key Discussion Points
Opus Extension Mechanism
- Version 04 published with changes discussed in Bangkok regarding frame separators with zero increment
- Clarified that frame separators with non-zero increment are not repeated by the repeat extensions mechanism
- Improved clarity and precision following feedback from Mark Harris
Deep Redundancy (DRED) Extension
- New model is 3x smaller than previous version while maintaining better quality
- Encoder reduced to ~200K weights (45KB), decoder to ~290K weights (365KB)
- Improvements from enhanced training schedule, soft quantization, and architectural changes including sparsity in GRU layers and dimensional reduction in convolutional layers
- Test vectors now available in binary format with comprehensive validation tools
- Three types of test vectors: DRED feature decoder, vocoder validation, and integration testing
- Uses entropy coding and provides comparison tools for compliance verification
Speech Coding Enhancements
- Blind bandwidth extension (BWENet) fully integrated into Opus, extending wideband to fullband
- Requires compile-time and runtime activation, operates at decode complexity ≥4 and 48kHz output
- Works with SILK-only wideband or hybrid mode without CELT high-end data
- Proposed evaluation methodology using energy correlation in frequency bands to validate extensions
- Testing on EAS dataset shows bandwidth extension consistently passes requirements
Scalable Quality Extension (Opus HD)
- Version 03 draft supports >8-bit depth, >20kHz bandwidth, and higher bitrates
- Uses entropy-coded refinement symbols for higher quality layer
- Test vectors available at 96kHz with strict decoder conformance and looser encoder conformance
- Achieves 100-115dB THD+N at 480 kbps per channel
Speech Quality Testing Battery
- Comprehensive evaluation framework for generative codecs addressing range equalization biases
- Implements MUSHRA-1S methodology with fixed anchors and references
- Tests both clean speech quality and real-world conditions (noise/reverb)
- Includes intelligibility testing using Diagnostic Rhyme Test for phoneme-level assessment
- Results show DRED performance comparable to Opus at various bitrates with some differences in noisy conditions
Decisions and Action Items
- Opus Extension Mechanism: Ready for working group last call pending final review
- DRED Extension:
- Final model training to be completed within one month
- Jean-Marc to incorporate test vector details in next draft update
- Chairs Action: Determine permanent archival location for normative weights and test vectors for both DRED and speech enhancements
- Chairs Action: Check with ADs regarding Creative Commons Non-Commercial licensing for EAS dataset
- Scalable Quality Extension: Initiate adoption call on mailing list following positive room consensus
Next Steps
- Run working group last call for Opus Extension Mechanism (2-3 weeks)
- Complete final DRED model training and update draft with test vector specifications
- Resolve permanent storage locations for normative reference materials
- Process adoption call for Scalable Quality Extension
- Continue development of speech enhancement quantization and PLC masking
- Gather working group input on packet loss testing scenarios for comprehensive evaluation framework
- Consider integration of evaluation methodologies into respective specification documents