An Ultra-low-bitrate Neural Speech Codec with Plain-to-Pseudo Synergistic Vector Quantization
1NERC-SLIP, University of Science and Technology of China, Hefei, China;
2iFLYTEK Research, China;
3Department of Electronic Engineering, Tsinghua University, Beijing, China
Most neural speech codecs adopt a residual vector quantizer (RVQ), where successive VQs contribute with decreasing importance. However, assigning the same bitrate to all VQs wastes bits and results in a relatively high overall bitrate. To address this issue, we propose an ultra-low-bitrate neural speech codec, termed P2PSynCodec, which incorporates a plain-to-pseudo synergistic vector quantizer (P2PSVQ). The P2PSVQ extends RVQ by embodying the key principle of allocating zero bitrate to the less important VQs at the rear stages. Specifically, the P2PSVQ is a cascaded structure composed of a plain VQ and multiple pseudo VQs that work in a synergistic manner. The plain VQ serves as the foundation, producing basic tokens through quantization, while the pseudo VQs generate auxiliary tokens through prediction (not used in bitrate calculation). Experiments show P2PSynCodec maintains speech quality comparable to competing codecs at 2.0 kbps, despite operating at merely 0.5 kbps.
Paper Figures
3.3 Comparison with Baseline Codecs
Sampling rate: 16 kHz
Setting: Comparisons at equal ultra-low bitrate (0.5 kbps)
Example 1
2961_961_000004_000002.wav
| MDCTCodec @ 0.5 kbps | DAC @ 0.5 kbps | BigCodec @ 0.5 kbps | Wavtokenizer @ 0.5 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Example 2
4970_29093_000037_000000.wav
| MDCTCodec @ 0.5 kbps | DAC @ 0.5 kbps | BigCodec @ 0.5 kbps | Wavtokenizer @ 0.5 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Example 3
260_123288_000033_000000.wav
| MDCTCodec @ 0.5 kbps | DAC @ 0.5 kbps | BigCodec @ 0.5 kbps | Wavtokenizer @ 0.5 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Example 4
6829_68769_000071_000001.wav
| MDCTCodec @ 0.5 kbps | DAC @ 0.5 kbps | BigCodec @ 0.5 kbps | Wavtokenizer @ 0.5 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Example 5
8463_294825_000015_000002.wav
| MDCTCodec @ 0.5 kbps | DAC @ 0.5 kbps | BigCodec @ 0.5 kbps | Wavtokenizer @ 0.5 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Comparisons with High-Bitrate Codecs
Baselines operate at higher bitrates; P2PSynCodec remains at 0.5 kbps.
Example 1
2961_961_000004_000002.wav
| MDCTCodec @ 2.0 kbps | DAC @ 2.0 kbps | SQCodec @ 1.5 kbps | Wavtokenizer @ 2.0 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Example 2
4970_29093_000037_000000.wav
| MDCTCodec @ 2.0 kbps | DAC @ 2.0 kbps | SQCodec @ 1.5 kbps | Wavtokenizer @ 2.0 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Example 3
260_123288_000033_000000.wav
| MDCTCodec @ 2.0 kbps | DAC @ 2.0 kbps | SQCodec @ 1.5 kbps | Wavtokenizer @ 2.0 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Example 4
6829_68769_000071_000001.wav
| MDCTCodec @ 2.0 kbps | DAC @ 2.0 kbps | SQCodec @ 1.5 kbps | Wavtokenizer @ 2.0 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|
Example 5
8463_294825_000015_000002.wav
| MDCTCodec @ 2.0 kbps | DAC @ 2.0 kbps | SQCodec @ 1.5 kbps | Wavtokenizer @ 2.0 kbps | P2PSynCodec @ 0.5 kbps |
|---|---|---|---|---|