Sound demos for "MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios"

Xiao-Hang Jiang, Yang Ai, Rui-Chen Zheng, Hui-Peng Du, Ye-Xin Lu, Zhen-Hua Ling

National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, P. R. China

 

Section 4.3 Comparison with Baseline Codecs

(The sampling rate is 48 kHz)

Example 1 (p360, male speaker, at 6 kbps)
Raw Audio            
         
SoundStream Encodec HiFi-Codec AudioDec DAC APCodec MDCTCodec
Example 1 (p360, male speaker, at 9 kbps)
  Raw Audio            
           
  SoundStream Encodec   AudioDec DAC APCodec MDCTCodec
   
Example 1 (p360, male speaker, at 12 kbps)
  Raw Audio            
           
  SoundStream Encodec HiFi-Codec AudioDec DAC APCodec MDCTCodec
 
Example 2 (p361, female speaker, at 6 kbps)
Raw Audio              
           
SoundStream Encodec HiFi-Codec AudioDec DAC APCodec MDCTCodec
Example 2 (p361, female speaker, at 9 kbps)
  Raw Audio            
           
  SoundStream Encodec   AudioDec DAC APCodec MDCTCodec
   
Example 2 (p361, female speaker, at 12 kbps)
  Raw Audio            
           
  SoundStream Encodec HiFi-Codec AudioDec DAC APCodec MDCTCodec
 
Example 3 (p364, male speaker, at 6 kbps)
Raw Audio              
           
SoundStream Encodec HiFi-Codec AudioDec DAC APCodec MDCTCodec
Example 3 (p364, male speaker, at 9 kbps)
  Raw Audio            
           
  SoundStream Encodec   AudioDec DAC APCodec MDCTCodec
   
Example 3 (p364, male speaker, at 12 kbps)
  Raw Audio            
           
  SoundStream Encodec HiFi-Codec AudioDec DAC APCodec MDCTCodec
 
Example 4 (s5, female speaker, at 6 kbps)
Raw Audio              
           
SoundStream Encodec HiFi-Codec AudioDec DAC APCodec MDCTCodec
Example 4 (s5, female speaker, at 9 kbps)
  Raw Audio            
           
  SoundStream Encodec   AudioDec DAC APCodec MDCTCodec
   
Example 4 (s5, female speaker, at 12 kbps)
  Raw Audio            
           
  SoundStream Encodec HiFi-Codec AudioDec DAC APCodec MDCTCodec
 

Section 4.4 Ablation Studies

(The sampling rate is 48 kHz)

Example 1 (p361, female speaker, at 6 kbps)
MDCTCodec        
     
w/o Melloss w/o Qloss w/o MDCTloss rep. MRD rep. MPD
Example 2 (p363, male speaker, at 6 kbps)
MDCTCodec        
     
w/o Melloss w/o Qloss w/o MDCTloss rep. MRD rep. MPD
Example 3 (p374, male speaker, at 6 kbps)
MDCTCodec        
     
w/o Melloss w/o Qloss w/o MDCTloss rep. MRD rep. MPD
Example 4 (s5, female speaker, at 6 kbps)
MDCTCodec        
     
w/o Melloss w/o Qloss w/o MDCTloss rep. MRD rep. MPD

Section 4.5 Validation of Generalization

(Results on VCTK dataset, 48 kHz, 6kbps)

Example 1
Raw Audio    
   
DAC APCodec MDCTCodec
Example 2
Raw Audio    
   
DAC APCodec MDCTCodec

(Results on CommonVoice dataset, 48 kHz, 6kbps)

Example 1
Raw Audio    
 
DAC APCodec MDCTCodec
Example 2
Raw Audio    
 
DAC APCodec MDCTCodec

(Results on Opencpop dataset, 48 kHz, 6kbps)

Example 1
Raw Audio    
 
DAC APCodec MDCTCodec
Example 2
Raw Audio    
 
DAC APCodec MDCTCodec

(Results on FSD50K dataset, 48 kHz, 6kbps)

Example 1
Raw Audio    
 
DAC APCodec MDCTCodec
Example 2
Raw Audio    
   
DAC APCodec MDCTCodec