Sound demos for "ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram"

Xiao-Hang Jiang, Hui-Peng Du, Yang Ai, Ye-Xin Lu, Zhen-Hua Ling

National Engineering Research Center of Speech and Language Information Processing, University of Science and Technology of China, Hefei, P. R. China

 

Analysis-synthesis results

(The sampling rate is 16 kHz)

Example 1 (LJ001-0012)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 2 (LJ009-0040)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 3 (LJ016-0238)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 4 (LJ020-0062)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 5 (LJ022-0069)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 6 (LJ029-0066)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 7 (LJ033-0037)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 8 (LJ038-0084)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 9 (LJ041-0114)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos
Example 10 (LJ049-0154)
Raw Audio      
   
ESTVocoder HiFi-GAN SiFi-GAN Vocos

Text-to-speech results

(The sampling rate is 16 kHz)

Example 1
Raw Audio    
   
ESTVocoder HiFi-GAN Vocos
Example 2
Raw Audio    
   
ESTVocoder HiFi-GAN Vocos
Example 3
Raw Audio    
   
ESTVocoder HiFi-GAN Vocos
Example 4
Raw Audio    
   
ESTVocoder HiFi-GAN Vocos
Example 5
Raw Audio    
   
ESTVocoder HiFi-GAN Vocos