王东副研究员

  • Email : wangdong99@mails.tsinghua.edu.cn
  • 地址:北京市海淀区清华大学FIT楼1-304
教育背景

2006年10-2010年1月 爱丁堡大学 博士

1999年9月-2002年6月 清华大学 硕士

1995年10月-1999年6月 清华大学 本科

工作履历

2017年12月-至今 清华大学副研究员

2012年1月-2017年11月 清华大学助理研究员/信研院语音和语言技术中心副主任

2011年7月-2011年12月 美国Nuance公司高级研究科学家

2010年4月-2011年7月 法国EURECOM博士后研究员

2006年10月-2010年1月 英国爱丁堡大学玛丽-居里研究员

2004年7月-2006年9月 IBM中国高级软件工程师

2002年10月-2004年7月 Oracle中国软件工程师

学术兼职

2025年1月-至今 清华大学计算机系人工智能通识教育研究中心 副主任

研究领域

(1)语音信息处理(语音识别、说话人识别)

(2)多模态语音信息处理(唇语识别、视觉语音生成)

研究概况

如何从语音信号中抽取出关键信息,是人工智能的重要研究领域。研究方向包括:(1)语音信息的建模、分解与重构。(2)混叠语音信息处理。(3)音视频多模态语音处理。相关研究成果在华为车机产品中上线应用;与合作公司研发了校园语音报警设备和老人居家语音报警设备,守护儿童和老人健康与生命。

奖励与荣誉

(1)北京市科技进步奖二等奖(2021年)

(2)中国产学研合作创新与促进奖优秀奖(2022年)

(3)Speech Communication 五年(2019-2023)最佳论文奖(2023)

学术成果

(1)著作:

[1] 王东,王依然,杜文强,《人工智能通识》(小学版),清华大学出版社,2025,ISBN:9787302688570

[2] 王东,蔡云麒,谭洪政,《人工智能通识》(初中版),清华大学出版社,2025,ISBN:9787302688464

[3] 王东,李蓝天,《人工智能通识》(高中版),清华大出版社,2025,ISBN:9787302688457

[4] 王东,马少平,《人工智能通识》,清华大学出版社,2025,ISBN: 9787302680147
[5] 王东,马少平,《图解人工智能》,清华大学出版社,2023,ISBN:9787302637127

[6] 王东,《机器学习导论》,清华大学出版社,2021, ISBN:787302546054

[7]  汤志远,李蓝天,王东,石颖,蔡云麒,郑方,《语音识别基本法》,电子工业出版社,2021, ISBN, 9787121404788

[8] 王东,利节,许莎,《人工智能》,清华大学出版社,2019,ISBN: 9787302531876

(2)期刊论文摘选

[1] Cai Y, Li J*, Wang D*. Fast and generalizable micromagnetic simulation with deep neural nets. Nature Machine Intelligence, 2024: 1-14. (Nature子刊)

[2] Ying Shi,Lantian Li,Dong Wang,Jiqing Han, Keyword Guided Target Speech Recognition, IEEE Signal Processing Letters,

[3] Wan Lin, Lantian Li*, Dong Wang*, A Simple Unsupervised Knowledge-Free Domain Adaptation for Speaker Recognition, Applied Science, 14(3), 2024

[4] Li L, Wang D, Abel A, et al. On evaluation trials in speaker verification. Applied Intelligence, 2023: 1-18.

[5] Cai Y, Li L, Abel A, Xiaoyan Zhu, Dong Wang*, Maximum Gaussianality training for deep speaker vector normalization. Pattern Recognition, 2024, 145: 109977.

[6] Haoran Sun, Dong Wang, Lantian Li, Chen Chen, Thomas Fang Zheng, Random Cycle Loss and Its Application to Voice Conversion,IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol.45, no.8, Augst, 2023.

[7] Du, W.; Maimaitiyiming, Y.; Nijat, M.; Li, L.; Hamdulla, A.; Wang, D. Automatic Speech Recognition for Uyghur, Kazakh, and Kyrgyz: An Overview. Appl. Sci. 2023, 13, 326.

[8] Lantian Li, Ruiqi Liu, Jiawen Kang, Yue Fa, Hao Cui, Yunqi Cai, Ravichander Vipperla, Thomas Fang Zheng and Dong Wang. "CN-Celeb: multi-genre speaker recognition", Speech Communication, 2022. (5-year Best Paper)

[9] Li, L., Wang, D., Kang, J., Wang, R., Wu, J., Gao, Z., & Chen, X. , "A Principle Solution for Enroll-Test Mismatch in Speaker Recognition", IEEE Transactions on Audio, Speech and Language Processing, 2022.

[10] Yunqi Cai, Lantian Li, Andrew Abel, Xiaoyan Zhu, Dong Wang, "Deep Normalization for Speaker Vectors", IEEE Transactions on Audio, Speech and Language Processing, 2020.

[11] Dong Wang, "A Simulation Study on Optimal Scores for Speaker Recognition", EURASIP Journal on Audio, Speech, and Music Processing, 2020.

[12] Yang Wang, Dong Wang, "Market Symmetry and Its Application to Pattern-Matching-Based Portfolio Selection", The Journal of Financial Data Science, Spring 2019, 1 (2) 78-93.

[13] Zhiyuan Tang, Dong Wang, Yixiang Chen, Lantian Li, and Andrew Abel, "Phonetic Temporal Neural Model for Language Identification", IEEE Trans. on Speech and language Processing, vol.26, no.1, 2018, pp.134-144.

[14] Zhiyuan Thang, Lantian Li, Dong Wang, Ravi Vipperla "Collaborative Joint Training with Multi-task Recurrent Model for Speech and Speaker Recognition", IEEE Trans. on Audio, Speech and Language Processing, vol. 25, no.3, March 2017.

[15] Xi Ma, Dong Wang, Javier Tejedor "Similar Word Model for Unfrequent Word Enhancement in Speech Recognition," TASLP, vol 24, no. 10, 2016.

[16] Shi Yin, Chao Liu, Zhiyong Zhang, Yiye Lin, Dong Wang, Javier Tejedor, Thomas Fang Zheng, Yinguo Li, "Noisy Training for Deep Neural Networks in Speech Recognition", EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015(2)

[17] Dong Wang, Ravichander Vipperla, Nick Evans, Thomas Fang Zheng, "Online Non-negative Convolutive Pattern Learning for Speech Signals",IEEE trans. on signal processing. vol.61, no.1, pp.44-56.

[18] Nick Evans, Simon Bozonnet, Dong Wang, Corinne Fredouille and Raphael Troncy, "A Comparative Study of Bottom-up and Top-down Approaches to Speaker Diarization ", IEEE Trans. on Audio, Speech and Language Processing,vol.20, no.2, 2012.

[19] Dong Wang, Simon King, Joe Frankel, "Stochastic Pronunciation Modelling for Out-of-Vocabulary Term Detection", IEEE transaction on Acoustic, Speech and Language Processing, vol.18, no.8, November 2010.

[20] Dong Wang, Javier Tejedor, Simon King, Joe Frankel "Term-dependent Confidence Normalization for Out-of-Vocabulary Spoken Term Detection", Journal of Computer Science and Technology (JCST), vol.27, no.2, 2012.

[21] Javier Tejedor, Alejandro Echeverria Dong Wang, Ravichander Vipperla, "Evolutionary discriminative confidence estimation for spoken term detection ", Multimedia tools and applications, no.4, 2011.

[22] Dong Wang, Simon King, "Letter-to-Sound Pronunciation Prediction using Conditional Random Field", IEEE Signal Processing Letters, vol 18, no.2, February 2011 , pp 122-125

[23] Javier Tejedor, Dong Wang, Joe Frankel, Simon King, Jose Colas, A comparison of grapheme and phoneme-based units for spoken term detection in Spanish", Speech Communication, 2008, 50(11-12):980-991, November-December

(3)近期会议论文

[1] Zhiyuan Tang, Dong Wang, Shen Huang, Shidong Shang, Pinyin Regularization in Error Correction for Chinese Speech Recognition with Large Language Models, Interspeech 2024

[2] Mewlude Nijat, Chen Chen, Dong Wang, Askar Hamdulla, UY/CH-CHILD -- A Public Chinese L2 Speech Database of Uyghur Children, Interspeech 2024

[3] Junming Yuan,Ying Shi,Lantian Li,Dong Wang,Askar Hamdulla, Few-Shot Keyword Spotting from Mixed Speech, Interspeech 2024

[4] Ying Shi,Lantian Li,Shi Yin,Dong Wang,Jiqing Han, Serialized Output Training by Learned Dominance, Interspeech 2024

[5] Chen Chen,Zehua Liu,Xiaolou Li,Lantian Li,Dong Wang, CNVSRC 2023: The First Chinese Continuous Visual Speech Recognition Challenge, Interspeech 2024

[6] Xiaolou Li,Zehua Liu,Chen Chen,Lantian Li,Li Guo,Dong Wang, Zero-Shot Fake Video Detection by Audio-Visual Consistency, Interspeech 2024

[7] Tianhao Wang,Lantian Li,Dong Wang, SE/BN Adapter: Parametric Efficient Domain Adaptation for Speaker Recognition, Interspeech 2024

[8] Zhenyu Zhou,Shibiao Xu,Shi Yin,Lantian Li,Dong Wang, A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition, Interspeech 2024

[9] Pengqi Li,Tianhao Wang,Lantian Li,Askar Hamdulla,Dong Wang, How Phonemes Contribute to Deep Speaker Models? ICASSP 2024

[10] Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang, AN INVESTIGATION OF DISTRIBUTION ALIGNMENT IN MULTI-GENRE SPEAKER RECOGNITION, ICASSP 2024

[11] Zhenyu Zhou, Junhui Chen, Namin Wang, Lantian Li, Dong Wang, Adversarial Data Augmentation for Robust Speaker Verification, International Conference on Communication and Information Processing (ICCIP) 2023 (Best Paper)

[12] Chen Chen, Dong Wang*, Lantian Li, CN-CVS: A Mandarin Audio-Visual Dataset for Large Vocabulary Continuous Visual to Speech Synthesis, ICASSP 2023.

[13] Shi, Y., Wang, D.*, Li, L., Han, J., Yin, S. (2023) Spot Keywords From Very Noisy and Mixed Speech. Proc. INTERSPEECH 2023, 1488-1492

[14] Li, L., Li, X., Jiang, H., Chen, C., Hou, R., Wang, D.* (2023) CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition. Proc. INTERSPEECH 2023,2118-2122

[15] Li, P., Li, L., Hamdulla, A., Wang, D.* (2023) Visualizing Data Augmentation in Deep Speaker Recognition. Proc. INTERSPEECH 2023, 2243-2247

[16] Wang, J., Wang, X., Wang, N., Li, L., Wang, D.* (2023) Ordered and Binary Speaker Embedding. Proc. INTERSPEECH 2023, 4683-4687

[17] Wei, X., Chen, J., Zheng, Z., Guo, L., Li, L., Wang, D. (2023) A Multi-Scale Attentive Transformer for Multi-Instrument Symbolic Music Generation. Proc. INTERSPEECH 2023, 5391-5395

[18] Haoran Sun,Chen Chen,Lantian Li,Dong Wang*, CycleFlow: Purify Information Factors by Cycle Loss Odyssey 2022 (Best Student Paper),

[19] Pengqi Li,Lantian Li,Askar Hamdulla,Dong Wang*, Reliable Visualization for Deep Speaker Recognition, INTERSPEECH 2022

[20] Lantian Li, Ruiqian Nai, Dong Wang*, REAL ADDITIVE MARGIN SOFTMAX FOR SPEAKER VERIFICATION, ICASSP 2022.

人才培养

硕士:刘超、汪洋、孙浩然、陈琛

联合培养博士:李蓝天、汤志远

联合培养博士后:蔡云麒