ZHENG FangResearch Professor

  • Email : fzheng@tsinghua.edu.cn
  • Phone : 010-62796393
  • Address : Room 3-411 FIT Building, Tsinghua University, Beijing, 100084, China
Education Background

Sep. 1994- May 1997 Ph.D. Computer Application, Dept. of Computer Science and Technology, Tsinghua University,

Sep. 1990- Jun. 1992 M.E. Computer Application, Dept. of Computer Science and Technology, Tsinghua University,

Sep. 1985- Jun. 1990 B.A. Computer Science and Technology, Dept. of Computer Science and Technology, Tsinghua University,


Work Experience

Nov. 2019-present Beijing National Research Center for Information Science and Technology, Intelligence Science Division Executive Director (Dec. 2019-Dec. 2022), Team for Speech and Language Technologies, Director (Nov. 2019-now)

Dec. 2004-Oct. 2019 Center for Speech and Language Technologies, Research Institute of Information Technology, Director

Apr. 2004-Oct. 2019 Research Institute of Information Technology, Tsinghua University, Vice Dean

Apr. 2002-now Beijing d-Ear Technologies, Co. Ltd., Founder, Chairman

Sep. 2001-Mar. 2002 Weniwen Technologies Limited (Hong Kong), VP

Sep. 1994- Jun. 2001 Dept. of Computer Science and Technology, Faculty


Academic Affiliations

Mar.2024-present  Director of the FinTech Committee of China Electronics Industry Federation

Jan.2016-DEec.2019  General Co-Chair of APSIPA ASC General Co-Chair of APSIPA ASC

Jan.2013-Dec.2014  General Co-Chair of APSIPA ASC 2011, APSIPA Distinguished Lecture

Jul.2012-present  Director of the Speech Committee of the Chinese Information Processing Society

Oct.2011-present  Associate editor of APSIPA Transactions on SIP

Jul.2007-present Deputy Director of Security Standardization Technical Committee for Biometric Identification Application
Oct.2005-present Editorial board member of Speech Communication


Professional Affiliations

Beijing d-Ear Technologies, Co. Ltd., Founder, Chairman


Research Areas

(1)Speech recognition

(2)Natural language processing

(3)Speaker recognition


Research Overview

Automatic speech recognition:
(1)An improved feature extraction method FBE-MFCC (formant based frequency band energy) is proposed. The frequency band energy is considered when analyzing, which improves the distinguishability of the extracted features and the noise robustness.
(2)The concept of extended Chinese phonetic alphabet, the method of acoustic fine modeling, the method of context-related weighting and so on are proposed. Thus, a solution to the problem of casualunciation and accent in speech recognition at the acoustic level is given.
(3)A WST (word search tree) structure is proposed to describe the internal relationship of words. The decoding problem in continuous speech recognition is solved from the structural aspect, and thus a solution to the accent problem in Chinese speech recognition at the linguistic level is given.
(4)The of Chinese syllable mapping and the model of acoustic corrector are proposed. With a small amount of dialect background database, it is convenient to obtain a Mandarin recognizer with dialect directly from the Mandarin recognizer, which provides a convenience for the training of acoustic models for low-resource languages. Automatic language understanding:
(5)An oral dialogue framework method is proposed, including a robust semantic analyzer, a dialogue manager based on the topic number forest structure, a text generator and so on are configurable modules, which make feasible and efficient to implement the customization of the oral dialogue system. Speaker recognition:
(6)A speaker model synthesis algorithm based on Cohort is proposed to solve the-channel problem.
(7)A database for researching the time-varying characteristics of voiceprints was constructed, where the same person and voice are used with different time (with a time span of more than 5 years and an interval of 1 week). Based on this, a feature extraction method to determine the frequency resolution of frequency bands according to the ratio of the speaker discrimination to the time discrimination is investigated and proposed, which solves the problem of time-varying voiceprints.
(8) The dualspectrum analysis (in the signal domain), feature selection based on the F-ratio (in the feature domain), and multi-model fusion (in the model domain) are to solve the problem of recording replay attack detection in voiceprint recognition comprehensively.
(9)A method for voiceprint recognition based on phoneme classes for ultra-short speech is, which can reduce the user’s pronunciation length from 20 seconds to 1 to 2 seconds without changing the performance of voiceprint recognition, providing user experience.
For trusted identity authentication:
(10)The concept that trusted identity authentication based on biometrics must meet at least three technical requirements, namely, accurate biometric recognition, antiprosthetic attack capability, and user real intention detection capability, is proposed.
(11)Taking advantage of the characteristics of speech signals, a method to prevent prosthetic attacks the process of voiceprint-based identity authentication is proposed and implemented, including: phonetic recognition while recognizing the password text randomly generated by the system, the user’s pron of the password text can be self-defined, and detecting whether the speech has been recorded and replayed.
(12) A method for real intention detection that comprehensively utilizes recognition, emotion recognition, and semantic understanding is proposed and implemented.


Awards and Honors

(1) Beijing City Patriotic and Meritorious Personnel (1997)
(2) National Huo Yingdong Education Outstanding Young Teacher Award (1999)
(3) First Prize of Beijing Higher Education Teaching Achievement (2000)
(4) Second Prize of Beijing Science and Technology Progress (2001, ranked first)
(5) China Industry-Education Cooperation Innovation Award (2009, ranked first)
(6) Champion of ASV Spoof 219 Anti-Recording Attack Challenge Task (2019)
(7) China AI Golden Goose Award (2020)
(8) First Prize of Chinese Institute of Electronics Technology Invention (2021, ranked first)
(9) Second Prize of China Industry-Education Cooperation Innovation Promotion Award (2022, ranked first)
(10) Special Prize of Capital Financial Innovation Incentive Project (2022, ranked first)
(11) Second Prize of Beijing City Science and Technology Award (2023, ranked first)


Academic Achievements

(1)Responsible for or participated as a key person in the research and development of more than 30 national key projects and international cooperation projects, and won than 10 awards from the Ministry of Education, the Ministry of Science and Technology, and Beijing.
(2)Published more than 310 academic papers in domestic and well-known journals and academic conferences, including 13 papers (3 papers as the first author) that won the excellent paper award; published 14 monographs.
Representative are as follows:
[1]Tongxu Li, Hui Zhang, Thomas Fang Zheng, “The Voiceprint Recognition Technology and Its Applications in Unsupervised Identity Authentication,” 8(9): 46-54, 2018, Chinese Association for Artificial Intelligence Transactions (in Chinese)

[2]Lantian Li, Dong Wang, Chenhao Zhang, and Thomas Fang Zheng, "Improving short utterance speaker recognition by modeling speech unit classes," IEEE/ACM Trans. on Audio, Speech, and Language Processing, pp. 1129-1139, vol. 24, no. 6, June 2016

[3]Linlin Wang, Jun Wang, Lantian Li, Thomas Fang Zheng, Frank K. Soong, “Improving speaker verification performance against long-term speaker variability,” Speech Communication, 79 (2016), 14-29, Mar. 2016

[4]Miao Fan, Qiang Zhou, Thomas Fang Zheng, Ralph Grishman. “Distributed Representation Learning for Knowledge Bases with Entity Descriptions,” Pattern Recognition Letters, DOI: 10.1016/j.patrec.2016.09.005, Elsevier.

[5]Miao Fan, Qiang Zhou, Andrew Abel, Thomas Fang Zheng, Ralph Grishman, “Probabilistic Belief Embedding for Large-Scale Knowledge Population,” Cognitive Computation, December 2016, Volume 8, Issue 6, pp. 1087-1102

[6]Meng Sun, Xiongwei Zhang, Hugo Van hamme, and Thomas Fang Zheng, "Unseen noise estimation using separable deep auto encoder for speech enhancement," IEEE/ACM Transactions on Audio, Speech, and Language Processing, pp. 93-104, Vol. 24, No. 1, Jan. 2016 (DOI 10.1109/TASLP.2015.2498101)

[7]Guoyu Tang, Yunqing Xia, Erik Cambria, Peng Jin, Thomas Fang Zheng, “Document representation with statistical word senses in cross-lingual document clustering,” Vol. 29, No. 2 (2015), International Journal of Pattern Recognition and Artificial Intelligence, World Scientific Publishing Company

[8]Shi Yin, Chao Liu, Zhiyong Zhang, Yiye Lin, Dong Wang, Javier Tejedor, Thomas Fang Zheng and Yingguo Li, “Noisy Training for Deep Neural Networks in Speech Recognition,” EURASIP Journal on Audio, Speech, and Music Processing, 2015, 2015:2

[9]Dong Wang, Ravichander Vipperla, Nicholas Evans, Thomas Fang Zheng, “Online Non-Negative Convolutive Pattern Learning for Speech Signals,” IEEE Trans. on Signal Processing, 61(1): 44-56, Jan. 1, 2013

[10]Mijit Ablimit, Sardar Parhat, Askar Hamdulla, Thomas Fang Zheng, “Multilingual Stemming and Term Extraction for Uyghur, Kazak and Kirghiz,” the 10th APSIPA Annual Summit and Conference (APSIPA ASC 2018), November 12-15, 2018, 587-590, Hawaii, USA

[11]Thomas Fang Zheng, “Speech Signal for Unsupervised Identity Authentication,” APSIPA 10th Anniversary Magazine, pp. 26-28, Nov. 2018, Hawaii, USA

[12]Lantian Li, Zhiyuan Tang, Dong Wang, Thomas Fang Zheng, “Full-Info Training for Deep Speaker Feature Learning,” International Conference on Acoustics, Speech and Signal Processing (ICASSP’18), pp. 5369-5373, Apr. 15-20, 2018, Calgary, Alberta, Canada

[13]Lantian Li, Dong Wang, Yixiang Chen, Ying Shi, Zhiyuan Tang, Thomas Fang Zheng, “Deep Factorization for Speech Signal,” International Conference on Acoustics, Speech and Signal Processing (ICASSP’18), pp. 5094-5098, Apr. 15-20, 2018, Calgary, Alberta, Canada

[14]Xingliang Cheng, Xiaotong Zhang, Mingxing Xu, and Thomas Fang Zheng, “MMANN: Multimodal Multilevel Attention Neural Network for Horror Clip Detection,” the 10th APSIPA Annual Summit and Conference (APSIPA ASC 2018), November 12-15, 2018, 329-334, Hawaii, USA

[15]Xiaotong Zhang, Xingliang Cheng, Mingxing Xu, Thomas Fang Zheng, “Imbalance Learning-based Framework for Fear Recognition in the MediaEval Emotional Impact of Movies Task,” pp.3678-3682, Interspeech 2018, 2-6 Sepember 2018, Hyderabad, India, DOI: 10.21437/Interspeech.2018-1744
[16] Replay Detection using CQT-based Modified Group Delay Feature and ResNeWt Network in ASVspoof 2019
[17] XIAOLONG WU, CHANG FENG, MINGXING XU, THOMAS FANG ZHENG, ASKAR HAMDULLA,“DialoguePCN: Perception and Cognition Network for Emotion Recognition in Conversations”,IEEE Access, VOLUME 11, pp. 141251-141260, 2023, DOI 10.1109/ACCESS.2023.3342456
 Book:《Robustness-Related Issues in Speaker Recognition》

(3)Possess 16 invention patents (including one international invention patent) and one utility model patent.
The representative patents obtained in recent years as follows:
[1] A training method and system for a language model based on a distributed neural network, 201410067916, 2014.02.27, China
[2] A method and system for voice password authentication, 201710052098, 2017.01.22, China [3] A system and method for voice identity confirmation based on dynamic password voice, Z 201310123555.0, 2013.10.12, China
[4] A voice access system based on dynamic digital verification code, ZL 201620119381.X, 2016, China
[5] A method and device for automatic reconstruction of voiceprint models, ZL 201510061721.8, 2015.1.06, China
[6] A method for dual authentication of fingerprints and voiceprints, ZL 2015100479665, 2015.10.04, China
[7] A feature extraction method and device for voice replay detection, ZL20180191512.9, China
(4)The "Unsupervised Identity Authentication System Based on Dynamic Password Voice" has passed the scientific and technological achievementisal of the Chinese Electronics Society, and the appraisal conclusion is "overall at the international leading level".


Talent Development

Since 1998, I have been guiding students, and so far I have trained a total of 71 master's and doctoral students.


Team Members

Mingxing Xu, Dong Wang, Li Zhao, Qiang Zhou, Xiaojun Wu, Chao Zhang