Invited Trend ⁄ Overview Talks

Automatic Speech Recognition: Trials, Tribulations and Triumphs by Sadaoki Furui
On Fusion of Visual Quality Assessment Methods by C.-C. Jay Kuo
The Computational Approach to Lossy Compression and its Applications to Image ⁄ Video Coding by En-hui Yang
Base Station Assignment and Transceiver Design for Heterogeneous Networks by Zhi-Quan Tom Luo
Model-Based Imaging by Charles A. Bouman
Language Processing as Signal Processing by Mari Ostendorf


Automatic Speech Recognition: Trials, Tribulations and Triumphs

Abstract: Although many important scientific advances have taken place in automatic speech recognition (ASR) research, we have also encountered a number of practical limitations which hinder a widespread deployment of applications and services. In most speech recognition tasks, human subjects produce one to two orders of magnitude fewer errors than machines. One of the most significant differences exists in that human subjects are far more flexible and adaptive than machines against various variations of speech, including individuality, speaking style, additive noise, and channel distortions. How to train and adapt statistical models for speech recognition using a limited amount of data is one of the most important research issues.

What we know about human speech processing and the natural variation of speech is very limited. It is important to spend more effort to clarify especially the mechanism underlying speaker-to-speaker variability, and devise a method for simultaneously modeling multiple sources of variations based on statistical analysis using large-scale databases. Future systems need to have an efficient way of representing, storing, and retrieving various knowledge resources.

Data-intensive science is rapidly emerging in scientific and computing research communities. The size of speech databases/corpora used in ASR research and development is typically 100 to 1,000 hours of utterances, which is too small considering the variety of sources of variations. We need to focus on solving various problems before efficiently constructing and utilizing huge speech databases, which will be essential to next-generation ASR systems.

Speaker Bio: Sadaoki Furui received the B.S., M.S., and Ph.D. degrees from the University of Tokyo, Japan in 1968, 1970, and 1978, respectively. After joining the Nippon Telegraph and Telephone Corporation (NTT) Labs in 1970, he has worked on speech analysis, speech recognition, speaker recognition, speech synthesis, speech perception, and multimodal human-computer interaction. From 1978 to 1979, he was a visiting researcher at AT&T Bell Laboratories, Murray Hill, New Jersey. He was a Research Fellow and the Director of Furui Research Laboratory at NTT Labs. He became a Professor at Tokyo Institute of Technology in 1997, and was given the title of Professor Emeritus in 2011. He has authored or coauthored over 900 published papers and books including "Digital Speech Processing, Synthesis and Recognition." He was elected a Fellow of the IEEE (1993), the Acoustical Society of America (ASA) (1996), the Institute of Electronics, Information and Communication Engineers of Japan (IEICE) (2001) and the International Speech Communication Association (ISCA) (2008). He received the Paper Award and the Achievement Award from the IEICE (1975, 88, 93, 2003, 2003, 2008), and the Paper Award from the Acoustical Society of Japan (ASJ) (1985, 87). He received the Senior Award and Society Award from the IEEE SP Society (1989, 2006), the ISCA Medal for Scientific Achievement (2009), and the IEEE James L. Flanagan Speech and Audio Processing Award (2010). He received the NHK (Nippon Hoso Kyokai: Japan Broadcasting Corporation) Broadcast Cultural Award (2012) and Okawa Prize (2013). He also received the Achievement Award from the Minister of Science and Technology and the Minister of Education, Japan (1989, 2006), and the Purple Ribbon Medal from Japanese Emperor (2006).



On Fusion of Visual Quality Assessment Methods

Abstract: A new methodology for objective image quality assessment (IQA) with the multi-method fusion (MMF) principle has been proposed recently. The idea is motivated by the observation that there is a no single method that can give the best performance in all situations and their fusion may lead to better performance. In this talk, I will present two different fusion approaches. The first one is to get a nonlinear combination of scores from multiple methods with suitable weights obtained by a training process. The second one is to adopt different IQA methods in different blocks depending on the block content and the distortion type so that the fusion is conducted in the spatial domain. It is called the block-based MMF or BMMF in short. It is supported by numerous experimental results that the performance of these fusion-based IQA methods is significantly better than that of state-of-the-art single IQA methods. Challenges and future extensions will be concluded at the end of this talk. Examples include HD, 3D and scene-adaptive image/video quality assessment methods.

Speaker Bio: Dr. C.-C. Jay Kuo received the Ph.D. degree from the Massachusetts Institute of Technology in 1987. He is now with the University of Southern California (USC) as Professor of EE, CS and Mathematics. His research interests are in the areas of digital media processing, multimedia compression, communication and networking technologies, and embedded multimedia system design. Dr. Kuo is a Fellow of IEEE and SPIE. Dr. Kuo has guided about 115 students to their Ph.D. degrees and supervised 25 postdoctoral research fellows. Currently, his research group at USC consists of around 30 Ph.D. students (see website http://viola.usc.edu), which is one of the largest academic research groups in multimedia technologies. He is a co-author of about 200 journal papers, 850 conference papers and 10 books. Dr. Kuo is a Fellow of AAAS, IEEE and SPIE. He is Editor-in-Chief for the IEEE Transactions on Information Forensics and Security and Editor Emeritus for the Journal of Visual Communication and Image Representation (an Elsevier journal). He was on the Editorial Board of the IEEE Signal Processing Magazine in 2003-2004, IEEE Transactions on Speech and Audio Processing in 2001-2003, IEEE Transactions on Image Processing in 1995-98 and IEEE Transactions on Circuits and Systems for Video Technology in 1995-1997. Dr. Kuo received the National Science Foundation Young Investigator Award (NYI) and Presidential Faculty Fellow (PFF) Award in 1992 and 1993, respectively. He received the best paper awards from the Multimedia Communication Technical Committee of the IEEE Communication Society in 2005, from the IEEE Vehicular Technology Fall Conference (VTC-Fall) in 2006, and from IEEE Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP) in 2006. He was an IEEE Signal Processing Society Distinguished Lecturer in 2006, a recipient of the Okawa Foundation Research Award in 2007, the recipient of the Electronic Imaging Scientist of the Year Award in 2010, and the holder of the Fulbright-Nokia Distinguished Chair in Information and Communications Technologies from 2010-2011.



The Computational Approach to Lossy Compression and its Applications to Image ⁄ Video Coding

Abstract: Ever since rate distortion theory was introduced by Shannon in 1948 and particularly in 1959, it has been recognized that it provides in principle a theoretical basis for many practically important lossy compression problems. More than a half century later, it is fair to say that the rate distortion theory in Shannon probabilistic sense has not yet yielded such profound impact on practice as one might conceive. The major limitation of the rate distortion theory in Shannon probabilistic sense lies in two aspects:
(1) modeling---the theory in Shannon probabilistic sense often assumes analytically tractable source models such as stationary sources and yet real-world data are often nonstationary and may not fit into any analytical model; even if they do, such a model is very difficult to construct;
(2) separation---the theory in Shannon probabilistic sense is often concerned with asymptotic performance, and lossless coding ⁄ reproduction sequence space is either decoupled from quantization or completely ignored.

In this talk, we will present the computational approach to lossy compression which has been quietly developed in recent years by the speaker and others. Unlike Shannon probabilistic approach, the computational approach does not assume any model for the data to be encoded; it fully integrates lossless coding, reproduction sequence space, and quantization into one optimization problem. When data to be encoded are actually stationary, the computational approach coincides with Shannon probabilistic approach. We will also discuss the recent successful applications of the computational approach to image and video coding, and its impacts on the design of HEVC, the newest video coding standard, and future video coding standards.

Speaker Bio: En-hui Yang has been with the Dept. of Electrical and Computer Engineering, University of Waterloo, Ontario, Canada since June 1997, where he is now a Professor and Canada Research Chair in information theory and multimedia compression. He is the founding director of the Leitch-University of Waterloo multimedia communications lab, and a co-founder of SlipStream Data Inc. (now a subsidiary of BlackBerry). He currently also serves as an Overseas Advisor for the Overseas Chinese Affairs Office of the City of Shanghai, and is sitting on the Overseas Expert Advisory Committee for the Overseas Chinese Affairs Office of the State Council of China and a Review Panel for the International Council for Science.

He served, among many other roles, as an Associate Editor for IEEE Transactions on Information Theory, a general co-chair of the 2008 IEEE International Symposium on Information Theory, a technical program vice-chair of the 2006 IEEE International Conference on Multimedia & Expo (ICME), the chair of the award committee for the 2004 Canadian Award in Telecommunications, a co-editor of the 2004 Special Issue of the IEEE Transactions on Information Theory, a co-chair of the 2003 US National Science Foundation (NSF) workshop on the interface of Information Theory and Computer Science, and a co-chair of the 2003 Canadian Workshop on Information Theory.

A Fellow of IEEE, the Canadian Academy of Engineering, and the Royal Society of Canada (The Academies of Arts, Humanities and Sciences of Canada), Dr. Yang is also a recipient of several research awards including the prestigious Inaugural (2007) Ontario Premier¡¯s Catalyst Award for the Innovator of the Year, and the 2007 Ernest C. Manning Award of Distinction, one of the Canada's most prestigious innovation prizes. Products based on his inventions and commercialized by SlipStream received the 2006 Ontario Global Traders Provincial Award. With over 200 papers and more than 200 patents ⁄ patent applications worldwide, his research work has had an impact on the daily life of hundreds of millions people over 170 countries either through commercialized products, or video coding open sources, or video coding standards. In 2011, he was selected for inclusion in Canadian Who¡¯s Who.



Base Station Assignment and Transceiver Design for Heterogeneous Networks

Abstract: We consider the interference management problem in a multicell MIMO heterogenous network. Within each cell there is a large number of distributed micro ⁄ pico base stations (BSs) that can be potentially coordinated for joint transmission. To reduce coordination overhead, we consider user-centric BS clustering so that each user is served by only a small number of (potentially overlapping) BSs. Thus, given the channel state information, our objective is to jointly design the BS clustering and the linear beamformers for all BSs in the network. In this talk, we formulate this problem from a sparse optimization perspective and analyze its computational complexity. We show that this problem is NP-hard in general and identify cases where the problem is polynomial time solvable to global optimality. For the general problem setting, we propose an efficient algorithm that is based on iteratively solving a sequence of group LASSO problems. A novel feature of the proposed algorithm is that it performs BS clustering and beamformer design jointly rather than separately as is done in the existing approaches for partial coordinated transmission.

Speaker Bio: Zhi-Quan Tom Luo received his PhD in Operations Research from MIT in 1989. He was with the McMaster University, Canada, from 1989 to 2003, where he served as the Head of the ECE Department and held a Canada Research Chair in Information Processing. Since 2003 he has been with the ECE Department at the University of Minnesota (Twin Cities) as a full professor and holds an endowed ADC Chair. His research interests include optimization algorithms, signal processing and digital communication.

Prof. Luo is a Fellow of IEEE and SIAM. His current professional activities include: Editor-in-Chief, IEEE Transactions on Signal Processing (2012-Present); Editorial Board Member, IEEE Signal Processing Magazine (2010-2012) and IEEE Journal of Selected Topics of Signal Processing (2013-Present); Associate Editor, Mathematics of Operations Research, INFORMS, (2007-Present) and Management Sciences (2009-Present); Member ⁄ Vice Chair ⁄ Chair ⁄ Past Chair, SPS Signal Processing for Communications and Networking Technical Committee (2005-Present) and Member, SPS Publications Board (2012-present). Dr. Luo is a recipient of the 2004, 2009 and 2011 IEEE SPS Best Paper Award, the 2011 EURASIP Best Paper Award and the 2011 ICC Best Paper Award. He was awarded the Farkas Prize from the INFORMS Optimization Society in 2010.



Model-Based Imaging

Abstract: Over the last two decades, model-based imaging techniques have emerged as a principled framework for understanding and solving many of the most important problems in imaging research. The approach of model-based imaging is to construct a model of both the image and the imaging system, and then to use this integrated model to either reconstruct an unknown image, or to estimate unknown parameters. So for example, model-based image reconstruction and parameter estimation can be used to robustly form images from sensors with uncertain calibration. But in addition, model-based imaging can serve as a framework for optimizing the static and dynamic design of imaging sensor systems themselves.

In this talk, we review some techniques and recent successes in model-based imaging. Two application domains that we consider are tomographic reconstruction from multislice helical-scan CT and electron microscopy, two very different sensors that share much in common when viewed from the perspective of model-based imaging. For both cases, we discuss a variety of technical innovations, which either improve image quality or reduce the computational burden. We then show results, which demonstrate the value of the methods both quantitatively and qualitatively, on a variety of real and simulated datasets. Finally, we conclude with a philosophical discussion of the future potential of model-based methods, and we present some emerging ideas, which have the potential to change the field.

Speaker Bio: Charles A. Bouman is the Michael J. and Katherine R. Birck Professor of Electrical and Computer Engineering at Purdue University where he also holds an appointment in the School of Biomedical Engineering and serves has a co-director of Purdue¡¯s Magnetic Resonance Imaging Facility. He received his B.S.E.E. degree from the University of Pennsylvania, M.S. degree from the University of California at Berkeley, and Ph.D. from Princeton University in 1989.

Professor Bouman's research focuses on inverse problems, stochastic modeling, and their application in a wide variety of imaging problems including tomographic reconstruction and image processing and rendering. Prof. Bouman is a Fellow of the IEEE, AIMBE, IS&T, and SPIE. He has served as the Editor-in-Chief of the IEEE Transactions on Image Processing, Distinguished Lecturer for the IEEE Signal Processing Society, a member of the IEEE Signal Processing Society¡¯s Board of Governors, and the Vice President of Publications for the IS&T Society. Currently, he is Vice President for Technical Directions of the IEEE Signal Processing Society.



Language Processing as Signal Processing

Abstract:Traditionally, signal processing has involved continuous-valued signals (e.g. audio, video, sonar, etc.) that we transform, enhance, and recognize. Language, represented in terms of word sequences, is incorporated into signal processing as a discrete process generated by a Markov source that is used as a prior in, e.g., speech recognition. Words are characterized with non-parametric multinomial distributions depending on the word history or other categorical variables. However, social media and online interactions give us many more applications for language processing, and it is also of interest to transform, enhance and recognize text. There is growing interest in continuous-space representations of language, which offers the potential for using signal processing tools to solve these problems. In this talk, we survey work in continuous-space modeling of language, including latent semantic analysis, neural network models, and an exponential model that treats unseen events as a rank regularization problem. These models provide transformations of language that map words to a continuous space where neighbors have syntactic/semantic similarity. We can extend this approach to consider mixed discrete and continuous models by incorporating methods for learning sparse elements of language. Inspection of the sparse component provides insights into the idiosyncracies of speakers and speaking style as well as vocabulary acquisition.

Speaker Bio: Mari Ostendorf is a Professor of Electrical Engineering at the University of Washington. After receiving her PhD in electrical engineering from Stanford University, she worked at BBN Laboratories, then Boston University, and then joined the University of Washington (UW) in 1999. At UW, she is an Endowed Professor of System Design Methodologies in Electrical Engineering and an Adjunct Professor in Computer Science and Engineering and in Linguistics. From 2009-2012, she served as the Associate Dean for Research and Graduate Studies in the College of Engineering. She has previously been a visiting researcher at the ATR Interpreting Telecommunications Laboratory and at the University of Karlsruhe, and a Scottish Informatics and Computer Science Alliance Distinguished Visiting Fellow. Currently, she is an Australia-America Fulbright Scholar at Macquarie University. Prof. Ostendorf's research interests are in dynamic and linguistically-motivated statistical models for speech and language processing. Her work has resulted in over 200 publications and 2 paper awards. Prof. Ostendorf has served as co-Editor of Computer Speech and Language, as the Editor-in-Chief of the IEEE Transactions on Audio, Speech and Language Processing, and she is currently the VP Publications for the IEEE Signal Processing Society. She is also a member of the ISCA Advisory Council. She is a Fellow of IEEE and ISCA, a recipient of the 2010 IEEE HP Harriett B. Rigas Award, and a 2013 IEEE Signal Processing Society Distinguished Lecturer.