Korean AI Association

  >   학술행사   >   국내학술대회


기조 연설자
▶ X-to-Speech
Mark Hasegawa-Johnson (UIUC)
Computers talk.  But what should they say?  Most current software uses speech as a way of communicating text to the user.  I argue that this is limiting: speech can carry a lot more information than text.
For example, first, consider the problem of generating spoken descriptions of images (image2speech). An image2speech system is similar to an image captioning system, but instead of generating text, it generates spoken descriptions immediately. More specifically, instead of generating letters, the sequence-to-sequence LSTM generates acoustic units, learned in a semi-supervised fashion from a speech corpus (transcribed or untranscribed) in the same language.  Evaluation is difficult, because there are no "words" anywhere in the training or testing data, therefore it's not possible to calculate word error rate, word-based BLEU, or any other traditional speech or machine translation metric, but examples suggest that the system learns to generate meaningful and largely correct spoken image captions.
As a second example, suppose that you have speech in the voice of one talker, and you want to generate the same utterance in the voice of another talker --- without having any training data in the second talker's voice. We have developed a system called "AutoVC" (autoencoder for zero-shot voice conversion) that can generate speech in the voice of any talker, without requiring speech in that talker's voice --- instead, all that it requires is a speaker ID vector.  During training, the LSTM content encoder is downsampled by a tunable downsampling rate: the rate of the content encoder must be low enough so that speech synthesis without a speaker ID fails, but must be high enough so that speech synthesis with a speaker ID succeeds. Testing achieves MOS of 3.2 for cross-gender conversion into a previously unknown voice.
Mark Hasegawa-Johnson has been on the faculty at the University of Illinois since 1999, where he is currently a Professor of Electrical and Computer Engineering.  He received his Ph.D. in 1996 at MIT, with a thesis titled "Formant and Burst Spectral Measures with Quantitative Error Models for Speech Sound Classification," after which he was a post-doc at UCLA from 1996-1999, with a project entitled "Factor Analysis of MRI-Derived Articulator Shapes."  Prof.
Hasegawa-Johnson is a Fellow of the Acoustical Society of America, and a Senior Member of IEEE and ACM.  He is currently Treasurer of ISCA, and Senior Area Editor of the IEEE Transactions on Audio, Speech and Language.  He has published 280 peer-reviewed journal articles and conference papers in the general area of automatic speech analysis, including machine learning models of articulatory and acoustic phonetics, prosody, dysarthria, non-speech acoustic events, audio source separation, and under-resourced languages.
Fabio Bonsginorio (SSSA)
The significant progress we witness in sensors, actuators, materials, Internet of Things, perception and many other areas of Robotics and AI enable a radical redesign of production and distribution processes, in any economic sector, from Industry to Agriculture and Logistics. The more mature changes affecting industry are referred to as 'Digitization of the European Industry' at European level and 'Industry 4.0' in Italy and Germany. The same technologies can be applied to Agriculture leading to the concept of Precision Agriculture or ‘Agriculture 4.0’. The Blockchain, a public ledger whose information is protected by a consensus algorithm, demonstrates the possibility to create a network of robotic devices, and has the ability to solve problems in multi-agent networks. Distributed computing platform such the Ethereum blockchain are already showing opportunity to make robotic applications more responsive, secure and simplified in development. Nevertheless, there are numerous alternative nuances for the development of heterogeneous robotic applications based on the blockchain technologies. Which issues (technical, production, legal), slow down the development and widespread usage of robotic applications based on the blockchain into the real world? Will there be enough scalability and security of decentralized technologies for a large-scale network of CPSs? What new types of applications can be developed based on these technologies? The aim of this talk is to give a deep and hands-on understanding of the opportunities and challenges related to the radical transformation of product and service ecology made possible by the recent concurrent technological advancements in IOT, Machine and Deep Learning, Computer Vision and Object Recognition and the critical role that blockchain technologies can play in the transition to smart society and (post-scarcity?) economy made possible by the massive utilization or robotics and AI systems.
Fabio Bonsignorio is a Visiting Professor at the Biorobotics Institute of the Scuola Superiore Sant’Anna in Pisa. He has been professor at the University Carlos III of Madrid until 2014 ( in 2009 he was awarded there the Santander Chair of Excellence in Robotics). He is founder and CEO of Heron Robots, see www.heronrobots.com. He has been working in the R&D departments of several major companies for more than 20 years. He is currently a member of the Research Board of Directors of SPARC. He has pioneered and introduced the topic of Reproducible Research and Benchmarking in Robotics and AI. He coordinated the EURON Special Interest Group on Good Experimental Methodology and Benchmarking in Robotics, is cochair of the IEEE RAS TC-Pebras. He has been general co-chair of the IEEE RAS 2015 Summer School on Replicable and Measurable Robotics Research. He has been the corresponding editor of the Special Issue on Replicable and Measurable Robotics Research on IEEE Robotics and Automation Magazine, appeared in September 2015. He has designed the Reproducible Articles in IEEE R&A Mag. He has been in the Program Committee of the European Robotics Forum 2017, 2018 and he is for 2019.