사단법인 한국인공지능학회

학술행사

국내학술대회

분과학술대회

2020 추계학술대회 한국인공지능학회 / Microsoft Research 공동학술대회

> 학술행사 > 국내학술대회

국내학술대회

발표자 및 초록

Keynote Speech

November 19 (Thursday) [10:00-11:00]

Title: Bring 10x Speedup to NLP Model Training

Dr. Tie-Yan Liu (Abstract & Bio)

Assistant Managing Director, Microsoft Research Asia

Research Manager of Machine Learning Group

IEEE Fellow

November 20 (Friday) [10:00-11:00]

Title: From Language Intelligence to Code Intelligence Based on Pre-trained Models

Dr. Ming Zhou (Abstract & Bio)

Assistant Managing Director, Microsoft Research Asia

Research Manager of Natural Language Computing Group

President of ACL

Invited Talks

November 19 (Thursday) [11:00-12:00]

Track 1: Computer Vision

Title: Human Centric Spatial Understanding

Dr. Wenjun Zeng (Abstract & Bio)

Sr. Principal Research Manager, Microsoft Research

IEEE Fellow

Track 2: NLP

Title: Robustifying NLP using Counterfactual Knowledge

Professor Seung-won Hwang, 황승원 교수 (Abstract & Bio)

Yonsei University, 연세대학교

November 20 (Friday) [11:00-12:00]

Track 1: NLP

Title: Multilingual Multimodal Pre-training

Dr. Nan Duan (Abstract & Bio)

Principal Research Manager, Microsoft Research

Track 2: Computer Vision

Part 1 [11:00-11:30]

Title: Application-focused Video AI Research

Dr. Joon-young Lee, 이준영 박사 (Abstract & Bio)

Research Scientist, Adobe Research, 어도비 연구소

Part 2 [11:30-12:00]

Title: Perceiving Humans in 3D from Single Images

Dr. Hanbyul Joo, 주한별 박사 (Abstract & Bio)

Research Scientist, Facebook AI Research, 페이스북 AI

Machine Learning Session

November 19 (Thursday) [13:00-15:00]

Part 1 [13:00-13:30]

Title: Towards Practical Reinforcement Learning

Professor Kee-Eung Kim, 김기응 교수 (Abstract & Bio)

KAIST, 한국과학기술원

Part 2 [13:30-14:00]

Title: Efficient adaptive sampling with latent meta-structure

Professor Jungseul Ok, 옥정슬 교수 (Abstract & Bio)

POSTECH, 포항공과대학교

Part 3 [14:00-14:30]

Title: Neural complexity measures

Professor Juho Lee, 이주호 교수 (Abstract & Bio)

KAIST, 한국과학기술원

Part 4 [14:30-15:00]

Title: Inference and Estimation using Nearest Neighbors

Professor Yung-Kyun Noh, 노영균 교수 (Abstract & Bio)

Hanyang University, 한양대학교

Machine Learning: NeurIPS 2020 Preview Session

November 20 (Friday) [13:00-15:00]

Part 1 [13:00-13:20]

Title: Overview of NeurIPS 2020 and Contrastive Learning for Novelty Detection

Professor Jinwoo Shin, 신진우 교수 (Abstract & Bio)

KAIST, 한국과학기술원

Part 2 [13:20-13:40]

Title: Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs

Professor Hyunwoo Kim, 김현우 교수 (Abstract & Bio)

Korea University, 고려대학교

Part 3 [13:40-14:00]

Title: Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud

Professor Bohyung Han, 한보형 교수 (Abstract & Bio)

Seoul National University, 서울대학교

Part 4 [14:00-14:20]

Title: Learning to represent data across domains

Professor Seunghoon Hong, 홍승훈 교수 (Abstract & Bio)

KAIST, 한국과학기술원

Part 5 [14:20-14:40]

Title: Deep Diffusion-Invariant Wasserstein Distributional Classification

Professor Junseok Kwon, 권준석 교수 (Abstract & Bio)

Chung-Ang University, 중앙대학교

Part 6 [14:40-15:00]

Title: MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Professor Sungju Hwang, 황성주 교수 (Abstract & Bio)

KAIST, 한국과학기술원

Parallel Sessions

November 19 (Thursday) [15:00-18:00]

Track 1: Computer Vision

Part 1 [15:00-15:30]

Title: Contrastive Learning for Pixel-level Correspondence

Professor Seungryong Kim, 김승룡 교수 (Abstract & Bio)

Korea University, 고려대학교

Part 2 [15:30-16:00]

Title: Continual Learning with Node-Importance based Adaptive Group Sparse Regularization

Professor Taesup Moon, 문태섭 교수 (Abstract & Bio)

Sungkyunkwan University, 성균관대학교

Part 3 [16:00-16:30]

Title: Towards machines that reconstruct and understand 3D scenes

Professor Jaesik Park, 박재식 교수 (Abstract & Bio)

POSTECH, 포항공과대학교

Part 4 [16:30-17:00]

Title: Learning architectures for binary networks and beyond

Professor Jonghyun Choi, 최종현 교수 (Abstract & Bio)

GIST, 광주과학기술원

Part 5 [17:00-17:30]

Title: Mixture Density Object Detector: LEARNING BOUNDING BOXES WITHOUT GROUND TRUTH ASSIGNMENT

Professor Nojun Kwak, 곽노준 교수 (Abstract & Bio)

Seoul National University, 서울대학교

Part 6 [17:30-18:00]

Title: Neural Motion Feature Learning for Video Understanding

Professor Minsu Cho, 조민수 교수 (Abstract & Bio)

POSTECH, 포항공과대학교

November 19 (Thursday) [15:00-18:00]

Track 2: Industry: "Idea into Action"

Part 1 [15:00-16:00]

Presentation 1: Data and Talent - Essentials for AI Innovation

by Korea Startup Forum

Part 2 [16:00-17:00]

Presentation 2: Introduction of AI open network, AI i-CON

(Innovation & Communication Open Network)

Part 3 [17:00-18:00]

Presentation 3: 4 X AI Companies

Parallel Sessions

November 20 (Friday) [15:00-18:00]

Track 1: NLP & Speech

Part 1 [15:00-15:30]

Title: Neural Interface for World Knowledge

Professor Minjoon Seo, 서민준 교수 (Abstract & Bio)

Kaist & Engineering Manager, Naver, 한국과학기술원 & 엔지니어링 매니저, 네이버

Part 2 [15:30-16:00]

Title: Question and Answer system with Deep Learning

Professor Kyomin Jung, 정교민 교수 (Abstract & Bio)

Seoul National University, 서울대학교

Part 3 [16:00-16:30]

Title: Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness

Professor Gunhee Kim, 김건희 교수 (Abstract & Bio)

Seoul National University, 서울대학교

Part 4 [16:30-17:00]

Title: Conversation Model and its Evaluation

Professor JinYeong Bak, 박진영 교수 (Abstract & Bio)

Sungkyunkwan University, 성균관대학교

Part 5 [17:00-17:30]

Title: Natural Speech Interface – Pre-processing and synthesis perspective

Professor Hong-Goo Kang, 강홍구 교수 (Abstract & Bio)

Yonsei University, 연세대학교

Part 6 [17:30-18:00]

Title: A REVIEW OF ON-DEVICE FULLY NEURAL END-TO-END AUTOMATIC SPEECH RECOGNITION ALGORITHMS

Dr. Chanwoo Kim, 김찬우 상무 (Abstract & Bio)

Samsung Electronics, 삼성전자

November 20 (Friday) [15:00-18:00]

Track 2: Young Researchers & Award Papers

Part 1 [15:00-15:30]

Title: Self-supervised representation learning

Dr. Joon Son Chung, 정준선 박사 (Abstract & Bio)

Naver, 네이버

Part 2 [15:30-16:00]

Title: Can open-domain NLP models read biomedical text?

Dr. Jinhyuk Lee, 이진혁 연구교수 (Abstract & Bio)

Korea University, 고려대학교

Part 3 [16:00-16:30]

Title: Development of Multi-modal Video QA & Dialogue systems

Ph.D Candidate Junyeong Kim, 김준영 박사과정 (Abstract & Bio)

KAIST, 한국과학기술원

Part 4 [16:30-18:00]

3 Best Papers

Abstract & Biography

Title: Bring 10x Speedup to NLP Model Training

Dr. Tie-Yan Liu

Assistant Managing Director, Microsoft Research Asia

Research Manager of Machine Learning Group

IEEE Fellow

Abstract: Thanks to the adoption of deep learning technologies, great progress has been made in the field of NLP in recent years. However, since the birth of Transformer (in 2017), the innovation on new machine learning models for NLP has been slowing down, while more attention has been paid to the utilization of larger data to train larger models. For example, GPT-3, the SOTA pre-trained language model, contains 175-billion parameters, which costs around 2 million GPU hours and 12 million dollars for its training. Such a trend may potentially lead to a very high entry barrier in the field of NLP and prevent the majority of researchers to conduct cutting-edge research. To tackle this problem, it is crucial to invent more efficient way to train NLP models. In this talk, we will discuss how to achieve this goal, by a comprehensive exploration from the perspectives of training data, objective function, model architecture, and optimization strategy. With innovations in all these aspects, we successfully accelerated the training of BERT by an order of magnitude. More importantly, our proposed approach is not restricted to BERT, and has its general implications to the acceleration of many other NLP models.

Bio: Tie-Yan Liu is an assistant managing director of Microsoft Research Asia, a fellow of the IEEE, and a distinguished scientist of the ACM. He is also an adjunct faculty member at Carnegie Mellon University (CMU), Tsinghua University, and the University of Science and Technology of China. He is well known for his pioneer work on machine learning for information retrieval, and recently he has done impactful research on deep learning, reinforcement learning, and distributed learning. He published 200+ papers in top conferences and journals, with tens of thousands of citations. He has been invited to serve as general chair, PC chair, local chair, or area chair for a dozen of top conferences including WWW/WebConf, SIGIR, KDD, ICML, NIPS, ICLR, IJCAI, AAAI, ACL, as well as associate editor of ACM Transactions on Information Systems, ACM Transactions on the Web, and IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). He won several best (student) paper awards and most cited paper awards. He received the Most Cited Chinese Researcher Award by Elsevier (2017, 2018), China AI Leader Award – Technical Innovation (2018), Most Influential Scholar Award by AMiner (2007-2017), and 30 Best AI Use Cases of the Year by Synced (2019). His team released LightGBM in 2017, which has become one of the most popularly used machine learning tools in Kaggle and KDD Cup; his team helped Microsoft achieve human parity in machine translation in 2018 and won 8 champions in the WMT machine translation contest in 2019; his team also built the world-best Mahjong AI, named Suphx, which achieved 10 DAN on the Tenhou Mahjong platform in 2019. In addition, his team helped several industrial partners, including Asset Management China (AMC), China Taiping, OOCL, SF-Express, and FareastTone and Sanofi, on achieving digital transformation.

Title: From Language Intelligence to Code Intelligence Based on Pre-trained Models

Dr. Ming Zhou

Assistant Managing Director, Microsoft Research Asia

Research Manager of Natural Language Computing Group

President of ACL

Abstract: There are 23.9 million professional developers in 2019 and the population is expected to reach 28.7 million in 2024. With the growing population of developers, code intelligence, which aims to leverage AI to help software developers improve the productivity of the development process, is growing increasingly important in both communities of software engineering and artificial intelligence.

This presentation will first introduce the pre-trained models for language and multi-modalities (image, video and documents), and then report the recent progress of MSRA in code intelligence including pre-trained models for code (codeBERT/CodeGPT), a new benchmark CodeXGLUE (https://github.com/microsoft/CodeXGLUE) and a new evaluation metric CodeBLEU.

Bio: Dr. Ming Zhou is an Assistant Managing Director of Microsoft Research Asia and research manager of the Natural Language Computing Group. He is the president of Association of Computational Linguistics (ACL) in 2019 and vice-president of China Computer Federation (CCF). His research interests include next generation search engines, statistical and neural machine translation, question-answering, chatbots, computer poetry, riddle resolving and generation, knowledge graph, commonsense graph, semantic parser, text mining, user modelling and recommendation system. His team developed MS IME, Bing Dictionary, MS Couplet, and has been the most important provider of NLP technologies to all MS products including Bing, News, Ads, Word, Powerpoint, Azure, Xiaoice. Recently he leads his group to develop state-of-the-art pre-trained models for language, image, video, document and code, making great impact to MS products and NLP research communities.

Title: Human Centric Spatial Understanding

Dr. Wenjun Zeng

Sr. Principal Research Manager, Microsoft Research

IEEE Fellow

Abstract: Many vertical domains for AI (e.g., retail, manufacturing) require effective human-centric spatial understanding technologies to extract the insights from the visual sensor data. In this talk, we highlight the challenges and report some of the vision technologies that we have developed at Microsoft Research Asia in the past years that aim to help us better understand the physical space around us. The core technologies include human detection/tracking, multi-view 3D human pose estimation, person re-identification across multiple cameras, action recognition, as well as domain generalization and adaptation technologies that help scale the solutions to different settings. I will also discuss some future directions.

Bio: Wenjun (Kevin) Zeng is a Sr. Principal Research Manager and a member of the Senior Leadership Team at Microsoft Research Asia. He has been leading the video analytics research powering the Microsoft Cognitive Services, Azure Media Analytics Services, Microsoft Office, Dynamics, and Windows Machine Learning since 2014. He was with the Computer Science Dept. of Univ. of Missouri from 2003 to 2016, most recently as a Full Professor. Prior to that, he worked for PacketVideo Corp, San Diego, CA, Sharp Labs of America, Camas, WA, Bell Labs, Murray Hill, NJ, and Panasonic Technology, Princeton, NJ. He received his B.E., M.S., and Ph.D. degrees from Tsinghua Univ., the Univ. of Notre Dame, and Princeton Univ., respectively. He is on the Editorial Board of International Journal of Computer Vision, and was an Associate Editor and Steering Committee members for a number of IEEE journals. He has served as the General Chair or TPC Chair for several IEEE Conferences (e.g., ICME’2018, ICIP’2017). He is a Fellow of the IEEE.Wenjun Zeng

Title: Robustifying NLP using Counterfactual Knowledge

Professor Seung-won Hwang, 황승원 교수

Yonsei University, 연세대학교

Abstract: Despite the super-human accuracy of successful deep models in NLP tasks, their robustness is reportedly limited due to its blind reliance on biases and spurious patterns. We have preliminary findings that the understanding of causality is what humans use to distinguish spurious from meaningful patterns, and present our recent work to inject such understanding in models and annotations. We find such injection contributes to enhancing the robustness of models without increasing human annotation overheads. This talk summarizes our papers published in 2020 and some ongoing efforts as well: Further details can be found at http://dilab.yonsei.ac.kr/~swhwang
Bio: Prof. Seung-won Hwang is a Professor of Computer Science at Yonsei University. Prior to joining Yonsei, she had been an Associate Professor at POSTECH for 10 years, after her PhD from UIUC. Her recent research interest has been data and language understanding and intelligence, led to 100＋ publication at top-tier AI, DB/DM, and NLP venues, including ACL, AAAI, IJCAI, NAACL, SIGMOD, VLDB, and ICDE. She has received best paper runner-up and outstanding collaboration award from WSDM and Microsoft Research respectively.

Title: Multilingual Multimodal Pre-training

Dr. Nan Duan

Principal Research Manager, Microsoft Research

Abstract: This talk will review multilingual multimodal pre-training, a cutting-edge topic in the AI field towards learning universal representations that can map objects occurred in different modalities or expressed in different languages to vectors in a common semantic space. Three latest pre-trained models from MSRA will be highlighted, including Unicoder for multilingual tasks, Unicoder-VL for image-language tasks and Unicoder-VL for video-language tasks. Besides, we will also show how such large-scale pre-trained models can be applied in real-world products with massive users, such as Bing and Ads. Last, we will discuss the challenges and future directions.

Bio: Dr. Nan DUAN is a principal researcher at Microsoft Research Asia. He is also an adjunct professor at Tianjin University. His research interests include question answering, semantic parsing, multilingual multimodal pre-training and machine reasoning. He gave tutorials on EMNLP and NLPCC and served as evaluation co-chair of NLPCC and area chairs of NAACL and IJCAI. He was awarded CCF NLPCC Distinguished Young Scientist in 2019. He published 80+ research papers and his research have been applied in many Microsoft products.

Title: Application-focused Video AI Research

Dr. Joon-young Lee, 이준영 박사

Research Scientist, Adobe Research, 어도비 연구소

Abstracts: As a researcher in an industrial research lab, it is always challenging to balance between academic impact and product impact. In this talk, I plan to introduce my video AI-related research and talk about my effort to make an impact on both academia and products.

Bio: Joon-Young Lee is a Senior Research Scientist at the Creative Intelligence Lab, Adobe Research in San Jose, CA. His main research areas are Computer Vision and Deep Learning. He received his Ph.D. degree from KAIST in 2015 and joined Adobe Research in October 2015. He served as an Area Chair for ICCV 2019, CVPR 2020, and ECCV 2020. In Adobe Research, he transferred techniques to several products such as After Effects, AEM, Livefyre, Photoshop Elements, Photoshop Express, Lightroom mobile, and Creative SDK. He has over 60 technical publications in top-tier international conferences and journals, and he holds more than 40 issued/pending US patents. For more information, please see his personal webpage (https://joonyoung-cv.github.io/).

Title: Perceiving Humans in 3D from Single Images

Dr. Hanbyul Joo, 주한별 박사

Research Scientist, Facebook AI Research, 페이스북 AI

Abstract: In this talk, I will browse various methods to represent humans in 3D. I will introduce my early exploration in building a sensor system, the Panoptic Studio equipped with more than 500 synchronized cameras, that can capture high resolution 3D human data. Then, I will discuss my ongoing effort to build a system to perceive and understand human movements in 3D from the monocular videos in the wild.

Bio: Hanbyul Joo is a Research Scientist at Facebook AI Research (FAIR), Menlo Park. His research is at the intersection of computer vision, graphics, and machine learning, focusing on building a system to perceive and understand humans in 3D. Hanbyul received a PhD in the Robotics Institute, Carnegie Mellon University. Hanbyul's research has been covered in various media outlets including Discovery, Reuters, NBC News, The Verge, and WIRED. He is a recipient of the Samsung Scholarship and the Best Student Paper Award at CVPR 2018.

Title: Towards Practical Reinforcement Learning

Professor Kee-Eung Kim, 김기응 교수

KAIST, 한국과학기술원

Abstract: Recent advances in deep reinforcement learning have shown remarkable achievements. We now have reinforcement learning agents that surpass human expert level in many tasks, such as arcade video games and go. Yet, these are all achieved in simulated environments, where trial-and-error is regarded as cheap. In order to make a tangible impact on real-world environments, this basic assumption behind reinforcement learning has to be addressed with top priority. Fortunately, there is a growing body of research on this matter, exemplified by approaches such as sim2real and off-line reinforcement learning. In this talk, I will go over the challenges that need to be addressed to make reinforcement learning practical in real-world scenarios, and present our work on off-line reinforcement learning inspired by cross-validation in supervised learning.

Bio: Kee-Eung Kim received the B.S. degree in computer science from KAIST, Daejeon, South Korea, in 1995, and the ScM and Ph.D. degrees in computer science from Brown University, Providence, RI, USA, in 1998 and 2001, respectively. From 2001 to 2006, he was a Senior Software Engineer with Samsung SDS, South Korea, and a Senior Research Staff Member with the Samsung Advanced Institute of Technology, South Korea. In 2006, he joined the Faculty of Computer Science Department with KAIST, where he is currently a Professor of Graduate School of AI. His research interests are representations and algorithms for sequential decision making problems in artificial intelligence and machine learning, including Markov decision processes and reinforcement learning.

Title: Efficient adaptive sampling with latent meta-structure

Professor Jungseul Ok, 옥정슬 교수

POSTECH, 포항공과대학교

Abstract: Adaptive sampling is essential in machine learning, where data collection is bottleneck. Various domain knowledge capturing the structure underlying target system, e.g., continuity or unimodality, has been used to improve sample efficiency. Most of research in this line assume that the structure is explicitly described in detail. However, in practice, structural information is implicitly given. For example of continuity structure, we know the existence of continuity, but the detailed description such as Lipschitz constant or embedding are latent. Hence, we propose a meta learning scenario in which the structural information is extracted from previous experience and transferred to new tasks. In particular, we focus on the continuity structure in multi-armed bandit problem, the classical problem of adaptive sampling. We provide an efficient framework with not only provable guarantee but also empirical superiority.

Bio: Jungseul Ok is an assistant professor in the Department of Computer Science and Engineering and Graduate School of Artificial Intelligence, and a member of Machine Learning Lab at POSTECH. He completed Ph.D program in School of Electrical Engineering at Korea Advanced Institute of Science and Technology (KAIST), South Korea, under the supervision of Prof. Yung Yi and Prof. Jinwoo Shin. After graduation, he worked with Prof. Alexandre Proutiere and Prof. Sewoong Oh as a postdoctoral researcher, respectively, in School of Electrical Engineering at KTH, Stockholm, Sweden, and Paul G. Allen School of Computer Science & Engineering, University of Washington, WA, US.

Title: Neural complexity measures

Professor Juho Lee, 이주호 교수

KAIST, 한국과학기술원

Abstract: While various complexity measures for deep neural networks exist, specifying an appropriate measure capable of predicting and explaining generalization in deep networks has proven challenging. We propose Neural Complexity (NC), a meta-learning framework for predicting generalization. Our model learns a scalar complexity measure through interactions with many heterogeneous tasks in a datadriven way. The trained NC model can be added to the standard training loss to regularize any task learner in a standard supervised learning scenario. We contrast NC’s approach against existing manually-designed complexity measures and other meta-learning models, and we validate NC’s performance on multiple regression and classification tasks.

Bio: Juho Lee is an assistant professor at the graduate school of AI, KAIST. He finished his Ph.D. at POSTECH and worked as a postdoc at computational statistics and machine learning group in University of Oxford. His research mainly focuses on Bayesian nonparametric models and Bayesian deep learning.

Title: Inference and Estimation using Nearest Neighbors

Professor Yung-Kyun Noh, 노영균 교수

Hanyang University, 한양대학교

Abstract: In spite of the consistency property in theory of nearest neighbor methods, which relates the algorithm to the theoretical minimum error, the Bayes error, algorithm using nearest neighbors is not preferred by researchers because it is too simple and old-fashioned. However, due to its simplicity, the analysis in nearest neighbor methods is tractable and can produce non-asymptotic theories. Those have simply not yet experienced a big enough number of data to enjoy theoretical prediction, and the current algorithmic and system technologies are immature. In this talk, I will introduce some of my recent works implementing models that modify the geometry around the points of interest and perform the nearest neighbor methods with many data as if we were using effectively even more data than what is actually given.I will show some derivations of the equations to take advantage of the entire information within finite but many data and achieve the inference and estimation results seemingly as if we had used infinite data. By doing this, we believe nearest neighbor methods can be considered a breakthrough showing asymptotic performance by the smart usage of extremely many data.
Bio: Yung-Kyun Noh is currently an Assistant Professor in the Department of Computer Science of Hanyang University. His research interests are metric learning and dimensionality reduction in machine learning, and he is especially interested in applying statistical theory of nearest neighbors to real and large datasets. He received his B.S. in Physics from POSTECH, Korea and his Ph.D. in Computer Science from Seoul National University. He was a Research Assistant Professor in the Department of Computer Science at KAIST and was a BK Assistant Professor in the School of Mechanical and Aerospace Engineering at Seoul National University in Korea. He worked in the GRASP Robotics Laboratory at the University of Pennsylvania in Philadelphia, USA, RIKEN AIP in Tokyo, Japan, and Mayo Clinic in Rochester, USA, as a visiting researcher,, performing machine learning research for nonparametric methods.

Title: Overview of NeurIPS 2020 and Contrastive Learning for Novelty Detection

Professor Jinwoo Shin, 신진우 교수

KAIST, 한국과학기술원

Abstract: In this talk, I first overview NeurIPS 2020, e.g., the recent trend in machine learning topics and the performance of Korean researchers. Then, I will focus on contrastive learning, one of the most successful recent concepts for representation learning in an unsupervised (or self-supervised) manner. In particular, I will present how it is useful for novelty detection, i.e., identifying whether a given sample is drawn from outside the training distribution, which is essential for reliable machine learning. Our key idea is that in addition to contrasting a given sample with other instances as in conventional contrastive learning methods, we contrast the sample with distributionally-shifted augmentations of itself. Our experiments demonstrate the superiority of our method under various novelty detection scenarios, including unlabeled one-class, unlabeled multi-class and labeled multi-class settings, with various image benchmark datasets. This is a joint work with Jihoon Tack (KAIST AI), Sangwoo Mo (KAIST EE) and Jongheon Jeong (KAIST EE).

Bio: Jinwoo Shin is currently an associate professor (jointly affiliated) in the Graduate School of AI and the School of Electrical Engineering at KAIST. He is also a KAIST endowed chair professor. He obtained B.S. degrees (in Math and CS) from Seoul National University in 2001, and the Ph.D. degree (in Math) from Massachusetts Institute of Technology in 2010 with George M. Sprowls Award (for best MIT CS PhD theses). He was a postdoctoral researcher at Algorithms & Randomness Center, Georgia Institute of Technology in 2010-2012 and Business Analytics and Mathematical Sciences Department, IBM T. J. Watson Research in 2012-2013. Dr. Shin's early works are mostly on applied probability and theoretical computer science. After he joined KAIST in Fall 2013, he started to work on the algorithmic foundations of machine learning. He received the Rising Star Award in 2015 from the Association for Computing Machinery (ACM) Special Interest Group for the computer systems performance evaluation community (SIGMETRICS). He also received Kenneth C. Sevcik Award at ACM SIGMETRICS/Performance 2009, Best Publication Award from INFORMS Applied Probability Society 2013, Best Paper Award at ACM MOBIHOC 2013, Bloomberg Scientific Research Award 2015 and ACM SIGMETRICS Test of Time Award 2019.

Title: Self-supervised Auxiliary Learning with Meta-paths for Heterogeneous Graphs

Professor Hyunwoo Kim, 김현우 교수

Korea University, 고려대학교

Abstract: Graph neural networks have shown superior performance in a wide range of applications. However, the auxiliary tasks for heterogeneous graphs, which contain rich semantic information with various types of nodes and edges, have less explored in the literature. In this talk, we propose a novel self-supervised auxiliary learning method using meta-paths, which are composite relations of multiple edge types. Our proposed method is learning to learn a primary task by predicting meta-paths as auxiliary tasks. The proposed method can identify an effective combination of auxiliary tasks and automatically balance them to improve the primary task. Our methods can be applied to any graph neural networks in a plug-in manner without manual labeling or additional data. This talk also briefly introduces ongoing researches at MLVLab http://mlv.korea.ac.kr.

Bio: Hyunwoo J. Kim is an assistant professor in the Department of Computer Science at Korea University. Prior to the position, he worked at Amazon Lab126 in Sunnyvale California. He earned a Ph.D. in the Department of Computer Sciences at the University of Wisconsin-Madison (Ph.D. minor: statistics) under the supervision of Dr. Vikas Singh. His research interests include computer vision, manifold statistics, and deep learning for structured data on graphs and manifolds.

Title: Rotation-Invariant Local-to-Global Representation Learning for 3D Point Cloud

Professor Bohyung Han, 한보형 교수

Seoul National University, 서울대학교

Abstract: We present a local-to-global representation learning algorithm for 3D point cloud data, which is appropriate to handle various geometric transformations, especially rotation, without explicit data augmentation with respect to the transformations. Our model takes advantage of multi-level abstraction based on graph convolutional neural networks, which constructs a descriptor hierarchy to encode rotation-invariant shape information of an input object in a bottom-up manner. The descriptors at each level are obtained from a neural network based on a graph via a stochastic sampling of 3D points, which is effective in making the learned representations robust to the variations of input data. The proposed algorithm presents the state-of-the-art performance on the rotation-augmented 3D object recognition and segmentation benchmarks, and we further analyze its characteristics through comprehensive ablative experiments.

Bio: Bohyung Han is currently a Professor in the Department of Electrical and Computer Engineering at Seoul National University, Korea. Prior to the current position, he was an Associate Professor in the Department of Computer Science and Engineering at POSTECH and a visiting research scientist in Machine Intelligence Group at Google, Venice, CA, USA. He received the Ph.D. degree from the Department of Computer Science at the University of Maryland, College Park, MD, USA, in 2005. He served or will be serving as an Area Chair or Senior Program Committee member of numerous major conferences in computer vision, machine learning, and artificial intelligence, a General Chair in ACCV 2022, a Tutorial Chair in ICCV 2019, and a Demo Chair in ECCV 2022. He is also serving as an Associate Editor in TPAMI and MVA, and an Area Editor in CVIU. He is interested in various topics in computer vision and machine learning with an emphasis on deep learning. His research group won the Visual Object Tracking (VOT) Challenge in 2015 and 2016.

Title: Learning to represent data across domains

Professor Seunghoon Hong, 홍승훈 교수

KAIST, 한국과학기술원

Abstract: How can we describe noisy data using a compact and meaningful set of features? Learning to represent data has been the core problem of machine learning, and has achieved remarkable progress in various data domains, such as images, languages, audio signals, etc. However, learning to align such representations ‘across’ multiple heterogeneous domains still remains an open problem, although it is essential to learn from multi-modal input sources (e.g., observations taken from multiple sensors). In this talk, I will introduce the problem of cross-domain disentanglement learning, and our recent work that enable unsupervised training of aligned representations across domains.

Bio: Seunghoon Hong is an assistant professor at the School of Computing, KAIST. Before joining KAIST, he had been a postdoctoral fellow at the University of Michigan and visiting research faculty at Google Brain team. His research interests lie in the intersection of machine learning and computer vision, with a specific focus on learning with least supervision and deep generative models. He received the B.S. and Ph.D. degree from the Department of Computer Science and Engineering at POSTECH, Pohang, Korea in 2011 and 2017, respectively.

Title: Deep Diffusion-Invariant Wasserstein Distributional Classification

Professor Junseok Kwon, 권준석 교수

Chung-Ang University, 중앙대학교

Abstract: We present a novel classification method called deep diffusion-invariant Wasserstein distributional classification (DeepWDC). DeepWDC represents input data and labels as probability measures to address severe perturbations in input data. It can output the optimal label measure in terms of diffusion invariance, where the label measure is stationary over time and becomes equivalent to a Gaussian measure. Furthermore, DeepWDC minimizes the 2-Wasserstein distance between the optimal label measure and Gaussian measure, which reduces the Wasserstein uncertainty. Experimental results demonstrate that DeepWDC can substantially enhance the accuracy of several baseline deterministic classification methods and outperforms state-of-the-art-methods on 2D and 3D data containing various types of perturbations (e.g., rotations, impulse noise, and down-scaling).

Bio: Junseok Kwon received the PhD degree in electrical engineering and computer science in 2013, the MSc degree in 2008 in the topic of object tracking (supervised by Prof. Kyoung Mu Lee), and the BSc degree in 2006 from the Seoul National University, Korea. He is an associate professor in the School of Computer Science and Engineering at Chung-Ang University, Seoul, Korea. He was a postdoctoral researcher under Prof. Luc Van Gool in the Computer Vision Laboratory at ETH Zurich from 2013 to 2014. His research interests include visual tracking, visual surveillance, and Monte Carlo Sampling method and its variants.

Title: MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures

Professor Sungju Hwang, 황성주 교수

KAIST, 한국과학기술원

Abstract: Regularization and transfer learning are two popular techniques to enhance model generalization on unseen data, which is a fundamental problem of machine learning. Regularization techniques are versatile, as they are task- and architecture-agnostic, but they do not exploit a large amount of data available. Transfer learning methods learn to transfer knowledge from one domain to another, but may not generalize across tasks and architectures, and may introduce new training cost for adapting to the target task. To bridge the gap between the two, we propose a transferable perturbation, MetaPerturb, which is meta-learned to improve generalization performance on unseen data. MetaPerturb is implemented as a set-based lightweight network that is agnostic to the size and the order of the input, which is shared across the layers. Then, we propose a meta-learning framework, to jointly train the perturbation function over heterogeneous tasks in parallel. As MetaPerturb is a set-function trained over diverse distributions across layers and tasks, it can generalize to heterogeneous tasks and architectures. We validate the efficacy and generality of MetaPerturb trained on a specific source domain and architecture, by applying it to the training of diverse neural architectures on heterogeneous target datasets against various regularizers and fine-tuning. The results show that the networks trained with MetaPerturb significantly outperform the baselines on most of the tasks and architectures, with a negligible increase in the parameter size and no hyperparameters to tune.

Bio: Sung Ju Hwang is an associate professor in the Graduate School of Artificial Intelligence and School of Computing at KAIST. Prior to working at KAIST, he was an assistant professor in the School of Electric and Computer Engineering at UNIST, and before that he was a postdoctoral research associate at Disney Research. He completed his Ph.D. study in computer science at University of Texas at Austin, under the supervision of Professor Kristen Grauman. His research interests are mainly on developing novel models and algorithms for tackling new challenges in deploying artificial intelligence systems to various real-world application domains, with the special focus on low-resource learning, on-device learning, safe/secure learning, and large-scale learning. Sung Ju Hwang is also a co-founder of AItrics, an AI startup company located in Seoul.

Title: Contrastive Learning for Pixel-level Correspondence

Professor Seungryong Kim, 김승룡 교수

Korea University, 고려대학교

Abstract: Numerous computer vision and computational photography applications require the points on an object in one image to be matched with their corresponding object points in another image. Establishing such pixel-level correspondences can facilitate a variety of computer vision applications including nonparametric scene parsing, semantic segmentation, object detection, and image manipulation. A major challenge of correspondence learning with recent deep CNNs is the lack of ground-truth correspondence maps for training. More recently, contrastive learning has been an effective tool in unsupervised visual representation learning, but there have been no studies to learn pixel-level correspondences with the contrastive learning. To move step forward, we have explored how the dense correspondence benefits from the contrastive learning paradigm. This talk summarizes our main efforts to incorporate contrastive learning for establishing pixel-level correspondences and its applications including semantic correspondence, landmark detection, photorealistic style transfer, and exemplar-based image-to-image translation. Further details can be found at https://cvlab.korea.ac.kr/.

Bio: Prof. Seungryong Kim is an assistant professor of the Department of Computer Science and Engineering at Korea University. Before joining Korea University, he was a postdoctoral researcher in School of Computer and Communication Science at École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, and a postdoctoral researcher in School of Electrical and Electronic Engineering at Yonsei University, Seoul, Korea. He received the B.S. and Ph.D. degrees in School of Electrical and Electronic Engineering at Yonsei University, Seoul, Korea, in 2012 and 2018, respectively. His current research interests include computer vision, computational photography, and machine learning, particularly representation learning, visual scene reconstruction and understanding.

Title: Continual Learning with Node-Importance based Adaptive Group Sparse Regularization

Professor Taesup Moon, 문태섭 교수

Sungkyunkwan University, 성균관대학교

Abstract: In this talk, I will present a recently developed novel regularization-based continual learning method, dubbed as Adaptive Group Sparsity based Continual Learning (AGS-CL), using two group sparsity-based penalties. Our method selectively employs the two penalties when learning each node based on its importance, which is adaptively updated after learning each new task. By utilizing the proximal gradient descent method for learning, the exact sparsity and freezing of the model is guaranteed, and thus, the learner can explicitly control the model capacity as the learning continues. Furthermore, as a critical detail, we re-initialize the weights associated with unimportant nodes after learning each task in order to prevent the negative transfer that causes the catastrophic forgetting and facilitate efficient learning of new tasks. Throughout the extensive experimental results, we show that our AGS-CL uses much less additional memory space for storing the regularization parameters, and it significantly outperforms several state-of-the-art baselines on representative continual learning benchmarks for both supervised and reinforcement learning tasks. This work will be also presented in the upcoming NeurIPS 2020 conference.

Bio: Taesup Moon received the B.S. degree in electrical engineering from Seoul National University, Seoul, Korea, in 2002 and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 2004 and 2008, respectively. From 2008 to 2012, he was a Scientist at Yahoo! Labs, Sunnyvale, CA, and he held a Post-Doctoral Researcher appointment with the Department of Statistics, UC Berkeley from 2012 to 2013. From 2013 to 2015, he was a Research Staff Member with Samsung Advanced Institute of Technology (SAIT), and from 2015 to 2017, he was an Assistant Professor at the Department of Information and Communication Engineering, Daegu-Gyeongbuk Institute of Science and Technology (DGIST). Currently, he is an Associate Professor at the Department of Electrical and Computer Engineering, Sungkyunkwan University (SKKU), Suwon, South Korea. His current research interests are in machine/deep learning, signal processing, information theory, and various (big) data science applications.

Title: Towards machines that reconstruct and understand 3D scenes

Professor Jaesik Park, 박재식 교수

POSTECH, 포항공과대학교

Abstract: Recovering and understanding a 3D scene is an essential task for artificial intelligence systems to interact with the real-world. This talk will cover my recent work regarding 3D reconstruction, learning-based geometric features, 3D semantic segmentation, and point cloud registration. Moreover, the talk will introduce the new version of Open3D equipped with a machine learning module.

Bio: Jaesik Park is an Assistant Professor at POSTECH. He received his Bachelor’s degree from Hanyang University in 2009, and he received his Master’s degree and Ph.D. degree from KAIST in 2011 and 2015, respectively. He worked at Intelligent Systems Lab in Intel as a Research Scientist. His research interests cover various 3D vision topics. He is a co-founder of Open3D and a recipient of the Microsoft Research Asia Fellowship and two Samsung Humantech Awards. He serves as an Area Chair for ICCV 2019, CVPR 2020, CVPR 2021.

Title: Learning architectures for binary networks and beyond

Professor Jonghyun Choi, 최종현 교수

GIST, 광주과학기술원

Abstract: The large computational complexity of the algorithms is one of the bottlenecks to deploy AI systems such as visual recognition models. Neural networks with 1-bit precision would have very high memory efficiency with computational benefits. I will introduce a few ideas to make better architectures for binary networks using search. In addition, we introduce simpler training methods for the searched better architectures. We show that the searched architectures outperform state of the art binary networks and the simpler training regime further improves the utilization of the smaller networks outperforming the state of the binary networks with higher FLOPs.

(Relevant paper link: https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123570562.pdf)

Bio: Jonghyun Choi is an assistant professor in the AI graduate school/EECS in GIST (Gwangju Institute of Science and Technology). Prior to GIST, he has been a research scientist in Allen Institute for Artificial Intelligence (AI2) in 2016-2018 and currently affiliated research scientist there. He got a PhD degree from University of Maryland, College Park in 2015 with distinguished dissertation fellowship under the supervision of Prof. Larry S. Davis. His research interest includes supervision- and resource-efficient computer vision and learning methods, and high level visual reasoning.

Title: Mixture Density Object Detector: LEARNING BOUNDING BOXES WITHOUT GROUND TRUTH ASSIGNMENT

Professor Nojun Kwak, 곽노준 교수

Seoul National University, 서울대학교

Abstract: In multi-object detection using neural networks, most methods train a network based on ground truth assignment, which makes the training too heuristic and complicated. In this paper, we reformulate the multi-object detection task as a problem of density estimation of bounding boxes. Instead of using a ground-truth-assignment-based method, we train a network by estimating the probability density of bounding boxes in an input image using a mixture model. For this purpose, we propose a novel network for object detection called Mixture Density Object Detector (MDOD), and the corresponding objective function for the density-estimation-based training. Unlike the ground-truth-assignment-based methods, our proposed method gets rid of the cumbersome processes of matching between ground truth boxes and their predictions as well as the heuristic anchor design. It is also free from the problem of foreground-background imbalance. We applied MDOD to MS COCO dataset. Our proposed method not only deals with multi-object detection problems in a new approach but also improves detection performances through MDOD.

Bio: NOJUN KWAK received the B.S., M.S., and Ph.D. degrees in electrical engineering and computer science from Seoul National University, Seoul, in 1997, 1999, and 2003, respectively. From 2003 to 2006, he was with Samsung Electronics, Seoul. From 2007 to 2013, he was a Faculty Member with the Department of Electrical and Computer Engineering, Ajou University, Suwon, South Korea. Since 2013, he has been with the Graduate School of Convergence Science and Technology, Seoul National University, where he is currently a Professor. His current research interests include feature learning by deep neural networks and their applications in various areas of pattern recognition, computer vision, and image processing.

Title: Neural Motion Feature Learning for Video Understanding

Professor Minsu Cho, 조민수 교수

POSTECH, 포항공과대학교

Abstract: Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information typically using optical flows extracted by a separate off-the-shelf method. As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding. In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features. We propose a trainable neural module for effective motion feature extraction. Inserted in the middle of any neural network, it learns to establish correspondences across frames and convert them into motion features, which are readily fed to the next downstream layer for better prediction. We demonstrate that the proposed method provides a significant gain on four standard benchmarks for action recognition with only a small amount of additional cost, outperforming the state of the art on Something-Something-V1&V2 datasets.

Bio: Minsu Cho is an associate professor of Computer Science and Engineering at POSTECH, South Korea, leading POSTECH Computer Vision Lab. Before joining POSTECH in the fall of 2016, he has worked as a postdoc (2012-2015) and a starting researcher (2015-2016) in Inria (the French National Institute for computer science and applied mathematics) and ENS (École Normale Supérieure), Paris, France. He completed his Ph.D. in 2012 at Seoul National University, Korea. His research lies in the areas of computer vision and machine learning, especially in the problems of visual correspondence, action recognition, and object recognition with minimal supervision. In general, he is interested in the relationship between correspondence and supervision in visual learning. He is an editorial board member of the International Journal of Computer Vision and is serving area chairs in top computer vision conferences including CVPR 2018, ICCV 2019, CVPR 20-2021, IJCAI 2021.

Title: Neural Interface for World Knowledge

Professor Minjoon Seo, 서민준 교수

Kaist & Engineering Manager, Naver, 한국과학기술원 & 엔지니어링 매니저, 네이버

Abstract: Modern natural language and machine learning problems are increasingly dependent on external world knowledge, and thus how we encode and interact with knowledge data has become very important as well. In this talk, I will discuss different approaches in the literature for handling world knowledge, and in particular, our recent work on creating a memory interface for accessing massive unstructured knowledge data. I will then argue both observed and hypothetical advantages of such a memory-based approach, especially how it can help us to perform more complex interaction with world knowledge including language reasoning.

Bio: Minjoon Seo is an Engineering Manager at Naver and an incoming Assistant Professor at KAIST Graduate School of AI. He finished his Ph.D. at the University of Washington, advised by Hannaneh Hajishirzi and Ali Farhadi. His research interest is in natural language processing and machine learning, and in particular, how knowledge data can be encoded (e.g. external memory and language model), accessed (e.g. question answering and dialog), and produced (e.g. scientific reasoning). His study was supported by Facebook Fellowship and AI2 Key Scientific Challenges Award. He previously co-organized MRQA 2018, MRQA 2019 and RepL4NLP 2020.

Title: Question and Answer system with Deep Learning

Professor Kyomin Jung, 정교민 교수

Seoul National University, 서울대학교

Abstract Recently various Deep Neural Network (DNN) models have shown excellence in question answering (QA) tasks. In this talk, I will present basic ideas of QA systems with DNN models, and present our work on them. Specifically we investigate the human-like QA system that generates answers in the natural language form in aligning with the given context. Furthermore, our research aims to fully incorporate external knowledge to the QA system for integrating deep neural network-based models and human-generated expertise knowledge-base. To this end, we have developed model frameworks that detect supporting sentences in knowledge-based QA.

Bio: Kyomin Jung is an associate professor in the Electrical and Computer Engineering department at Seoul National University (SNU), and has a joint appointment in the Mathematics department SNU. He worked at KAIST Computer Science department from 2009 to 2013. He received Ph.D.

at MIT department of Mathematics in 2009, and B.Sc. at Seoul National University department of Mathematics in 2003. During his Ph.D., he worked at Microsoft Research Cambridge (2008), IBM T.J. Watson Research (2007), and Bell Labs (2006) as research internships. He is a recipient of the excellent new faculty funding from NRF Korea in 2012.

His research area includes machine learning and natural language processing, and their applications to various services. His work on machine learning and natural language processing have been published in major journals and conferences including IEEE PAMI, NIPS, ICML, AAAI, ACL, NAACL, ICDM, SIGMETRICS, and ICDE.

Title: Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness

Professor Gunhee Kim, 김건희 교수

Seoul National University, 서울대학교

Abstract: We explore the task of improving persona consistency of dialogue agents. Recent models tackling consistency often train with additional Natural Language Inference (NLI) labels or attach trained extra modules to the generative agent for maintaining consistency. However, such additional labels and training can be demanding. Also, we find even the best-performing persona-based agents are insensitive to contradictory words. Inspired by social cognition and pragmatics, we endow existing dialogue agents with public self-consciousness on the fly through an imaginary listener. Our approach, based on the Rational Speech Acts framework (Frank and Goodman, 2012), can enforce dialogue agents to refrain from uttering contradiction. We further extend the framework by learning the distractor selection, which has been usually done manually or randomly. Results on Dialogue NLI (Welleck et al., 2019) and PersonaChat (Zhang et al., 2018) dataset show that our approach reduces contradiction and improves consistency of existing dialogue models. Moreover, we show that it can be generalized to improve context-consistency beyond persona in dialogues.

Bio: Gunhee Kim is an associate professor in the Department of Computer Science and Engineering of Seoul National University from 2015. He was a postdoctoral researcher at Disney Research for one and a half years. He received his PhD in 2013 under supervision of Eric P. Xing from Computer Science Department of Carnegie Mellon University. Prior to starting PhD study in 2009, he earned a master’s degree under supervision of Martial Hebert in Robotics Institute, CMU. His research interests are solving computer vision and web mining problems that emerge from big image data shared online, by developing scalable and effective machine learning and optimization techniques. He is a recipient of 2014 ACM SIGKDD doctoral dissertation award, and 2015 Naver New faculty award.

Title: Conversation Model and its Evaluation

Professor JinYeong Bak, 박진영 교수

Sungkyunkwan University, 성균관대학교

Abstract: Many neural network-based open-domain conversation models have seen successes in recent years. Despite these recent successes, the open-domain conversation models still have challenges to imitate human-level conversations. One of the challenges is the consideration of speakers in the conversations. Knowing information about a speaker, such as her linguistic style or personal information can help predict her response, and knowing more about both speakers from their previous conversations can help predict the contents of their conversation. I focus on modeling speakers and their behaviors in the conversations that enable the conversation model to generate human-like responses. In this talk, I will introduce two recent research projects on conversation modeling: generating personalized conversational responses (EMNLP 2019) and evaluating conversational responses without human-labeled data (ACL 2020). I will also show future directions of my research to build conversation models and its evaluation.

Bio: JinYeong Bak is an assistant professor in the colleague of computing at Sungkyunkwan University. His research interests analyzing human behaviors from their conversations and building open-domain conversation models from the insights of the analysis. He worked at Microsoft Research Asia as a research intern and United Nations Global Pulse Lab Jakarta as a junior data scientist. He received Ph.D. and M.S. from the KAIST and a B.S. from Sungkyunkwan University. His research has been published in ACL, EMNLP, CHI, and ICWSM. His personal homepage: https://nosyu.github.io and lab homepage: https://hli.skku.edu

Title: Natural Speech Interface – Pre-processing and synthesis perspective

Professor Hong-Goo Kang, 강홍구 교수

Yonsei University, 연세대학교

Abstract: Speech is one of the most effective media to control or communicate with digital devices. The main goal of speech interface research is to provide naturalness similar to human-to-human communication. Thanks to the advances in deep learning-based technology, speech interface systems can be applied for various applications such as AI speakers, robots, and so on. This talk briefly explains the core modules for natural speech interface, then introduces recent research trends and directions, from a perspective of speech enhancement and speech synthesis techniques.

Bio: Hong-Goo Kang is a professor of Electrical and Electronic engineering at Yonsei University. Prior to joining Yonsei, he had been a Senior Technical Staff Member at AT&T Labs-Research (Florham Park, NJ) for 6.5 years, after PhD from Yonsei. In 2008~20009 and 2015~2016, respectively, he worked for Broadcom (Irvine, CA) and Google (Mountain View, CA) as a visiting scholar, where he participated in various projects on speech signal processing. His research interests include speech/audio signal processing, machine learning, and human computer interface.

Title: A REVIEW OF ON-DEVICE FULLY NEURAL END-TO-END AUTOMATIC SPEECH RECOGNITION ALGORITHMS

Dr. Chanwoo Kim, 김찬우 상무

Samsung Electronics, 삼성전자

Abstract: In this presentation, we review various end-to-end automatic speech recognition algorithms and their optimization techniques for on-device applications. Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on. To obtain sufficiently highspeech recognition accuracy with such conventional speech recognition systems, a very large language model (up to 100 GB) is usually needed. Hence, the corresponding WFST size becomes enormous, which prohibits their on-device implementation. Recently, fully neural network end-to-end speech recognition algorithms have been proposed. Examples include speech recognition systems based on Monotonic Chunk-wise Attention (MoCha), Recurrent Neural Network Transducer (RNN-T), transformer-based speech recognition systems, and so on. These fully neural network-based systems require much smaller memory footprints compared to conventional algorithms, therefore their on-device implementation has become feasible. In this paper, we review such end-to-end speech recognition models. We extensively discuss their structures, performance, and advantages compared to conventional algorithms.

Bio: Dr. Chanwoo Kim has been a vice president at Samsung research leading the speech processing Lab since Feb. 2018. He was a senior software engineer at Google speech team between Feb. 2013 and Feb. 2018. He worked for acoustic modeling for speech recognition systems and enhancing noise robustness using deep learning techniques. He was a speech scientist at the Microsoft Speech Team from 2011 to 2013. Dr. Kim received his Ph. D. from the Language Technologies Institute of School of Computer Science Carnegie Mellon University in 2010. He received his B.S and M.S. degrees in Electrical Engineering from Seoul National University in 1998 and 2001, respectively. Dr. Kim’s doctoral research was focused on enhancing the robustness of automatic speech recognition systems in noisy environments. Between 2003 and 2005 Dr. Kim was a Senior Research Engineer at LG Electronics, where he worked primarily on embedded signal processing and protocol stacks for multimedia systems. Prior to his employment at LG, he worked for EdumediaTek and SK Teletech as a R&D engineer.

Title: Self-supervised representation learning

Dr. Joon Son Chung, 정준선 박사

Naver, 네이버

Abstract: Supervised learning with deep neural networks has brought phenomenal advances to many fields of research, but the performance of such systems relies heavily on the quality and quantity of annotated databases tailored to the particular application. It can be prohibitively difficult to manually collect and annotate databases for every task. There is a plethora of data on the internet that is not used in machine learning due to the lack of such annotations. Self-supervised learning allows a model to learn representations using properties inherent in the data itself, such as natural co-occurrence.

In this talk, I will introduce research from myself and others on self-supervised learning, with a focus on cross-modal self-supervision. Our work demonstrates that representations can be learnt from unlabelled videos by learning correlation between images and audio. The learnt representations can be used for downstream tasks such as speaker recognition, face recognition, automatic speech recognition and lip reading, on many of which we outperform fully supervised baselines. Other noteworthy applications of self-supervision include separating simultaneous speech from video, and generating music from silent video of instruments.

Bio: Joon Son Chung is a research scientist at the Naver R&D Center. He received his BA and PhD from the University of Oxford, working with Prof. Andrew Zisserman. He published in top tier publications including TPAMI and IJCV, and has been the recipient of best paper awards at Interspeech and ACCV. His research interests include speaker recognition, cross-modal learning, visual speech synthesis and audio-visual speech recognition.

Title: Can open-domain NLP models read biomedical text?

Dr. Jinhyuk Lee, 이진혁 연구교수

Korea University, 고려대학교

Abstract: Recent advancement in natural language processing (NLP) enables us to deal with tons of textual information that we are producing everyday. Although there has been a surge of interest in developing "open"-domain NLP models, it is essential to build a model that better understands text in expert domains such as biomedical articles. In this talk, I'm going to introduce some of the recent progress in open-domain NLP models especially focusing on question answering models and show how generalizable these models are across different domains. Then, I'll discuss how we can build a better domain-specific NLP model with two different approaches: domain adaptation and domain generalization.

Bio: Jinhyuk Lee is a research professor at Korea University. His research area is based on natural language processing and deep learning. Specifically, he is interested in building machine reading models that can accumulate knowledge and extending them to the biomedical domain which requires an understanding of expert knowledge. He received the Ph.D. in Computer Science and Engineering at Korea University advised by Prof. Jaewoo Kang. Previously, he received the B.S. in Computer Science and Engineering at Korea University.

Title: Development of Multi-modal Video QA & Dialogue systems

Ph.D Candidate Junyeong Kim, 김준영 박사과정

KAIST, 한국과학기술원

Abstract: Understanding visual information along with the natural language appears to be a desiderata in recent research community. Notable efforts have made progress towards briding the fields of computer vision and natural language processing, and have opened the door to a metjods for multi-modal video question answering and dialogue tasks. In this presentation, I present recent progress on multi-modal video question answering and dialogue. To be specific, I will present two methods, Progressive Attention Memory Network (PAMN) [CVPR'19] and Modality Shifting Attention Network (MSAN) [CVPR'20], tackling multi-modal video question answering problem. Furthurmore, I will present a on-going research, Structured Co-reference Graph Attention (SCGA), tackling video-grounded dialogue system.

Bio: Junyeong Kim is a 4th-year Ph.D. student in Artificial Intelligence and Machine Learning Lab., in School of Electrical Engineering at KAIST, since 2017. He received the both B.S. and M.S. degrees in electrical engineering from the KAIST, in 2015 and 2017, respectively. His research interest lies in Multi-modal Video Question Answering, Video-grounded Dialogue, and Vision-Language Reasoning. He focus on developing AI agent that can 'observe' and 'converse' as humen does.

학술행사

Korean AI Association

국내학술대회