> 학술행사 > 국내학술대회
Abs: How can we enable robots to efficiently reason both at discrete task-level and continuous motion-level to achieve high-level goals such as tidying up a room or constructing a building? This is a challenging problem that requires integrated reasoning about the combinatoric aspects, such as deciding which object to manipulate, and the feasibility of each motion, such as collision-free constraints, to achieve goals. The classical robotics approach is to design a planner, that, given an initial state, goal, and transition model, computes a plan. The advantage of such a pure-planning approach is the immense generalization capability: for any given state and goal, the planner will find the solution if there is one. The inherent drawback, however, is that the planners do not typically make use of planning experience, and compute a plan from scratch every time it encounters a new problem. For complex problems, this renders them extremely inefficient. Alternatively, we can take a pure-learning approach where you learn, from either reinforcement signals or demonstrations, a policy that maps a state to an action to execute. The advantage here is that computing the next action to execute becomes much cheaper than pure planning because it comes down to making a prediction from a function approximator. The drawback, however, is that it is brittle. If the policy encounters a state that is very different from the ones seen in the training set, then it is likely to make mistakes and might get into a situation where it would not know how to proceed. Our approach is to take the middle-ground between these two extremes to achieve the best of both worlds. More concretely, in this talk, I will introduce several algorithms that learn to guide a planner from planning experience.
Bio: Beomjoon Kim is a director of the Intelligent mobile manipulation (IM^2) lab and Assistant Professor at the Kim Jaechul Graduate School of AI of Korea Advanced Institute of Science and Technology (KAIST). He obtained his PhD in Computer Science from MIT CSAIL, working on integrating task and motion planning and learning. Before that, he obtained his Masters at McGill University, working on combining reinforcement learning and learning from demonstrations and applying them on a wheelchair robot. Even before that, he got his undergraduate degree in computer science and statistics from the University of Waterloo. His research goal is to build an intelligent mobile-manipulator that can work in diverse and unstructured environments.
Abs: The goal of video understanding is to develop algorithms that enable machines to understand videos at the level of human experts. Researchers have tackled various domains including video classification, search, personalized recommendation, and more. Thanks to the emergence of deep learning, video understanding ability of machines has been improved significantly during the last decade. Nevertheless, the models and algorithms we are currently using for video understanding are still relatively preliminary compared to other domains like images, sounds, or natural language. In this tutorial, we are going to go over the key ideas proposed during the last decade, mainly focusing on video classification and representation learning. If time permits, we may discuss recent trends of multimodal approaches as well.
Bio: Joonseok Lee is an assistant professor at Graduate School of Data Science in Seoul National University, leading the Visual Information Processing lab (http://viplab.snu.ac.kr). Since 2015, he has been a research engineer at Google Research. He is mainly working on multi-modal video representation learning and recommendations. He earned his Ph. D. in Computer Science from Georgia Institute of Technology in 2015, under the supervision of Dr. Guy Lebanon and Prof. Hongyuan Zha. His thesis is about local approaches for collaborative filtering, with recommendation systems as the main application. He has done three internships during Ph.D, including Amazon, Microsoft Research, and Google. Before coming to Georgia Tech, he worked in NHN corp. in Korea (2007-2010). He received his B.S degree in computer science and engineering from Seoul National University. He co-organized the YouTube-8M Large-Scale Video Understanding Workshop as a program chair, and served as the publicity chair for AISTATS 2015 conference. More information is available on his website (http://www.joonseok.net).
Abs: Even though many AI-based methods have improved recently the performance of various computer vision tasks, the large scale of labeled data is essential for high performance. In this presentation, we will introduce several computer vision tasks to reduce the number of labeled data and utilize the unlabeled data, which includes visual tracking, domain adaptation, and active learning. The visual tracking task can utilize only one labeled data to track the target, but the tracker should be robust to the variance of target appearance. This robustness can be obtained by training the tracking results similar to the pseudo-labels. Thus, the visual trackers need the online training step, which results in a large computational time for visual tracking. To solve the problems, we propose several methods to reduce the computational costs of visual trackers to let them run in real-time. In addition, we also introduce our research on domain adaptation to improve the performance of image classification across various test environments. The domain adaptation improves the classification performance even though the label information of the test environment is not given when the training and test environments are different from each other. Thus, through the domain adaptation scheme, we can adapt the pre-trained AI network into the novel environment without any label cost. Finally, we introduce the active learning tasks, which select the best subset of the unlabeled samples to be labeled for the improved efficiency of labeling. A number of studies have improved the efficiency of labels, but many of them show the limitation that fails to select the effective samples in the unbalanced class dataset. To solve the limitation, we develop a novel derivation that works even in the unbalanced class dataset, which improves the real-world applicability of active learning algorithms.
2020.03~Present: Assistant Professor, Dept. of Advanced Imaging, Chung-Ang University, Seoul, Korea
2018.06~2020.02: Research Engineer, Samsung SDS, Seoul, Korea
2018.08: Ph.D., Dept. of ECE, Seoul National University, Seoul, Korea
Major Topics: Visual Tracking, Surveillance Camera, Semi-supervision
Detailed Information & Homepage: https://www.vilab.cau.ac.kr
Abs: As the size of large-scale electronic health records (EHR) grow, researchers are able to make clinical predictions, diagnoses, or recommendation with modern neural networks. The distinguishing property of EHR is its heterogenous multi-modality, but most deep learning approaches do not take full advantage of the multi-modal nature. In this tutorial, we will first briefly look at how to build basic medical prediction models based on EHR, then talk about Transformers, and how we can train machines to understand multi-modal EHR.
Bio: Edward Choi is an assistant professor of the Graduate School of AI at KAIST. He received his PhD in computer science at Georgia Tech in 2018, where he studied interpretable deep learning models for healthcare analytics under the supervision of professor Jimeng Sun. Prior to joining KAIST in 2020, Edward was a software engineer at Google Health Research, where he focused on developing deep learning models for processing electronic health records. His current research interests include machine learning, healthcare and natural language processing.
Abs: Deep learning inspired by differential equations have proliferated over the past couple of years, ranging from physics-informed neural networks to neural ordinary/controlled/rough differential equations. Many phenomena can be described by differential equations in the field of natural science, social science, finance, engineering, and so forth. In addition, deep neural network-based differential equations are now being used for various applications, such as image classification, recommendation, time-series forecasting, and so on. In those applications, implicit governing equations are approximated by neural networks. Therefore, the essential technology is how to extract (or train) those domain-specific implicit governing equations from data. In this tutorial, we will introduce technological backgrounds and successful applications in this line of research.
Bio: Dr. Noseong Park earned his CS Ph.D. from the University of Maryland, College Park at 2016. He was appointed assistant professor in the University of North Carolina at Charlotte from 2016 to 2018, and in George Mason University from 2018 to 2020. He is currently an assistant professor of Yonsei University. His main research interests are data mining, machine learning, and big data analytics.
Abs: Retrieving a document of interest from a large text corpus such as Wikipedia is a long standing challenge that has been approached in various ways as the AI community moved from rule-based models to feature-based models and deep neural networks. In this tutorial, I will give a compact, historical overview of retrieval methods, including Bag-of-Words, TF-IDF, BM25, and dense passage retrieval. I will also discuss how these methods can be sped up using inverted index and leveraging open-source libraries such as faiss.
Bio: Minjoon Seo is an Assistant Professor at KAIST Graduate School of AI. He did PhD in Computer Science at the University of Washington and BS in Electrical Engineering & Computer Science at the University of California, Berkeley. His research interest is in natural language processing and machine learning, and in particular, how knowledge data can be encoded (e.g. external memory and language model), accessed (e.g. question answering and dialog), and produced (e.g. scientific reasoning). He is the recipient of Facebook Fellowship and AI2 Key Scientific Challenges Award. He previously co-organized MRQA 2018, MRQA 2019, MRQA 2021 and RepL4NLP 2020.