> ÇмúÇà»ç > ±¹³»Çмú´ëȸ
ÀÌ¿ëÀç ±³¼ö(University of Wisconsin-Madison)
Title: Beyond Understanding: Toward Controllable and Agentic Multimodal Models
Abs:
The field of AI has been undergoing a transformative shift with the emergence of generalist models capable of performing a wide range of understanding and generation tasks. Trained on massive, internet-scale datasets---often unlabeled or weakly labeled---many of these models are multimodal, seamlessly integrating vision, language, audio, and action. In this talk, I will first present our work on LLaVA, a family of intelligent assistants that interpret the visual world and communicate naturally in language. I will highlight strategies to make these models more controllable, efficient, and agentic, enabling them to not only describe but also act upon the world around them. I will conclude with reflections on current limitations and opportunities for advancing toward more grounded, interactive AI systems.
Bio:
Yong Jae Lee is a Professor in the Department of Computer Sciences at the University of Wisconsin-Madison, and a Research Scientist at Adobe Research. His core research interests are in computer vision and machine learning, with a focus on creating robust AI systems that can understand our multimodal world with minimal human supervision. Before joining UW-Madison in 2021, he spent one year as an AI Visiting Faculty at Cruise, and before that, six years as an Assistant and then Associate Professor at UC Davis. He received his PhD from the University of Texas at Austin in 2012 advised by Kristen Grauman, and was a postdoc at Carnegie Mellon University (2012-2013) and UC Berkeley (2013-2014) advised by Alyosha Efros. He is a recipient of the Army Research Office Young Investigator Program Award, NSF CAREER Award, industry awards from Amazon, Adobe, Samsung, and Sony, UC Davis College of Engineering Outstanding Junior Faculty Award, UW-Madison SACM Student Choice Professor of the Year Award, Susan Beth Horwitz Professorship, and H. I. Romnes Faculty Fellowship. He and his collaborators received the Most Innovative Award at the COCO Object Detection Challenge ICCV 2019 and the Best Paper Award at BMVC 2020.
Á¶°æÇö ±³¼ö(New York University)
Title: Reality Checks
Despite its amazing success, leaderboard chasing has become something researchers dread and mock. When implemented properly and executed faithfully, leaderboard chasing can lead to both faster and easily reproducible progress in science, as evident from the amazing progress we have seen with machine learning, or more broadly artificial intelligence, in recent decades. It does not however mean that it is easy to implement and execute leaderboard chasing properly. In this talk, I will go over four case studies demonstrating the issues that ultimately prevent leaderboard chasing from a valid scientific approach. The first case study is on the lack of proper hyperparameter tuning in continual learning, the second on the lack of consensus on evaluation metrics in machine unlearning, the third on the challenges of properly evaluating the evaluation metrics in free-form text generation, and the final one on wishful thinking. By going over these cases, I hope we can collectively acknowledge some of our own fallacies, think of underlying causes behind these fallacies and come up with better ways to approach artificial intelligence research.
- ÇзÂ
- ÁÖ¿ä°æ·Â