> ÇмúÇà»ç > ±¹³»Çмú´ëȸ
Abs:
À½¼ºÀνÄÀÇ ±âº» °³³äÀ» ÀÌÇØÇÏ´Â °ÍÀ» ¸ñÇ¥·Î ÇÑ´Ù. Seq2Seq °üÁ¡¿¡¼ À½¼ºÀνÄÀÇ ¹®Á¦¸¦ Á¤ÀÇÇÑ´Ù. µö·¯´× ±â¹ÝÀÇ À½¼ºÀÎ½Ä ÁÖ¿ä ±â¼úÀ» DNN-WFST, CTC, RNN-T, Seq2Seq, Tramsformer Áß½ÉÀ¸·Î »ìÆ캸°í, ¾ð¾î¸ðµ¨Àº n-gram Áß½ÉÀ¸·Î, µðÄÚµù ³×Æ®¿öÅ©´Â WFST Áß½ÉÀ¸·Î °øºÎÇÑ´Ù.
¼ö°´ë»ó: À½¼ºÀÎ½Ä °øÇÐÀû ¹®Á¦ Á¤ÀÇ¿Í ¾Ë°í¸®ÁòÀº óÀ½ Á¢ÇÏ´Â ºÐ (´Ü, MLP, Back-propagationÀº ÀÌÇØÇÏ°í ÀÖ´Â °ÍÀ» ÀüÁ¦·Î ÁøÇàÇÔ, RNN, Seq-to-seq, attention, Transformer¿¡ ´ëÇØ ±âº» °³³äÀ» °øºÎÇÑ ÀûÀÌ ÀÖÀ¸¸é, °ÀÇ ³»¿ë ÀÌÇØ¿¡ µµ¿òÀÌ µÊ)
Bio:
Supervised learning with deep neural networks has brought phenomenal advances to many fields of research, but the performance of such systems relies heavily on the quality and quantity of annotated databases tailored to the particular application. It can be prohibitively difficult to manually collect and annotate databases for every task. There is a plethora of data on the internet that is not used in machine learning due to the lack of such annotations. Self-supervised learning allows a model to learn representations using properties inherent in the data itself, such as natural co-occurrence. In this talk, I will introduce recent works on self-supervised learning of audio and speech representations. Recent works demonstrates that phonetic and semantic representations of audio and speech can be learnt from unlabelled audio and video. The learnt representations can be used for downstream tasks such as automatic speech recognition, speaker recognition, face recognition and lip reading. Other noteworthy applications include localizing sound sources in images and separating simultaneous speech from video.
Representation learning is a fundamental task in machine learning as the success of machine learning relies on the quality of representation. Over the last decade, visual representation learning is usually done by supervised learning, taking deep convolutional neural network (CNN) features trained on a labeled large-scale image dataset. However, recent works in self-supervised learning (SSL) suggest that representation learning without labels is effective, often outperforming its supervised counterpart. In this talk, I will first summarize recent advances in self-supervised representation learning for visual recognition. Then, I will describe their limitations and discuss several ways to improve them.