致辞 |
9:05-9:50 |
王井东,微软亚洲研究院 报告题目:Transformer Encoder and Decoder for Visual Recognition |
9:50-10:35 |
刘武,京东AI研究院 报告题目:智能供应链中的机器视觉
10:35-11:20 |
乔文静,山东博昂信息科技有限公司 报告题目:AI算法在自然场景文本检测识别中的应用 |
报告题目:Transformer Encoder and Decoder for Visual Recognition
There are two major applications for Transformer in visual recognition: encoder (backbone) and decoder. In this talk, I will talk about the relation between dynamic depth-wise convolution and local attention in Transformer encoder. Then, I will present two applications of Transformer decoder: one is OCRNet for semantic segmentation and the other one is conditional DETR for object detection. Finally, I will illustrate that the attention mechanism in Transformer is more intuitive and explainable than convolution for visual recognition.
Jingdong Wang is a Senior Principal Research Manager with the Visual Computing Group at Microsoft Research Asia, Beijing, China. He received the B.Eng. and M.Eng. degrees from the Department of Automation at Tsinghua University in 2001 and 2004, respectively, and the PhD degree from the Department of Computer Science and Engineering, the Hong Kong University of Science and Technology, Hong Kong, in 2007. His areas of interest include neural network design, human pose estimation, large-scale indexing, and person re-identification. He is/was an Associate Editor of the IEEE TPAMI, the IEEE TMM, the IEEE TCSVT and International Journal of Computer Vision, and is an area chair of several leading Computer Vision and AI conferences, such as CVPR, ICCV, ECCV, ACM MM, IJCAI, and AAAI. He was elected as an IAPR Fellow, an ACM Distinguished Member, and an Industrial Distinguished Lecturer Program (iDLP) speaker of the IEEE Circuits and Systems Society.
His representative works include deep high-resolution network (HRNet), interleaved group convolutions, discriminative regional feature integration (DRFI) for supervised saliency detection, neighborhood graph search (NGS, SPTAG) for large scale similarity search, composite quantization for compact coding, and so on. He has shipped a number of technologies to Microsoft products, including Bing search, Bing Ads, Cognitive service, and XiaoIce Chatbot. The NGS algorithm developed in his group serves as a basic building block in many Microsoft products. In the Bing image search engine, the key color filter function is based on the salient object algorithm developed in his group. He has pioneered in the development of a commercial color-sketch image search system. More information about Dr. Jingdong Wang can be found at https://jingdongwang2017.github.io/.
刘武,现任京东AI研究院资深研究员,2015年博士毕业于中科院计算所。研究方向为多媒体与计算机视觉,在一流国际会议和期刊上发表文章70余篇。曾获得IEEE Tran. on Multimedia 2019最佳论文奖, IEEE Multimedia Magazine 2018最佳论文奖,IEEE ICME 2016最佳学生论文奖和中科院优秀博士学位论文奖。在JD率领团队获得过IEEE CVPR 2018全球人体姿态估计挑战赛两个任务的冠军,并打造了智能结算台、智慧园区ReID系统等落地产品。曾担任国际会议ACM MM Asia 2021程序委员会主席, AAAI 2021、ACM MM 2019~2021等领域主席。