学术交流

山东省人工智能大讲堂(第12期)


报告人:王井东、刘武、乔文静
报告时间:2021-08-27 09:00:00
报告地点:线上


会议议程:

9:00
致辞
9:05-9:50

王井东,微软亚洲研究院

报告题目:Transformer Encoder and Decoder for Visual Recognition

9:50-10:35

刘武,京东AI研究院

报告题目:智能供应链中的机器视觉
10:35-11:20

乔文静,山东博昂信息科技有限公司

报告题目:AI算法在自然场景文本检测识别中的应用


报告题目Transformer Encoder and Decoder for Visual Recognition

报告摘要

There are two major applications for Transformer in visual recognition: encoder (backbone) and decoder. In this talk, I will talk about the relation between dynamic depth-wise convolution and local attention in Transformer encoder. Then, I will present two applications of Transformer decoder: one is OCRNet for semantic segmentation and the other one is conditional DETR for object detection. Finally, I will illustrate that the attention mechanism in Transformer is more intuitive and explainable than convolution for visual recognition.

讲者信息

Jingdong Wang is a Senior Principal Research Manager with the Visual Computing Group at Microsoft Research Asia, Beijing, China. He received the B.Eng. and M.Eng. degrees from the Department of Automation at Tsinghua University in 2001 and 2004, respectively, and the PhD degree from the Department of Computer Science and Engineering, the Hong Kong University of Science and Technology, Hong Kong, in 2007. His areas of interest include neural network design, human pose estimation, large-scale indexing, and person re-identification. He is/was an Associate Editor of the IEEE TPAMI, the IEEE TMM, the IEEE TCSVT and International Journal of Computer Vision, and is an area chair of several leading Computer Vision and AI conferences, such as CVPR, ICCV, ECCV, ACM MM, IJCAI, and AAAI. He was elected as an IAPR Fellow, an ACM Distinguished Member, and an Industrial Distinguished Lecturer Program (iDLP) speaker of the IEEE Circuits and Systems Society.


His representative works include deep high-resolution network (HRNet), interleaved group convolutions, discriminative regional feature integration (DRFI) for supervised saliency detection, neighborhood graph search (NGS, SPTAG) for large scale similarity search, composite quantization for compact coding, and so on. He has shipped a number of technologies to Microsoft products, including Bing search, Bing Ads, Cognitive service, and XiaoIce Chatbot. The NGS algorithm developed in his group serves as a basic building block in many Microsoft products. In the Bing image search engine, the key color filter function is based on the salient object algorithm developed in his group. He has pioneered in the development of a commercial color-sketch image search system. More information about Dr. Jingdong Wang can be found at https://jingdongwang2017.github.io/.


报告题目:智能供应链中的机器视觉

报告摘要

随着全球经济的快速发展,高效、协同的供应链在全球市场化进程中体现出必不可少的地位。供应链已由传统阶段发展到全新的智能供应链阶段。智能供应链将成为推动产业升级,尤其是实体经济转型的重要抓手。依托全行业价值链数据和人工智能技术积累,京东与多家高校科研院所和企业共同研究打造了国家级智能供应链人工智能开放创新平台,并对外进行开源和开放服务。本次报告将介绍智能供应链平台的最新研究进展,并重点介绍机器视觉技术如何助力供应链升级,为全链条降本增效。

讲者信息

刘武,现任京东AI研究院资深研究员,2015年博士毕业于中科院计算所。研究方向为多媒体与计算机视觉,在一流国际会议和期刊上发表文章70余篇。曾获得IEEE Tran. on Multimedia 2019最佳论文奖, IEEE Multimedia Magazine 2018最佳论文奖,IEEE ICME 2016最佳学生论文奖和中科院优秀博士学位论文奖。在JD率领团队获得过IEEE CVPR 2018全球人体姿态估计挑战赛两个任务的冠军,并打造了智能结算台、智慧园区ReID系统等落地产品。曾担任国际会议ACM MM Asia 2021程序委员会主席, AAAI 2021、ACM MM 2019~2021等领域主席。


报告题目:AI算法在自然场景文本检测识别中的应用

报告摘要

汇报主要内容是AI算法在自然场景文字识别中的泛应用。包括传统机器学习方法在自然场景文字检测与识别方面的应用与面临的问题;基于深度学习算法的文字识别相关技术研究以及深度学习算法的优化、加速与落地。

讲者信息

乔文静,硕士,毕业于西北工业大学,研究方向为人工智能、机器学习、模式识别,现为山东博昂信息科技有限公司算法工程师。主要从事基于机器学习的图像处理与视频分析的算法开发工作,包括OCR字符识别、车辆属性分析、大场景视频结构化处理、语义分割、人体行为分析等。


关闭窗口】  【返回顶部】  【 打印文章