讲座:Evaluating Multimodal AI: A Dual Perspective on Image Understanding and Generation Capabilities 发布时间:2025-05-08

嘉 宾:Zhenhui Jiang Professor  University of Hong Kong

主持人:徐海峰  副教授  上海交通大学安泰经济与管理学院

时  间:2025年5月26日(周一)10:30-12:00

地  点:上海交通大学徐汇校区安泰楼A511

 

内容简介:

The rapid development of multimodal AI models has demonstrated remarkable progress in visual understanding and image generation. This project aims to develop a theoretical framework to evaluate the image-related capabilities of state-of-the-art AI models and to empirically assess their performance. We evaluate image understanding across three core dimensions: visual perception and recognition, visual reasoning and analysis, and visual aesthetics and creativity—while also incorporating safety and responsibility. Using a carefully constructed test suite comprising both curated and newly developed questions, we assessed 20 leading models and found that GPT-4o and Claude ranked highest overall, while several Chinese models—including Tongyi Qianwen-VL and Step-1V—performed well, particularly when safety metrics were considered.

 

For image generation, we focus on two core tasks: new image creation and image revision. Drawing on multi-dimensional test sets, we evaluated 23 models, including 15 text-to-image models and 7 multimodal large language models (LLMs). Results show that ByteDance‘s Dreamina and Doubao, along with Baidu‘s Ernie Bot, led in both content quality for new image generation and effectiveness in image revision. Notably, multimodal LLMs outperformed text-to-image models overall. Taken together, our evaluations offer a rigorous, comparative perspective on how today‘s leading AI models perform in image-related tasks, providing valuable benchmarks for future research and practical deployment.

 

演讲人简介:

Zhenhui (Jack) Jiang is a professor of Innovation and Information Management and the Padma and Hari Harilela Professor in Strategic Information Management at HKU Business School. He previously served as the Area Head of Innovation and Information Management at HKU. Prior to joining the University of Hong Kong, he was a full professor of Information Systems and Analytics at School of Computing, National University of Singapore.

 

Prof. Jiang‘s research explores the broad economic and behavioral impacts of cutting-edge information technologies (IT) as well as the effective user interface design of such technologies. His work addresses significant issues in AI, human computer interaction, digital innovation, e-commerce, information privacy, social media, and healthcare. Professor Jiang currently serves as a Senior Editor for MIS Quarterly and has previously served on editorial boards of many leading Information Systems journals such as Journal of AIS (Senior Editor), Information Systems Research (Associate Editor), MIS Quarterly (Associate Editor), IEEE Transactions of Engineering Management. Prof. Jiang‘s research has been published in premier business journals, such as MIS Quarterly, Information Systems Research, Management Science, and Journal of MIS, as well as in top-tier Computer Science conferences, like CHI.

 

欢迎广大师生参加!