讲座:Topic Modeling: Optimal Estimation and Statistical Inference 发布时间:2022-03-04

  • 活动时间:
  • 活动地址:
  • 主讲人:

题 目:Topic Modeling: Optimal Estimation and Statistical Inference

嘉 宾:Ruijia Wu(吴瑞佳), Ph.D. Candidate, University of Pennsylvania

主持人:林学民,讲席教授,上海交通大学安泰经济与管理学院

时 间:2022年3月9日(周三)9:30-11:00(腾讯会议)

(校内师生如需获取会议号和密码,请于3月8日中午12点前发送电邮至dbi@acem.sjtu.edu.cn 获取) 

内容简介

With the development of computer technology and the internet, increasingly large amounts of textual data are generated and collected every day. It is a significant challenge to analyze and extract meaningful and actionable information from vast amounts of unstructured textual data. Many machine learning and natural language processing algorithms have been developed for text classification, clustering, and information retrieval. Driven by applications in a wide range of fields, there is an increasing need for developing computationally efficient statistical methods for analyzing a massive amount of textual data with theoretical guarantees.

In the first part of the talk, I will present the algorithms of unsupervised topic modeling under the probabilistic latent semantic indexing (pLSI) model. Novel and computationally fast algorithms for estimation and inference of both the word-topic matrix and the topic-document matrix are proposed, and their theoretical properties are investigated. In the second part, I will discuss the supervised topic modeling, which jointly considers a collection of documents and their paired side information. A bias-adjusted algorithm is developed to study the regression coefficients in the supervised topic modeling under the generalized linear model formulation. I will also introduce an approach to constructing valid confidence intervals. Applications of the proposed methods reveal meaningful latent topic structures of textual data.

演讲人简介

Ruijia Wu is a Ph.D. candidate in the Department of Statistics and Data Science, the Wharton School, University of Pennsylvania. She obtained her B.A. and M.A. in Mathematics from the University of Oxford. Her research interests include statistical machine learning, high-dimensional statistics, and their applications. 

欢迎广大师生参加!