讲座:Basics of Pre-Training A Large Language Model 发布时间:2026-07-03

  • 活动时间:
  • 活动地址:
  • 主讲人:

题    目:Basics of Pre-Training A Large Language Model

嘉 宾:周正元 副教授 纽约大学

主持人:曾智宇 助理教授 上海交通大学安泰经济与管理学院

时 间:2026年7月10日(周五)14:00-15:30

地 点:上海交通大学安泰经济与管理学院包兆龙图书馆A511

内容简介:

This talk provides a simple tutorial on how a large language model (LLM) is pre-trained. The focus is on generative pre-trained transformer (GPT), which generates next-token prediction in an auto-regressive way. Starting with the historical background preceding transformers, the talk walks through various steps of the LLM pre-training pipeline as well as the components of the transformer (most notably the attention mechanism). The main goal of the talk is to equip the audience with a basic understanding of how pre-training works, so that he/she can engage (now commonly occurring) relevant conversations and/or literature without feeling lost.

演讲人简介:

Zhengyuan Zhou is currently an associate professor in New York University Stern School of Business, Department of Technology, Operations and Statistics. Before joining NYU Stern, Professor Zhou spent the year 2019-2020 as a Goldstine research fellow at IBM research. He received his BA in Mathematics and BS in Electrical Engineering and Computer Sciences, both from UC Berkeley, and subsequently a PhD in Electrical Engineering from Stanford University in 2019. His research interests lie at the intersection of machine learning, stochastic optimization and game theory and focus on leveraging tools from those fields to develop methodological frameworks to solve data-driven decision-making problems.

 

欢迎广大师生参加!