Suggested Searches

AI/ML STIG Lecture Series

Artificial Intelligence and Machine Learning Science and Technology Interest Group (AI/ML STIG)

Module 7: Reinforcement Learning

AI/ML STIG about AI/ML STIG Lecture Series

Location

Virtual

Dates

11 May 2026
4:00pm ET

Community

AI/ML STIG

Type

Seminar

Reinforcement Learning Fundamentals

Speaker

Carol Cuesta-Lazaro, IAS/Flatiron

An introduction to reinforcement learning from its classical foundations to its central role in modern large language models. Trace the arc from TD-Gammon and AlphaGo through DQN and multi-agent emergent behavior to RLHF and RLVR, and learn how policy gradients turn a non-differentiable reward signal into a usable training objective.

Topics Covered

  • A brief history of RL: TD-Gammon, AlphaGo, DQN, RLHF, and RLVR
  • The agent-environment loop: states, actions, policies, and rewards
  • How RL differs from supervised learning: shifting data, exploration vs. exploitation, evaluative and delayed rewards
  • The non-differentiability of the learning problem and policy gradients
  • The REINFORCE estimator, baselines, and reward-to-go for variance reduction
  • Reinforcement Learning from Human Feedback (RLHF) for instruction-tuned models
  • Reinforcement Learning from Verifiable Rewards (RLVR) and reasoning in LLMs
  • Open questions: is RL teaching new capabilities or sharpening existing ones?
Session Recording

Meeting Connection

Join the Meeting

News Straight to Your Inbox

Subscribe to your community email news list

We will never share your email address.

Sign Up
Angled from the upper left corner to the lower right corner is a cone-shaped orange-red cloud known as Herbig-Haro 49/50. This feature takes up about three-fourths of the length of this angle. The upper left end of this feature has a translucent, rounded end. The conical feature widens slightly from the rounded end at the upper right down to the lower right. Along the cone there are additional rounded edges, like edges of a wave, and intricate foamy-like details, as well as a clearer view of the black background of space. In the upper left, overlapping with the rounded end of Herbig-Haro 49/50, is a background spiral galaxy with a concentrated blue center that fades outward to blend with red spiral arms. The background of space is speckled with some white stars and smaller, more numerous, fainter white galaxies throughout.