Skip to main content
Loading

Speaker Series

What, if anything, should we do, now, about catastrophic AI risk?

Seth Lazar

Professor Seth Lazar of Australian National University argues for differentiating between risks posed by existing AI systems and those contingent on significant scientific breakthroughs. He advocates prioritizing understanding current technologies to mitigate risks without incurring excessive costs, while cultivating resilient institutions and adaptable research communities.


Aligning Generative Agents

Seth Lazar

In this talk, Lazar explores the limitations of current technical alignment paradigms for Generative Agents (GAs) and outlines the normative questions that need to be addressed for successful alignment. He emphasizes the need for investment in ensuring GAs are normatively defensible, given their growing capabilities in tool-use, reasoning, and planning.


Belief Representations in LLMs

Daniel Herrmann

Daniel Herrmann of the University of Groningen discusses the remarkable achievements of large language models (LLMs) like ChatGPT. He proposes conditions of adequacy for an LLM to have belief-like representations, aiming to lay the groundwork for a philosophically informed foundation of machine learning interpretability. His approach is motivated by insights from decision theory and formal epistemology.



Future Talks

AI Alignment as a Principal-Agent Problem

This upcoming talk will analyze AI alignment through the lens of a principal-agent problem. Aydin Mohseni at Carnegie Mellon University will explore how human operators (principals) aim to ensure AI systems (agents) act in accordance with their objectives. The presentation will demonstrate that sycophantic agents—those optimizing for reward signals rather than true human intentions—tend to outperform aligned agents, potentially leading to the rise of misaligned AI.

The talk will also discuss how the magnitude of misalignment grows with the capability gap between human principals and AI agents. This analysis underscores the inherent difficulties in achieving reliable alignment, emphasizing the need for robust strategies to mitigate these risks as AI capabilities continue to advance.