Focus Areas

1. AI safety

Making text generation models like OpenAI's ChatGPT safe for humans to use is a problem that cannot be solved by algorithms alone. A collective effort to adjust the governance of AI, design content filters, continuously monitor and probe the vulnerabilites is needed.

Nominal LLM GIF
Large language models, like OpenAI's ChatGPT, are vulnerable to attacks that cause these models to generate harmful content.

The technical part of my research agenda involves designing attacks, defenses, and benchmarks to stress test large models that process text, images, and speech. For example, I proposed SmoothLLM, one of the first defenses against jailbreaking attacks.

Nominal LLM GIF
SmoothLLM defends large language models against adversarial attacks.

I'm also interested in (1) understanding the mechanisms and data that cause large models to generate harmful content and (2) measuring the vulnerabilities of large models when used in fields like robotics.

Nominal LLM GIF
We show that LLM-controlled robots can be jailbroken with 100% success rates.

Outside of academia, I'm interested in contributing to the ongoing debate about how AI models should be goverened. I was recently part of a public policy proposal and open letter, which were later covered in The Washington Post, calling for more robust oversight of large models.

2. Out-of-distribution generalization

Deep learning has an amazing capacity to recognize and interpret the data it sees during training. But what happens when neural networks interacts with data very different from what they've seen before?

Nominal LLM GIF
An overview of out-of-distribution generalization in medical imaging.

This problem is called out-of-distribution (OOD) generalization. My work in this area, which uses tools from robust optimization theory and generative models, has looked at OOD problems in self-driving, medical imaging, and drug discovery. I am also interested in algorithms that yield provable guarantees on the performance of models when evaluated OOD.

3. Adversarial Robustness

Much has been written about the the tendency of neural networks to make incorrect predictions when their input data is perturbed by a malicious, or even adversarial, user. Despite thousands of papers on the topic, it remains unclear how to make neural networks more robust.

Nominal LLM GIF
View of adversarial robustness from the dual perspective.

My work on robustness is guided my two mantras:

  • Designing robust defences requires first identifying strong attacks.
  • Vulnerabilites should be identified, open-sourced, and resolved as fast as possible, but no faster.

I'm interested in designing new attacks and defenses for neural networks in the setting of perturbation-based, norm-bounded adversaries, and in understanding the fundamental, statistical limits of how robust different architectures can be.

Nominal LLM GIF
Interpolation between average- and worst-case robustness.

My research involves duality-inspired defense algorithms and probabilisitc perspectives that interpolates between average- and worst-case robustness.