Research

My goal is to make AI safe for people to use. Because modern AIs are so complex and capable, finding ways to make AI safe involves tools from different areas of math, statistics, and engineering.

Focus Areas

1. AI safety

Making text generation models like OpenAI's ChatGPT safe for humans to use is a problem that cannot be solved by algorithms alone. A collective effort to adjust the governance of AI, design content filters, continuously monitor and probe the vulnerabilites is needed.

Nominal LLM GIF
Large language models, like OpenAI's ChatGPT, are vulnerable to attacks that cause these models to generate harmful content.

The technical part of my research agenda is to design attacks, defenses, and benchmarks to stress test large models that process text, images, and speech. I'm also interested in (1) understanding the mechanisms and data that cause large models to generate harmful content and (2) measuring the vulnerabilities of large models when used in fields like robotics.

Nominal LLM GIF
SmoothLLM defends large language models against adversarial attacks.

Outside of academia, I'm interested in contributing to the ongoing debate about how AI models should be goverened. I was recently part of a public policy proposal and open letter, which were later covered in The Washington Post, calling for more robust oversight of large models.

2. Out-of-distribution generalization

Deep learning has an amazing capacity to recognize and interpret the data it sees during training. But what happens when neural networks interacts with data very different from what they've seen before?

Nominal LLM GIF
An overview of out-of-distribution generalization in medical imaging.

This problem is called out-of-distribution (OOD) generalization. My work in this area, which uses tools from robust optimization theory and generative models, has looked at OOD problems in self-driving, medical imaging, and drug discovery. I am also interested in algorithms that yield provable guarantees on the performance of models when evaluated OOD.

3. Adversarial Robustness

Much has been written about the the tendency of neural networks to make incorrect predictions when their input data is perturbed by a malicious, or even adversarial, user. Despite thousands of papers on the topic, it remains unclear how to make neural networks more robust.

Nominal LLM GIF
View of adversarial robustness from the dual perspective.

My work on robustness is guided my two mantras:

  • Designing robust defences requires first identifying strong attacks.
  • Vulnerabilites should be identified, open-sourced, and resolved as fast as possible, but no faster.

I'm interested in designing new attacks and defenses for neural networks in the setting of perturbation-based, norm-bounded adversaries, and in understanding the fundamental, statistical limits of how robust different architectures can be. My research involves duality-inspired defense algorithms and probabilisitc perspectives of robustness.

Publications

JailbreakBench

Jailbreaking LLM-Controlled Robots

Alexander Robey, Zachary Ravichandran, Vijay Kumar, Hamed Hassani, George J. Pappas

JailbreakBench

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Patrick Chao*, Edoardo Debenedetti*, Alexander Robey*, Maksym Andriushchenko*, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

Text-to-image

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter

Safe Harbor

A Safe Harbor for AI Evaluation and Red Teaming

Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Semantic smoothing

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Jiabao Ji*, Bairu Hou*, Alexander Robey*, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

Model-based verification

Data-Driven Modeling and Verification of Perception-Based Autonomous Systems

Thomas Waite, Alexander Robey, Hassani Hamed, George J. Pappas, Radoslav Ivanov

PAIR

Jailbreaking Black Box Large Language Models in Twenty Queries

Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

SmoothLLM

SmoothLLM: Defending Large Language Models against Jailbreaking Attacks

Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas

Non-zero-sum AT

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Alexander Robey*, Fabian Latorre*, George J. Pappas, Hamed Hassani, Volkan Cevher

ROCBF

Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations

Lars Lindemann, Alexander Robey, Lejun Jiang, Satyajeet Das, Stephen Tu, Nikolai Matni

Distribution shift verification

Toward Certified Robustness Against Real-World Distribution Shifts

Haoze Wu*, Teruhiro Tagomori*, Alexander Robey*, Fengjun Yang*, Nikolai Matni, George J. Pappas, Hamed Hassani, Corina Pasareanu, Clark Barrett

QRM

Probable Domain Generalization via Quantile Risk Minimization

Cian Eastwood*, Alexander Robey*, Shashank Singh, Julius von Kügelgen, Hamed Hassani, George J. Pappas, Bernhard Schölkopf

Stable imitation learning

On the Sample Complexity of Stability Constrained Imitation Learning

Stephen Tu, Alexander Robey, Tingnan Zhang, Nikolai Matni

Chordally sparse LipSDP

Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks

Anton Xue, Lars Lindemann, Alexander Robey, Hamed Hassani, George J. Pappas, Rajeev Alur

Long-tailed robustness

Do Deep Networks Transfer Invariances Across Classes?

Allan Zhou*, Fahim Tajwar*, Alexander Robey, Tom Knowles, George J. Pappas, Hamed Hassani, Chelsea Finn

Probabilistic robustness

Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Alexander Robey, Luiz F. O. Chamon, George J. Pappas, Hamed Hassani

Semi-infinite robustness

Adversarial Robustness with Semi-Infinite Constrained Learning

Alexander Robey*, Luiz Chamon*, George J. Pappas, Hamed Hassani, Alejandro Ribeiro

MBDG

Model-Based Domain Generalization

Alexander Robey, George J. Pappas, Hamed Hassani

CDCG

Optimal Algorithms for Submodular Maximization With Distributed Constraints

Alexander Robey, Arman Adibi, Brent Schlotfeldt, Hamed Hassani, George J. Pappas

RHCBF

Learning Robust Hybrid Control Barrier Functions for Uncertain Systems

Alexander Robey*, Lars Lindemann*, Stephen Tu, Nikolai Matni

HCBF

Learning Hybrid Control Barrier Functions from Data

Lars Lindemann, Haimin Hu, Alexander Robey, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

CBF

Learning Control Barrier Functions from Expert Demonstrations

Alexander Robey*, Haimin Hu*, Lars Lindemann, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

Adversarial trade-off

Provable Tradeoffs in Adversarially Robust Classification

Edgar Dobriban, Hamed Hassani, David Hong, Alexander Robey

MBRDL

Model-Based Robust Deep Learning: Generalizing to Natural, Out-of-Distribution Data

Alexander Robey, Hamed Hassani, George J. Pappas

LipSDP

Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, George J. Pappas

Fourier Ptychography

Optimal Physical Preprocessing for Example-Based Super-Resolution

Alexander Robey and Vidya Ganapati