Alex Robey :: Research

Research

My goal is to make AI safe for people to use. Because modern AIs are so complex and capable, finding ways to make AI safe involves tools from different areas of math, statistics, and engineering.

Focus Areas

1. AI safety

Making text generation models like OpenAI's ChatGPT safe for humans to use is a problem that cannot be solved by algorithms alone. A collective effort to adjust the governance of AI, design content filters, continuously monitor and probe the vulnerabilites is needed.

Nominal LLM GIF — Large language models, like OpenAI's ChatGPT, are vulnerable to attacks that cause these models to generate harmful content.

The technical part of my research agenda is to design attacks, defenses, and benchmarks to stress test large models that process text, images, and speech. I'm also interested in (1) understanding the mechanisms and data that cause large models to generate harmful content and (2) measuring the vulnerabilities of large models when used in fields like robotics.

Outside of academia, I'm interested in contributing to the ongoing debate about how AI models should be goverened. I was recently part of a public policy proposal and open letter, which were later covered in The Washington Post, calling for more robust oversight of large models.

2. Out-of-distribution generalization

Deep learning has an amazing capacity to recognize and interpret the data it sees during training. But what happens when neural networks interacts with data very different from what they've seen before?

This problem is called out-of-distribution (OOD) generalization. My work in this area, which uses tools from robust optimization theory and generative models, has looked at OOD problems in self-driving, medical imaging, and drug discovery. I am also interested in algorithms that yield provable guarantees on the performance of models when evaluated OOD.

3. Adversarial Robustness

Much has been written about the the tendency of neural networks to make incorrect predictions when their input data is perturbed by a malicious, or even adversarial, user. Despite thousands of papers on the topic, it remains unclear how to make neural networks more robust.

My work on robustness is guided my two mantras:

Designing robust defences requires first identifying strong attacks.
Vulnerabilites should be identified, open-sourced, and resolved as fast as possible, but no faster.

I'm interested in designing new attacks and defenses for neural networks in the setting of perturbation-based, norm-bounded adversaries, and in understanding the fundamental, statistical limits of how robust different architectures can be. My research involves duality-inspired defense algorithms and probabilisitc views robustness.

Publications

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Patrick Chao*, Edoardo Debenedetti*, Alexander Robey*, Maksym Andriushchenko*, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

arXiv | code | leaderboard | dataset | bibtex NeurIPS 2024

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter

arXiv | code | bibtex Under review

A Safe Harbor for AI Evaluation and Red Teaming

Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Jiabao Ji*, Bairu Hou*, Alexander Robey*, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

arXiv | code | bibtex Under review

Data-Driven Modeling and Verification of Perception-Based Autonomous Systems

Thomas Waite, Alexander Robey, Hassani Hamed, George J. Pappas, Radoslav Ivanov

arXiv | code | bibtex Under review

Jailbreaking Black Box Large Language Models in Twenty Queries

Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

arXiv | code | VentureBeat article | bibtex Under review

SmoothLLM: Defending Large Language Models against Jailbreaking Attacks

Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas

arXiv | code | DebugML blog | Penn article | bibtex Under review

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Alexander Robey*, Fabian Latorre*, George J. Pappas, Hamed Hassani, Volkan Cevher

arXiv | EPFL article | bibtex ICLR 2024

Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations

Lars Lindemann, Alexander Robey, Lejun Jiang, Satyajeet Das, Stephen Tu, Nikolai Matni

arXiv | code | bibtex OJCSYS 2024

Toward Certified Robustness Against Real-World Distribution Shifts

Haoze Wu*, Teruhiro Tagomori*, Alexander Robey*, Fengjun Yang*, Nikolai Matni, George J. Pappas, Hamed Hassani, Corina Pasareanu, Clark Barrett

arXiv | bibtex SaTML 2023

Probable Domain Generalization via Quantile Risk Minimization

Cian Eastwood*, Alexander Robey*, Shashank Singh, Julius von Kügelgen, Hamed Hassani, George J. Pappas, Bernhard Schölkopf

arXiv | code | bibtex NeurIPS 2022

On the Sample Complexity of Stability Constrained Imitation Learning

Stephen Tu, Alexander Robey, Tingnan Zhang, Nikolai Matni

arXiv | bibtex L4DC 2022 (Oral)

Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks

Anton Xue, Lars Lindemann, Alexander Robey, Hamed Hassani, George J. Pappas, Rajeev Alur

arXiv | code | bibtex CDC 2022

Do Deep Networks Transfer Invariances Across Classes?

Allan Zhou*, Fahim Tajwar*, Alexander Robey, Tom Knowles, George J. Pappas, Hamed Hassani, Chelsea Finn

arXiv | code | bibtex ICLR 2022

Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Alexander Robey, Luiz F. O. Chamon, George J. Pappas, Hamed Hassani

arXiv | code | bibtex ICML 2022

Adversarial Robustness with Semi-Infinite Constrained Learning

Alexander Robey*, Luiz Chamon*, George J. Pappas, Hamed Hassani, Alejandro Ribeiro

arXiv | code | bibtex NeurIPS 2021

Model-Based Domain Generalization

Alexander Robey, George J. Pappas, Hamed Hassani

arXiv | code | bibtex NeurIPS 2021

Optimal Algorithms for Submodular Maximization With Distributed Constraints

Alexander Robey, Arman Adibi, Brent Schlotfeldt, Hamed Hassani, George J. Pappas

arXiv | bibtex L4DC 2021

Learning Robust Hybrid Control Barrier Functions for Uncertain Systems

Alexander Robey*, Lars Lindemann*, Stephen Tu, Nikolai Matni

arXiv | code | bibtex ADHS 2021

Learning Hybrid Control Barrier Functions from Data

Lars Lindemann, Haimin Hu, Alexander Robey, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

arXiv | code | bibtex CoRL 2020

Learning Control Barrier Functions from Expert Demonstrations

Alexander Robey*, Haimin Hu*, Lars Lindemann, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

arXiv | code | bibtex CDC 2020

Provable Tradeoffs in Adversarially Robust Classification

Edgar Dobriban, Hamed Hassani, David Hong, Alexander Robey

arXiv | bibtex Trans. on Information Theory

Model-Based Robust Deep Learning: Generalizing to Natural, Out-of-Distribution Data

Alexander Robey, Hamed Hassani, George J. Pappas

arXiv | code | bibtex arXiv

Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, George J. Pappas

arXiv | code | bibtex NeurIPS 2019 (Spotlight)

Optimal Physical Preprocessing for Example-Based Super-Resolution

Alexander Robey and Vidya Ganapati

arXiv | bibtex Optics Express