Alex Robey :: Papers

Safety Guardrails for LLM-Enabled Robots

Zachary Ravichandran, Alexander Robey, Vijay Kumar, George J. Pappas, Hamed Hassani

arXiv | website | bibtex Under review

Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks

Hanjiang Hu, Alexander Robey, Changliu Liu

arXiv | bibtex Under review

Jailbreaking LLM-Controlled Robots

Alexander Robey, Zachary Ravichandran, Vijay Kumar, Hamed Hassani, George J. Pappas

Jailbreaking Black Box Large Language Models in Twenty Queries

Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong

arXiv | code | WIRED article | VentureBeat article | bibtex SatML 2025

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Patrick Chao*, Edoardo Debenedetti*, Alexander Robey*, Maksym Andriushchenko*, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramer, Hamed Hassani, Eric Wong

arXiv | code | leaderboard | dataset | bibtex NeurIPS 2024

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation

Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J. Zico Kolter

arXiv | code | bibtex Under review

A Safe Harbor for AI Evaluation and Red Teaming

Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng-Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Sandy Pentland, Arvind Narayanan, Percy Liang, Peter Henderson

Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing

Jiabao Ji*, Bairu Hou*, Alexander Robey*, George J. Pappas, Hamed Hassani, Yang Zhang, Eric Wong, Shiyu Chang

arXiv | code | bibtex Under review

Data-Driven Modeling and Verification of Perception-Based Autonomous Systems

Thomas Waite, Alexander Robey, Hassani Hamed, George J. Pappas, Radoslav Ivanov

arXiv | code | bibtex Under review

SmoothLLM: Defending Large Language Models against Jailbreaking Attacks

Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas

arXiv | code | DebugML blog | Penn article | bibtex Under review

Adversarial Training Should Be Cast as a Non-Zero-Sum Game

Alexander Robey*, Fabian Latorre*, George J. Pappas, Hamed Hassani, Volkan Cevher

arXiv | EPFL article | bibtex ICLR 2024

Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations

Lars Lindemann, Alexander Robey, Lejun Jiang, Satyajeet Das, Stephen Tu, Nikolai Matni

arXiv | code | bibtex OJCSYS 2024

Toward Certified Robustness Against Real-World Distribution Shifts

Haoze Wu*, Teruhiro Tagomori*, Alexander Robey*, Fengjun Yang*, Nikolai Matni, George J. Pappas, Hamed Hassani, Corina Pasareanu, Clark Barrett

arXiv | bibtex SaTML 2023

Probable Domain Generalization via Quantile Risk Minimization

Cian Eastwood*, Alexander Robey*, Shashank Singh, Julius von Kügelgen, Hamed Hassani, George J. Pappas, Bernhard Schölkopf

arXiv | code | bibtex NeurIPS 2022

On the Sample Complexity of Stability Constrained Imitation Learning

Stephen Tu, Alexander Robey, Tingnan Zhang, Nikolai Matni

arXiv | bibtex L4DC 2022 (Oral)

Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks

Anton Xue, Lars Lindemann, Alexander Robey, Hamed Hassani, George J. Pappas, Rajeev Alur

arXiv | code | bibtex CDC 2022

Do Deep Networks Transfer Invariances Across Classes?

Allan Zhou*, Fahim Tajwar*, Alexander Robey, Tom Knowles, George J. Pappas, Hamed Hassani, Chelsea Finn

arXiv | code | bibtex ICLR 2022

Probabilistically Robust Learning: Balancing Average- and Worst-case Performance

Alexander Robey, Luiz F. O. Chamon, George J. Pappas, Hamed Hassani

arXiv | code | bibtex ICML 2022

Adversarial Robustness with Semi-Infinite Constrained Learning

Alexander Robey*, Luiz Chamon*, George J. Pappas, Hamed Hassani, Alejandro Ribeiro

arXiv | code | bibtex NeurIPS 2021

Model-Based Domain Generalization

Alexander Robey, George J. Pappas, Hamed Hassani

arXiv | code | bibtex NeurIPS 2021

Optimal Algorithms for Submodular Maximization With Distributed Constraints

Alexander Robey, Arman Adibi, Brent Schlotfeldt, Hamed Hassani, George J. Pappas

arXiv | bibtex L4DC 2021

Learning Robust Hybrid Control Barrier Functions for Uncertain Systems

Alexander Robey*, Lars Lindemann*, Stephen Tu, Nikolai Matni

arXiv | code | bibtex ADHS 2021

Learning Hybrid Control Barrier Functions from Data

Lars Lindemann, Haimin Hu, Alexander Robey, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

arXiv | code | bibtex CoRL 2020

Learning Control Barrier Functions from Expert Demonstrations

Alexander Robey*, Haimin Hu*, Lars Lindemann, Hanwen Zhang, Dimos V. Dimarogonas, Stephen Tu, Nikolai Matni

arXiv | code | bibtex CDC 2020

Provable Tradeoffs in Adversarially Robust Classification

Edgar Dobriban, Hamed Hassani, David Hong, Alexander Robey

arXiv | bibtex Trans. on Information Theory

Model-Based Robust Deep Learning: Generalizing to Natural, Out-of-Distribution Data

Alexander Robey, Hamed Hassani, George J. Pappas

arXiv | code | bibtex arXiv

Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks

Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, George J. Pappas

arXiv | code | bibtex NeurIPS 2019 (Spotlight)

Optimal Physical Preprocessing for Example-Based Super-Resolution

Alexander Robey and Vidya Ganapati

arXiv | bibtex Optics Express

Publications