2025
Safety Pretraining: Toward the Next Generation of Safe AI
P. Maini*, S. Goyal*, D. Sam*, A. Robey, Y. Savani, Y. Jiang, A. Zou, M. Fredrikson, Z. C. Lipton, J. Z. Kolter
NeurIPS 2025 [pdf] [website]
Antidistillation Sampling
Y. Savani*, A. Trockman*, Z. Feng, Y. Xu, A. Schwarzschild, A. Robey, M. Finzi, J. Z. Kolter
NeurIPS 2025 [website]
Embodied AI: Emerging Risks and Opportunities for Policy Action
J. Perlo, A. Robey, F. Barez, L. Floridi, J. Mökander
NeurIPS 2025 [pdf]
Jailbreaking LLM-Controlled Robots
A. Robey, Z. Ravichandran, V. Kumar, H. Hassani, G. J. Pappas
ICRA 2025 [pdf] [website]
Jailbreaking Black Box Large Language Models in Twenty Queries
P. Chao, A. Robey, E. Dobriban, H. Hassani, G. J. Pappas, E. Wong
IEEE SaTML 2025 [pdf] [website]
SmoothLLM: Defending LLMs Against Jailbreaking Attacks
A. Robey, E. Wong, H. Hassani, G. J. Pappas
TMLR 2025 [pdf] [blog]
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Y. He, A. Robey, N. Murata, Y. Jiang, J. Williams, G. J. Pappas, H. Hassani, Y. Mitsufuji, R. Salakhutdinov, J. Z. Kolter
TMLR 2025 [pdf]
Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
J. Ji*, B. Hou*, A. Robey*, G. J. Pappas, H. Hassani, Y. Zhang, E. Wong, S. Chang
IJCNLP-AACL 2025 [pdf]
Safety Guardrails for LLM-Enabled Robots
Z. Ravichandran, A. Robey, V. Kumar, G. J. Pappas, H. Hassani
Under Review [pdf] [website]
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
S. Ball*, N. Hasrati*, A. Robey, A. Schwarzschild, F. Kreuter, J. Z. Kolter, A. Risteski
arXiv 2025 [pdf]
Jailbreaking in the Haystack
R. R. Shah, C. H. Wu, Z. Zhong, A. Robey, A. Raghunathan
arXiv 2025 [pdf]
Benchmarking Misuse Mitigation Against Covert Adversaries
D. Brown*, M. Sabbaghi*, L. Sun, A. Robey, G. J. Pappas, E. Wong, H. Hassani
arXiv 2025 [pdf]
Evaluating LLM Memorization Using Soft Token Sparsity
Z. Feng, Y. E. Xu, P. Maini, A. Robey, A. Schwarzschild, J. Z. Kolter
arXiv 2025 [pdf]
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
F. Marchiori, R. Sinha*, C. Agia*, A. Robey, G. J. Pappas, M. Conti, M. Pavone
arXiv 2025 [pdf]
Evaluating Language Model Reasoning about Confidential Information
D. Sam, A. Robey, A. Zou, M. Fredrikson, J. Z. Kolter
arXiv 2025 [pdf]
Command-V: Pasting LLM Behaviors via Activation Profiles
B. Wang, A. Schwarzschild, A. Robey, A. Payani, C. Fleming, M. Sun, D. Ippolito
arXiv 2025 [pdf]
Adversarial Attacks on Robotic Vision Language Action Models
E. K. Jones, A. Robey, A. Zou, Z. Ravichandran, G. J. Pappas, H. Hassani, M. Fredrikson, J. Z. Kolter
arXiv 2025 [pdf]
Existing Large Language Model Unlearning Evaluations Are Inconclusive
Z. Feng*, Y. E. Xu*, A. Robey, R. Kirk, X. Davies, Y. Gal, A. Schwarzschild, J. Z. Kolter
arXiv 2025 [pdf]
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
H. Hu, A. Robey, C. Liu
Under Review [pdf]
2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
P. Chao*, E. Debenedetti*, A. Robey*, M. Andriushchenko*, F. Croce, V. Sehwag, E. Dobriban, N. Flammarion, G. J. Pappas, F. Tramer, H. Hassani, E. Wong
NeurIPS 2024 [pdf] [website]
A Safe Harbor for AI Evaluation and Red Teaming
S. Longpre, S. Kapoor, K. Klyman, A. Ramaswami, R. Bommasani, B. Blili-Hamelin, Y. Huang, A. Skowron, Z. Yong, S. Kotha, Y. Zeng, W. Shi, X. Yang, R. Southen, A. Robey, P. Chao, D. Yang, R. Jia, D. Kang, S. Pentland, A. Narayanan, P. Liang, P. Henderson
ICML 2024 (Oral) [pdf]
Adversarial Training Should Be Cast as a Non-Zero-Sum Game
A. Robey*, F. Latorre*, G. J. Pappas, H. Hassani, V. Cevher
ICLR 2024 [pdf]
Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations
L. Lindemann, A. Robey, L. Jiang, S. Das, S. Tu, N. Matni
OJCSYS 2024 [pdf]
2023
Toward Certified Robustness Against Real-World Distribution Shifts
H. Wu*, T. Tagomori*, A. Robey*, F. Yang*, N. Matni, G. J. Pappas, H. Hassani, C. Pasareanu, C. Barrett
SaTML 2023 [pdf]
Provable Tradeoffs in Adversarially Robust Classification
E. Dobriban, H. Hassani, D. Hong, A. Robey
IEEE Trans. Info. Theory 2023 [pdf]
Data-Driven Modeling and Verification of Perception-Based Autonomous Systems
T. Waite, A. Robey, H. Hassani, G. J. Pappas, R. Ivanov
Under Review [pdf]
2022
Probable Domain Generalization via Quantile Risk Minimization
C. Eastwood*, A. Robey*, S. Singh, J. von Kügelgen, H. Hassani, G. J. Pappas, B. Schölkopf
NeurIPS 2022 [pdf]
On the Sample Complexity of Stability Constrained Imitation Learning
S. Tu, A. Robey, T. Zhang, N. Matni
L4DC 2022 (Oral) [pdf]
Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks
A. Xue, L. Lindemann, A. Robey, H. Hassani, G. J. Pappas, R. Alur
CDC 2022 [pdf]
Do Deep Networks Transfer Invariances Across Classes?
A. Zhou*, F. Tajwar*, A. Robey, T. Knowles, G. J. Pappas, H. Hassani, C. Finn
ICLR 2022 [pdf]
Probabilistically Robust Learning: Balancing Average- and Worst-case Performance
A. Robey, L. F. O. Chamon, G. J. Pappas, H. Hassani
ICML 2022 [pdf]
2021
Adversarial Robustness with Semi-Infinite Constrained Learning
A. Robey*, L. Chamon*, G. J. Pappas, H. Hassani, A. Ribeiro
NeurIPS 2021 [pdf]
Model-Based Domain Generalization
A. Robey, G. J. Pappas, H. Hassani
NeurIPS 2021 [pdf]
Optimal Algorithms for Submodular Maximization With Distributed Constraints
A. Robey, A. Adibi, B. Schlotfeldt, H. Hassani, G. J. Pappas
L4DC 2021 [pdf]
Learning Robust Hybrid Control Barrier Functions for Uncertain Systems
A. Robey*, L. Lindemann*, S. Tu, N. Matni
ADHS 2021 [pdf]
2020
Learning Hybrid Control Barrier Functions from Data
L. Lindemann, H. Hu, A. Robey, H. Zhang, D. V. Dimarogonas, S. Tu, N. Matni
CoRL 2020 [pdf]
Learning Control Barrier Functions from Expert Demonstrations
A. Robey*, H. Hu*, L. Lindemann, H. Zhang, D. V. Dimarogonas, S. Tu, N. Matni
CDC 2020 [pdf]
Model-Based Robust Deep Learning: Generalizing to Natural, Out-of-Distribution Data
A. Robey, H. Hassani, G. J. Pappas
arXiv 2020 [pdf]
2019
Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks
M. Fazlyab, A. Robey, H. Hassani, M. Morari, G. J. Pappas
NeurIPS 2019 (Spotlight) [pdf]
2018
Optimal Physical Preprocessing for Example-Based Super-Resolution
A. Robey, V. Ganapati
Optics Express 2018 [pdf]