2025
Safety Pretraining: Toward the Next Generation of Safe AI
Antidistillation Sampling
Embodied AI: Emerging Risks and Opportunities for Policy Action
Jailbreaking LLM-Controlled Robots
Jailbreaking Black Box Large Language Models in Twenty Queries
SmoothLLM: Defending LLMs Against Jailbreaking Attacks
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Defending Large Language Models against Jailbreak Attacks via Semantic Smoothing
Safety Guardrails for LLM-Enabled Robots
Toward Understanding the Transferability of Adversarial Suffixes in Large Language Models
Jailbreaking in the Haystack
Benchmarking Misuse Mitigation Against Covert Adversaries
Evaluating LLM Memorization Using Soft Token Sparsity
Preventing Robotic Jailbreaking via Multimodal Domain Adaptation
Evaluating Language Model Reasoning about Confidential Information
Command-V: Pasting LLM Behaviors via Activation Profiles
Adversarial Attacks on Robotic Vision Language Action Models
Existing Large Language Model Unlearning Evaluations Are Inconclusive
Steering Dialogue Dynamics for Robustness against Multi-turn Jailbreaking Attacks
2024
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
A Safe Harbor for AI Evaluation and Red Teaming
Adversarial Training Should Be Cast as a Non-Zero-Sum Game
Learning Robust Output Control Barrier Functions from Safe Expert Demonstrations
2023
Toward Certified Robustness Against Real-World Distribution Shifts
Provable Tradeoffs in Adversarially Robust Classification
Data-Driven Modeling and Verification of Perception-Based Autonomous Systems
2022
Probable Domain Generalization via Quantile Risk Minimization
On the Sample Complexity of Stability Constrained Imitation Learning
Chordal Sparsity for Lipschitz Constant Estimation of Deep Neural Networks
Do Deep Networks Transfer Invariances Across Classes?
Probabilistically Robust Learning: Balancing Average- and Worst-case Performance
2021
Adversarial Robustness with Semi-Infinite Constrained Learning
Model-Based Domain Generalization
Optimal Algorithms for Submodular Maximization With Distributed Constraints
Learning Robust Hybrid Control Barrier Functions for Uncertain Systems
2020
Learning Hybrid Control Barrier Functions from Data
Learning Control Barrier Functions from Expert Demonstrations
Model-Based Robust Deep Learning: Generalizing to Natural, Out-of-Distribution Data
2019
Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks
2018
Optimal Physical Preprocessing for Example-Based Super-Resolution