Safe Vision Language Action Models via Barrier Enhanced Flow Matching

The XXXXXXXX University
Submitted to IEEE Robotics and Automation Letters (RA-L)
Barrier Enhanced VLA Architecture

Our framework modifies the Flow Matching denoising process within the model to inherently generate safe trajectories using a smooth Log-Sum-Exponential aggregate barrier.

Abstract

This article introduces a modular inference framework that integrates Vision-Language-Action (VLA) foundation models with formal Control Barrier Function (CBF) safety guarantees. Unlike existing methods that apply external safety filters to a model's final output, our approach modifies the Flow Matching denoising process within the model to inherently generate safe trajectories. By utilizing a smooth Log-Sum-Exponential aggregate barrier, we enforce safety across entire steps of action chunks without compromising the semantic intent of the generative model. The framework eliminates the need for safety-specific datasets or costly retraining. Hardware experiments demonstrate that our framework achieves reliable safety without degrading the success rate of the baseline model.

Barrier Demonstration

Wall Barrier

Restricting the end-effector from passing a 3D plane to avoid table collisions.

Spherical Barrier

Maintaining safety by keeping the robot frame outside a defined spherical region.

Policy Comparison

No Filter: PI-0

Baseline: E2E-CBF

Proposed: CBF-FM

Performance metrics evaluated across 150+ hardware trials, comparing our proposed CBF-FM architecture against baseline models.

Metric Base VLA E2E CBF CBF-FM (Ours)
Safety Rate (%) 15.0% 68.18% 100.0%
Success Rate (%) 75.0% 68.18% 77.41%

*Trials included scenarios with objects placed inside and outside of unsafe collision zones. Violation of the safety is not critical to the robot's health and objective, so an unsafe rollout can still end-up being successful which is only a measure of task completion. Based on our experience, CBF-FM matches baseline success rate while ensuring zero safety violations.

Generative Model Error Bound and Safety Guarantees

We showed that the semantic intents of the flow matching is not affected by our safety modifications and the error between the safely generated distribution and the target distribution is bounded in the sense of Wasserstein distance.

Comprehensive safety analysis is provided in the paper and forward invariance of action chunk is guranteed in the denoising time.

Joint Velocity Profiles
Impact of the smoothing parameter on safety margin. Higher values of barrier correspond to larger safety margins.

Trajectory Smoothing and Velocity Enforcement

Beyond collision avoidance, our formulation enables the direct optimization of the action chunk for physical feasibility. By incorporating a sparse finite-difference matrix D into the barrier function Quadratic Program (QP), we can:

  • Enforce Joint Velocity Limits: Hard constraints ensure that no joint command exceeds the physical limits of the SO-101 arm, preventing motor saturation.
  • Optimize for Smoothness: An additional cost term minimizes the difference between consecutive actions, significantly reducing jerky motions and improving the quality of the generated trajectory.

This integrated approach ensures that the "Safe VLA" trajectories are not only collision-free but also dynamically consistent and smooth for real-world execution.

Joint Velocity Profiles
Joint velocity profiles showing strict adherence to maximum limits (red dashed lines) and the resulting smooth path generated by the Action Expert.