ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching

Shuoheng Zhang, Yifu Yuan, Hongyao Tang, YAN ZHENG, Qiaojun Yu, Pengyi Li, Guowei Huang, Helong Huang, Xingyue Quan, Jianye HAO
Tianjin University Huawei Noah’s Ark Lab Shanghai AI Lab

Highlights

Force-Aware Flow Matching

ForceFlow models a deterministic velocity field over a hybrid action space to jointly generate motion and target contact force.

Vision-to-Force Handover

A VLM first localizes the target, then control transitions to a force-driven policy for high-precision contact interaction.

Strong Real-World Results

Across six tasks, ForceFlow reaches 81.67% average success rate, outperforming ForceVLA by 37 percentage points.

ForceFlow overview figure
Figure 1: Overview of ForceFlow with Vision-to-Force (V2F) handover and flow matching policy.

Abstract

Existing imitation learning methods have enabled robots to interact autonomously with physical environments, but contact-rich manipulation remains challenging due to complex contact dynamics and the need for precise force control. We present ForceFlow, a force-aware reactive framework built on flow matching. ForceFlow integrates temporal force history with asymmetric multimodal fusion, and introduces a joint prediction design that outputs both action and next-step contact force. To improve robustness under spatial shifts, ForceFlow adopts a Vision-to-Force (V2F) handover: a VLM handles coarse target localization and the force-aware policy handles local contact regulation. Experiments on six real-world contact-rich tasks show that ForceFlow improves average success rate from 45% (ForceVLA) to 81.67%, while also demonstrating strong force fidelity and zero-shot OOD generalization.

Method Overview

  1. Temporal Force Modeling + Asymmetric Fusion: Force history is used as a global condition and visual features are integrated via cross-attention, preventing visual dominance over subtle force signals.
  2. Hybrid Action Generation: The model predicts a hybrid action at = [Δpt, t+1], coupling motion with proactive force regulation through flow matching.
  3. V2F Handover: A VLM localizes semantic target points in the global scene (approach stage), then ForceFlow performs high-frequency force-aware interaction (interaction stage).
ForceFlow architecture figure
Figure 2: Overview of the ForceFlow architecture.

Task Suite

Contact-rich task suite figure
Figure 3: Contact-rich task suite evaluated in the paper.
Category Task Key Challenge
Short-Horizon ContactStampVisual ambiguity in paper thickness, force-triggered stamping
Short-Horizon ContactPlugCoarse visual alignment with force-guided insertion
Short-Horizon ContactPress ButtonDifferent spring constants and trigger depths
Short-Horizon ContactUSB InsertSub-millimeter tolerance and geometric jamming
Continuous ContactClean WhiteboardStable normal force tracking on planar surface
Continuous ContactClean VaseAdaptive force regulation on curved, non-linear surface

Main Results

Success Rate (SR, %, 20 trials per task)

Method Stamp Plug Press Insert Clean WB Clean Vase Avg.
pi0.5060304510024.17
ACT030501508.33
Diffusion Policy040205075030.83
ForceVLA20706515100045.00
ForceFlow (w/o Force)20750401003044.17
ForceFlow (Ours)859090601006581.67

Force Fidelity (MAE Cost, N, lower is better)

Method Stamp Plug Press Insert Clean WB Clean Vase Avg.
pi0.531.9921.4117.3950.8911.937.8723.58
ACT31.8625.5431.8138.7111.9130.4528.38
Diffusion Policy32.2615.7924.8623.858.2223.5621.42
ForceVLA30.039.5930.9437.8220.1611.2923.31
ForceFlow (w/o Force)30.0313.3637.5034.757.1613.2422.67
ForceFlow (Ours)10.613.585.0321.794.593.768.23
Force distribution analysis figure
Figure 4: Statistical force distribution across tasks (maximum or average effective forces).
Predicted vs measured forces figure
Figure 5: Alignment between predicted and measured contact forces during execution.

Generalization and Ablation

Zero-Shot Force Gen. (SR %)

Method Press Clean WB Clean Vase
pi0.5000
ACT000
Diffusion Policy000
ForceVLA40900
ForceFlow (Ours)8010060

Spatial OOD (SR %)

Method Press Plug Clean WB
pi0.5000
ACT000
Diffusion Policy000
ForceVLA000
ForceFlow000
ForceFlow + V2F401050

Ablation (Stamp)

Variant SR (%) Force (N)
w/o Force History
(1-step)
5515.50
w/o Force Prediction8012.52
w/o Both
(1-step + No Pred)
4018.21
ForceFlow (Full)8510.61
OOD evaluation setup figure
Figure 6: OOD evaluation setup for force-side and vision-side generalization.

Task Demo Videos

All videos are shown at 1x speed.

Stamp

Plug

Press Button

USB Insert

Clean Whiteboard

Clean Vase

Qualitative Analysis: Cucumber Peeling

We qualitatively evaluate ForceFlow on a cucumber peeling task, where the cucumber is stabilized on V-groove supports and the blade must hold precise normal force along a varying-stiffness surface. Vision-centric baselines, misled by visual ambiguity, tend to either miss contact or over-press and jam the tool. ForceFlow instead uses temporal force history to detect the onset of resistance at first contact and settle at an optimal cutting depth, and then leverages active force prediction to adapt downward pressure to local bumps and tapering ends during sliding. The uniform, unbroken peeled strips evidence a stable interaction envelope and consistent cutting depth.

Cucumber peeling rollout sequence and peeled strips
Figure 7: Cucumber peeling rollout (left) and peeled-skin strips collected across trials (right). The robot maintains stable normal force along the curved surface, producing continuous, uniform-width strips.
Video: ForceFlow performing cucumber peeling at 1x speed.

Data Collection

A short look at the teleoperated demonstration collection process used to build the ForceFlow dataset.

Teleoperated collection pipeline for ForceFlow demonstrations.

BibTeX

@misc{zhang2026forceflow,
  title={ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching},
  author={Shuoheng Zhang and Yifu Yuan and Hongyao Tang and YAN ZHENG and Qiaojun Yu and Pengyi Li and Guowei Huang and Helong Huang and Xingyue Quan and Jianye HAO},
  year={2026},
  eprint={2605.11048},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2605.11048}
}