RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation

Pingchuan Ma^1*, Tao Du^1*, Joshua B. Tenenbaum¹, Wojciech Matusik¹, and Chuang Gan²

¹MIT CSAIL ²MIT-IBM Watson AI Lab

The illustrations of the environments we experiment on.

Abstract

This work considers identifying parameters characterizing a physical system's dynamic motion directly from a video whose rendering configurations are inaccessible. Existing solutions require massive training data or lack generalizability to unknown rendering configurations. We propose a novel approach that marries domain randomization and differentiable rendering gradients to address this problem. Our core idea is to train a rendering-invariant state-prediction (RISP) network that transforms image differences into state differences independent of rendering configurations, e.g., lighting, shadows, or material reflectance. To train this predictor, we formulate a new loss on rendering variances using gradients from differentiable rendering. Moreover, we present an efficient, second-order method to compute the gradients of this loss, allowing it to be integrated seamlessly into modern deep learning frameworks. We evaluate our method in rigid-body and deformable-body environments using four tasks: state estimation, system identification, imitation learning, and visuomotor control, including a challenging task of emulating dexterous motion of a robotic hand from a video. Compared with existing methods, our approach achieves significantly lower errors in almost all tasks and has better generalizability among unknown rendering configurations.

Paper

RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation

Pingchuan Ma^*, Tao Du^*, Joshua B. Tenenbaum, Wojciech Matusik, and Chuang Gan

International Conference on Learning Representations (ICLR), 2022 [Oral]

[Paper] [OpenReview] [Demo] [Code]

Citation

@inproceedings{ma2021risp,
  title={RISP: Rendering-Invariant State Predictor with Differentiable Simulation and Rendering for Cross-Domain Parameter Estimation},
  author={Ma, Pingchuan and Du, Tao and Tenenbaum, Joshua B and Matusik, Wojciech and Gan, Chuang},
  booktitle={International Conference on Learning Representations},
  year={2021}
}

Results

The RISP reconstruction (left) from a real-world video clip of a quadcopter (right).

The imitation learning results from a target video (Target) using RISP (Ours) and other methods.