Memory-based manipulation with SAM2Act
Note
This software was modified after running the experiments. Functionality should still be the same.
The following covers inference and data collection with a Franka Emika Panda, Robotiq gripper, and an Intel Realsense camera. Instructions for other robots might differ. For example, if you use the original Franka gripper you need to change the configuration for the robot. Please see the paper and the project webpage for more details.
Installation
Installation of SAM2Act is covered in the SAM2Act code repository.
See the section Environment setup in the Readme.
You can install the extras “real” for sam2act with pip install -e '.[real]'.
This will automatically install RoBits and other dependencies.
For the robot setup see the System and Robot setup section for details on the
installation. Especially, check the camera calibration with rb camera
calibrate extrinsics. If necessary, adjust the current camera calibration by
adjusting the sliders.
Inference
Make sure that the system has booted the real-time kernel by running
uname -a. See System and Robot setup for more details.Ensure that the robot is safe to operate. The default configuration assumes that the robot is reachable on 172.16.0.2, which is the factory default. Either create another user in Franka’s webinterface or update the configuration if you want to use the cmd to unlock the robot. Turn the robot on and unlock it with
rb panda unlock.Toggle the e-stop. You can test your robot setup with
rb info poseorrb move upLoad the weights and run the inference with the entry point
sam2act-agentprovided by sam2act. You can also specify the command options so you won’t be prompted.sam2act-agent --robot-name "robot_panda_real" --execution-mode auto --instruction "turn on the lamp" --model-path /home/markus/models_sam2act/model_real_lamp_1/model_9.pth
Training
Note
Documentation and best practice for real-world training will be available soon
Since the keyframe heuristic relies on pauses it is necessary to record and replay the trajectory or use scripted demonstrations. See Data collection for more details.
References
`
SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation.
Haoquan Fang, Markus Grotz, Wilbert Pumacay, Yi Ru Wang, Dieter Fox, Ranjay Krishna, Jiafei Duan
International Conference on Machine Learning (ICML), 2025.
`