Memory-based manipulation with SAM2Act ====================================== .. note:: This software was modified after running the experiments. Functionality should still be the same. The following covers inference and data collection with a Franka Emika Panda, Robotiq gripper, and an Intel Realsense camera. Instructions for other robots might differ. For example, if you use the original Franka gripper you need to change the configuration for the robot. Please see the paper and the `project webpage `_ for more details. Installation ------------ Installation of **SAM2Act** is covered in the `SAM2Act code repository `_. See the section `Environment setup `_ in the Readme. You can install the extras "real" for sam2act with ``pip install -e '.[real]'``. This will automatically install **RoBits** and other dependencies. For the robot setup see the :doc:`system_setup` section for details on the installation. Especially, check the camera calibration with ``rb camera calibrate extrinsics``. If necessary, adjust the current camera calibration by adjusting the sliders. Inference --------- 1. Make sure that the system has booted the real-time kernel by running ``uname -a``. See :doc:`system_setup` for more details. 2. Ensure that the robot is safe to operate. The default configuration assumes that the robot is reachable on `172.16.0.2`, which is the factory default. Either create another user in Franka's webinterface or update the configuration if you want to use the cmd to unlock the robot. Turn the robot on and unlock it with ``rb panda unlock``. 3. Toggle the e-stop. You can test your robot setup with ``rb info pose`` or ``rb move up`` 4. Load the weights and run the inference with the entry point ``sam2act-agent`` provided by sam2act. You can also specify the command options so you won't be prompted. .. code-block:: bash sam2act-agent --robot-name "robot_panda_real" --execution-mode auto --instruction "turn on the lamp" --model-path /home/markus/models_sam2act/model_real_lamp_1/model_9.pth Training -------- .. note:: Documentation and best practice for real-world training will be available soon Since the keyframe heuristic relies on pauses it is necessary to record and replay the trajectory or use scripted demonstrations. See :doc:`data_collection` for more details. References ---------- ``` SAM2Act: Integrating Visual Foundation Model with A Memory Architecture for Robotic Manipulation. Haoquan Fang, Markus Grotz, Wilbert Pumacay, Yi Ru Wang, Dieter Fox, Ranjay Krishna, Jiafei Duan International Conference on Machine Learning (ICML), 2025. ```