A research project from InternRobotics.
YouTube 1 · YouTube 2 · YouTube 3
Sim1 is a physics-aligned simulator and data stack for dual-arm cloth manipulation in simulation: teleoperation, diffusion-based data generation, replay, filtering, and optional photorealistic rendering, built on Newton and NVIDIA Warp. This repository contains the full pipeline from interactive control and synthetic trajectory generation to rendering and LeRobot-style dataset export.
- Installation
- Quick Start — Interactive Teleoperation
- Quick Start — Data Generation
- Rendering Pipeline
- Data Conversion
- Project Structure
- TODO List
- Citation
- License
Use Python 3.11 with conda (environment name sim1) and CUDA toolkit >= 12.4 if you want GPU acceleration.
Reference: Newton Installation Guide
conda create -n sim1 python=3.11 -y
conda activate sim1Clone with submodules so components/render/MeisterRender (SIM1MeisterRender, main branch) is checked out automatically:
git clone --recurse-submodules https://github.com/InternRobotics/SIM1.git sim1
cd sim1With sim1 active, from the repository root:
conda activate sim1
bash setup.shAll Python dependencies (simulation, DataGen, asset download helpers, optional full render stack, and post-install checks) are installed by setup.sh only. No separate render dependency install is required. Open that file for the full list, optional environment variables (SIM1_SKIP_RENDER, TORCH_INDEX_URL), and the exact pip commands. For render usage notes (not dependency installation), see components/render/README.md.
Run the download script once from the repository root. It downloads required assets into assets/.
# From the repository root (after setup.sh)
bash download_assets.shconda activate sim1
python -c "import newton; print('Newton version:', newton.__version__)"
python -c "import warp as wp; print('Warp OK')"
python -c "import torch, torchvision; print('torch', torch.__version__, 'cuda', torch.cuda.is_available())"Newton smoke test (MuJoCo humanoid + nv_humanoid.xml; needs a display for the GL viewer):
cd newton
python newton/examples/robot/example_robot_humanoid.pyEquivalent: python -m newton.examples robot_humanoid (from the same newton/ directory). The MJCF asset is newton/examples/assets/nv_humanoid.xml.
After bash download_assets.sh, you should see at least:
assets/acone/acone.urdfassets/cloth/short-shirt.usdcassets/model/flow_ckpt_three.pthassets/sim_teleoperated_npz/(reference NPZ subset)
Launch a real-time interactive simulation with keyboard-driven dual-arm control:
python apps/teleoperation_app.py --task lift_manip_shirtMatches the startup prompt in apps/teleoperation_app.py (using the same shields style as the Demo badge above):
| Key | Action |
|---|---|
W/S, A/D, Q/E, X, Z/C |
Left gripper: front/back, left/right, down/up, toggle grip, pitch |
U/J, H/K, Y/I, N, B/M |
Right gripper: front/back, left/right, down/up, toggle grip, pitch |
| Arrow keys, mouse left-drag, scroll | Camera: move, look, zoom |
For headless or remote machines, enable WebSocket streaming:
python apps/teleoperation_app.py --task lift_manip_shirt --stream --host 0.0.0.0 --ws-port 8765 --http-port 8080Then open http://<server-ip>:8080 in a browser to view and control the simulation remotely.
run_pipeline.sh runs the full data path in one shot: generate → Kalman smooth → replay (NPZ + USD) → filter. Generation uses the diffusion-policy path; robot URDF and cloth USD come from the default Hugging Face bundle location assets/ (see Step 4 - Download assets). The script prints HF assets : … on startup. For extra diversity at replay, add --position-randomize.
From the repository root (conda env, assets, and clone steps are in Installation and Step 4 — Download assets):
bash run_pipeline.sh --num 100Optional: bash run_pipeline.sh --num 100 --position-randomize
By default, DataGen reads references from assets/sim_teleoperated_npz (downloaded from Hugging Face). You can override the reference source with --ref_npz_folder. Generated trajectories are written under --data_folder (gen/, gen/kf/). Replay outputs are saved under replay/pipeline_output_XXXX/ (the script prints the path). Use that session folder as --root_dir for Rendering Pipeline.
All run_pipeline.sh options (advanced)
| Option | Description | Default |
|---|---|---|
--data_folder DIR |
Output data root (gen/, gen/kf/) |
with --ref_npz_folder: ./dataset/example; otherwise auto: <SIM1_ASSETS_ROOT>/sim_teleoperated_npz (fallback ./dataset/example) |
--ref_npz_folder DIR |
Reference NPZ source for DataGen (<DIR>/npz/*.npz or <DIR>/*.npz) |
unset (use --data_folder layout) |
--num N |
Trajectories to generate (DP pipeline only) | 10 |
--workers N |
Parallel workers (smooth + filters) | 8 |
--skip_smooth / --skip_replay / --skip_filter |
Skip a stage | off |
--folder_name NAME |
replay/<NAME>_XXXX/ base name |
pipeline_output |
--position-randomize |
Random cloth pose at replay; joint filter also runs EE reachability (FK). Omit → joint filter uses --no-check-ee (jump / mutation only) |
off |
--ref_usd PATH |
Reference USD for aligned cloth filter (with randomization); auto-picked if omitted | auto |
--skip_asset_check |
Do not verify the HF bundle (SIM1_ASSETS_ROOT) before running |
off |
run_pipeline.sh
│
├─ 1. Generate → apps/datagen_app.py --use_dp --mode fine (DP only; fixed in script)
│ → <data_folder>/gen/*.npz
├─ 2. Smooth → scripts/smooth_trajectory_multi_thread.py (Kalman; fixed variances in script)
│ → <data_folder>/gen/kf/*.npz
├─ 3. Replay → apps/replay_app.py [--position-randomize]
│ → replay/<folder_name>_NNNN/{npz,usd}/
└─ 4. Filter → filter_joint_unreachable.py (joint jump + first-5 mutation; + EE FK if --position-randomize)
→ filter_cloth_quality.py (aligned + --ref-usd if randomize, else direct)
replay/
└── pipeline_output_0001/
├── npz/
├── usd/
├── npz_bad_cloth/
├── usd_bad_cloth/
├── npz_unreachable/ # joint / EE rejects (see filter_joint_unreachable.py logs)
└── cloth_filter_summary.txt
Manual step-by-step (only if you are not using run_pipeline.sh)
- Generate:
python apps/datagen_app.py --data_folder assets/sim_teleoperated_npz --num 100 --use_dp --mode fine - Smooth:
python scripts/smooth_trajectory_multi_thread.py assets/sim_teleoperated_npz/gen assets/sim_teleoperated_npz/gen/kf --method kalman --workers 8 - Replay:
python apps/replay_app.py assets/sim_teleoperated_npz/gen/kf --folder_name my_replay
Optional cloth position randomization at replay: add--position-randomize(then use the matching manual filters as in Step 4 above).
4a. Joint / EE filter:python scripts/filter_joint_unreachable.py ./replay/my_replay_0001/npz --usd-dir ./replay/my_replay_0001/usd --workers 8(add--no-check-eeto skip EE FK; joint checks always run)
4b. Cloth quality:python scripts/filter_cloth_quality.py ./replay/my_replay_0001(add--ref-usd ...if you used randomization)
Convert simulation USD output to photorealistic data: main.py runs Steps 1–3 by default (USD → blend → cameras → blend_out/). Step 4 (MeisterRender path tracing + LMDB) writes under out_updated/<record_id>/; run it via batch_step4.sh, or inline with main.py --step4.
MeisterRender lives in the git submodule components/render/MeisterRender (InternRobotics/SIM1MeisterRender, main). Use git clone --recurse-submodules in Step 2 so it is checked out automatically.
Environment: use the same sim1 env; the render stack is installed by setup.sh unless you set SIM1_SKIP_RENDER=1 (see comments in setup.sh). For render usage and step notes, see components/render/README.md.
conda activate sim1
# One-click render on latest replay/pipeline_output_XXXX
bash components/render/run_latest.sh
# Optional: use another replay prefix
# bash components/render/run_latest.sh --session-prefix my_runRendering resolves the HF bundle via SIM1_ASSETS_ROOT (default <repo>/assets/). HDRI / table / cloth glTF roots default to assets/random/{bg,table,mat}/ inside that bundle (scripts/sim1_asset_paths.py); no extra export is required for the usual layout.
After Step 4 rendering, trajectories are stored under replay/<session>/out_updated/<record_id>/ as LMDB + meta_info.pkl. To convert them into a LeRobot v2 dataset for training, use components/lmdb2lerobot/.
One-time environment (separate conda env lerobot, Python 3.12 — see full docs for details):
bash components/lmdb2lerobot/setup_conda_lerobot.sh
conda activate lerobotSingle session → LeRobot dataset:
# Auto-detect latest replay/pipeline_output_XXXX, then:
# src = <latest>/out_updated
# out = <latest>/lerobot_dataset
bash components/lmdb2lerobot/run_local.sh
# Optional: explicit paths still supported
# bash components/lmdb2lerobot/run_local.sh \
# --src ./replay/my_session/out_updated \
# --out ./replay/my_session/lerobot_datasetThis runs LMDB→LeRobot, sim2real, then removes near-static frames by default (--keep-static-frames to skip).
Batch / multi-GPU (optional): components/lmdb2lerobot/run_batch.sh — see components/lmdb2lerobot/README.md.
sim1/
├── setup.sh # Dependency installation (setup.sh)
├── download_assets.sh # Hugging Face -> assets/ (Sim1_Assets + Sim1_Dataset/sim_teleoperated_npz only)
├── run_pipeline.sh # Data generation pipeline (generate→smooth→replay→filter)
├── apps/
│ ├── teleoperation_app.py # Interactive teleoperation entry point
│ ├── datagen_app.py # SIM1-DataGen entry (diffusion-policy mode)
│ ├── datagen_fine_app.py # Optional fine-grained DataGen entry
│ └── replay_app.py # Trajectory replay (headless)
├── replay_batch.sh # Batch replay script
│
├── newton/ # Newton physics engine (local install)
│ ├── pyproject.toml
│ └── newton/ # Newton source code
│
├── assets/ # Robot URDFs, meshes, render assets
├── configs/ # Task configuration files
├── envs/ # Simulation environments
├── tasks/ # Task definitions (cloth manipulation)
├── stream/ # WebSocket streaming server + web UI
│
├── components/
│ ├── datagen/ # SIM1-DataGen core (splitter, selector, diffusion)
│ │ ├── datagen_core.py # DataGenerator class
│ │ ├── splitter.py # Trajectory splitter
│ │ ├── selector.py # Segment selector
│ │ ├── traj_df/ # Diffusion model for trajectory generation
│ │ └── configs/ # Task split configurations
│ ├── function/ # Utility functions (FK, IK, video, analysis)
│ ├── randomization/ # Environment randomization
│ ├── recorder/ # Dual-arm data recorder
│ ├── render/ # USD → Blender → MeisterRender (git submodule) pipeline
│ └── lmdb2lerobot/ # LMDB → LeRobot v2 (+ sim2real + remove_static_frames by default)
│
├── scripts/ # Post-processing scripts
│ ├── smooth_trajectory_multi_thread.py # Kalman smooth (used by run_pipeline.sh)
│ ├── filter_joint_unreachable.py # Joint jump + optional EE reachability (see --no-check-ee)
│ ├── filter_cloth_quality.py # Cloth-quality filter (used by run_pipeline.sh)
│ └── convert_ee_quat.py # EE pose conversion (used by datagen)
│
├── module_train/ # Training modules
│ ├── trajectory_discriminator/
│ └── trajectory_generator/
│
└── dataset/ # Example datasets (npz, segments, etc.)
- Simulation assets — Robot URDFs, cloth meshes, render assets.
- Public datasets — Open-sourced a subset of trajectories and rendered data.
- Data generation pipeline — Supports one-command generate → smooth → replay → filter.
- Training utilities — Includes policy and trajectory training modules.
- Open-source 10,000 trajectories.
- Upgrade to latest Newton — Bump bundled
newton/to upstream; adapt API changes in envs/tasks/components. - Integrate libuipc solver — Optional libuipc cloth/deformable backend for richer contact and friction.
If you use Sim1 (code, assets, or datasets) in research, please cite the paper below.
@misc{zhou2026sim1physicsalignedsimulatorzeroshot,
title={SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds},
author={Yunsong Zhou and Hangxu Liu and Xuekun Jiang and Xing Shen and Yuanzhen Zhou and Hui Wang and Baole Fang and Yang Tian and Mulin Yu and Qiaojun Yu and Li Ma and Hengjie Li and Hanqing Wang and Jia Zeng and Jiangmiao Pang},
year={2026},
eprint={2604.08544},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2604.08544},
}Unless otherwise noted, this repository is released under Apache License 2.0, while language data is released under CC BY-NC-SA 4.0. Newton and other third-party components follow their own licenses; see, for example, newton/LICENSE.md.
