Efficient Object Rearrangement via Multi-view Fusion (ICRA 2024)

User provides a single goal image (a tidy tabletop). The robot rearranges the current cluttered tabletop to match the goal.

We propose a perception module based on a visual localization pipeline to associate the goal image with multi-view observations, improving system efficiency. Since existing baselines were not open-sourced, we built the dataset and system framework from scratch.