ObjectFolder models the multisensory behaviors of real objects with 1) ObjectFolder 2.0, a dataset of 1,000 neural objects in the form of implicit neural representations with simulated multisensory data, and 2) ObjectFolder Real, a dataset that contains the multisensory measurements for 100 real-world household objects, building upon a newly designed pipeline for collecting the 3D meshes, videos, impact sounds, and tactile readings of real-world objects. It also contains a standard benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch. We open source both datasets and the benchmark suite to catalyze and enable new research in multisensory object-centric learning in computer vision, robotics, and beyond.

ObjectFolder Datasets


  • 100 real-world household objects.
  • A tailored data collection pipeline for each modality.
  • For vision, we scan the 3D meshes of objects in a dark room and record HD videos of each object rotating in a lightbox.
  • For audio, we build a professional anechoic chamber with a tailored object platform and then collect impact sounds by striking the objects at different surface locations with an impact hammer.
  • For touch, we equip a Franka Emika Panda robot arm with a GelSight robotic finger and collect tactile readings at the exact surface locations where impact sounds are collected.
  • Explore ObjectFolder-Real

    ObjectFolder 2.0

  • 1,000 multisensory neural objects.
  • Each neural object is represented by an Object File, a compact neural network that encodes the object's intrinsic visual, acoustic, and tactile sensory data.
  • Querying the Object File with extrinsic parameters (e.g., camera viewpoint and lighting conditions for vision, impact location and strength for audio, contact location and gel deformation for touch), we can obtain the corresponding sensory signal at a particular location or condition.
  • Explore ObjectFolder 2.0

    ObjectFolder Benchmarks

    Object Recognition

  • Cross-Sensory Retrieval
  • Contact Localization
  • Material Classification
  • Explore Object Recognition Tasks

    Object Reconstruction

  • 3D Shape Reconstruction
  • Sound Generation of Dynamic Objects
  • Visuo-Tactile Cross-Generation
  • Explore Object Reconstruction Tasks

    Object Manipulation

  • Grasp-Stability Prediction
  • Contact Refinement
  • Surface Traversal
  • Dynamic Pushing
  • Explore Object Manipulation Tasks

    The ObjectFolder Team

    Ruohan Gao Ruohan Gao
    Yiming Dou Yiming Dou
    Hao Li Hao Li
    Yen-Yu Chang Yen-Yu Chang
    Zilin Si Zilin Si
    Tanmay Agarwal Tanmay Agarwal
    Samuel Clarke Samuel Clarke
    Shivani Mall Shivani Mall
    Yunzhu Li Yunzhu Li
    Jeannette Bohg Jeannette Bohg
    Wenzhen Yuan Wenzhen Yuan
    Li Fei-Fei Li Fei-Fei
    Jiajun Wu Jiajun Wu