Manipulation Data

These recordings and data-files are part of a dataset collected during the development of the TOMSY Collaborative Project funded by the European Union under the 7th Framework Programme (FP7).

This particular set of files represent a self-contained set of recordings and data-files which describe visually and geometrically some type of manipulation. In full pertinence with the TOMSY project’s goals, these recordings focus on interactions which modify the topological structure (e.g. open/close object, separate/unite object), kinematic state (e.g. fold/unfold box) and/or relational state (e.g. on-top-of, inside) of the objects in the workspace.

IMG_20140416_165604 mosaic



In each compressed folder, you will find:

  • Video files relative to front and side views of the workspace, as provided by high-resolution and high-fps industrial uEye USB3 cameras.
  • Video files relative to a front view of the workspace, in the form of both video and depth map, as provided by one Microsoft Kinect.
  • Precise ground-truth tracking data of human hands and other involved objects, as provided by the magnetic trackers G4 by Polhemus. A video showing each sensor’s trajectory during the motion sequence is also provided for showcase purposes.
  • Timestamp files for each of the above, indicating the most precise time of capture for each of the data frames.
  • A mosaic video, showcasing all the above videos contemporarily, for a more generic view of the recorded manipulation.

uEYE Cameras

The uEye USB3.0 Industrial cameras record 1280×1024 frames at 60 fps.

Each specific camera is identified by a number (e.g. ueye_rgb_1, ueye_rgb_2, etc..). Be wary that this ID does not necessarily automatically indicate the camera’s position during the recording (front, left or right), and that such position may change between recordings.

Microsoft Kinect

The Microsoft Kinect hardware records 640×480 frames at 30 fps.

The depth measure is packed bit-by-bit and provided also in video format. The kinect provides 12 bit of info relative to the depth, at each individual pixel location. This is packed into an RGB block. The packaging method may differ between different folders. Each folder will have its own README file with more info regarding this aspect.

G4 Polhemus Magnetic Trackers

The Polhemus G4 Tracker provides position and orientation measurements for each sensor in the scene. Each frame in time is represented, in the .dat files, as a NSx7 matrix, where NS is an upper bound on the number of sensors in the scene (which means that some rows are dummies, and don’t actually contain data).

Each row is divided in the position part (first 3 elements) and the quaternion part (last 4 elements). Together, these determine the pose of the sensor w.r.t the world coordinate frame, which is approximatively somewhere on the table surface.

Each hand is associated with 4 sensors. E.g. for the right hand (called “rh”):

  • one sensor on the thumb (“rh:thumb”).
  • one sensor on the index finger (“rh:index”).
  • one sensor on the middle finger (“rh:thumb”).
  • one sensor on the back (“rh:back”).

For other types of objects, the number and position of the sensors depends on the kinematic structure of the object itself, with the underlying assumption that there is one additional sensor to measure each addiotional kinematic degree of freedom in addition to simple position and orientation.

To determine which rows of the frame matrix correspond to which sensor, make use of the Meta file, which describes object identities (and sensor-object associations). The meta file associates each sensor name with a hub id (hid) and a sensor id (sid); The row of the sensor within the frame matrix is row = 3*hid + sid.

Timestamp Files

Each data file (.mp4 or .dat) comes along with its own timestamp file (this timestamp file will have the same name as the original data file, with the suffix ‘.times’).

The format of the timestamp files is simple: each line contains two values:

  • the first one denotes the number of the data-frame, as collected by the specific hardware (as opposed to the number of the data-frame in the data-stream). This implies that, if the hardware were ever to fail to capture any specific frame, this can be observed by watching for non-consecutive values of this first number.
  • the second denotes time of origin, measured in seconds from the beginning of that recording day, for that specific frame.