|
Please cite: | Brostow, Fauqueur, Cipolla,
"Semantic Object Classes in Video: A High-Definition Ground Truth
Database" (submitted for review to Pattern Recogntion Letters, Special Issue on Video-based Object and Event Analysis, 2007) |
|||||
Description: |
|
The Cambridge-Toyota Labeled
Video Database (CamToy) is the first collection of videos with object
class semantic labels, complete with metadata. The database provides
ground truth labels that associate each pixel with one of 32 semantic classes. The database addresses the need for experimental data to quantitatively evaluate emerging algorithms. While most videos are filmed with fixed-position CCTV-style cameras, our data was captured from the perspective of a driving automobile. The driving scenario increases the number and heterogeneity of the observed object classes. Over ten minutes of high quality 30Hz footage is being provided, with corresponding semantically labeled images at 1Hz and in part, 15Hz. The CamToy Database offers four contributions that are relevant to object analysis researchers. First, the per-pixel semantic segmentation of over 700 images was specified manually, and was then inspected and confirmed by a second person for accuracy. Second, the high-quality and large resolution color video images in the database represent valuable extended duration digitized footage to those interested in driving scenarios or ego-motion. Third, we filmed calibration sequences for the camera color response and intrinsics, and computed a 3D camera pose for each frame in the sequences. Finally, in support of expanding this or other databases, we offer custom-made labeling software for assisting users who wish to paint precise class-labels for other images and videos. We evaluated the relevance of the database by measuring the performance of an algorithm from each of three distinct domains: multi-class object recognition, pedestrian detection, and label propagation. |
|||||
Overview Video: |
Avi, 30 Mb, xVid compressed. (playback tips or get the free Mac/Windows player. or Mpg, 11 Mb, mpeg-1 compressed (more compatible, but lower quality) |
||||||
|
|||||||
CamToy Database (only samples until accepted for publication) |
|||||||
Original Video Sequences: |
|||||||
seq06R0 Description: 3030 frames at 30Hz == 1:41 min Sample Frame Preview Video in MPG1 VideoFile in MXF format* |
|||||||
seq16E5
Description: 6120 frames at 30Hz == 3:24 min Sample Frame Preview Video in MPG1 VideoFile in MXF format* seq16E5_15Hz Description: 202 frames at 30Hz == 0:06 min Sample Frame Preview Video in MPG1 VideoFile in MXF format* |
|||||||
seq05VD
Description: 5130 frames at 30Hz == 2:51 min Sample Frame Preview Video in MPG1 VideoFile in MXF format* |
|||||||
seq01TP Description: 3720 frames at 30Hz == 2:04 min Sample Frame Preview Video in MPG1 VideoFile in MXF format* |
|||||||
Intrinsic Calibration: |
(Temporary: contained in Camera
Pose below, but will be separated off also) |
||||||
Camera Pose Trajectories: |
Example camera pose
trajectory, stored in Boujou Animation Format: each line containing "AddDecompCameraKey" has a K and R matrix and t vector, so that P = K * R * [I -t] |
||||||
Class
Labels and Pseudocolors: |
Listing of (RGB)-Class assignments (alphabetical) Listing in color-order used by MSRC (with "XX")
|
||||||
Hand-Labeled Frames: |
|||||||
seq06R0 |
|||||||
seq16E5
seq16E5_15Hz Description: 101 frames at 15Hz == 0:06 min Sample Frame Preview Video in MPG1 Video as zipped PNG's |
|||||||
seq05VD
Description: 101 frames at 1Hz == 1:41 min Sample Frame Preview Video in MPG1 Video as zipped PNG's |
|||||||
seq01TP
Description: 124 frames at 1Hz == 2:04 min Sample Frame Preview Video in MPG1 Video as zipped PNG's |
|||||||
Paint-Stroke Logs of Manual
Labeling: |
Example log file, where each
of the user's mouse-strokes was recorded to include: the class label being applied, size and type of brush or pre-segmentation used, location of each click point and drag-path, and duration for each stroke. |
||||||
InteractLabeler Software: |
InteractLabeler.zip for
Windows (3.4Mb) |
||||||
*MXF format: |
This format is like Avi or
Quicktime in that it is a wrapper for multimedia files. In our case,
just the video channel has data, and is HD format. To decode, use this
utility (link) along with the scripts provided. |