Database Overview |
|||||
For the purposes of experimental evaluation with eye-tracking data, since there are only a few databases with audiovisual eye-tracking data, we decided to collect such data for two databases, SumMe [1] and ETMD [2]. The SumMe database contains 25 unstructured videos, while the ETMD contains 12 videos from six different hollywood movies, both summing up to 37 videos totaling approximately 2 h and 171,000 frames. For this reason, the group of participants and of videos were split into two equivalent groups containing the half number of people and videos, respectively. Thus, each video was seen by 10 different subjects. The subjects were recruited through the National Technical University of Athens, with ages ranging from 23-55 (mean 35). Almost all subjects were naive as to the purposes of the experiment and they all had normal vision. The employed videos ranged from 38 to 388 s in length and they were converted from their original sources to a MOV video format. | |||||
|
|||||
Data collection procedure |
|||||
Eye movements were binocularly monitored via a SR Research Eyelink 2000 desktop mounted eye-tracker with 1000 Hz sampling rate. Videos were displayed on a 1600 x 900 monitor at a 90 cm distance from the viewer. Audio was delivered in stereo, through headphones. A chin and headrest was used during the experiment, in order to ensure the viewer's minimal movement and avoid continuous calibration. Presentation was controlled using the SR Research Experiment Builder software. The subjects that participated in the experiment were informed only that they would watch some videos and that they should avoid moving during a video playback. The order of the clips was randomized across participants. The whole experimental procedure for each participant was approximately 90 min long, including instructions, calibration, testing, and short breaks if needed. Regarding calibration, a 13-point binocular calibration preceded the experiment. Before each video, if central fixation accuracy was exceeding a pre-defined threshold of 0.5°, a full calibration was repeated. The central fixation marker also served as a cue for the participant and offered an optional break-point in the procedure. After checking for a central fixation, the start of each trial was manually triggered. Regarding post-processing, the 1000-Hz raw eye-tracking recordings were sampled down to match each video's frame rate. One sample frame per video with its corresponding eye-tracking data superimposed, and the distribution of eye-tracking data for the whole video can be found in Fig. 1, Fig. 2 for SumMe and ETMD databases for all videos. The data are publicly released and can be dowloaded using the link below. | |||||
|
|||||
Data annotation |
|||||
SumMe database- 25 unstructured videos from YouTube, etc. (two of them without audio) - eye-tracking data from 10 viewers - free viewing - video list: Air_Force_One.mp4 Base_jumping.mp4 Bearpark_climbing.mp4 Bike_Polo.mp4 Bus_in_Rock_Tunnel.mp4 car_over_camera.mp4 Car_railcrossing.mp4 Cockpit_Landing.mp4 Cooking.mp4 Eiffel_Tower.mp4 Excavators_river_crossing.mp4 Fire_Domino.mp4 Jumps.mp4 Kids_playing_in_leaves.mp4 Notre_Dame.mp4 Paintball.mp4 paluma_jump.mp4 playing_ball.mp4 Playing_on_water_slide.mp4 Saving_dolphines.mp4 Scuba.mp4 Statue_of_Liberty.mp4 St_Maarten_Landing.mp4 Uncut_Evening_Flight.mp4 Valparaiso_Downhill.mp4 |
|||||
ETMD databaseETMD database: - 12 Hollywood movie videos - eye-tracking data from 10 viewers - free viewing - video list: CHI_1_color.avi CHI_2_color.avi CRA_1_color.avi CRA_2_color.avi DEP_1_color.avi DEP_2_color.avi FNE_1_color.avi FNE_2_color.avi GLA_1_color.avi GLA_2_color.avi LOR_1_color.avi LOR_2_color.avi |
|||||
Eye-tracking data structureFolder: audio_stereo - Contains the stereo audio tracks in wav format, with 16-bit and 44100Hz Folder: audio_mono - Contains the one-channel audio tracks in wav format, with 16-bit and 44100Hz. (Same tracks as above, but converted to one channel audio) Folder: video - Contains the video in the original format for both SumMe and ETMD databases Folder: eyetracking - Contains the collected eyetracking data Matlab structure: all_videos.mat eye_data_all.(video_name) --> 3D vector: 2 x Nframes x Nparticipants 1st dimension: (x,y) coordinates 2nd dimension: frame number 3rd dimension: participant number (from 1 to 10) For example eye_data_all.Base_jumping(:,456,4) will show the (x,y) coordinates of the eye-tracking data of frame 456 as viewed by the 4th participant |
|||||
Download |
|||||
You can download this database by clicking this link: | |||||
If you use the corpus please cite:
For more information, please email Antigoni Tsiami (antsiami@cs.ntua.gr) | |||||
References |
|||||
[1] M. Gygli, H. Grabner, H. Riemenschneider, L. Van Gool, Creating summaries from user videos , in: Proc. European Conf. on Computer Vision, 2014, pp. 505 - 520. [2] Petros Koutras, Petros Maragos, A perceptually based spatio-temporal computational framework for visual saliency estimation , Signal Process., Image Commun., 2015, pp. 15-31 |