Fast forwarding Egocentric Videos by Listening and Watching
2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Sight and Sound Workhop
Abstract
The remarkable technological advance in well-equipped wearable devices is pushing an increasing production of long first-person videos. However, since most of these videos have long and tedious parts, they are forgotten or never seen. Despite a large number of techniques proposed to fast-forward these videos by highlighting relevant moments, most of them are image based only. Most of these techniques disregard other relevant sensors present in the current devices such as high-definition microphones. In this work, we propose a new approach to fast-forward videos using psychoacoustic metrics extracted from the soundtrack. These metrics can be used to estimate the annoyance of a segment allowing our method to emphasize moments of sound pleasantness. The efficiency of our method is demonstrated through qualitative results and quantitative results as far as of speed-up and instability are concerned.
Methodology and Visual Results |
Citation
@InProceedings{furlan2018cvprw,
author = {Vinicius S. Furlan and Ruzena Bajcsy and Erickson R. Nascimento},
title = {Fast forwarding Egocentric Videos by Listening and Watching},
booktitle = {In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Sight and Sound},
pages = {2504–2507},
publisher = {{IEEE} Computer Society},
year = {2018}
}
author = {Vinicius S. Furlan and Ruzena Bajcsy and Erickson R. Nascimento},
title = {Fast forwarding Egocentric Videos by Listening and Watching},
booktitle = {In IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Sight and Sound},
pages = {2504–2507},
publisher = {{IEEE} Computer Society},
year = {2018}
}
Baselines
We compare the proposed methodology against the following methods:
- Multi-Importance Fast-Forward (MIFF) – Silva et al., JVCI 2018.
Datasets
We conducted the experimental evaluation using the following datasets:
- Dataset of Multimodal Semantic Egocentric Videos (DoMSEV) – Silva et al., A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos, CVPR 2018.