timed metadata-based adaptation of sensor-enriched video streams

Get Complete Project Material File(s) Now! »

Data Exchange

Regarding the data exchange process, the only reference point required by the client application during setup, is a playlist file maintained on the server. The playlist starts with the locations of the application Parameter List and Parameter Map, that are used to initialize the Audio and Synthesis Engines. It also contains the location of the file with the initialization information regarding the codec used for the video stream from the Kinect.
The server is responsible for updating the playlist by adding the Joint Coordinates (Kt) and Parameter Values (Et) whenever the Mapping Engine provides new output (at approx. 30 Hz). To achieve synchronization, the timestamp of the Coordinates provided by the Kinect middleware is preserved, as is the case with the video frames timing. Note that the Coordinate frames and the video frames are not aligned (details on Section 3.4), but preserving source timing information is essential to implement techniques for counter-balancing the mis-alignment, or add filtering etc. The recorded video is encoded in fragmented H.264 (MPEG-4 AVC) [51] and the playlist references to the resulting segments. To balance between file size and latency, we chose a segment duration of 1 s, thus a new location is added on the playlist at 1 Hz rate. Figure 18 shows an overview of the server data update mechanism. On the client side, both the playlist and the video segments are fetched from within the browser using the XMLHttpRequest API (AJAX).
The video segments and the timed metadata can be packaged in mp4 containers, for later offline distribution. With the timing capabilities of the mp4 file format, we are able to achieve frame-accurate synchronization, while using a widely supported format. For consuming mp4 files from the browser, libraries that analyze and extract data such as mp4box.js8 can be utilized.

Using Timed Metadata to Generate Visualizations

For rendering input-based visualizations, the application uses the <canvas> element. Vector graphics are rendered on the canvas by using its CanvasRenderingContext2D API. Figure 19a shows an example content-based visualization, with gradient based on the current Stereo Panning, pointers on the projected coordinates of the performer’s hands and dynamically generated sprites according to the sound frequency. More specifically, we create a canvas-wide blackto- white gradient; then, the Audio Panning min/max values [-100%, 100%] are mapped to a blue gradient « stop », with possible values [0, canvas.width], in the illustrated example the Audio Panning value is close to 0% – i.e. we have Stereo: left and right audio output has similar volume – which reflects to the blue gradient being centered to the canvas. Similarly, the current Audio Frequency [0, 22 kHz] is
mapped to the sprite generation frequency [10, 30 Hz]. Alternatively, for rendering the video of the performance, the <video> element is used instead. Since we have a continuous video stream, the frames are buffered and loaded using the MediaSource API. Figure 19b shows an application screenshot displaying the performance video, at the same time as the previous figure.

READ Psychological well-being, physical activity, sport and exercise

state of the art

In this Section we overview some existing platforms for sharing geotagged multimedia content. We selected three cases to demonstrate, according to relevance and popularity criteria. First, the Instagram sharing platform, which is a popular application and website for sharing UGC photographs and videos. Second, Google Maps that is a map service that also offers the capability to browse through user-submitted photographs according to the target location. Then, GeoUGV platform that is the result of a research project and enables recording sensor-enriched UGC videos and navigating through the collection using a map interface. Finally, before we present our contribution, we overview some other useful tools and techniques that can have auxiliary uses in such applications.

Table of contents :

1 introduction
1.1 Video Streams with Timed Metadata
1.2 Characteristics and Challenges
1.3 Classification of Extended AV Streams Systems
1.4 Architectures for Delivery of Extended AV Streams
2 essential elements of the state of the art
2.1 Standardization Efforts
2.2 Delivery of Video with Timed Metadata
3 timed metadata for control of interactive multimedia applications
3.1 Scenario Description
3.2 State of The Art
3.3 Audio Synthesis Application Example
3.4 System Performance
3.5 Discussion
4 spatiotemporal navigation for extended video streams
4.1 State of The Art
4.2 Spatiotemporal Video Navigation
4.3 Software For Adaptive Playback of User-Generated Content (SWAPUGC)
4.4 Discussion
5 timed metadata-based adaptation of sensor-enriched video streams
5.1 State of the Art
5.2 Proposal Overview
5.3 Stream Selection Policies
5.4 Implementation Considerations
5.5 Experimental Setup
5.6 User Study
5.7 Discussion and Future Work
6 buffer management for synchronous and lowlatency playback of multi-stream user generated
6.1 State of The Art
6.2 Client-side Buffering Scheme
6.3 Discussion
7 conclusion
7.1 Outlook and Potential Application Fields
7.2 Research Perspectives and Future Work
8 deliverables
8.1 Publications
8.2 Software
bibliography