Resampling

Sprachen:

Deutsch • ‎English

Data generation in emoTouch

The recording of user interaction in emoTouch is event driven. Every time a user interacts with the interface, e.g. moves a slider, an event is triggered and recorded in the database with the exact timestamp when the event occurred. There are different kinds of events for different kinds of user interaction, e.g. slider moves, button pushes, marking checkboxes, etc. Some additional events are triggered and recorded when e.g. a part begins, a video starts playing or a user rotates his mobile device.

The event driven architecture of emoTouch stores data with the best possible time accuracy. The actual accuracy of the recorded timestamps of the events depends on various parameters, e.g. the speed of the actual device and the network connection. However, emoTouch tries to determine and compensate e.g. the network latency and stores the timestamp of an event as accurate as possible. At the same time, the event driven architecture is very data efficient. When no interaction occurs (because e.g. the user doesn’t move a slider for a while), no events will be triggered and there will be no unneccessary network traffic and data storage.

Necessity of resampling

The initial event driven data that emoTouch records in the database is called ‚Raw data‘. Although the Raw data contains the timestamp of the events with the best possible time accuracy, this kind of data is not suitable for most types of data analysis. For all analysis that include calculations on the actual of several participants (e.g. the calculation of the mean value of a slider position for all participants at a specific timestamp) you have to use ‚Resampled Data’ with a fixed sampling rate. ‚Resampled Data’ contains e.g. the actual position of a slider for all participants at discrete time intervals, e.g. every 100 milliseconds.

Example

Figure A: Raw Data

Participants A (blue) and B (green) are moving a slider, beginning from a common initial slider value of 3 at timecode t0. The first one to move his slider is participant A at time code t1, where he changes the slider value to 1. At t5 both participants incidentally move their silders at exactly the same time. The changes to the specified slider values at timestamps t0 to t5 are recorded to the database (figure A) – and nothing more than that!

If you try to calculate the mean value of the slider for participants A and B at timestamps t0 to t5, this will mostly produce incorrect results, because at t1, t2, t3 and t4 the database contains ONLY a single value for ONE participant. The mean value that is be calculated at these timestamps will simply be the value of this single participant that triggered the event, and this is obviously incorrect. Only at t0 and t5 there are TWO values stored in the database each, giving the correct mean of 3 (t0) and 4 (t5).

Resampling process

The solution for this problem is the use of resampled data. Resampling creates a grid of equidistant timestamps with the time interval ∆t. The value of the slider for EACH participant at EACH resampled timestamp is stored in the database.

Figure B: Resampeled Data

Figure B shows how the events from the Raw data are matched to the new timestamp grid in the resampled data. t0 to t5 are the original timestamps from the Raw Data, r0 to r8 (red) the newly created timestamps of the resampled data. As you can see, the actual slider values for both participants at every resampled timestamp is stored in the database. Therefore, one can perform calculations on the resampled data that are not possible with the Raw data, e.g. the calculation of mean values for every moment in the new timestamp grid. If there are 'several' raw events within a time grid interval, only either the first or the last of the values is taken over into the resampled data, the rest is discarded. This 'downsampling' is therefore inevitably always associated with a certain loss of information. The downsampling mode can be set in the emoTouch resample dialogue.

Impact of resampling

However, the resampled timestamps are a bit less accurate. For example, the first change from value 3 to 1 by participant A did NOT occur at timecode r1 as the value change in the resampled data might suggest, but actually slightly earlier at t1. But because t1 doesn’t match the equidistant resampling grid, r1 is the first resampled timestamp that reflects the change. The smaller the resampling interval ∆t, the smaller the mean time deviation between the original and the resampled timestamps. Last but not least, resampled data files are way bigger than the raw data. The Raw data in figure A needs only 8 time/participant/value-rows to store the event information, while the resampled data needs 16 of these rows. In fact, resampled data stores a lot of redundant information, even if there is no user interaction at all. The smaller the resampling interval ∆t, the bigger the amount of redundant information.

Because the use of resampled data is essential for many numeric and statistical analysis methods of timeline data, emoTouch calculates an initial Resampling data set with a 100 ms time interval (10 Hz) after a realisation has been finished. The Raw data itself is safely stored as well and will never be changed. If the initial 10 Hz resampling doesn’t match your needs, you may calculate other resamplings with different sample rates in the ‚Realisations‘ section (Tab ‚Resamplings’). These new resamplings will always be calculated from the Raw data. But keep in mind that choosing a very small resampling interval might need a lot of time to calculate and might produce a very large data set with a lot of redundant information. So it’s a good idea to carefully think over which time resolution is REALLY needed and useful for the specific project.

(Note: The calculation of resamplings is currently not yet possible for playback projects, as the sessions can take place at completely different times and must first be synchronised to the media stimulus. The function will be implemented in a later version)

Resampling of playback projects

In playback projects, the sessions usually take place asynchronously. Since version 1.7.0, the data is therefore pre-processed and synchronised before evaluation.

→ To the article "Data Preprocessing and Synchronisation of Playback Projects"