Combining events of different types in Private

It is often the case that one would like to combine events of different types that happened at approximately the same time. For instance, suppose you would like to know how temperature affects whether people are happy. The temperature can be found in App events, while a rating of happiness might be found in SEMA events. So you need to match the SEMA events with the closest corresponding App event.

To do this we employ the zipDate function. We start by extracting the SEMA events and the App events as separate streams as follows:

SemaEvents = [e for e in DemoEvents if e.type == "__SEMA__" and e.hasField("Happy")]
AppEvents = [e for e in DemoEvents if e.type == "__App__" and e.hasField("Temperature")]

Then we use zipDate to create a list of tuples containing events from each list if they match on time.

CombinedEvents = zipDate([SemaEvents, AppEvents])

We can take off the first tuple to see what they look like as follows:

> t = CombinedEvents[1]
 > t
   ({'Angry': 5,
    'AngryRT': 5205,
    'Anxious': 9,
    'AnxiousRT': 4524,
    'Bored': 10,
    'BoredRT': 1641,
    'Confident': 1,
    'ConfidentRT': 2219,
    'Content': 4,
    'ContentRT': 1729,
    'Disappointed': 4,
    'DisappointedRT': 9617,
    'EndDateTime': '2019-10-17 15:30:00',
    'EndDateTimeLocal': '2019-10-18 14:01:00',
    'Excited': 2,
    'ExcitedRT': 8842,
    'Happy': 6,
    'HappyRT': 2060,
    'ParticipantTimeZone': 'Australia/Melbourne',
    'Relaxed': 4,
    'RelaxedRT': 256,
    'SEMAParticipantId': '866880491',
    'Sad': 6,
    'SadRT': 8845,
    'ScheduledTime': '2019-10-18 13:00:00',
    'StartDateTime': '2019-10-18 02:30:00',
    'StartDateTimeLocal': '2019-10-18 13:30:00',
    'StudyName': 'DemoStudy',
    'StudyVersion': 1,
    'SurveyName': 'Personal Experience Sampling Study',
    'TotalRT': 27743,
    'Trigger': 'scheduled',
    'UserId': '35e5c046-03dd-4ce6-8e92-76399180a4b6',
    'aws_profile': None,
    'keywords': ['SEMA',
                 'DemoStudy',
                 'Personal Experience Sampling Study',
                 'Completed',
                 'October',
                 'Friday',
                 2019,
                 'Spring'],
    'type': 'SEMA'}, {'AccelerometryCount': 11,
    'AudioProcessedCount': 3,
    'BatteryCount': 3,
    'BatteryLevel': 3,
    'EndDateTime': '2019-10-18 03:00:00',
    'EndDateTimeLocal': '2019-10-18 14:00:00',
    'Kilometers': 0.04013085372443839,
    'LocationCount': 11,
    'MoonAge': 7.238290015479986,
    'MoonIllumination': 0.3098332953006283,
    'StartDateTime': '2019-10-18 02:00:00',
    'StartDateTimeLocal': '2019-10-18 13:00:00',
    'Temperature': 20.969043550236623,
    'UserId': '35e5c046-03dd-4ce6-8e92-76399180a4b6',
    'Weather': 'overcast',
    'address': '72 Galvan Boulevard\nWilliamstown, NSW, 7410',
    'aws_profile': None,
    'keywords': ['October',
                 'Friday',
                 2019,
                 'Spring',
                 'audio_car',
                 'audio_street',
                 'overcast',
                 'waning_gibbous'],
    'latitude': -35.91438120230728,
    'longitude': 143.73124866254977,
    'type': 'App'})

 >

The first element of the tuple is a SEMA event and the second element is the corresponding App event. To extract the temperatures and the Happy ratings for analysis we can do the following:

Temperature = [p[1].Temperature for p in CombinedEvents]
Happy = [int(p[0].Happy) for p in CombinedEvents]

And now we can define a regression model to estimate the relationship between them:

Temperature = [p[1].Temperature for p in CombinedEvents]
Happy = [int(p[0].Happy) for p in CombinedEvents]
Happy ~ Normal(mHappy, sHappy)
mHappy ~ Temperature * betaHappy + interceptHappy
sHappy ~ HalfNormal(5)
betaHappy ~ Normal(0,10)
interceptHappy ~ Normal(0,30)
betaHappyPlot = distplot(betaHappy)
percentilesHappy = percentile(betaHappy, percent=90)


zipDate takes the elements of the first in the list of event lists and tries to find matches in all of the remaining lists. Only matches from the same UserId are considered.

By default, zipDate uses the StartDateTime to do date matching, but this can be changed by providing the second parameter (e.g. to change the key to EndDateTime or StartDateTimeLocal). One can also alter the amount of time between the events from different lists that counts as a match by supplying the third parameter. By default, this is set to 30 minutes (specified as (“minutes”, 30)). The forth parameters determines if events that don’t match will be retained. By default, if an event from the first list does not have any matching events in the other lists it is removed entirely. If you would like to keep these events set the fourth parameter to True.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s