I have exported my Apple Health data using the Health app’s export functionality. We now have a folder of .xml files to deal with. Eventually I’ll switch to using my partner’s tracking data–it will be a far better proxy of the baby’s sleep habits–but she is currently at a baby class so I’ll need to grab it later.
Let’s take a look at export.xml which I think should contain the sleep data.
Instead of spending more time trying to parse the xml file myself, let’s turn to Google. This site seems to have some explanations of what’s going on with the Apple Watch data. It looks like indeed HKQuantityTypeIdentifier is just a prefix for a field name. There should be a field called HKCategoryTypeIdentifierSleepAnalysis which sounds like what I am after!
Exploring the data
This blog gives a snippet that should convert the entire xml import into a dataframe. This has saved a ton of time!
Warning: There was 1 warning in `mutate()`.
ℹ In argument: `value = as.numeric(value)`.
Caused by warning:
! NAs introduced by coercion
I don’t really understand what’s going on here yet so let’s try to pull it apart bit by bit.
According to the snippet the information I should be interested in is contained in health['//Record']. Inspecting this with typeof(health['//Record']) I can see that it’s a list so let’s just grab the first element:
health['//Record'][[1]] # remember that R is 1-indexed, unlike Python!
Ok now we’re talking. It looks like those HKQuantityTypeIdenifiers are identified with some field called type. We also seem to have some metadata about where this data came from - seems like this entry is from the Apple Health app? We have some info about the app itself (perhaps ‘15.6.1’ corresponds to the Health app, or it could be iOS version maybe?).
We have a few datetime fields that I’ll need to dig into, and then a value ‘6.3333’ and a unit ‘ft’. It looks like this entry must have been when I entered my height when I first got my iPhone.
Three records of my heart rate, by the looks of things. I was right that our previous sourceName corresponded to iPhone Health app - clearly these three entries are from my Apple Watch.
The rest of the earlier snippet just looks like it’s converting these entries into a dataframe using XML:::xmlAttrsToDataFrame(). I can’t easily find documentation for this but I will search harder later. In particular I will want to see whether it has more arguments than just stringsasfactors.
Taking a look at the dataframe we have made:
summary(health_df)
type sourceName sourceVersion unit
Length:505685 Length:505685 Length:505685 Length:505685
Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character
creationDate startDate endDate value
Length:505685 Length:505685 Length:505685 Min. : 0.000
Class :character Class :character Class :character 1st Qu.: 0.365
Mode :character Mode :character Mode :character Median : 1.266
Mean : 27.002
3rd Qu.: 35.696
Max. :1290.000
NA's :17576
dim(health_df)
[1] 505685 8
Okay so we have half a million rows and eight columns. Let’s take a look inside:
We’ve already cast our values to numeric, which is great. Looks like we have some metadata (sourceName, sourceVersion) about how the data was collected. I should take a quick glance at these fields but I can probably ignore them.
The type field looks to contain my different health data, so I’ll want to filter by that. unit is probably nothing I need to think about, but once I’ve found my sleep data I should just check that I can safely ignore the field.
Then we have creationDate, startDate and endDate. I will need to cast these to datetimes (& work out how to work with datetimes in R!) and then work out how they relate to the value field for the sleep data that I am interested in. I assume that the relationship between the datetime fields and value will differ depending on type so there’s not going to be a one-size-fits-all approach here.
In the next post I will start to dig in to these fields!