'AppleHealthAnalysis' is a R package to import and analyse data generated by the iOS Apple Health app. This includes data that is entered by the user, from iOS apps, as well as from any Apple Watch data generated.
The main function of the app is to parse the XML file, and convert the data into a R data frame. From there on the user can use the built in functions of R to analyse and plot the data. There will be some built in analysis functions for convenience, however with the amount of data the user can use their imagination to find out ways to analyse their own data.
I'm a fan of Hadley Wickham's tidyverse, and dplyr in particular, however have a bit to go to use them to their full potential. I've kept the imported data as dataframes, however changing them to tibbles may make it more efficient in the future.
For modern computers there should not be too much of a problem in using the package to analyse Apple Health data. However as the amount of data in an individuals Apple Health app grows with time this may cause problems with memory allocation in R, especially with computers that are still have 2/4/8GB of RAM. As far as I understand R needs to hold all data in memory, however there may be clever people who can get around this. I think the Microsoft R implementation may be able to do disk based streaming of data.
As I was developing the package, I noted the following memory use when stepping through each step in the XML import (and this is with a fairly small Apple Health data set!):
- 14 MB - Apple Health exported ZIP file
- 28.2 MB - unzipped XML file (plus a separate CDA xml file: not sure what this does)
- 46 MB - XML extract containing the XML "Records" elements
- 109.3 MB - R list of the "Records" elements (the ones we are interested in)
- 3.2 MB - the resulting exported XSLX file
So the main memory requirements are on XML extraction. I think dplyr's piping system is the most efficient way of getting the data out of the XML file, but suggestions on how to improve resource requirements would be gratefully appreciated.
Download the GitHub repository and open the R Studio project. Then do the following:
# Get some Apple Health data an put the export.xml file into your directory
library(devtools)
install_github("deepankardatta/AppleHealthAnalysis")
library(AppleHealthAnalysis)
health_data <- ah_import_xml("export.xml")
ah_shiny(health_data)
# Explore
- Get feedback on the code efficiency
- Find someone to help make the code more efficient
- If data parsing takes a long time think about a progress bar
- Get some more built in reports
- Work out if there is anymore useful data in the exported XML files to use
- Shiny dashboard
- Data comparisons
I had a look at a few things to help make this package
- http://www.ryanpraski.com/apple-health-data-how-to-export-analyze-visualize-guide/#4
- https://gist.github.com/ryanpraski/ba9baee2583cfb1af88ca4ec62311a3d
- http://rpubs.com/Ranthony__/visualizing-iphone-health-app-data-in-R
- http://www.tdda.info/in-defence-of-xml-exporting-and-analysing-apple-health-data