Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Pulled is Inconsistent with Apple Health #3

Open
benjaminwnelson opened this issue Mar 1, 2018 · 22 comments
Open

Data Pulled is Inconsistent with Apple Health #3

benjaminwnelson opened this issue Mar 1, 2018 · 22 comments
Assignees

Comments

@benjaminwnelson
Copy link
Collaborator

Hi,

I've been working through the downloaded .xml file and used your package to convert it to a dataframe, but when I look at data from specific times they do not match that data from that time on the health app on my phone. Any ideas how this could be?

Thanks again for an awesome package.

Best,
Ben

@deepankardatta
Copy link
Owner

It's difficult to actually say without examining the data. Do you have an example?

The one thing that I did notice is that Apple Health data is in quite small time periods which is then aggregated for hourly totals.

The other potential option is that it could be due to time zone conversion. You could try manipulating the data with the lubridate package, or tinker with the code in the library, to see if it makes a difference?

@benjaminwnelson
Copy link
Collaborator Author

benjaminwnelson commented Mar 7, 2018

Here's a picture from the health app vs the final heart rate output from the xml file. You can see that the sampling time is not lined up and the values are different. Note that the heart rate value is column number 5 in the excel picture below.

screen shot 2018-03-07 at 7 10 50 am

excel

@deepankardatta
Copy link
Owner

Interesting. Will have to look at my own data. This might take a bit of time to work out what's going on as busy with a few other things. Are you good at R and able to have a look yourself to see where the issue is?

@benjaminwnelson
Copy link
Collaborator Author

I'm pretty busy at the moment as well. I will try to look into this when I have some time. I'll let you know if I figure anything out.

@deepankardatta
Copy link
Owner

My data input times seems to match up mostly with my Apple Health iOS display. (They are sometimes off by a minute or two on the phone, but the excel output I have shows the precision of the timing in the Apple Watch data).

I've update the library with some minor changes. Would you be able to send me a snippet of your excel file please, but in a slightly different format as below please just for the timings you sent above?

library(AppleHealthAnalysis)
library(openxlsx)
health_data <- ah_import_xml("unused/export.xml")
write.xlsx( health_data , "unused/health_data.xlsx" )

It might be that this is all just due to time zone formats, or daylight savings time, but no way to tell without looking at the data.

@deepankardatta deepankardatta self-assigned this Mar 9, 2018
@benjaminwnelson
Copy link
Collaborator Author

benjaminwnelson commented Mar 9, 2018

Here it is. Yes, I'm wondering if the time zone could have to do with it. When I load the package it says,

Warning message:
In format.POSIXlt(as.POSIXlt(x, tz), format, usetz, ...) :
unknown timezone 'zone/tz/2018c.1.0/zoneinfo/America/Los_Angeles’

Here is a screenshot of the excel file in the format you specified:

screen shot 2018-03-09 at 3 08 08 pm

@deepankardatta
Copy link
Owner

I assume I'm working at GMT. LA should be GMT-8. Could you have a look at the readings around these offset times to see if that actually matches up?

@benjaminwnelson
Copy link
Collaborator Author

Unfortunately, it looks like that isn't the issue. If I subtract or even add 8 hours the values still don't line up. I checked on a random value for a different day and the value was off by around 60.

screen shot 2018-03-10 at 8 44 53 am

@deepankardatta
Copy link
Owner

Not sure what to make of this. What OS and version of R are you using? (Is it an OS problem e.g. tidyverse/lubridate#615 (comment))

Might also be a timezone issue as apparently this can be OS dependent.

The other way to investigate the other way round and note the heart rates from the iOS display and see where in your excel file that pattern fits?

@deepankardatta
Copy link
Owner

Could you try the following for me to see what your computer does, after building the github version of the library please?

library(AppleHealthAnalysis)
library(lubridate)
test_time <- ymd_hms("2010-12-13 15:30:30")
tz(test_time)
health_data <- ah_import_xml("export.xml")
tz(health_data)

@deepankardatta
Copy link
Owner

Or actually even try this (feel free to skip steps above that have already been done):

library(AppleHealthAnalysis)
library(lubridate)
library(openxlsx)
Sys.timezone()
health_data <- ah_import_xml("export.xml")
tz(health_data$endDate)
health_data$endDate2 <- force_tz( health_data$endDate , tzone = "America/Los_Angeles", roll = FALSE)
tz(health_data$endDate2)
write.xlsx( health_data , "health_data_force_tz.xlsx" )

I wonder if there is an issue with what the OS timezone database is recognising what a valid timezone is. These lines should force the Los Angeles time zone onto a new column on a dataframe

@benjaminwnelson
Copy link
Collaborator Author

For your post stating, "Could you try the following for me to see what your computer does, after building the github version of the library please?

library(AppleHealthAnalysis)
library(lubridate)
test_time <- ymd_hms("2010-12-13 15:30:30")
tz(test_time)
health_data <- ah_import_xml("export.xml")
tz(health_data)"

Here is a screenshot:
screen shot 2018-03-12 at 8 25 47 am

@benjaminwnelson
Copy link
Collaborator Author

benjaminwnelson commented Mar 12, 2018

Just ran the second set of code you send. It creates a new column for date/time, but it matches exactly the column that is already there.

When I visually inspect values they are still off.

In addition, it's now not pulling dates past 3/4/18.

screen shot 2018-03-12 at 8 33 28 am

@deepankardatta
Copy link
Owner

Remind me again what OS are you using? sys.timezone() should return something.

The second column of dates (force_tz) get R to think all times are in Los Angeles time zone. In the first column it is presumed to be UTC, which could have been a reason that the times were off (presuming you were actually in Los Angeles).

I'm actually out of ideas without actually being at your computer to see what's going on. Unless there's someone else who is using the package to test their data who can replicate and help debug what is going on.

In summary
(1) My exported data seems to be matching up
(2) Your exported data isn't matching up - not sure why
(3) Your computer has an odd timezone issue - however not sure how this would affect your data
(4) Not sure why you can't pull dates past 3/4/18. I've changed nothing to prevent this, and again I can only think you've maybe accidentally used an old data file?

I do not think there is much more I can do to help at this time. If you're export.xml file is small enough the one option is that I can take a look at that directly to see if there's anything else I can do to it - DM me if you want to do this.

@benjaminwnelson
Copy link
Collaborator Author

I ran sys.timezone() and got Error in sys.timezone() : could not find function "sys.timezone"

I'll try to figure out more on my end and then update you.

@deepankardatta
Copy link
Owner

Sorry - the S in Sys.timezone() should be capitalised as in the output you posted at the top

@benjaminwnelson
Copy link
Collaborator Author

This code transformed the time and now the values line up.

health_data$new_date<- lubridate::ymd_hms(health_data$endDate) - lubridate::hours(7)

@deepankardatta
Copy link
Owner

Fantastic. Glad to hear it's working. Was it just an issue with the Apple Watch which the reset fixed then?

Could you kindly try this to see if this also works:

library(AppleHealthAnalysis)
library(lubridate)
health_data$new_date_2 <- with_tz( health_data$endDate , tzone = "America/Los_Angeles" )

If it does I can see if I can integrate it into the package

@benjaminwnelson
Copy link
Collaborator Author

That partly worked. It worked for every row, but the first two rows.

@deepankardatta
Copy link
Owner

Couldn't explain why. However let's go for small victories, and that your dataset is working!! Hopefully there will be more people to test the library as it develops.

There is a preliminary shiny dashboard that you can use to look at some of the data (not fully functional though!). Per the other thread, for idle and active time you might need to do some addition and subtraction.

I hope you have fun exploring the data!

@benjaminwnelson
Copy link
Collaborator Author

Thanks so much for assisting in figuring this out! This is a great package!

@EricGoldsmith
Copy link
Collaborator

FWIW, this is the pattern I use for converting from Apple Health's UTC times to my local timezone:

healthData <- ah_import_xml("export.xml") %>%
  mutate(endDate = with_tz(endDate, tzone = "US/Pacific"))

Note the modification of the date column in-place (rather than creating a new column - less confusion over which to use), and the simplified timezone name (more obvious what timezone you're in). Requires dplyr and lubridate packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants