Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many dates read from UFDR can be decoded using a wrong timezone #1879

Closed
wladimirleite opened this issue Sep 14, 2023 · 13 comments · Fixed by #1884
Closed

Many dates read from UFDR can be decoded using a wrong timezone #1879

wladimirleite opened this issue Sep 14, 2023 · 13 comments · Fixed by #1884
Assignees
Labels

Comments

@wladimirleite
Copy link
Member

This was reported by another user and took me a while to figure it out, as it seems to be a rare issue.
In that particular case, there was a question about the calls made in a very specific period of time, and the results taken from IPED were different from the ones shown by CellebriteReader (opening the UFDR directly).
There were several calls with wrong dates, but many others with correct values.
An example below:

IPED:             2020-05-18T18:06:19Z
CellebriteReader: 18/05/2020 21:06:19(UTC+0)
XML:              2020-05-18T18:06:19.000-03:00

image
image

<field name="TimeStamp" type="TimeStamp">
  <value type="TimeStamp" format="TimeStampKnown" formattedTimestamp="2020-05-18T18:06:19-03:00">2020-05-18T18:06:19.000-03:00</value>
</field>
@wladimirleite wladimirleite self-assigned this Sep 14, 2023
@wladimirleite
Copy link
Member Author

wladimirleite commented Sep 14, 2023

Looking at the code, I found that this issue was caused by the usage of lastDateFormat in UfedXmlReader class.

The original date in the example was "2020-05-18T18:06:19.000-03:00".
There are 6 DateFormat's defined in the reader (df1, df2 etc.).
The problem was caused by the fact that for this particular date format, two of them will work, producing different results:

df1 ("yyyy-MM-dd'T'HH:mm:ss.SSSXXX") : 2020-05-18T21:06:19Z
df2 ("yyyy-MM-dd'T'HH:mm:ss.SSS") : 2020-05-18T18:06:19Z

The correct one is the first. It should always be tried before the second, as the latter is a substring of the former.
If a date does not contain the timezone, it is parsed by df2 (df1 fails). Then lastDateFormat is set to df2. And if the next date to be parsed is in the same format as I mentioned above, the timezone information is lost, as lastDateFormat (which was set to df2) will be used before trying df1.

@lfcnassif
Copy link
Member

Good catch! Thanks for investigating this @tc-wleite! Seems lastDateFormat optimization wasn't a good idea from my side...

@wladimirleite
Copy link
Member Author

I am inspecting ~1000 UFDRs, generated by different users, using different Cellebrite programs versions, including the most recent ones, to see all date formats present and how frequently each one appears.
That should allow checking if the 6 supported date formats are still enough, and which would be the best order to try them.

@wladimirleite
Copy link
Member Author

This test was a bit overkill :-)
Anyway, at least no unhandled date formats were found, and the vast majority of the dates use the first format (df1). In fact, only df1 and df2 were used.
So there won't be any performance penalty caused by removing lastDateFormat.

df1: 360,887,518
df2:   9,146,514

I will submit a PR fixing this issue.

@lfcnassif
Copy link
Member

Thank you @tc-wleite!

lfcnassif added a commit that referenced this issue Sep 16, 2023
Always try DateFormats in the same order, to avoid timezone loss (#1879)
@lfcnassif
Copy link
Member

lfcnassif commented Sep 18, 2023

Hi @tc-wleite. I understood when df2 is used, all remaining df1 dates could be parsed using it then. Do you think this issue is enough to roll a 4.1.5 fixing release?

@lfcnassif
Copy link
Member

If you think this is common (and not rare), we can think about publishing a new release, since, in my opinion, silent issues producing wrong results are worse than issues missing results or explicit ones (aborting bugs, infinite loops, etc).

@wladimirleite
Copy link
Member Author

Hi @tc-wleite. I understood when df2 is used, all remaining df1 dates could be parsed using it then.

Correct, as it seems that there are no more dates using df3 to df6 (at least in the UFDRs I used to test).

If you think this is common (and not rare), we can think about publishing a new release, since, in my opinion, silent issues producing wrong results are worse than issues missing results or explicit ones (aborting bugs, infinite loops, etc).

Well, I think it is quite common.
Two typical behaviors (number of dates parsed by each DateFormat, after the bug fix, as they are read from the XML file):

UFDR 1
    df1 : 42567
    df2 : 5
    df1 : 7492 (*)

UFDR 2
   df1 : 1031495
   df2 : 267
   df1 : 58000  (*)
   df2 : 88
   df1 : 74116  (*)
   df2 : 57614 
   df1 : 8123  (*)
   df2 : 3
   df1 : 39619 (*)
   df2 : 380

(*) means that these dates would lose their timezone (before the bug fix).

In most files, there a lot of "df1 dates" in the beginning.

@lfcnassif
Copy link
Member

Thanks! And about df2 dates, does Cellebrite software display some timezone for them? We are using local timezone for conversion right?

@wladimirleite
Copy link
Member Author

Great question!
I will check and let you know.

@wladimirleite
Copy link
Member Author

And about df2 dates, does Cellebrite software display some timezone for them?

The time zone to be displayed is configurable (I use UTC+0, but it can be set to another time zone or to the device's timezone). But as this is dinamic, these options don't affect how the extracted data is interpreted (and written in the report.xml).

We are using local timezone for conversion right?

I checked a few cases, and we are using UTC+0, which seems correct, as the dates shown in Cellebrite tools match with the ones present by IPED.
And in the code there is a df.setTimeZone(TimeZone.getTimeZone("GMT"));.

@lfcnassif
Copy link
Member

I checked a few cases, and we are using UTC+0, which seems correct, as the dates shown in Cellebrite tools match with the ones present by IPED.

Thank you @tc-wleite for checking!

And in the code there is a df.setTimeZone(TimeZone.getTimeZone("GMT"));.

I was out of my computer, so I couldn't check, sorry...

@wladimirleite
Copy link
Member Author

I was out of my computer, so I couldn't check, sorry...

It was important to check this, as it could lead to another hard-to-notice issue. But it seems to be consistent now.

@lfcnassif lfcnassif changed the title Some dates read from UFDR may lost timezone information Many dates read from UFDR can miss timezone information Oct 4, 2023
@lfcnassif lfcnassif changed the title Many dates read from UFDR can miss timezone information Many dates read from UFDR can be decoded using a wrong timezone Oct 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants