-
Notifications
You must be signed in to change notification settings - Fork 243
Fixed invalid number mappings for datetime strings #157
Conversation
osxcollector/osxcollector.py
Outdated
@@ -355,10 +355,11 @@ def _normalize_val(val, key=None): | |||
:returns: A string | |||
""" | |||
# If the key hints this is a timestamp, try to use some popular formats | |||
if key and any([-1 != key.lower().find(hint) for hint in ['time', 'utc', 'date', 'accessed']]): | |||
if key and any([-1 != key.lower().find(hint) for hint in ['time', 'utc', 'date', 'accessed']]) and not 'times' in key.lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas if we could refactor the first part of the if statement to also use the in
keyword instead of any
+ find
+ in
combo?
osxcollector/osxcollector.py
Outdated
if ts: | ||
return _datetime_to_string(ts) | ||
if not ts: | ||
ts = datetime.fromtimestamp(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment about what is going on here. It maybe not clear why we are assigning the default datetime here when there is not ts
calculated before.
…t conforming to timestamp heuristics
9c2e5c1
to
2c1e80e
Compare
Currently the heuristics for checking if a key maps to a timestamp is seeing if the key A contains certain keywords, and B has a value that is within a certain epoch timeframe. Timestamps failing to meet B should thus be given a default timestamp value. These fields must not however be grouped together with non-timestamp keys meeting A, which represent a huge subset. One solution would be to whitelist all known timestamp keys for timestamp rendering, and giving those whose values fail to be rendered the default timestamp. However, this would be overkill since there are hundreds if not thousands of osxcollector-output timestamp-based keys. The most cost-effective solution right now in my opinion is to simply check for known timestamp keys failing to meet B . From what I've seen, this list is quite small, and we can for now just check it in the code by means of list filtering as done here. As an extension this could be instead made into a JSON containing all keys with known mapping issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ship it!
This fixes GH-156, where timestamp keys whose values could not be correctly rendered to timestamp strings were ultimately mapped to their original Number types instead of string types. As described in the issue, the problem with this was that when later ingested by analytics frameworks like Elasticsearch, type conflicts could arise since some values were interpreted as longs and others as strings. The fix is to by default return a very early
datetime
if rendering fails.