Optimize read_datetime #1019

jverswijver · 2022-05-03T14:40:58Z

Switching from floor division to string processing results in ~35% speedup in execution time for read_datetime after profiling with cProfile.

zitrosolrac

I reviewed the performance files using SnakeViz and they reflect the performance upgrade.

dimitri-yatsenko · 2022-05-11T14:13:56Z

datajoint/blob.py

+                year=int(date_str[:4]) if date_str[:4] != "" else 0,
+                month=int(date_str[4:6]) if date_str[4:6] != "" else 0,
+                day=int(date_str[-2:] if date_str[-2:] != "" else 0),


Suggested change

year=int(date_str[:4]) if date_str[:4] != "" else 0,

month=int(date_str[4:6]) if date_str[4:6] != "" else 0,

day=int(date_str[-2:] if date_str[-2:] != "" else 0),

year=int(date_str[:4]) or 0,

month=int(date_str[4:6]) or 0,

day=int(date_str[-2:] or 0),

dimitri-yatsenko · 2022-05-11T14:20:01Z

datajoint/blob.py

+                hour=int(time_str[-12:-10]) if time_str[-12:-10] != "" else 0,
+                minute=int(time_str[-10:-8]) if time_str[-10:-8] != "" else 0,
+                second=int(time_str[-8:-6]) if time_str[-8:-6] != "" else 0,
+                microsecond=int(time_str[6:12]) if time_str[6:12] != "" else 0,


Suggested change

hour=int(time_str[-12:-10]) if time_str[-12:-10] != "" else 0,

minute=int(time_str[-10:-8]) if time_str[-10:-8] != "" else 0,

second=int(time_str[-8:-6]) if time_str[-8:-6] != "" else 0,

microsecond=int(time_str[6:12]) if time_str[6:12] != "" else 0,

hour=int(time_str[-12:-10] or 0),

minute=int(time_str[-10:-8] or 0),

second=int(time_str[-8:-6] or 0),

microsecond=int(time_str[6:12] or 0),

dimitri-yatsenko · 2022-05-11T14:45:32Z

datajoint/blob.py

        date = (
-            datetime.date(year=date // 10000, month=(date // 100) % 100, day=date % 100)


Why the change? Here are some timing tests:

dimitri-yatsenko · 2022-05-11T14:51:11Z

datajoint/blob.py

            if date >= 0
            else None
        )
        time = (
            datetime.time(
-                hour=(time // 10000000000) % 100,


Why is this better? Here is the timing test:

The string parsing takes longer according to %%timeit tests and is more verbose.

The way that I found that this results in a speedup is I profiled the unpacking of a nparray of 100000 datetime objects and then I overloaded the read_datetime method and profiled unpack again. When looking at the cProfile results the string process method resulted in less total time spent in read_datetime. But it seems like when you profile it you get different results.

Do you want to tag-up on this sometime? Also I have a python script that generates the cProfile performance profiling which you can then visualize using a python package called snakeviz, I can send you this script and we can look at it to see if there is some error in my logic.

I will attach pictures of the visualized performance profiles below.

before overload:

after:

specifically I looked at the decrease in total execution time as well as the totime column which represents total time spent in each method across all method calls. Please let me know what you think @dimitri-yatsenko

I just don't see a compelling reason why the string processing would produce a speedup. I think the reason the blob deserialization is slow is because of python's need to loop through the numbers and calling datetime.date separately for each time and each date.

A real speedup can potentially be produced by using numpy.datetime64 type support.

Yes, I also think we could speed up the process by multiprocessing the unpacking of arrays as well.

Here is a 500x improvement in decoding speed:

np.datetime64 did not exist when we made the original time serializer.

For now, we can recommend the workaround for storing datetimes as int64 as shown. We can add native support for the datetime64 data type, which would eliminate the need for converting into uint64 and back.

dimitri-yatsenko · 2022-05-11T15:01:48Z

The string parsing takes longer according to %%timeit tests and is more verbose.

jverswijver · 2022-06-15T20:21:07Z

superseded by #1036

Optimize read_datetime

3262429

zitrosolrac self-requested a review May 4, 2022 19:26

zitrosolrac approved these changes May 4, 2022

View reviewed changes

dimitri-yatsenko requested changes May 11, 2022

View reviewed changes

jverswijver mentioned this pull request May 12, 2022

Support for storing numpy datetime64 type #1022

Closed

jverswijver mentioned this pull request Jun 15, 2022

add np.datetime64 serialization and tests #1036

Merged

jverswijver closed this Jun 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize read_datetime #1019

Optimize read_datetime #1019

Uh oh!

jverswijver commented May 3, 2022

Uh oh!

zitrosolrac left a comment

Uh oh!

dimitri-yatsenko May 11, 2022

Uh oh!

dimitri-yatsenko May 11, 2022

Uh oh!

dimitri-yatsenko May 11, 2022

Uh oh!

dimitri-yatsenko May 11, 2022

Uh oh!

jverswijver May 11, 2022

Uh oh!

dimitri-yatsenko May 11, 2022

Uh oh!

dimitri-yatsenko May 11, 2022

Uh oh!

jverswijver May 11, 2022

Uh oh!

dimitri-yatsenko May 11, 2022 •

edited

Loading

Uh oh!

dimitri-yatsenko May 11, 2022

Uh oh!

dimitri-yatsenko May 11, 2022 •

edited

Loading

Uh oh!

dimitri-yatsenko commented May 11, 2022

Uh oh!

jverswijver commented Jun 15, 2022

Uh oh!

Uh oh!

		date = (
		datetime.date(year=date // 10000, month=(date // 100) % 100, day=date % 100)

Optimize read_datetime #1019

Optimize read_datetime #1019

Uh oh!

Conversation

jverswijver commented May 3, 2022

Uh oh!

zitrosolrac left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitri-yatsenko May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitri-yatsenko May 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dimitri-yatsenko commented May 11, 2022

Uh oh!

jverswijver commented Jun 15, 2022

Uh oh!

Uh oh!

dimitri-yatsenko May 11, 2022 •

edited

Loading

dimitri-yatsenko May 11, 2022 •

edited

Loading