datetime optimization

Hi,
I noticed that datetime parsing for large sql/csv tables is really slow. Would it be acceptable to use a technique where repeated calculations are cached? For example, instead of:

``` python
def parse_date(date_str) :
    return datetime.datetime.strptime(date_str,FMT)
def parse_date_col(str_col) :
    return [parse_date(date_str) for date_str in str_col]
```

use

``` python
def parse_date(date_str) :
    return datetime.datetime.strptime(date_str,FMT)
def parse_date_col(str_col) :
    cache = dict()
    for date_str in str_col :
        if date_str not in cache :
            cache[date_str] = parse_date(date_str)
    return [cache[date_str] for date_str in str_col]
```

The reason this works is that string hashing / comparison / dictionary insertion is much much faster than strptime.

For tables where dates are repeated many times this can result in orders of magnitude speedup.

Thanks
Charles


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

datetime optimization #9594

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

datetime optimization #9594

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions