Skip to content

Timestamp methods #17876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Oct 15, 2017 · 11 comments
Closed

Timestamp methods #17876

jbrockmendel opened this issue Oct 15, 2017 · 11 comments
Labels
Internals Related to non-user accessible pandas implementation

Comments

@jbrockmendel
Copy link
Member

jbrockmendel commented Oct 15, 2017

[ ] - Timestamp classmethods should return cls, not hard-coded Timestamp
[ ] - ... if and only if is_timestamp is updated to recognized subclasses.
[ ] - Timestamp._get_field does not need to be in the user-facing namespace
[x] - Timestamp._get_named_field does not need to be in the user-facing namespace
[ ] - Timestamp._get_start_end_field does not need to be in the user-facing namespace
[ ] - NaT / timedelta should behave like NaT / Timedelta #17955
[ ] - _assert_tzawareness_compat doesn't actually return the declared return type
[ ] - Is it the case that _localize_tso argument obj will always have obj.tzinfo == None? This is the case in all existing usages.

A few quick questions:

  1. __new__ returns Timestamp instead of cls. Is this intentional? Ditto Timestamp.replace returning Timestamp instead of self.__class__.

    • This might be solved by making create_timestamp_from_ts a classmethod. Maybe this function was written before cython supported cdef classmethods?
  2. Similarly, is_timestamp will miss subclasses of Timestamp. That is appreciably faster than isinstance(obj, Timestamp), so this may just be something we have to live with.

  3. Some Timestamp methods are cpdef but may not need to be user-facing, i.e. could just be cdef: _get_field, _get_named_field, _get_start_end_field

  4. Timestamp._get_named_field is only used once, for weekday_name. This calls

    val = self._maybe_convert_value_to_local()
    out = get_date_name_field(np.array([val], dtype=np.int64), field)
    return out[0]

which seems like huge overkill for return {0: 'Monday', ..., 6: 'Sunday'}[self.weekday()]. Was there a decision at some point that the latter is less performant?

@gfyoung gfyoung added the Internals Related to non-user accessible pandas implementation label Oct 15, 2017
@jreback
Copy link
Contributor

jreback commented Oct 15, 2017

4. is not locale safe (and less general); imagine returning month names for example
1. prob could be changed
2. not sure this actually matters
3. prob could be changed

@jbrockmendel
Copy link
Member Author

Yah, definitely all super-low priority.

Not sure about your numbering scheme, except for 4. locale-safeness would be a good reason for the status-quo, but the current version also hard-codes English names.

In a quick %timeit comparison, using a dict lookup is ~3x faster.

@jorisvandenbossche
Copy link
Member

@jbrockmendel fixed the numbering (sometimes it is a bit annoying that you can write whathever number in markdown and it fixes it for you :-))

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Oct 17, 2017

Related enough:

>>> pd.NaT / pd.Timedelta(seconds=1)
nan
>>> pd.NaT / pd.Timedelta(seconds=1).to_pytimedelta()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'NaTType' and 'datetime.timedelta'
  1. Treating NaT/timedelta the same as NaT/Timedelta would amount to changing an isinstance(other, Timedelta) check to isinstance(other, timedelta) (or actually, a more performant C version). Is treating these differently an intentional choice?

@jbrockmendel
Copy link
Member Author

  1. prob could be changed
  2. not sure this actually matters

Fair enough. Changing 1 without changing 2 is liable to cause headaches down the road, so I'm going to mentally mark these as "wont fix"

@jbrockmendel
Copy link
Member Author

jbrockmendel commented Oct 17, 2017

Two more:

  1. _localize_tso docstring reads Take a TSObject in UTC and localizes to timezone tz.. In practice I think obj.tzinfo is always None on the way in, but there are no checks to that effect. Is there any scenario in which this function would be called and obj.tzinfo would be non-null (and non-UTC)?

  2. Timestamp method cdef int _assert_tzawareness_compat(_Timestamp self, object other) except -1: never actually returns an int, so the return type can be removed -- and with it the ugly "except -1" clause. Sidenote: the cython people tell me that declaring _Timestamp self is perfunctory.

@jbrockmendel
Copy link
Member Author

Updated numbering scheme above. 1-4 have been answered. 5) will be actionable if I get the go-ahead.

@jreback
Copy link
Contributor

jreback commented Oct 23, 2017

pd.NaT / pd.Timedelta(seconds=1).to_pytimedelta()
Traceback (most recent call last):
File "", line 1, in
TypeError: unsupported operand type(s) for /: 'NaTType' and 'datetime.timedelta'

should probably work, I think if we can de-couple NaT from datetime/timedelta directly (but make it act like duck), then a few things will be easier

@jbrockmendel
Copy link
Member Author

should probably work, I think if we can de-couple NaT from datetime/timedelta directly (but make it act like duck), then a few things will be easier

OK. Implementing this in #17955. Regardless of the end-game for de-coupling NaT, #17793 should be a step in the right direction.

@jreback
Copy link
Contributor

jreback commented Oct 23, 2017

can you update top section with check boxes as needed (then can reference the closing PR).

@jbrockmendel jbrockmendel mentioned this issue Oct 27, 2017
59 tasks
@jbrockmendel
Copy link
Member Author

Just ported this list above over to the TODO list in #17652; this can be closed if you want to de-clutter the tracker.

@jreback jreback closed this as completed Oct 27, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

No branches or pull requests

4 participants