-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python][Compute] Make pyarrow.compute.strptime %Y handling consistent with time.strptime #38528
Comments
Thanks for reporting this @no23reason! Pyarrow uses system's strptime and hence doesn't really follow Python's strptime conventions. |
Thank you @rok or the clarification and the umbrella link. I played some more with the C version of strptime (https://onlinegdb.com/Et55gUIpp) and indeed, it parses the years with less than 4 digits 🤯 |
Unfortunately dialects can't be reconciled. We could add a dialect parameter to control behavior but I suspect it would be a significant amount of work and we're not seeing strong demand for it. |
Makes sense. Maybe it would be enough to mention it in the docs of the pyarrow compute, that the function follows C++ semantics, not Python ones 🤔 |
@no23reason that would be good! Want to open a PR? |
Sure, I'll try to get to it later today :) |
Sorry for radio silence, I was able to extend the documentation locally, but was struggling with the python build so that I could preview the changes in the built docs. Will give it some more time when I'm able. |
No worries! |
To prevent possible confusion with the compute strptime function, we now explicitly mention that the C/C++ semantics are followed.
Hm, seems that as a first time contributor, I cannot run the preview: #38665 can you please take a look? :) |
It seems so! I started the CI and the docs build. |
Hm, it seems something is wrong with the CI :( I tried running the
let's hope the CI comes around :) |
Splits a long sentence, makes the language more directive. Co-authored-by: Rok Mihevc <rok@mihevc.org>
### Rationale for this change To prevent possible confusion with the compute strptime function, we now explicitly mention that the C/C++ semantics are followed. ### What changes are included in this PR? The documentation of the `format` parameter of the `strptime` function is expanded. ### Are these changes tested? N/A documentation change only. ### Are there any user-facing changes? Just the documentation. * Closes: #38528 Lead-authored-by: Dan Homola <dan.homola@hotmail.cz> Co-authored-by: Dan Homola <dan.homola@gooddata.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Rok Mihevc <rok@mihevc.org>
…pache#38665) ### Rationale for this change To prevent possible confusion with the compute strptime function, we now explicitly mention that the C/C++ semantics are followed. ### What changes are included in this PR? The documentation of the `format` parameter of the `strptime` function is expanded. ### Are these changes tested? N/A documentation change only. ### Are there any user-facing changes? Just the documentation. * Closes: apache#38528 Lead-authored-by: Dan Homola <dan.homola@hotmail.cz> Co-authored-by: Dan Homola <dan.homola@gooddata.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Rok Mihevc <rok@mihevc.org>
…pache#38665) ### Rationale for this change To prevent possible confusion with the compute strptime function, we now explicitly mention that the C/C++ semantics are followed. ### What changes are included in this PR? The documentation of the `format` parameter of the `strptime` function is expanded. ### Are these changes tested? N/A documentation change only. ### Are there any user-facing changes? Just the documentation. * Closes: apache#38528 Lead-authored-by: Dan Homola <dan.homola@hotmail.cz> Co-authored-by: Dan Homola <dan.homola@gooddata.com> Co-authored-by: Rok Mihevc <rok@mihevc.org> Signed-off-by: Rok Mihevc <rok@mihevc.org>
Describe the enhancement requested
The
pyarrow.compute.strptime
handles the%Y
format part (i.e. 4-digit year) differently from the built-intime.strptime
:When the year part of the input has only two digits,
time.strptime
fails to parse it, whilepyarrow.compute.strptime
parses it with no error yielding a result with a year in the 1st century.For example
I believe the
pyarrow.compute.strptime
should also fail in this case as it most likely means that the format is wrong.Component(s)
Python
The text was updated successfully, but these errors were encountered: