-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix segmentation fault when JSON serializing a PeriodIndex #47431
Fix segmentation fault when JSON serializing a PeriodIndex #47431
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment
doc/source/whatsnew/v1.4.3.rst
Outdated
@@ -30,6 +30,7 @@ Fixed regressions | |||
- Fixed regression in :func:`assert_index_equal` when ``check_order=False`` and :class:`Index` has extension or object dtype (:issue:`47207`) | |||
- Fixed regression in :func:`read_excel` returning ints as floats on certain input sheets (:issue:`46988`) | |||
- Fixed regression in :meth:`DataFrame.shift` when ``axis`` is ``columns`` and ``fill_value`` is absent, ``freq`` is ignored (:issue:`47039`) | |||
- Fixed regression in :meth:`DataFrame.to_json` when ``index`` is of the type ``PeriodIndex`` causing a segmentation violation (:issue:`46683`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:class:Index
and class:PeriodIndex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index
refers to the parameter in the constructor of DataFrame
and not the class. Is there some kind of markup to indicate parameters?
The PeriodIndex
has been marked as class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for parameter types then the double ticks is fine. However, the note refers to to_json
so it is not obvious that the index
is a parameter name. perhaps reword to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the confusion. Is the new version clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think marking this as class too is clearer, since both are interchangable in this context. Otherwise, please add something like when argument ìndex
is ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@@ -238,8 +238,10 @@ static PyObject *get_values(PyObject *obj) { | |||
PyErr_Clear(); | |||
} else if (PyObject_HasAttrString(values, "__array__")) { | |||
// We may have gotten a Categorical or Sparse array so call np.array | |||
PyObject *array_values = PyObject_CallMethod(values, "__array__", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know how the memory management work here? I'm not entirely sure that is safe to reassign to values
after decrementing. It may be happen-stance that this improves the odds of delaying garbage collection. Maybe we can just return array_values here directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In C there is no Python style reference counting. I must say that I assume that the Python object returned by PyObject_CallMethod
has at least a reference count of 1, so that it does not get deallocated.
The next line decrements the refcount of the original values
object, which in this case will reduce it to 0, causing it to be freed immediately. This was the problem causing the segfault.
This fix delays the destruction of the original values
object and as the array_values
object should have a refcount high enough not to be destroyed, this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also in case array_values
would be equal to NULL, this if statement on line 252 (new situation) would not be executed if it would it would be returned directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must say that I assume that the Python object returned by
PyObject_CallMethod
has at least a reference count of 1, so that it does not get deallocated.
Yep typicaly callsPyObject_*
functions will give you ownership of the reference to an object, which is pretty similar to +1 on a refcount.
The next line decrements the refcount of the original
values
object, which in this case will reduce it to 0, causing it to be freed immediately.
Could be wrong but I think it gets freed on the next garbage collector run, not necessarily immediately when the count reaches zero. Would definitely be safer here to return rather than re-using the variable if we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not for this PR, but this part of the thread re-ups my desire to move this logic out of C
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Python documentation on reference counting is definitely a good resource. Worth a read:
https://docs.python.org/3/c-api/intro.html#reference-count-details
I think this code would be very difficult to port to Cython maybe not even worth it, but of course anything possible with time and effort
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the reference count reaches zero, a python object gets freed (see Py_DECREF). Garbage collection is used to free up objects which are held in circular references that are no longer referenced from any other objects.
This was also the cause of the segmentation fault, as a freed object was being used (use after free type of bug).
There is nothing to be gained by returning immediately with respect to influencing the reference count.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Thanks for clarifying
lgtm ex @WillAyd comment. |
175d8e1
to
521dfeb
Compare
Thanks @roberthdevries |
…izing a PeriodIndex
Fixes #46683
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.