-
-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix param parsing. #286
Fix param parsing. #286
Conversation
Closes numpy#285 This fixes two tings: - When first sentence of the docstring is onteh first line, Parameters is not properly parse, which for example mis parsed numpy.array docstring. - many project have paremeters description list with ` :` afer the name, even if no type is present. If there is no space after the `:` the parameter name includes the ` :` which is most likely wrong.
Codecov Report
@@ Coverage Diff @@
## master #286 +/- ##
==========================================
- Coverage 93.10% 91.99% -1.12%
==========================================
Files 7 7
Lines 1261 1261
==========================================
- Hits 1174 1160 -14
- Misses 87 101 +14 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the indenting/parsing gets screwed up for more than just the Parameters
section. Using np.array.__doc__
as and example::
>>> print(NumpyDocString(np.array.__doc__))
array(object, dtype=None, \*, copy=True, order='K', subok=False, ndmin=0)
Create an array.
Parameters
----------
object : array_like
An array, any object exposing the array interface, an object whose
__array__ method returns an array, or any (nested) sequence.
dtype : data-type, optional
The desired data-type for the array. If not given, then the type will
be determined as the minimum type required to hold the objects in the
sequence.
copy : bool, optional
If true (default), then the object is copied. Otherwise, a copy will
only be made if __array__ returns a copy, if obj is a nested sequence,
or if a copy is needed to satisfy any of the other requirements
(`dtype`, `order`, etc.).
order : {'K', 'A', 'C', 'F'}, optional
Specify the memory layout of the array. If object is not an array, the
newly created array will be in C order (row major) unless 'F' is
specified, in which case it will be in Fortran order (column major).
If object is an array the following holds.
===== ========= ===================================================
order no copy copy=True
===== ========= ===================================================
'K' unchanged F & C order preserved, otherwise most similar order
'A' unchanged F order if input is F and not C, otherwise C order
'C' C order C order
'F' F order F order
===== ========= ===================================================
When ``copy=False`` and a copy is made for other reasons, the result is
the same as if ``copy=True``, with some exceptions for `A`, see the
Notes section. The default order is 'K'.
subok : bool, optional
If True, then sub-classes will be passed-through, otherwise
the returned array will be forced to be a base-class array (default).
ndmin : int, optional
Specifies the minimum number of dimensions that the resulting
array should have. Ones will be pre-pended to the shape as
needed to meet this requirement.
Returns
-------
out : ndarray
An array object satisfying the specified requirements.
See Also
--------
`empty_like`_
Return an empty array with shape and type of input.
`ones_like`_
Return an array of ones with shape and type of input.
`zeros_like`_
Return an array of zeros with shape and type of input.
`full_like`_
Return a new array with shape of input filled with value.
`empty`_
Return a new uninitialized array.
`ones`_
Return a new array setting values to one.
`zeros`_
Return a new array setting values to zero.
`full`_
Return a new array of given shape filled with value.
Notes
-----
When order is 'A' and `object` is an array in neither 'C' nor 'F' order,
and a copy is forced by a change in dtype, then the order of the result is
not necessarily 'C' as expected. This is likely a bug.
Examples
--------
>>> np.array([1, 2, 3])
array([1, 2, 3])
Upcasting:
>>> np.array([1, 2, 3.0])
array([ 1., 2., 3.])
More than one dimension:
>>> np.array([[1, 2], [3, 4]])
array([[1, 2],
[3, 4]])
Minimum dimensions 2:
>>> np.array([1, 2, 3], ndmin=2)
array([[1, 2, 3]])
Type provided:
>>> np.array([1, 2, 3], dtype=complex)
array([ 1.+0.j, 2.+0.j, 3.+0.j])
Data-type consisting of more than one element:
>>> x = np.array([(1,2),(3,4)],dtype=[('a','<i4'),('b','<i4')])
>>> x['a']
array([1, 3])
Creating an array from sub-classes:
>>> np.array(np.mat('1 2; 3 4'))
array([[1, 2],
[3, 4]])
>>> np.array(np.mat('1 2; 3 4'), subok=True)
matrix([[1, 2],
[3, 4]])
Note that some sections are fully dedented (e.g. See Also) whereas others are partially dedented (Parameters) and others have only the headings dedented (Notes, Examples). Maybe there's a more general fix than the change to _parse_param_list
to get the dedenting correct for the entire docstring when the first line doesn't match the indenting scheme of the rest of the docstring?
That was the first fix I tried, but this make Personally I end up always calling it as so |
Yes, I noticed this too when I was messing around with alternatives (see #287, which will fail one validation test). It seems to me much more straightforward to catch issues due to dedenting errors as far upstream as possible, rather than modifying the downstream functions to handle the corner cases of less-robust docstring cleaning. However, if there are strong backwards compatibility concerns with the validation module, then the approach in #287 is a non-starter. |
Yeah, I went with not validating, and just re-emitting the docstring from the Numpydoc |
Hmm, on the one hand the validation should definitely catch indentation errors, but it would be nice if the docstring parsing were more resilient to indentation problems. |
Yeah, I think the two logics are intertwined too much, and the "parsing"/"guessing"/"validating" should be different steps. There are many docstrings around that have invalid section names, and right now can't be parsed as Numpydoc Errors immediately, though that might be for a longer term project to clean that up. |
... including in numpy itself! (numpy/numpy#16791) I agree it's probably a larger project, so let's not worry about it in this PR. I think the changes here look good. The only additions I'd make is to apply a similar |
The problem being that this matched the relatively obscure syntax for ReST definition lists... |
ok, I made |
9066488
to
7f40a9f
Compare
7f40a9f
to
f8b424a
Compare
Anything I can do to push this forward ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than two nitpicks, LGTM
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
Co-authored-by: Eric Larson <larson.eric.d@gmail.com>
Thanks ! |
Thanks @Carreau ! |
Closes #285
This fixes two tings:
When first sentence of the docstring is onteh first line, Parameters
is not properly parse, which for example mis parsed numpy.array
docstring.
many project have paremeters description list with
:
afer thename, even if no type is present. If there is no space after the
:
the parameter name includes the
:
which is most likely wrong.