Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More list printing bugs in 1.6 #40508

Open
pdeffebach opened this issue Apr 17, 2021 · 5 comments
Open

More list printing bugs in 1.6 #40508

pdeffebach opened this issue Apr 17, 2021 · 5 comments
Labels
docsystem The documentation building system feature Indicates new feature / enhancement requests markdown

Comments

@pdeffebach
Copy link
Contributor

Looking at the CSV Documentation in 1.6, I get this big block instead of a nice list


    •  File layout options: • header=1: the header argument can be an Int, indicating the row to parse for column names; or a Range, indicating a
       span of rows to be concatenated together as column names; or an entire Vector{Symbol} or Vector{String} to use as column names; if a file
       doesn't have column names, either provide them as a Vector, or set header=0 or header=false and column names will be auto-generated (Column1,
       Column2, etc.). Note that if a row number header and comment or ignoreemtpylines are provided, the header row will be the first
       non-commented/non-empty row after the row number, meaning if the provided row number is a commented row, the header row will actually be the
       next non-commented row. • normalizenames=false: whether column names should be "normalized" into valid Julia identifier symbols; useful when
       iterating rows and accessing column values of a row via getproperty (e.g. row.col1) • datarow: an Int argument to specify the row where the
       data starts in the csv file; by default, the next row after the header row is used. If header=0, then the 1st row is assumed to be the start
       of data; providing a datarow or skipto argument does not affect the header argument. Note that if a row number datarow and comment or
       ignoreemtpylines are provided, the data row will be the first non-commented/non-empty row after the row number, meaning if the provided row
       number is a commented row, the data row will actually be the next non-commented row. • skipto::Int: identical to datarow, specifies the
       number of rows to skip before starting to read data • footerskip::Int: number of rows at the end of a file to skip parsing. Do note that
       commented rows (see the comment keyword argument) do not count towards the row number provided for footerskip, they are completely ignored by
       the parser • limit: an Int to indicate a limited number of rows to parse in a csv file; use in combination with skipto to read a specific,
       contiguous chunk within a file; note for large files when multiple threads are used for parsing, the limit argument may not result in exact
       an exact # of rows parsed; use threaded=false to ensure an exact limit if necessary • transpose::Bool: read a csv file "transposed", i.e.
       each column is parsed as a row • comment: rows that begin with this String will be skipped while parsing. Note that if a row number header or
       datarow and comment are provided, the header/data row will be the first non-commented/non-empty row after the row number, meaning if the
       provided row number is a commented row, the header/data row will actually be the next non-commented row. • ignoreemptylines::Bool=true:
       whether empty rows/lines in a file should be ignored (if false, each column will be assigned missing for that empty row) • threaded::Bool:
       whether parsing should utilize multiple threads; by default threads are used on large enough files, but isn't allowed when transpose=true;
       only available in Julia 1.3+ • tasks::Integer=Threads.nthreads(): for multithreaded parsing, this controls the number of tasks spawned to
       read a file in chunks concurrently; defaults to the # of threads Julia was started with (i.e. JULIA_NUM_THREADS environment variable) •
       lines_to_check::Integer=5: for multithreaded parsing, a file is split up into tasks # of equal chunks, then lines_to_check # of lines are
       checked to ensure parsing correctly found valid rows; for certain files with very large quoted text fields, lines_to_check may need to be
       higher (10, 30, etc.) to ensure parsing correctly finds these rows • select: an AbstractVector of Int, Symbol, String, or Bool, or a
       "selector" function of the form (i, name) -> keep::Bool; only columns in the collection or for which the selector function returns true will
       be parsed and accessible in the resulting CSV.File. Invalid values in select are ignored. • drop: inverse of select; an AbstractVector of
       Int, Symbol, String, or Bool, or a "drop" function of the form (i, name) -> drop::Bool; columns in the collection or for which the drop
       function returns true will ignored in the resulting CSV.File. Invalid values in drop are ignored.

Will have to check if this problem on master.

@KristofferC
Copy link
Member

Hopefully fixed by #40203. Should be in 1.6.1.

@mgkuhn mgkuhn added the docsystem The documentation building system label Apr 24, 2021
@mgkuhn
Copy link
Contributor

mgkuhn commented Apr 24, 2021

Using Julia 1.6.1 on Ubuntu Linux 20.04 with CSV v0.8.4, I get in an 80-characters wide terminal:

julia> using CSV
help?> CSV.File
  CSV.File(source; kwargs...) => CSV.File
[...]
    •  File layout options:
       • header=1: the header argument can be an Int, indicating
       the row to parse for column names; or a Range,
       indicating a span of rows to be concatenated together as
       column names; or an entire Vector{Symbol} or
       Vector{String} to use as column names; if a file doesn't
       have column names, either provide them as a Vector, or
       set header=0 or header=false and column names will be
       auto-generated (Column1, Column2, etc.). Note that if a
       row number header and comment or ignoreemtpylines are
       provided, the header row will be the first
       non-commented/non-empty row after the row number,
       meaning if the provided row number is a commented row,
       the header row will actually be the next non-commented
       row.
       • normalizenames=false: whether column names should be
       "normalized" into valid Julia identifier symbols; useful
       when iterating rows and accessing column values of a row
       via getproperty (e.g. row.col1)
[...]

Better than in the original report, but still not correct: text for second-level items wraps to indentation level of first-level items. I would have expected the output to look like

    •  File layout options:
       • header=1: the header argument can be an Int, indicating
         the row to parse for column names; or a Range,
         indicating a span of rows to be concatenated together as

Can also be seen at other places that use nested lists, e.g.

julia> using Distributed
help?> addprocs
[...]
    •  shell: specifies the type of shell to which ssh connects on the
       workers.
       • shell=:posix: a POSIX-compatible Unix/Linux shell (bash,
       sh, etc.). The default.
       • shell=:wincmd: Microsoft Windows cmd.exe.
[...]

@pdeffebach
Copy link
Contributor Author

I also see the above. Lines >2 should be indented by two spaces.

But this is still an improvement and is readable.

@KristofferC KristofferC added bug Indicates an unexpected problem or unintended behavior markdown regression Regression in behavior compared to a previous version labels Apr 25, 2021
@vtjnash vtjnash added feature Indicates new feature / enhancement requests and removed bug Indicates an unexpected problem or unintended behavior regression Regression in behavior compared to a previous version labels Sep 29, 2021
@vtjnash
Copy link
Member

vtjnash commented Sep 29, 2021

Can be improved more, but doesn't seem to be a bug

@KristofferC
Copy link
Member

KristofferC commented Sep 29, 2021

I am pretty sure this is both a bug and a regression since this worked fine in 1.5. It started going bad in #37087 and many attempts was made to fix it, #37235, #38502, #40203.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docsystem The documentation building system feature Indicates new feature / enhancement requests markdown
Projects
None yet
Development

No branches or pull requests

4 participants