Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle leading whitespace on code blocks #134

Merged
merged 7 commits into from
Apr 21, 2018

Conversation

bobbypriam
Copy link
Contributor

Tries to address #133.

The implementation brings in Str module to split strings (since split_on_chars is available only since 4.04.0). Let me know if you have concerns.

I also wondered whether we need to specially handle weird spaces and tabs combo, but decided not to since I don't think there's a use case for that.

Copy link
Contributor

@aantron aantron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbypriambodo, thanks! It looks good, and the new tests make it easy to review :)

  • Indeed, we should avoid the Str module. There are two alternatives.

    1. The better one is probably that odoc depends on astring, which has Astring.cuts. As a side note, astring should be listed as a direct dependency of odoc. Right now, we are picking it up transitively. We can save this for a later issue, or you're welcome to tweak the opam file now :)
    2. We have some quickly-made function for this in the parser. Maybe that should be replaced by astring as well, but there is also the option to factor it out and call it from the lexer.
  • I agree that it's better to avoid dealing with mixes of spaces and tabs. We can wait until someone submits an issue showing us what actually needs to be fixed. I suspect the case will be a number of tabs followed by a number of spaces, but such source formatting is often inconsistent in how many leading tabs there will be, so the appearance will depend on the tab width, and basically we will have a mess. It's difficult even to handle pure tabs correctly when it comes to display, since we don't know the width.

let least_amount_of_whitespace =
lines
|> List.map count_leading_whitespace
|> List.fold_left min max_int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add a comment, that using max_int is okay because if the list is empty, we will not use the result of the fold (in the map below).

@bobbypriam
Copy link
Contributor Author

bobbypriam commented Apr 21, 2018

Thanks for the review. I went with the astring approach since it touches less files :D I've also added it to the depends field on the opam file.

Thanks for catching on max_int, but it makes me wonder if the fact that it needed a comment means it's a bit of a smell. Unfortunately I can't think of a more straightforward implementation now, so I just added the note anyway.

Do tell if you have more concerns, would be happy to address them.

@aantron aantron merged commit c1cdfb6 into ocaml:master Apr 21, 2018
@aantron
Copy link
Contributor

aantron commented Apr 21, 2018

Thanks for all that, it looks good to me :)

The max_int is a little bit of a smell, I guess, but I also can't think of something better.

@aantron
Copy link
Contributor

aantron commented Apr 21, 2018

@bobbypriambodo There is one last thing remaining. I think we need to filter out empty or all-whitespace lines from computing the minimum:

{[
let () =
  Lwt_main.run begin
    let%lwt data = Lwt_io.(read_line stdin) in
    let%lwt () = Lwt_io.printl data in
    Lwt.return ()
  end

(* ocamlfind opt -linkpkg -package lwt.ppx,lwt.unix echo.ml && ./a.out *)
]}

In that example, the blank line is completely empty, so even when I indent the first and last line by two spaces, the blank line, which has no characters, forces the overall amount of leading whitespace to be computed as zero. Blank lines could have whitespace characters in general, though, even though that's sloppy coding :)

Of course, after ignoring the blank line, we will get two for the amount of leading space, so we have to be prepared for the blank line having less characters than that to substring out.

Let me know if you'd like me to fix this, but a quick follow-on PR is also welcome :)

@bobbypriam
Copy link
Contributor Author

Ah, that is true, I already tried it on several examples but none had that characteristics. Sorry for that. I think I'll be able to fix it tomorrow (it's past 11 PM on this part of the world). If that's not quick enough then please handle it as you see fit. Thanks.

@aantron
Copy link
Contributor

aantron commented Apr 21, 2018

Thanks, that is fine :) and sorry for not catching right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants