Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List item indentation #2

Open
LoneRifle opened this issue Jun 4, 2019 · 3 comments
Open

List item indentation #2

LoneRifle opened this issue Jun 4, 2019 · 3 comments

Comments

@LoneRifle
Copy link
Collaborator

Our target documents contain large amounts of lists with indented list items, but this project is unable to spot them and convert them appropriately. Detect such lists and output the corresponding markdown

@jenlky
Copy link

jenlky commented Sep 30, 2019

I am currently working on this. Will create a PR soon.

@jenlky
Copy link

jenlky commented Oct 1, 2019

The approach I have in mind is to first detect indented list items like "Eintrag 1.2" and "3ter Level 1.1.2, aber mit so langem Text". Extract the number out like "1.2" and "1.1.2" respectively. Then add spaces in front of the dash from "- Eintrag 1.2" depending on how many children items there are.

What do you think? Do you think that would suffice?

@LoneRifle
Copy link
Collaborator Author

You shouldn't have to rely on content within the list item to figure out if something is to be indented, so don't extract numbers out. Try instead to work out the relative position between list items to figure out if something is being indented.

If you want a richer source of PDFs, consider the judgments found on Singapore Law Watch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants