-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot "parse" wikimedia #20
Comments
It looks like there are only specific circumstances where tables can be parsed at the moment. The first line must be present that has a distinct character to indicate where the columns are, all the remaining column separators must be lined up with the header line separators. A more generalized solution might be to compare each line to see where there are (non-alphanumeric?) characters that are the same all the way from the bottom to the top of the table to be parsed, or at least are tied for the most in a single column. Dealing with HTML and wikimedia syntax would be a bit more, there is a javascript implementation of an html to csv parser here: https://gist.github.com/adilapapaya/9787842 |
Checks if spaces are being used as vertical separators if no other separators are found. Corrected the recursion to remove the first element from the array each iteration. Combined notifications so only one alert box is ever shown when the user uses the parse functionality.
Regarding the original issue, I don't think parsing wikimedia is actually much of a priority here. There are so many different table formats wikimedia supports and I don't really understand why. It's almost like they had an old way, then changed it, and never removed the first one. But since you are attempting to parse in javascript it can definitely get tricky as you said. Since the table definitions aren't consistent, I'd just avoid that idea all together. I'd honestly disable the parse button when wikimedia is selected for now and just throw a message below it letting people know its not supported. This will keep the appearance of the site looking good, though I don't know how many people besides myself who have/would try this. |
Summary
If you load up the default input, by refreshing the page, and switch your output style to wikimedia and hit "parse" on the output, you'll get an error prompt. When you hit "OK" your input goes blank but your output remains. Switching the output style afterwards will also not affect the input nor the output.
Investigation
Current default input (with correct tabs):
Current wikitable output of above input:
Prompt
The text was updated successfully, but these errors were encountered: