Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot find grid on the plot I'm trying to read #4

Open
bedaro opened this issue Aug 8, 2020 · 1 comment
Open

Cannot find grid on the plot I'm trying to read #4

bedaro opened this issue Aug 8, 2020 · 1 comment

Comments

@bedaro
Copy link

bedaro commented Aug 8, 2020

Hi,

I'm trying to extract data from this PDF containing several hundred plots. https://fortress.wa.gov/ecy/publications/documents/0803037appc.pdf

For example, I'm starting with the second plot on page 14 for Alki East Chlorophyll-a. If all I do is save this page as an SVG using Inkscape 1.0, I get this:

svg2data/svg2data.py in get_axes(lines, width, height)
    730                 cleaned_axes[i].append(axes[i][j])
    731     axes = cleaned_axes
--> 732     axes_min = np.array([axes[0][0]['min'][0],axes[1][0]['min'][1]])
    733     axes_max = np.array([axes[0][0]['max'][0],axes[1][0]['max'][1]])
    734     new_lines = []

IndexError: list index out of range

Next I tried to make the job easier by deleting everything from the page except the plot I want. Same error.
Next, I used Inkscape to Resize Page to Selection (the resulting SVG is attached with the extension changed to please github)
[0803037appc_p14.txt](https://github.com/peterstangl/svg2data/files/5046091/0803037appc_p14.txt
This produces:

svg2data/svg2data.py in __init__(self, filename, test, debug)
    104         and debug != 'get_axes'
    105         and debug != 'connect_graphs'):
--> 106             grids = calibrate_grid(axes,phrases,width,height)
    107         elif debug == 'calibrate_grid':
    108             self.debug = {'axes':axes,

svg2data/svg2data.py in calibrate_grid(axes, phrases, width, height)
   1033                         axis_scaling = 'linear'
   1034                 else:
-> 1035                     raise Exception('no grid found!')
   1036                 grids_calibr[axis_type]['type']=axis_scaling
   1037                 grids_calibr[axis_type]['grid']=grid_calibr

Exception: no grid found!

Any ideas why it can't find the plot? Is a grid required?

@peterstangl
Copy link
Owner

Hi @bedaro,
"grid" means in this case only that the code can find ticks and corresponding values on the x and y axes. I see two things the code might not be able to handle:

  • The tick values on the x-axis are rotated. The code only looks for unrotated text in the vicinity of the axis ticks. Rotated text might have unrotated coordinates that are too far away from the x-axis to be recognized. This is probably the source of the error message since the code is not able to find the text corresponding to the values on the x-axis.
  • The values on the x-axis are not numerical. This is a problem that would still remain even if the code would be able to find the values. At the moment, only numerical values are supported and dates are not recognized.

To solve these problems in the svg file, one could remove the dates on the x-axis and replace them by (unrotated) numerical values.
To solve these problems in the code, the functions that look for the tick values have to be modified to also find rotated text. In addition, the strings describing the dates have to be converted to numerical values.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants