-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate machine-readable legacy format #23
Comments
@efucile I encountered some errors. You can browse what I could import here. I'm not sure what's causing the errors - I've never had trouble parsing these files for previous versions before, so it seems something's changed in the formatting. I'll try to narrow it down and provide some feedback. XMLTable B - 111 elements were imported, 1619 generated errors. CSVTable B - 1698 elements inserted. 32 errors. |
Error sources we've identified:
|
This is a branch to work on this issue https://github.com/wmo-im/BUFR4/tree/legacyFormats |
@tomkralidis thanks for the heads-up. Since bebfa61 all tables import correctly in CSV and XML formats. |
Hello @efucile. I tried to import xml files with our internal import tool (BUFRCREX_TableB_en.xml, BUFR_TableD_en.xml, CREX_TableD_en.xml). The import was successful, the results were the same as for the previous files (published by Atsushi, I assume). |
I can import the XML files as well, though for some reason my parser didn't like the empty elements and failed to recognize them as null strings, even after I added an extra space character to change them from, e.g. <Title_en/> to <Title_en />. But empty elements are legal XML constructs, so I can work around those on my end by just deleting them altogether from my own copies of the tables after I download them, and before running my parser. On a related note, I have a question for @efucile - are you still planning to distribute XSD schema definition files with future version releases? I didn't see any such files in the xml subdirectory, so I tested your latest files by validating them against modified earlier copies that I'd already downloaded. Again, I can work around this on my end if need be, but it's generally good practice to distribute XSD files with XML files. |
@jbathegit thanks for checking the xml files.
|
|
@jbathegit thanks for explaining. Now it is clear to me. |
Hello @efucile. I did import txt files with internal library. I see that the commit I made yesterday had failed. Should I make pull request so can be reviewed? On a related note, are you still planning to distribute txt for CCT eg. Common_C11_20200506_en.txt with future version releases? |
@marijanacrepulja we don't want the column No because every time you add one line you have to update the all table. This is inefficient, unnecessary and error prone. You can write a script that adds the No if this is a requirement and we can run it automatically, but we cannot have that column in the csv files. Where did you do the changes on the scripts? Please revert them. |
@efucile It is OK, there is no requirement for having column No. I did change on the scripts in branch legacyFormats and reverted them. |
I don’t see the changes. In which branch? |
It is in branch legacyFormats. |
Archive.zip |
I also integrated the edit for CSVtoXML in CCT, but I don't think it's giving us the results we want: |
XMLI've checked these but I still see the empty XML tags, e.g. <Note_en/> I believe before the XML tables had something like this: <Note_en>""</Note_en> As it is I can only import 113 elements from Table B, 571 elements for Table D, all others produce errors (meaning no code/flag tables!). CSVCode/flag tables import fine with only 1 error. It looks like the headers are still repeated though, I see them on lines 354, 1445, 1595, 1610, 1615, 2274, 2291, 2402, 2469, 2476, 2627, 3688, 3870, 3955, 4039, 4046, 4374, 4401, 4415, 4441, 4455, 5005, 5108, and 5341. Table B imports fine but produces errors where the header line is repeated. Table D imports OK (601 items imported) but I see that also here the header is repeated and this causes errors for the software. |
Please test this new zip of BUFR XML and TXT. I believe that all of the repeating headers are cleaned up and the empty XML elements are removed. Other differences (aside from amendments) and questions:
|
Hi @amilan17 I opened up the Table B CSV with a text editor and can see the repeated headers, so I don't think it's worthwhile to check the CSVs again - maybe the file is old? Regarding your points (3) and (4) - that's great! That makes a lot more sense now. Regarding the XML files, I still see the empty
to this:
Then the element imports fine. Could you change the script so that the I also still can't import the code/flag tables. I think this is also because most of the elements are empty:
Table D also produces the same successes and errors. It seems that these files are unchanged, as all of the errors that were in the previous ones are still present in the most recent upload. Are you sure that BUFR-2020-Nov-10.zip was updated? |
Oops. I'm checking now! |
BUFR-2020-Nov-10.zip |
Hi @amilan17 , that does it - everything now imports fine, thanks for the update! |
@wmo-im/tt-tdcf -- Will you also review ASAP and review the questions I have? Thanks!
|
Hi @amilan17, |
@marijanacrepulja - Good catch. I removed the extra lines and it looks better now. |
Machine readable codes are published to https://community.wmo.int/activity-areas/wmo-codes/manual-codes/latest-version |
Machine-readable legacy format has been included in the master branch. We need to verify that the format satisfies users requirements.
The text was updated successfully, but these errors were encountered: