-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EWE and OEWN script don't write lines of the same length #1054
Comments
Thanks for the comment. Actually as part of #1030, I plan to remove the write functionality from the scripts in OEWN entirely, so this bug is a "won't fix". It would be good however to have a Python based method for making programmatic changes in the YAML (possibly integrated with @goodmami's The Rust system for EWE uses some very custom code, to attempt to get the same result as Python, but it obviously fails from time-to-time on this. I will look into some of the "flickers" and see if I can fix this again. |
The following commit 44d25fc now adopts the use of Python formatting so the following should not change the source YAML from glob import glob
import yaml
for f in glob("src/yaml/*.yaml"):
data = yaml.load(open(f), Loader=yaml.CLoader)
with open(f, "w") as out:
out.write(yaml.dump(data, default_flow_style=False,
allow_unicode=True)) I believe this should close the issue. @1313ou does this satisfy you? |
It would be nice to have a Python idempotent load()-save() function pair for the wn model/object. It doesn't shock me as irrelevant to the minimal core library in /scripts. Quite the contrary : as I see it the core must consist of a load()/save() pair. The XML conversion is certainly desirable as part of the core too, though I don't use it (I'd rather use YAML directly). One is then free to implement textual/structural modifications of the model/object as one likes and EWE is certainly a good option. Modification code doesn't have to be part of the core and if there's on thing to ditch it's this ... and offload editing to EWE ... and constrain EWE to output the same format. The function above does the job for the synsets with a one-line modification. So if the save function is 'fixed' this way it will do the job too. And the code can be used as a library, which is what I did. So I'm confident the goal can be reached with minimal work. Also, one can possibly tweak the configuration passed to yaml.dump() so that the line length is adjusted as desired. Still to be explored. My understanting is that @goodmami's wn library works wih XML not YAML. Or does it ? |
Sorry my previous comment was started before I got yours. |
Was the commit produced by EWE ? |
...
Wn does not read or write the YAML format, nor does it directly support editing of wordnets (but see https://github.com/Hypercookie/wn-editor for an extension for editing). You can directly manipulate the database via SQL statements or work with in-memory representations of WN-LMF files, but neither of these are very user-friendly. I'm not entirely opposed to adding support for the YAML format, assuming it has stabilized. |
After examining and testing your suggested code I can say it works fine for normalizing the YAML. It looks like long lines are wrapped at 90 or thereabouts. So I take it PRs are to be normalized this way to avoid "flicker". But I realize now my comment was two-fold and also about the need for a minimal library, that I advocated in an earlier comment, with, at its core, a load_model()/save_model() pair that brings a live wn object into memory and persists it to files. The current definition of the model will do but can be optimized and the current solution I use, adapted from your code, is not satisfactory in that it has to be patched. If you replace line wordnet_yaml:416
|
Yes, this was produced by EWE and verified by the script above.
It is 80 characters. EWE was wrapping at 80 UTF-8 bytes rather than characters, which was causing the flicker
I meant that we should have an internal load/save script that uses the
Yes, implemented. It seems that this was some legacy hack based on sense IDs that does not apply any more. |
Fine now. Thanks John. |
Basic load/save support could be this oewnio.py file now:
|
Recent commit rewrites shorter lines for definitions (and BTW examples).
The different scripts (EWE's and OEWN's) must have different YAML-output configurations. The latter uses yaml.dump(data, stream,Dumper, **kwds) from the library, with hardly any argument, so it must rely on defaults. I don't know about the former.
Could they be made to agree on line length ? If they don't we will have different formats depending on the tool used, generating spurious 'flickering' changes.
More generally, a default YAML configuration is desirable.
The 'normalized' lines I produced (in the quotation/apostrophe overhaul) used the following code, adapted from wordnet_yaml.py:
ALSO:
Incidentally, the code was adapted from wordnet_yaml.py, because the original (and still current) code contains to my sense a bug which makes the saving non deterministic in members' order, which it changes. The culprit being entries_ordered function. Testing involves loading the yaml data and rewriting it without any change. This should be a blank operation but is not.
The text was updated successfully, but these errors were encountered: