-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Line numbers #647
base: main
Are you sure you want to change the base?
Line numbers #647
Conversation
Thank you @acoleman2000 for this! Can you run |
To re-create "metaschema.py" do
|
schema_salad/python_codegen.py
Outdated
r = CommentedMap() | ||
if line_info is not None: | ||
self._doc = line_info | ||
if (type(self._doc) == CommentedMap): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally use isinstance
. The difference is that a direct comparison on type will not match subclasses, whereas isinstance
is true both for the same class and for subclasses.
Practically that may not matter if CommentedMap
doesn't have any subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in commit 149b1ba.
for key in val: | ||
if doc: | ||
if isinstance(key, (int, float, bool, str)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key shouldn't ever be anything other than a string, so you can either assume it is a string, or keep the type check but throw an error on a non-string subscript.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in commit 0522ce8
schema_salad/python_codegen.py
Outdated
@@ -170,6 +170,8 @@ def begin_class( | |||
self.out.write(" pass\n\n\n") | |||
return | |||
|
|||
field_names.append("_doc") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the fields of the model (which are closely related to, but not exactly the same, as the fields of the class being generated). Why are you adding this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted the original CommentedMap that had line column information to be associated with the top level class (e.g., Workflow) in order to update the attributes that are being saved. I wasn't sure how to save this Commented Map besides saving it as a class attribute. The save method called by the user goes directly to the save method within the class, so I didn't see a way of passing the Commented Map to the first call to save. If you have any ideas of doing this an alternate way I would be happy to reformat it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually changed it to have a global doc variable and have the save methods take a list of strings representing what level in the doc that object is. This is useful for the second issue as well, since there are places that error messages need line numbers appended where no version of the doc is present.
for key in keys: | ||
if isinstance(doc, CommentedMap): | ||
doc = doc.get(key) | ||
elif isinstance(doc, (CommentedSeq, list)) and isinstance(key, int): | ||
if key < len(doc): | ||
doc = doc[key] | ||
else: | ||
doc = None | ||
else: | ||
doc = None | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need some discussion about what's going on here. It looks like you're using "keys" to find a path through the original document, to find the leaf node that has the line number info we want.
What happens when you have a field with mapSubject
and it's been converted (internally) from a dict to a list? In this case it is intentional for save()
to emit the normalized form, which is the list form, but that may or may not correspond to the original document, depending on the original document used the list form or the dict form.
if doc: | ||
if i in doc: | ||
r.lc.data[i] = doc.lc.data[i] | ||
new_keys.append(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
append is a destructive modification, so appending to new_keys
is also modifying the contents of keys
which is probably not what you intended.
@@ -43,6 +43,9 @@ | |||
IdxType = MutableMapping[str, Tuple[Any, "LoadingOptions"]] | |||
|
|||
|
|||
doc_line_info = CommentedMap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doc_line_info
being a global variable, how about having save()
take LoadingOptions
and using original_doc
?
schema_salad/python_codegen.py
Outdated
saved_val = saved_val[0] | ||
r["{fieldname}"] = saved_val | ||
|
||
max_len = add_kv(old_doc = doc, new_doc = r, line_numbers = line_numbers, key = "{fieldname}", val = r.get("{fieldname}"), max_len = max_len, cols = cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have r
, key
and val
together already, I would consider moving the assignment of r["{fieldname}"] = saved_val
into add_kv()
so the add_kv()
method is responsible for setting both the value and metadata of the entry at the same time.
…and less conditionals - as well as some bug fixes
…wherever metaschema.py is ignored
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #647 +/- ##
==========================================
- Coverage 83.68% 83.63% -0.06%
==========================================
Files 22 22
Lines 4580 4497 -83
Branches 1239 1242 +3
==========================================
- Hits 3833 3761 -72
+ Misses 483 470 -13
- Partials 264 266 +2 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting! I left some comments. Would be nice if this PR doesn't add 100,000+ lines
@@ -46,7 +47,7 @@ help: Makefile | |||
## cleanup : shortcut for "make sort_imports format flake8 diff_pydocstyle_report" | |||
cleanup: sort_imports format flake8 diff_pydocstyle_report | |||
|
|||
## install-dep : install most of the development dependencies via pip | |||
## install-dep : inshttps://github.com/common-workflow-language/cwltool/issues?q=is%3Aissue+is%3Aopen+author%3Atom-tantall most of the development dependencies via pip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
# names = [] | ||
# for name in field_names: | ||
# names.append("('%s', 0)"%name) | ||
|
||
# self.serializer.write( | ||
# fmt(f"""ordered_attrs = CommentedMap(["{', '.join(names)}])\n""", 4) | ||
# ) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# names = [] | |
# for name in field_names: | |
# names.append("('%s', 0)"%name) | |
# self.serializer.write( | |
# fmt(f"""ordered_attrs = CommentedMap(["{', '.join(names)}])\n""", 4) | |
# ) |
return max_len + 1, inserted_line_info | ||
|
||
|
||
@no_type_check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
schema_salad/tests/cwl_v1_2.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between this and schema_salad/cwl_v1_2.py
?
|
||
from ruamel.yaml.comments import CommentedMap | ||
|
||
import schema_salad.tests.cwl_v1_2 as cwl_v1_2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we keeping schema_salad/tests/cwl_v1_2.py
up to date? Maybe it would be better to generate that when the tests are run..
I made changes to
python_codegen.py
,python_codegen_support.py
, and introduced a test filetest_line_numbers.py
that intergrates with the test suite.I identified several blockers within the current code preventing line numbers from being associated with keys during the saving process.
During the loading process, the cwl is read in and saved as a CommentedMap, which has associated line numbers. However in the
_document_load
method inpython_codegen_support.py
the CommentedMap was replaced with a dictionaryI replaced this code with
to keep doc in CommentedMap form.
Additionally, I noticed in the
fromDoc
methoddoc
was being set to None or overriden to be something else, so I saved the original passed in doc asself._doc
, following the naming conventions.I wanted to use the lc info from the original YAML passed in, so I modified the save method for each class to take in line_numbers, a CommentedMap. If line_numbers isn't null, it replaces the self._doc field. This is done to save the original CommentedMap and propagate it downwards.
python_codegen_support.py
I added several methods.
I added a method that extracts the max_line (+ 1) number from a CommentedMap. This iterates through the child with the highest line number until it reaches the end). This is used to insert the line column info for new fields in the returned doc.
I added a method that adds a the kv lc info into the returned doc. This is the real meat of the change. This takes a CommentedMap to insert into, an old CommentedMap, a dictionary of line numbers, and a dictionary of line numbers to maximum col used in the line, and a
max_len
variable. First the method checks if the key is in the line numbers, and then inserts the old lc info directly info the new Commented Map. Then, if the key isn't in the line numbers, it checks if the value is in the line numbers and inserts it using that line number with an adjusted column number (based on the length of the key and the maximum col for that line). It then checks if the value is in the old_doc, and inserts with that lc information. Finally, if neither the key or the val is the line numbers, it inserts it tomax_len
, and increasesmax_len
by 1. It has appropriate logic for DSL expansion:I added a method that pulls out the lc info for all kv pairs in a Commented doc. For example, if a CommentedMap was like
orderddict("key, "value")
with lc info["key": [1, 0, 1, 6]]
it would return{"key": {"line":1, "col": 0}, "value':{"line":1, "col":6}}
I also modified the save method. It changes the return type from list/dict to CommentedSeq/CommentedMap, takes in a
doc
field, and if the k/v pair is in thedoc
, it adds thelc
info to the return type.I added a method,
iterate_through_doc
, that has no type check and takes a list of keys, and iterates through the global doc to the appropriate place. It has no type check since it goes from CommentedMap -> CommentedSeq before eventually ending up at a CommentedMap (or None)python_codegen.py
I modified several things in python_codegen.py
First, I modified the fromDoc attribute to save the
self._doc
attribute to the class.I modified the save method. I changed the return type
r
from dict to CommentedMap. I added the code to override theself._doc
, calculatemax_len
,line_numbers
, and set an empty dictionary to storecol
info. I also updatedmax_len
after inserting each class attribute tor
by callingadd_kv
, which also adds the lc value tor
.To prevent issues of something like the
outputs
key being before aninputs
key and overexpanding, causing inconsistency with line numbers, I iterate through all keys in the line number doc and add the line numbers, before going through all attributes like normal.Additionally, due to array expansion and DSL expansion, sometimes there is a shift down. To appropriately make sure everything ends up on the same line, I added
shift
counter that says how many lines to shift down for a value.test_line_numbers
I added 3 tests.
outputs
field being beforeinputs
.