Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Index in namelist #1259

Merged
merged 25 commits into from
Mar 24, 2017
Merged

Index in namelist #1259

merged 25 commits into from
Mar 24, 2017

Conversation

jedwards4b
Copy link
Contributor

@jedwards4b jedwards4b commented Mar 17, 2017

Adds support for namelist entries of the form foo(3) = 'a'

Test suite: scripts_regression_tests.py, --namelist-only tests for cesm prealpha and aux_clm45
Note there are some expected failures here due to bugs found in the process. One bug was that
for multi-instance cases all namelists were using the 0001 stream file and they each should have
a separate file. Another bug was that only changes in user_nl_d***_0001 were being used in all
instances (user_nl_d***_000n where n>1 was being ignored)
Test baseline: cesm2_0_alpha06g
Test namelist changes:
Test status: bit for bit

Fixes #1248
Fixes #1274

User interface changes?:

Code review: @jgfouca @mvertens

@@ -112,7 +112,7 @@

# Fortran syntax regular expressions.
# Variable names.
FORTRAN_NAME_REGEX = re.compile(r"^[a-z][a-z0-9_]{0,62}$", re.IGNORECASE)
FORTRAN_NAME_REGEX = re.compile(r"^[a-z][a-z0-9_]{0,62}(\((\d+)\))?$", re.IGNORECASE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows for the possibility of an index in the variable name such as foo(3)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about foo(-3) or foo(+3), both of which are also valid?
I have a regex for that which also includes (integer constant) array sections if you are interested.
As a side note, since you use re.IGNORECASE, can we get rid of all the scary language in namelist_definition_drv.xml about the id having to be lower case?

@@ -171,6 +171,20 @@ def is_valid_fortran_name(string):
"""
return FORTRAN_NAME_REGEX.search(string) is not None

def fortran_name_index(string):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a valid index exists return that value. Note that we are not yet handling array sections such as (:2) or (3:5)

@@ -1414,6 +1453,18 @@ def _parse_variable_name(self, allow_equals=True):
if not is_valid_fortran_name(text_check):
raise _NamelistParseError("%r is not a valid variable name at %s" %
(str(text), self._line_col_string()))

# a particular value of a namelist array may be assigned using syntax
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create a list of indices here and pop them off later where they are used.

@@ -1905,17 +1968,23 @@ def parse_namelist(self):
>>> _NamelistParser("! Comment \n &group /! Comment ").parse_namelist()
{u'group': {}}
>>> _NamelistParser("&group1\n foo='bar','bazz'\n,, foo2=2*5\n / &group2 /").parse_namelist()
{u'group1': {u'foo': [u"'bar'", u"'bazz'", u''], u'foo2': [u'2*5']}, u'group2': {}}
{u'group1': {u'foo': [u"'bar'", u"'bazz'", u''], u'foo2': [u'5', u'5']}, u'group2': {}}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side effect these values are expanded a little earlier than they were before

self._streams_namelists[variable] = []
self._streams_namelists["streams"] = []

def new_instance(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean data stored from previous instances so that it isn't copied into new instance by instance here I am refering to the cesm ensemble mode functionality.

Copy link
Contributor

@jgfouca jgfouca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for improving tests.

@jgfouca
Copy link
Contributor

jgfouca commented Mar 17, 2017

I will wait for @mvertens before merging.

@mvertens
Copy link
Contributor

mvertens commented Mar 17, 2017 via email

@jedwards4b
Copy link
Contributor Author

Having the doctests there really helped in the development process.

@jgfouca
Copy link
Contributor

jgfouca commented Mar 18, 2017

@mvertens OK, I will hold off until you guys tell me it's ready.

Copy link

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that your current implementation, in addition to only allowing setting of single values, assumes that all arrays read in a namelist are indexed to begin at one. This is not a valid assumption in Fortran.
I recommend less processing, just spit valid array sections out in the order in which they appear. An alternative would be to optionally specify bounds in the namelist definition file.

if vname in self._nameindex:
self._nameindex[vname].append(nindex)
else:
self._nameindex[vname] = [nindex]
return text.lower()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty line before return (end of previous if block)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -112,7 +112,7 @@

# Fortran syntax regular expressions.
# Variable names.
FORTRAN_NAME_REGEX = re.compile(r"^[a-z][a-z0-9_]{0,62}$", re.IGNORECASE)
FORTRAN_NAME_REGEX = re.compile(r"^[a-z][a-z0-9_]{0,62}(\((\d+)\))?$", re.IGNORECASE)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about foo(-3) or foo(+3), both of which are also valid?
I have a regex for that which also includes (integer constant) array sections if you are interested.
As a side note, since you use re.IGNORECASE, can we get rid of all the scary language in namelist_definition_drv.xml about the id having to be lower case?

m = FORTRAN_NAME_REGEX.search(string)
if m is not None and m.group(2) is not None:
return int(m.group(2))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning an int here is restrictive in that you will not be able to handle a list of ints, such as created by an array section. Of course, even a list will not always help (see below).

@@ -618,6 +634,9 @@ def expand_literal_list(literals):
if FORTRAN_REPEAT_PREFIX_REGEX.search(literal) is not None:
num, _, value = literal.partition('*')
expanded += int(num) * [value]
elif nindex is not None:
expanded = ['']*(nindex-1)
expanded.append(literal)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic will not help much with an input like:
foo(5:6) = 2*5

if vname in self._nameindex:
self._nameindex[vname].append(nindex)
else:
self._nameindex[vname] = [nindex]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If fortran_name_index returned a list of integers instead of a single int, you could always use extend.
Before you work on that, be sure you have figured out how to handle sections correctly.
For instance, foo(2::2) is valid as foo(::) or foo(8,4,-1.
Another issue with trying to insert values in lists (as opposed to just listing each section in order) is that you do now know the bounds of the actual array. It is perfectly valid for an array in a namelist to have negative bounds (e.g., integer :: foo(-3:3)). Not knowing those bounds, how will you process:

foo(:) = 16*'a'
foo(-3) = 'b'

@jedwards4b
Copy link
Contributor Author

@gold2718: I believe that implementing a complete fortran namelist parser may be beyond the scope or intent of this issue. Do we want something that will be ready for cesm 2 or do we want a complete solution?

@gold2718
Copy link

@jedwards4b, That questions is above my pay grade of course. I suggest this be brought up at the next CIME meeting.

Whatever solution is decided on (for any particular CIME tag, not just the CESM release) should be properly documented -- especially where it deviates from the Fortran standard.

If it is decided to keep the current version, documentation should at least have:

"The namelist code assumes that all arrays have a lower bound of one unless the user enters an index of zero in an array section in which case, the assumption is that the array has a lower bound of zero."

because that is how the current code sort of behaves. However, even that is not correct.

For fun, stick this in your user_nl_datm and try ./preview\_namelists:

dtlimit(6) = 6.0
DTLIMIT(3) = 3.0
dtlimit(0) = 0.0
dtlimit(1) = 1.0

Funny how dtlimit(1) overwrote dtlimit(0). Any why is there one 'default' entry of 1.5 but not others? I don't think this is ready as it stands. I think rejecting zero as a valid index (and documenting it) would allow the assumption of a lower bound of one to work.

@jedwards4b
Copy link
Contributor Author

@gold2718: Out of curiosity I tried a few of these in a short fortran program. None of intel, pgi and gnu compilers recognize array sections in input at all. I think that it would be okay to document that we assume all arrays begin at 1 and I think that that is the case in all current namelists. I will look into the dtlimit example you raised.

@jedwards4b
Copy link
Contributor Author

@gold2718 The number of default values set in an array is something that was not changed with this PR - I have determined that this is done in nmlgen.py subroutine _to_namelist_literals and the number of values set is determined by enumerate(value). I haven't figured out why the result of this appears generally to be 3. I will open this as a separate issue.

@gold2718
Copy link

@jedwards4b ???
Before I began, I also wrote a short program and had no issues reading in various array sections on Yellowstone (Intel) or Hobart (Nag). Program and one trial namelist below (I tried lots of variants):

program foo
  implicit none
  integer :: ierr
  integer :: unitn = 63
  integer :: item(-8:8)
  namelist /foo_nl/ item
  item(:) = -1
  open(unitn, file="foo.nl", status='old', delim='APOSTROPHE')
  read(unitn, foo_nl, iostat = ierr)
  if (ierr /= 0) then
    write(6, *) 'ERROR reading foo_nl, iostat = ',ierr
  else
    write(6, '(8i5)') item
  end if
  close(unitn)
end program foo

foo.nl

&foo_nl
  item(::2)    = -8, -6
  item(8:4:-1) = 8,,,,4
  ITEM(1:2) = ,2
  item(-3) = -3
  item(5:6) = 2*5
/

The result

   -8   -1   -6   -1   -1   -3   -1   -1
   -1   -1    2   -1    4    5    5   -1
    8

@jedwards4b
Copy link
Contributor Author

@gold2718 I stand corrected on array section syntax. I did not appreciate the importance of the right hand side of the above assignments.

@gold2718
Copy link

Before this PR is closed, I would like to hear from @mvertens, @rljacob, and @jgfouca on the issue of how to handle the issue of array sections in namelist input. Three clear options I see are:

  1. Only handle whole arrays or single elements to the left of the equals sign along with the assumption that the namelist array has a lower bound of one. This option would include documentation of the limitation in the CIME manual (sections 6.3, 6.4, 6.5 & 6.6) and possibly an additional message when an unhandled case is detected.
  2. Scrap this PR and implement the full standard.
  3. Merge this PR with a plan to implement the full standard prior to the release (thus avoiding the need for the additional documentation mentioned in option 1).

Thoughts? Additional options? Please note a possible issue with options 2 & 3 in my next comment.

@gold2718
Copy link

Why is it so hard to implement the full Fortran standard for namelist input? I think it is a combination of the fact that we do not know the bounds of the target array along with a decision to try and combine all mentions of the array into a single namelist entry. For instance, the sequence:

foo = 1, 2, 3, 4, 5
foo(-3) = 9

requires no knowledge of the array bounds, just an assumption that foo has at least five elements and bounds that include index minus three. However, if we want to combine these into a single entry, we need to know where to put the nine. Our current namelist definition file does not contain this information so I see the need for a decision before attempting to handle the full Fortran namelist input standard for arrays:

  1. Extend the syntax of the <type> tag for namelist entries to allow specification of array bounds (as in Fortran programs). This allows for proper interpretation of non-positive integer as well as array-section element selections. It also allows for rejection of out-of-bounds elements.
  2. Do not try to fold multiple array element setting into a single namelist entry (i.e., keep multiple entries as in the example above).

Thoughts on these options? Am I missing a third option?

@rljacob
Copy link
Member

rljacob commented Mar 21, 2017

So did this work in CIME2 because CIME2 didn't check what was in user_nl?

@mvertens
Copy link
Contributor

@rjacob - no CIME2 did check what was in user_nl - but it used the perl namelist generator which handled single array indices.
@gold2718 - I would recommend having at least the behavior we had with the perl namelist parsing of user_nl_xxx. At this point. I would like to understand if the perl namelist parsing handled array syntax behavior. If not - I would recommend going with (1) for now and dealing handling more sophisticated namelist parsing after the cesm2.0 release in an update.

@rljacob
Copy link
Member

rljacob commented Mar 21, 2017

Agreed. The original issue was about the missing CIME2 functionality so restoring that should be enough for now.

@jedwards4b
Copy link
Contributor Author

I've rerun scripts_regression_tests.py after these changes - all pass.

@jedwards4b
Copy link
Contributor Author

I have tested the cesm yellowstone prealphas in namelist-only mode with this tag, all tests pass except ERS_Ly20_N2_P2.f09_g16_gl10.TG1.yellowstone_pgi which failed due to an existing bug in multiinstance cases which this PR fixes. I am on vacation next week so I would really appreciate a thorough review of this PR asap if you can. Thanks

Copy link

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still unhappy with the line number logged in the _NamelistParseError on line 1814 (namelist.py). It does not correspond to any sort of actual line. Please either fix it or remove it.

Also, if a rank>1 array is found, use a better error message than:
"Error in parsing namelist: expected literal value, but got 'dtlimit(1'"
Looking for the comma may help distinguish which unsupported situation you have found.

If you try to set an array element past the top of the array, you get:
"ERROR: Variable 'dtlimit' has invalid value"
which may be hard to interpret (you have to count the number of elements and see that it is too many.

The user_nl_<model> is still processed out of order.
For instance, this combo:

dtlimit(3) = 22.0
DTLIMIT(3) = 3.0

gives the correct result (dtlimit(3) = 3.0), however, this one:

dtlimit(6) = 19.0
dtlimit(6:99:3) = 6.0,9.0,12.0,15.0,18.0

gives dtlimit(6) = 19.0. Not sure how that happens (but it is the ordering because if I comment out the first line, the elements are correct).
Here is my test case and the wrong (and what I think is correct) answer:

dtlimit(6) = 19.0
dtlimit(6:99:3) = 6.0,9.0,12.0,15.0,18.0
dtlimit(1:2:3) = 4.0
dtlimit(3) = 22.0
DTLIMIT(3) = 3.0
dtlimit(8:10:2) = 8.0, 10.0
dtlimit(1) = 1.0
dtlimit(::2) = 1.0, 3.0, 5.0

results in:

  dtlimit = 1.0, 1.5, 3.0, , , 19.0, , , 9.0, 10.0, , 12.0, , 8.0, 15.0, , ,

I think it should be:

  dtlimit = 1.0, 1.5, 3.0, ,5.0 , 6.0, ,8.0 , 9.0, 10.0, , 12.0, , , 15.0, , , 18.0

minindex, maxindex, step = get_fortran_variable_indices(variable_name, var_size)
variable_name = get_fortran_name_only(variable_name.lower())

expect(minindex > 0, "CIME indices < 1 not supported in CIME interface to fortran namelists minindex=%s"%minindex)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change this to
"Array lower bounds /= 1 are not supported in CIME interface to fortran namelists ..."
I added the ... because a minimum index without the array name is not very useful (unless you get the file and line number to display correctly). If you do print it out, use 'lower bound' instead of 'minindex'.
Also, change the expect logic to minindex == 1 because that is the correct current requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the expect logic is correct here - if in user_nl you have dtlimit(6) then minindex=maxindex=6,step=1 .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no way of knowing or trapping for fortran lower array bounds > 1

@jedwards4b
Copy link
Contributor Author

@gold2718 I believe that I've addressed all of the issues you raised - please try again.

Thanks

@jedwards4b
Copy link
Contributor Author

Reran scripts_regression_tests.py and cesm yellowstone prealphas in namelist-only mode

@jedwards4b
Copy link
Contributor Author

In addition to the above I also ran the aux_clm45 tests:

:) git checkout master
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
yslogin6: ~/sandboxes/cesm2_0_alpha/cime/scripts
:) execca ./create_test --namelists-only --xml-machine yellowstone --xml-category aux_clm45  --generate jpe_aux_clm45_master  1> cts.out 2>&1
yslogin6: ~/sandboxes/cesm2_0_alpha/cime/scripts
:) git checkout index_in_namelist
Switched to branch 'index_in_namelist'
yslogin6: ~/sandboxes/cesm2_0_alpha/cime/scripts
:) execca ./create_test --namelists-only --xml-machine yellowstone --xml-category aux_clm45  --compare jpe_aux_clm45_master  1> cts.out 2>&1

All pass with the exception of a multiinstance case that was broken in master and fixed in this PR.

Copy link

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Results are correct!
Just fix the comment and merge this baby (as far as I'm concerned).


# Check if can actually change this variable via filename change
# Check if can actuax1lly change this variable via filename change

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked the original spelling better :)

(([+-]?\d+)?:([+-]?\d+)?:?([+-]?\d+)?)) # colon seperated triplet
\))?\s*$""" # end optional index expression
, re.IGNORECASE | re.VERBOSE)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why all these blank lines?

self._groups[group_name][variable_name].extend(['']*(minindex-tlen-1))

for i in range(minindex-1, maxindex+step, step):
while len(self._groups[group_name][variable_name]) < i+1:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic is incorrect. For instance:
dtlimit(20:22:2) = 20.0, 22.0, 24.0
should generate an error since the slice only indicates two elements but you accept (and set) three. This is an ERROR in Fortran and should be trapped. If you trap the error earlier, then you can maybe keep the logic.
I believe the correct number of steps is:
1 + ((maxindex - minindex) / step)

@gold2718
Copy link

Just tried:
dtlimit(::2) = 1.0, 3.0, 5.0
and got
ERROR: Too many values for array dtlimit(::2)

if minindex > tlen:
self._groups[group_name][variable_name].extend(['']*(minindex-tlen-1))

for i in range(minindex-1, maxindex+step, step):
Copy link

@gold2718 gold2718 Mar 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This range does not seem to work when step < 0.
dtlimit(25:24:-1) = 25.0, 24.0
only sets spot 25.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link

@gold2718 gold2718 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the horse is finally dead, we can stop beating it (i.e., could not find any issues this time)!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants