Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature #2669 proj #2672

Merged
merged 14 commits into from
Sep 5, 2023
Merged

Feature #2669 proj #2672

merged 14 commits into from
Sep 5, 2023

Conversation

JohnHalleyGotway
Copy link
Collaborator

@JohnHalleyGotway JohnHalleyGotway commented Aug 29, 2023

Expected Differences

This enhances MET to compile against and link to the proj library. Note that for dtcenter/METbaseimage#12, I pushed the dtcenter/met-base:v2.1 Docker image which provides an environment for GHA in which these changes compile. The Makefiles have all been updated so that anywhere -lvx_grid appears, -lproj is added.

Unrelated but also included:

  • Fixes to the diff logic to correctly handle diffing the SEEPS, SEEPS_MPR, and MODE_CTS line types. Note that MTD output is still not included in the diff logic. But I decided that adding it in the existing convoluted way is not worth the effort.
  • Delete development.kiowa since kiowa has been decommissioned.
  • Delete test_const_dakota.R since a machine-specific configuration is no longer required.

Note also that the vx_grid library HAS NOT yet been updated to actually use the Proj library. @rgbullock will work on that.

  • Do these changes introduce new tools, command line arguments, or configuration file options? [No]

    If yes, please describe:

  • Do these changes modify the structure of existing or add new output data types (e.g. statistic line types or NetCDF variables)? [No]

    If yes, please describe:

Pull Request Testing

  • Describe testing already performed for these changes:

    Tested compilation on seneca and via GHA with v2.1 of the MET base image.

  • Recommend testing for the reviewer(s) to perform, including the location of input datasets, and any additional instructions:

    Browse changes and confirm that MET compiles via GHA and no differences to the unit test output are flagged.

A review from @jprestop is required because of the huge implications on installation.
A review from @georgemccabe is optional to keep him in the loop on dependency changes and his knowledge of the METbaseimage setup for GHA.

  • Do these changes include sufficient documentation updates, ensuring that no errors or warnings exist in the build of the documentation? [Yes]
    Added MET_PROJ info to the installation chapter which should be easy to incorporate into the overhaul of that chapter that @jprestop and @j-opatz are working on.

  • Do these changes include sufficient testing updates? [Yes]
    None needed.

  • Will this PR result in changes to the test suite? [No]

    If yes, describe the new output and/or changes to the existing output:

  • Please complete this pull request review by [Wed 8/30/23].

Pull Request Checklist

See the METplus Workflow for details.

  • Review the source issue metadata (required labels, projects, and milestone).
  • Complete the PR definition above.
  • Ensure the PR title matches the feature or bugfix branch name.
  • Define the PR metadata, as permissions allow.
    Select: Reviewer(s) and Development issue
    Select: Milestone as the version that will include these changes
    Select: Coordinated METplus-X.Y Support project for bugfix releases or MET-X.Y.Z Development project for official releases
  • After submitting the PR, select the ⚙️ icon in the Development section of the right hand sidebar. Search for the issue that this PR will close and select it, if it is not already selected.
  • After the PR is approved, merge your changes. If permissions do not allow this, request that the reviewer do the merge.
  • Close the linked issue and delete your feature or bugfix branch from GitHub.

…Also delete development.kiowa since that machine was decommissioned in June 2023.
…compiling the 50 applications that already link to -lvx_grid.
…include the Proj library in the rpath settings.
…r or not MET successfully compiles in that METbaseimage
…a4 to test with the correct base image version
…updating this feature branch to use that image when compiling via GHA.
@JohnHalleyGotway JohnHalleyGotway added this to the MET 12.0.0 milestone Aug 29, 2023
@JohnHalleyGotway JohnHalleyGotway linked an issue Aug 29, 2023 that may be closed by this pull request
21 tasks
@JohnHalleyGotway
Copy link
Collaborator Author

I notice failures in this GHA run related to python embedding. Will investigate further and debug.

TEST: gen_vx_mask_PYTHON                - FAIL -   1.023 sec
/usr/local/share/met/../../bin/gen_vx_mask \
      PYTHON_NUMPY \
      PYTHON_NUMPY \
      /data/output/met_test_output/gen_vx_mask/PYTHON_FCST_or_OBS_mask.nc \
      -type data \
      -input_field 'name="/usr/local/share/met/python/examples/read_ascii_numpy.py /met/MET-feature_2669_proj-PR/data/python/fcst.txt FCST";' \
      -mask_field  'name="/usr/local/share/met/python/examples/read_ascii_numpy.py /met/MET-feature_2669_proj-PR/data/python/obs.txt  OBS";' \
      -thresh gt0 -union -v 3
...
DEBUG 3: Initializing MET compile time python instance: /usr/local/bin/python3
FATAL ERROR (SEGFAULT): Process 2376 got signal 11 @ local time = [202](https://github.com/dtcenter/MET/actions/runs/6014551488/job/16315492945#step:4:203)3-08-29 16:46:28Z

configure Outdated Show resolved Hide resolved
configure.ac Outdated Show resolved Hide resolved
@JohnHalleyGotway
Copy link
Collaborator Author

These changes failed in this GHA run. All instances of python embedding cause a segfault. I pulled the Docker image and was able to replicate the bad behavior locally:

docker pull dtcenter/met-dev:feature_2669_proj-PR
docker run -it --rm dtcenter/met-dev:feature_2669_proj-PR /bin/bash
/usr/local/bin/plot_data_plane PYTHON_NUMPY plot.ps 'name="/usr/local/share/met/python/examples/read_ascii_numpy.py /met/MET-feature_2669_proj-PR/data/python/fcst.txt FCST";'

DEBUG 1: Start plot_data_plane by root(0) at 2023-08-29 20:01:50Z  cmd: /usr/local/bin/plot_data_plane PYTHON_NUMPY plot.ps name="/usr/local/share/met/python/examples/read_ascii_numpy.py /met/MET-feature_2669_proj-PR/data/python/fcst.txt FCST"; 
DEBUG 1: Opening data file: PYTHON_NUMPY
FATAL ERROR (SEGFAULT): Process 9 got signal 11 @ local time = 2023-08-29 20:01:50Z
FATAL ERROR (SEGFAULT): Look for a core file in /met
FATAL ERROR (SEGFAULT): Process command line: /usr/local/bin/plot_data_plane PYTHON_NUMPY plot.ps name="/usr/local/share/met/python/examples/read_ascii_numpy.py /met/MET-feature_2669_proj-PR/data/python/fcst.txt FCST"; 
Segmentation fault

This is the same issue that RAL-IT encountered when installing MET into /usr/local and @torle1f diagnosed the problem.

Manually compiling python 3.10 (as we are doing in METbaseimage) and running pip commands introduces linking problems, as demonstrated by this ldd command:

root@e2b70e2f1e04:/met# ldd /usr/local/lib/python3.10/site-packages/netCDF4.libs/libnetcdf-1281958a.so.19 
	linux-vdso.so.1 (0x00007ffe8abd4000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x000014dcb3f8b000)
	libhdf5_hl-f45ebdf9.so.200.1.0 => not found
	libhdf5-c6d05200.so.200.2.0 => not found
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x000014dcb3f6a000)
	libsz-57467d8a.so.2.0.1 => not found
	libz.so.1 => /usr/local/lib/libz.so.1 (0x000014dcb3f49000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x000014dcb3dc4000)
	libblosc-a9e8a7c7.so.1.21.1 => not found
	libzstd-c4cc7033.so.1.5.2 => not found
	libcurl-47074032.so.4.8.0 => not found
	libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x000014dcb3c40000)
	libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x000014dcb3c26000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x000014dcb3a64000)
	/lib64/ld-linux-x86-64.so.2 (0x000014dcb4248000)

So Python's NetCDF library doesn't know where to find the needed libraries and defaults to a bad/incompatible choice from /usr/local/lib. The fix is being more careful in the manual compilation of python, using rpath settings to tell it where to find its required libraries.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Sep 1, 2023

I do note that this apparent linker problem also exists in the dtcenter/met-base:v2.0_debian10 Docker image we've been using for the last 6 months. So that likely isn't the (only) problem to be fixed. I continue to believe the problem is a conflict between the NetCDF library used for the Python netCDF4 module versus the version used to compile MET.

Comparing to the successful setup on seneca, I do notice that the custom Python 3.10 installation DOES NOT have a netCDF4.libs subdirectory in /usr/local/met-python3.10/lib/python3.10/site-packages. Presumably, it was compiled in such a way so that it wasn't required. I'll attempt to achieve the same result with Docker.

@JohnHalleyGotway
Copy link
Collaborator Author

JohnHalleyGotway commented Sep 2, 2023

All of the unit tests finally passed although some differences were flagged in this GHA run:

  1. In tc_stat/ALAL2010_stat.out very small difference in the CRTK_ERR values exist. The differences occur in the 5-th decimal place and are simply the result of using a newer gfortran compiler version. Note that .out files are diffed using the diff command. These minor CRTK_ERR diffs are not flagged in the .tcst files because they are not large enough to be flagged.
  2. The minor differences from (1) propagate to differences in the resulting plots: plot_tc/TK_ERR_boxplot.png and plot_tc/ABS_AMAX_WIND-BMAX_WIND_boxplot.png. However visual inspection of the plots yield no real differences.
  3. The last source of differences is significant. It appears in pb2nc_indy/nam.20210311.t00z.prepbufr.tm00.pbl.nc. All the differences are in the derived HPBL output variable. Four of the existing values are modified in the output, in the 2nd decimal place:
< ADPUPA 72251   20210311_000000 27.78  -97.5100   15 HPBL   NA 0 9  252.54010010
---
> ADPUPA 72251   20210311_000000 27.78  -97.5100   15 HPBL   NA 0 9  252.53479004
--
< ADPUPA 72202   20210311_000000 25.76  -80.3800    4 HPBL   NA 0 9  399.12905884
---
> ADPUPA 72202   20210311_000000 25.76  -80.3800    4 HPBL   NA 0 9  399.13281250
--
< ADPUPA 71802   20210311_000000 47.52  -52.7800  112 HPBL   NA 0 9  157.78698730
---
> ADPUPA 71802   20210311_000000 47.52  -52.7800  112 HPBL   NA 0 9  157.78948975
--
< ADPUPA 72388   20210311_000000 36.05 -115.1800  697 HPBL   NA 0 9 1999.55712891
---
> ADPUPA 72388   20210311_000000 36.05 -115.1800  697 HPBL   NA 0 9 1999.50708008

In addition, the new output increases the number of derived HPBL values from 392 to 432. I inspected the new values and they look to be reasonable. But I have no good reason to offer why a newer gfortran compiler version would increase the number of derived HPBL obs by 40!

That being said, the along/cross track error and HBPL observations are all derived using Fortran code. So I'm pretty confident that these diffs are all due to changing gfortran compiler versions.

Copy link
Collaborator

@jprestop jprestop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JohnHalleyGotway Thanks for your patience and perseverance on this task! I scanned through the code changes and looked at others more closely. I saw the failed check in the differences and reviewed your comment on them. It sounds like these are acceptable differences. I approve this request.

Copy link
Collaborator

@georgemccabe georgemccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good and the diffs were explained in the comments.

A note about the order of merging of this and the corresponding METbaseimage PR:

To test these changes, we had to create the dtcenter/met-base:v2.1 DockerHub tag that uses the build script from the feature_2669_proj branch. We will want to update the dtcenter/METbaseimage Dockerfile to no longer reference the feature branch, as it will go away once this PR is merged.
We discussed that we should merge this pull request, then update the dtcenter/METbaseimage#14 branch to use develop instead of feature_2669_proj in the URL to the build script. We can retag v2.1 in METbaseimage after it is merged, which will kick off a rebuild of the Docker image. This last step is not mandatory, but would be good to do to ensure that we don't introduce a typo in the METbaseimage PR that we will have to track down later.

@JohnHalleyGotway JohnHalleyGotway merged commit 1ccc067 into develop Sep 5, 2023
32 of 33 checks passed
JohnHalleyGotway added a commit to dtcenter/METbaseimage that referenced this pull request Sep 5, 2023
…Dockerfile here to pull compile_MET_all.sh from develop rather than the feature_2669_proj feature branch.
@JohnHalleyGotway JohnHalleyGotway deleted the feature_2669_proj branch September 6, 2023 19:39
@JohnHalleyGotway JohnHalleyGotway restored the feature_2669_proj branch September 6, 2023 19:39
JohnHalleyGotway pushed a commit that referenced this pull request Sep 6, 2023
Co-authored-by: jprestop <jpresto@ucar.edu>
Co-authored-by: Seth Linden <linden@seneca.rap.ucar.edu>
Co-authored-by: John Halley Gotway <johnhg@ucar.edu>
Co-authored-by: Daniel Adriaansen <dadriaan@ucar.edu>
Co-authored-by: John and Cindy <halleygotway@Halleys-Mac-mini.local>
Co-authored-by: rgbullock <bullock@ucar.edu>
Co-authored-by: Randy Bullock <bullock@seneca.rap.ucar.edu>
Co-authored-by: Dave Albo <dave@seneca.rap.ucar.edu>
Co-authored-by: Howard Soh <hsoh@seneca.rap.ucar.edu>
Co-authored-by: George McCabe <23407799+georgemccabe@users.noreply.github.com>
Co-authored-by: hsoh-u <hsoh@ucar.edu>
Co-authored-by: MET Tools Test Account <met_test@seneca.rap.ucar.edu>
Co-authored-by: Seth Linden <linden@ucar.edu>
Co-authored-by: lisagoodrich <33230218+lisagoodrich@users.noreply.github.com>
Co-authored-by: davidalbo <dave@ucar.edu>
Co-authored-by: Lisa Goodrich <lisag@ucar.edu>
Co-authored-by: metplus-bot <97135045+metplus-bot@users.noreply.github.com>
Co-authored-by: j-opatz <59586397+j-opatz@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Jonathan Vigh <jvigh@ucar.edu>
Co-authored-by: Tracy Hertneky <39317287+hertneky@users.noreply.github.com>
Co-authored-by: David Albo <dave@ucar.edu>
Co-authored-by: Dan Adriaansen <dadriaan@ucar.edu>
fix 2518 dtypes appf docs (#2519)
fix 2531 compilation errors (#2533)
fix #2531 compilation_errors_configure (#2535)
fix #2514 develop clang (#2563)
fix #2575 develop python_convert (#2576)
Fix Python environment issue (#2407)
fix definitions of G172 and G220 based on comments in NOAA-EMC/NCEPLIBS-w3emc#157. (#2406)
fix #2380 develop override (#2382)
fix #2408 develop empty config (#2410)
fix #2390 develop compile zlib (#2404)
fix #2412 develop climo (#2422)
fix #2437 develop convert (#2439)
fix for develop, for #2437, forgot one reference to the search_parent for a dictionary lookup.
fix #2452 develop airnow (#2454)
fix #2449 develop pdf (#2464)
fix #2402 develop sonarqube (#2468)
fix #2426 develop buoy (#2475)
fix 2596 main v11.1 rpath compilation (#2614)
fix #2514 main_v11.1 clang (#2628)
fix #2644 develop percentile (#2647)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏁 Done
Development

Successfully merging this pull request may close these issues.

Enhance MET to compile and link against the Proj library
3 participants