Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Tooling and automation for support matrix #290

Merged
merged 1 commit into from
May 8, 2022

Conversation

mikemckiernan
Copy link
Member

@mikemckiernan mikemckiernan commented May 5, 2022

  • Extract the broad swath of data from
    the NGC containers.
  • Create RST tables from the data, by year.
  • Add tests.
  • Rebase after docs-ext-toc was merged.

I can be persuaded not to commit the docs-ci workflow file.

I created the PNGs for the notebooks because the embedded data kept getting flagged by codespell.

In the end, it all seems to work, after I committed to main in my fork:

Oh, and here too! I wasn't sure that'd work: https://nvidia-merlin.github.io/Merlin/review/pr-290/support_matrix/index.html

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@mikemckiernan mikemckiernan requested a review from rnyak May 5, 2022 20:17
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #290 of commit 5d44453e548cf5cbae36fa8b5d4015b003dc2f70, no merge conflicts.
Running as SYSTEM
Setting status of 5d44453e548cf5cbae36fa8b5d4015b003dc2f70 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/68/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/290/*:refs/remotes/origin/pr/290/* # timeout=10
 > git rev-parse 5d44453e548cf5cbae36fa8b5d4015b003dc2f70^{commit} # timeout=10
Checking out Revision 5d44453e548cf5cbae36fa8b5d4015b003dc2f70 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5d44453e548cf5cbae36fa8b5d4015b003dc2f70 # timeout=10
Commit message: "docs: Tooling and automation for support matrix"
 > git rev-list --no-walk 3fca4a55f53ddcbf00b603312241c8247ca0db80 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins8925814107767920868.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1 item

tests/unit/test_version.py . [100%]

============================== 1 passed in 0.02s ===============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins718900722889556412.sh

@mikemckiernan mikemckiernan added the documentation Improvements or additions to documentation label May 5, 2022
@github-actions
Copy link

github-actions bot commented May 5, 2022

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-290

@mikemckiernan mikemckiernan added this to the Merlin 22.05 milestone May 5, 2022
@mikemckiernan
Copy link
Member Author

@rnyak , I tagged you for review because I have two notebooks that I modified--wanted you to know in case you had test cases in flight.


# disable code-complexity checks for now
# TODO: should we configure the thresholds for these rather than just disable?
; too-many-function-args,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly just curious: what does the leading semicolon on these lines do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karlhigley , ha. I didn't even question it. I highlighted the lines in vscode and pressed Ctrl+/ to comment them and vscode added the semicolons. I believe the semi acts as a comment character because I added some of those explicitly to some of the functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting! I thought it might be a comment but then I realized # was a comment, and it turns out to be impossible to search for information on this, because all you get is info on the pylint checks about semicolons.

client = docker.from_env()
container = None
try:
# runtime="nvidia",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does specifying runtime="nvidia" break something? Just not necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The runtime isn't installed on the GitHub runner. IIRC (and that's dicey) it produced an error on my first run. I don't need it for the data extraction because I'm not running any analytics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense!

Copy link
Member

@benfred benfred left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'm excited to get this functionality in.

One minor issue though - I'm seeing 'not applicable' versions here for 22.04 for cudf/nvtabular etc: https://nvidia-merlin.github.io/Merlin/review/pr-290/support_matrix/support_matrix_merlin_training.html#xx-container-images - both of which should be on that container. I'm a little confused as to why the cudf/nvtabular versions is showing up for 22.03 but not 22.04 =(

I'm also wondering if it might be easier to maintain the smx2rst script using a templating library - rather than building up the RST ourselves in python (maybe something like jinja2 https://jinja.palletsprojects.com/en/3.1.x/templates/ ). But since you've got this working already, I don't think we need to change here - just something to think about in the future.

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #290 of commit 0c2f1cf7d23ca1f5fddf95095b465e20e7ac4841, no merge conflicts.
Running as SYSTEM
Setting status of 0c2f1cf7d23ca1f5fddf95095b465e20e7ac4841 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/70/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/290/*:refs/remotes/origin/pr/290/* # timeout=10
 > git rev-parse 0c2f1cf7d23ca1f5fddf95095b465e20e7ac4841^{commit} # timeout=10
Checking out Revision 0c2f1cf7d23ca1f5fddf95095b465e20e7ac4841 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0c2f1cf7d23ca1f5fddf95095b465e20e7ac4841 # timeout=10
Commit message: "docs: Tooling and automation for support matrix"
 > git rev-list --no-walk 885366f0ef77f0e14cfbe3eace95d65da4d0473c # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins652111999545755881.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1 item

tests/unit/test_version.py . [100%]

============================== 1 passed in 0.01s ===============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins2999333153103710278.sh

@mikemckiernan mikemckiernan force-pushed the mx-automation branch 2 times, most recently from 1255df3 to b1cbf45 Compare May 6, 2022 19:36
@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #290 of commit b1cbf453035942e878396b1bb8faa56e01dd3387, no merge conflicts.
Running as SYSTEM
Setting status of b1cbf453035942e878396b1bb8faa56e01dd3387 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/71/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/290/*:refs/remotes/origin/pr/290/* # timeout=10
 > git rev-parse b1cbf453035942e878396b1bb8faa56e01dd3387^{commit} # timeout=10
Checking out Revision b1cbf453035942e878396b1bb8faa56e01dd3387 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f b1cbf453035942e878396b1bb8faa56e01dd3387 # timeout=10
Commit message: "docs: Tooling and automation for support matrix"
 > git rev-list --no-walk 0c2f1cf7d23ca1f5fddf95095b465e20e7ac4841 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins8838689680984873676.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1 item

tests/unit/test_version.py . [100%]

============================== 1 passed in 0.01s ===============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins5445242447122355265.sh

@nvidia-merlin-bot
Copy link
Contributor

Click to view CI Results
GitHub pull request #290 of commit f1017d21774bfe15fc70ace9365f0aa0eecace1b, no merge conflicts.
Running as SYSTEM
Setting status of f1017d21774bfe15fc70ace9365f0aa0eecace1b to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/72/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/290/*:refs/remotes/origin/pr/290/* # timeout=10
 > git rev-parse f1017d21774bfe15fc70ace9365f0aa0eecace1b^{commit} # timeout=10
Checking out Revision f1017d21774bfe15fc70ace9365f0aa0eecace1b (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f1017d21774bfe15fc70ace9365f0aa0eecace1b # timeout=10
Commit message: "docs: Tooling and automation for support matrix"
 > git rev-list --no-walk b1cbf453035942e878396b1bb8faa56e01dd3387 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins5260512312924266804.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 1 item

tests/unit/test_version.py . [100%]

============================== 1 passed in 0.01s ===============================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins7353678990772643368.sh

@mikemckiernan
Copy link
Member Author

This looks great! I'm excited to get this functionality in.

One minor issue though - I'm seeing 'not applicable' versions here for 22.04 for cudf/nvtabular etc: https://nvidia-merlin.github.io/Merlin/review/pr-290/support_matrix/support_matrix_merlin_training.html#xx-container-images - both of which should be on that container. I'm a little confused as to why the cudf/nvtabular versions is showing up for 22.03 but not 22.04 =(

I'm also wondering if it might be easier to maintain the smx2rst script using a templating library - rather than building up the RST ourselves in python (maybe something like jinja2 https://jinja.palletsprojects.com/en/3.1.x/templates/ ). But since you've got this working already, I don't think we need to change here - just something to think about in the future.

Geez. Thanks for the review and the sugg to use pip show. Looks better now: https://nvidia-merlin.github.io/Merlin/review/pr-290/support_matrix/support_matrix_merlin_training.html#xx-container-images

@benfred
Copy link
Member

benfred commented May 7, 2022

looks a lot better! I see cudf versions showing up now. Is it possible to get the hugectr version on the https://nvidia-merlin.github.io/Merlin/review/pr-290/support_matrix/support_matrix_merlin_training.html#xx-container-images ? it should be installed on the merlin-training and merlin-inference containers

* Extract the broad swath of data from
  the NGC containers.
* Create RST tables from the data, by year.
* Add tests.
* Rebase after docs-ext-toc was merged.
* Add logging for troublshooting unexpected results.
* Add get_from_pip function from Ben
@mikemckiernan
Copy link
Member Author

looks a lot better! I see cudf versions showing up now. Is it possible to get the hugectr version on the https://nvidia-merlin.github.io/Merlin/review/pr-290/support_matrix/support_matrix_merlin_training.html#xx-container-images ? it should be installed on the merlin-training and merlin-inference containers

I managed to get the merlin-training 22.04 container. I'm fairly sure that HugeCTR is in there, but there's some issue:

python -c 'import hugectr as x; print(x.__version__);'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: libnvidia-ml.so.1: cannot open shared object file: No such file or directory

...and there's something up with this too:

pip show hugectr
WARNING: Package(s) not found: hugectr

I agree it's missing, but it seems like gymnastics are needed by me and by customers.

@mikemckiernan mikemckiernan merged commit 20ea0b1 into NVIDIA-Merlin:main May 8, 2022
@mikemckiernan mikemckiernan deleted the mx-automation branch May 12, 2022 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants