Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the test suite #70

Closed
wants to merge 2 commits into from
Closed

Run the test suite #70

wants to merge 2 commits into from

Conversation

maresb
Copy link
Contributor

@maresb maresb commented Jun 25, 2021

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@maresb
Copy link
Contributor Author

maresb commented Jun 28, 2021

@conda-forge-admin, please rerender

@github-actions
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.

recipe/meta.yaml Outdated
@@ -33,6 +35,9 @@ requirements:
- mkl-service
- cxx-compiler # [not win]
- m2w64-toolchain # [win]
- libpython # [win]
- numba # [not linux]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numba is not a pymc3 dependency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some reason, I don't remember what though.

My guess is that some test was failing without. But if so, maybe that test was optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having numba in the dependencies tricks conda into resolving a compatible of dependencies.
A while back I did really extensive testing, comparing different sets of dependencies, diffing the resulting environments and checking if PyMC3 was importing without warnings...
That's why numba is included in https://github.com/pymc-devs/pymc3/wiki/Installation-Guide-(Windows)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess they just did a better job at defining the dependencies compared to the authors of Theano/Aesara conda recipes.

@@ -33,6 +35,9 @@ requirements:
- mkl-service
- cxx-compiler # [not win]
- m2w64-toolchain # [win]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these dependencies should be in the aesara feedstock, not this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe it's best to keep them for this version and try to fix in Aesara soon afterwards? After removing this we'd then ≤-pin Aesara to the packages which are fixed?

BTW, I don't see Aesara as a direct dependency here. Is it indirect?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theano-pymc was renamed to aesara, but pymc3 stable does not yet depend on it. So we might need to wait until 1. aesara has a reliable compile chain dependencies, and 2. we have a pymc3 release that depends on aesara instead of theano-pymc.

recipe/meta.yaml Outdated
@@ -33,6 +35,9 @@ requirements:
- mkl-service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also optional.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that numpy and pandas are optional, or something about the pinning is optional?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mkl-service is not a requirement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, it seems that GitHub glitched and was showing pandas/numpy instead of mkl-service.

@maresb
Copy link
Contributor Author

maresb commented Aug 13, 2021

@twiecki please feel free to commit directly yourself. I enabled edits by maintainers.

@maresb
Copy link
Contributor Author

maresb commented Aug 13, 2021

Some broken pip dependencies:

https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=363667&view=logs&j=9c5ef928-2cd6-52e5-dbe6-9d173a7d951b&t=20c71c51-4b27-578b-485d-06ade2de1d00&l=1271

And under Windows, it's showing a stack overflow. 😣

Still waiting for more stuff to complete.

@michaelosthege
Copy link
Contributor

And under Windows, it's showing a stack overflow. 😣

The test suite is too large to run in one test job on Windows. It's also reproducible locally that none of these tests fail individually, but give the stack overflow when all of them run in the same pytest call.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I was trying to look for recipes to lint for you, but it appears we have a merge conflict.
Please try to merge or rebase with the base branch to resolve this conflict.

Please ping the 'conda-forge/core' team (using the @ notation in a comment) if you believe this is a bug.

@conda-forge-linter
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@twiecki
Copy link
Contributor

twiecki commented Aug 18, 2021

I think we should figure out what adding numba is doing and solving the underlying issue, because it really doesn't make any sense to add it here.

@maresb
Copy link
Contributor Author

maresb commented Aug 18, 2021

@twiecki I completely agree, I just want to reach a state where the CI tests are passing.

Now it's saying pytest failed, even though there were no errors and only warnings? Any idea what's going on?

@maresb
Copy link
Contributor Author

maresb commented Aug 20, 2021

I am getting closer, thanks very much @michaelosthege for the suggestion to run the tests individually to avoid the stack overflow under Windows.

Now just Windows is failing with some theano compilation errors:

3.7:

E Exception: ('Compilation failed (return status=1): C:\Users\VSSADM~1\AppData\Local\Temp\ccbQJyYb.s: Assembler messages:\r. C:\Users\VSSADM~1\AppData\Local\Temp\ccbQJyYb.s:262: Error: invalid register for .seh_savexmm\r. ', 'FunctionGraph(Elemwise{ge,no_inplace}(v, <TensorType(float64, (True,))>))')

3.8

E Exception: ('Compilation failed (return status=1): C:\Users\VSSADM~1\AppData\Local\Temp\cckOnRXa.s: Assembler messages:\r. C:\Users\VSSADM~1\AppData\Local\Temp\cckOnRXa.s:81: Error: invalid register for .seh_savexmm\r. ', 'FunctionGraph(Elemwise{ge,no_inplace}(v, <TensorType(float64, (True,))>))')

3.9

E Exception: ("Compilation failed (return status=1): C:\Users\VSSADM~1\AppData\Local\Temp\cctE7GCa.o: In function run':\r. C:/Users/VssAdministrator/AppData/Local/Theano/compiledir_Windows-10-10.0.14393-SP0-Intel64_Family_6_Model_79_Stepping_1_GenuineIntel-3.9.6-64/tmpv8iwac39/mod.cpp:1612: undefined reference to dgemm_'\r. C:\Users\VSSADM~1\AppData\Local\Temp\cctE7GCa.o:mod.cpp:(.rdata$.refptr.dgemm_[.refptr.dgemm_]+0x0): undefined reference to `dgemm_'\r. collect2.exe: error: ld returned 1 exit status\r. ", 'FunctionGraph(BatchedDot(<TensorType(float64, row)>, <TensorType(float64, row)>))')

Do we need to mess with cxxflags? Or is this a BLAS thing? Any ideas @twiecki or @michaelosthege?

@twiecki
Copy link
Contributor

twiecki commented Aug 20, 2021

This problems was mentioned here: pymc-devs/pymc#4749 (seems to be a m2g64-toolchain issue, can you confirm that it gets installed?), as well as here: Theano/Theano#6693 (solved by a compiler flag).

Perhaps the latter is the safer fix?

@maresb
Copy link
Contributor Author

maresb commented Aug 20, 2021

In the testing environment, m2w64-toolchain is indeed installed here.

I'm not actually sure how one should set compiler flags from conda. Primarily because I don't work with C++ much, so it's quite mysterious to me. And secondly because it seems very risky and complicated. For example, if we mess with the CXXFLAGS environment variable, what if it conflicts with other programs? I'm pretty stuck on a good way forward. Any suggestions?

@twiecki
Copy link
Contributor

twiecki commented Aug 20, 2021

The compiler flag I think we would set in aesara or pymc3, we already do this here: https://github.com/pymc-devs/pymc3/blob/main/pymc3/__init__.py#L40 Could we test this here?

Curious if @michaelosthege has any ideas.

@michaelosthege
Copy link
Contributor

The compiler flag I think we would set in aesara or pymc3, we already do this here: https://github.com/pymc-devs/pymc3/blob/main/pymc3/__init__.py#L40 Could we test this here?

Curious if @michaelosthege has any ideas.

I'm a biologist. This is pretty much black magic to me.
My knowledge about compiler flags ends at the point where I know they exist and they do stuff. With the how the compiler does things I guess.

I think we should figure out what adding numba is doing and solving the underlying issue, because it really doesn't make any sense to add it here.

Maybe check out this for comparison? https://github.com/conda-forge/numba-feedstock/blob/master/recipe/meta.yaml

@maresb
Copy link
Contributor Author

maresb commented Aug 20, 2021

@conda-forge-admin, please rerender

2 similar comments
@maresb
Copy link
Contributor Author

maresb commented Aug 20, 2021

@conda-forge-admin, please rerender

@maresb
Copy link
Contributor Author

maresb commented Aug 20, 2021

@conda-forge-admin, please rerender

@maresb
Copy link
Contributor Author

maresb commented Aug 20, 2021

@conda-forge-admin, please rerender

@github-actions
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.

@maresb
Copy link
Contributor Author

maresb commented Aug 20, 2021

@conda-forge-admin, please rerender

@maresb maresb closed this Aug 20, 2021
@maresb maresb reopened this Aug 20, 2021
@maresb
Copy link
Contributor Author

maresb commented Aug 20, 2021

I rebased on the new version.

CI seems to be down. I'll probably have to try again tomorrow.

@twiecki
Copy link
Contributor

twiecki commented Aug 24, 2021

restarted the tests (I think).

@maresb
Copy link
Contributor Author

maresb commented Aug 24, 2021

@conda-forge-admin, please rerender

@maresb
Copy link
Contributor Author

maresb commented Aug 24, 2021

In Python 3.7 there is a missing DLL when loading netcdf4.

In 3.8 and 3.9 there is an error about the following failed assertion in check_logp().

        domains = paramdomains.copy()
        domains["value"] = domain
        for pt in product(domains, n_samples=n_samples):
            pt = Point(pt, model=model)
>           assert_almost_equal(
                logp(pt),
                logp_reference(pt),
                decimal=decimal,
                err_msg=str(pt),
            )
E           AssertionError: 
E           Arrays are not almost equal to 6 decimals
E           {'alpha': array(20.), 'beta': array(0.9), 'value': array(100.)}
E           Mismatched elements: 1 / 1 (100%)
E           Max absolute difference: 3.86856262e+25
E           Max relative difference: 4.70326902e-16
E            x: array(-8.225263e+40)
E            y: array(-8.225263e+40)

pymc3\tests\test_distributions.py:611: AssertionError

It looks like numpy's definition of being equal to 6 digits means "6 digits after the 1's place" instead of "6 significant digits".

@twiecki
Copy link
Contributor

twiecki commented Oct 12, 2021

@maresb I merged the mkl-services PR. Should we revisit this?

@maresb
Copy link
Contributor Author

maresb commented Oct 12, 2021

@twiecki, sure! I just rebased on master, so hopefully some of the latest changes have fixed some of the issues.

I don't have any time to work on this, but please feel free to commit to this branch.

Also please see the commented-out changes. For instance, we may need to pin the semver package. Unfortunately I've fairly completely forgotten where I was at.

@maresb
Copy link
Contributor Author

maresb commented Oct 12, 2021

@conda-forge-admin, please rerender

@maresb
Copy link
Contributor Author

maresb commented Oct 13, 2021

Windows is segfaulting right away on tests\test_pickling.py even though it's running as an individual test. I can move that one to the end and see if the other tests make any more progress.

OSX/3.7 stalled out mysteriously. Maybe it was a glitch.

I'll rebase master and try again...

@maresb
Copy link
Contributor Author

maresb commented Oct 13, 2021

@conda-forge-admin, please rerender

@github-actions
Copy link
Contributor

Hi! This is the friendly automated conda-forge-webservice.
I tried to rerender for you, but it looks like there was nothing to do.

@maresb
Copy link
Contributor Author

maresb commented Oct 13, 2021

Windows tests are still segfaulting. :( @michaelosthege, do you have any ideas?

@michaelosthege
Copy link
Contributor

@maresb the tests segfault on Windows when you run too much at once. Where "too much" is something like less than a quarter of all tests. Looks like the CI here is trying to run everything at once..

@maresb
Copy link
Contributor Author

maresb commented Oct 14, 2021

@michaelosthege, on precisely this advice from you, I already split the tests so that each test file runs individually. 😉 Unfortunately it's still segfaulting on Windows, so I was hoping that you might have another clever idea.

@michaelosthege
Copy link
Contributor

michaelosthege commented Oct 14, 2021

@maresb oh I see.
In one log it shows this:

conda activate log
%SRC_DIR%>SET DISTUTILS_USE_SDK=1 

%SRC_DIR%>SET MSSdk=1 

%SRC_DIR%>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\" ) 

%SRC_DIR%>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\BuildTools\" ) 

%SRC_DIR%>if not exist "C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\" (set "VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\" ) 

%SRC_DIR%>IF NOT "1" == "" (
set "INCLUDE=%PREFIX%\Library\include;"  
 set "LIB=%PREFIX%\Library\lib;"  
 set "CMAKE_PREFIX_PATH=%PREFIX%\Library;" 
) 

%SRC_DIR%>call :GetWin10SdkDir 

%SRC_DIR%>call :GetWin10SdkDirHelper HKLM\SOFTWARE\Wow6432Node  1>nul 2>&1 

%SRC_DIR%>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE\Wow6432Node  1>nul 2>&1 

%SRC_DIR%>if errorlevel 1 call :GetWin10SdkDirHelper HKLM\SOFTWARE  1>nul 2>&1 

%SRC_DIR%>if errorlevel 1 call :GetWin10SdkDirHelper HKCU\SOFTWARE  1>nul 2>&1 

%SRC_DIR%>if errorlevel 1 exit /B 1 

%SRC_DIR%>exit /B 0 

%SRC_DIR%>for /F %i in ('dir /ON /B "C:\Program Files (x86)\Windows Kits\10\\include\10.*"') DO (SET WindowsSDKVer=%~i ) 

%SRC_DIR%>(SET WindowsSDKVer=10.0.10240.0 ) 

%SRC_DIR%>(SET WindowsSDKVer=10.0.10586.0 ) 

%SRC_DIR%>(SET WindowsSDKVer=10.0.14393.0 ) 

%SRC_DIR%>(SET WindowsSDKVer=10.0.15063.0 ) 

%SRC_DIR%>(SET WindowsSDKVer=10.0.16299.0 ) 

%SRC_DIR%>(SET WindowsSDKVer=10.0.17134.0 ) 

%SRC_DIR%>(SET WindowsSDKVer=10.0.17763.0 ) 

%SRC_DIR%>if errorlevel 1 (echo "Didn't find any windows 10 SDK. I'm not sure if things will work, but let's try..." )  else (echo Windows SDK version found as: "10.0.17763.0" ) 
Windows SDK version found as: "10.0.17763.0"

%SRC_DIR%>IF "win-64" == "win-64" (
set "CMAKE_GEN=Visual Studio 15 2017 Win64"  
 set "BITS=64" 
)  else (
set "CMAKE_GEN=Visual Studio 15 2017"  
 set "BITS=32" 
) 

%SRC_DIR%>pushd C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\ 

When I see this on my PC I trash the environment right away and start over. It may work, but this is definitely not normal.
Could be unrelated though.

Do you know which test causes the segfault? I couldn't see that from the logs..
Also maybe there's still some notion of "process session" and the split-up didn't help. The CI on our main repo splits things across jobs that run more separate than the tests here(?)

@maresb
Copy link
Contributor Author

maresb commented Jun 13, 2022

This is now mostly obsolete, so I'm closing this out.

It could however be a useful reference in the future if we want to run pytest in pymc-feedstock.

@maresb maresb closed this Jun 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants