Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Windows paths in tests of mingw-w64 executables #2105

Merged
merged 13 commits into from
Apr 11, 2022

Conversation

mjwoods
Copy link
Contributor

@mjwoods mjwoods commented Sep 4, 2021

The mingw-w64 environment is designed to produce native Windows software, but it supports build systems based on POSIX-style utilities such as shell scripts. The bash shell provided with mingw-w64 is derived from Cygwin, and it uses Unix-style paths within a root directory tree that is mounted from a Windows directory. When invoking Windows executables, the bash shell automatically converts command arguments that appear to be Unix paths into the Windows format. However, the automatic conversion causes a few problems in netcdf tests.

For example, URLs of the file:// variety are not converted to Windows format, and the corresponding files are not found by Windows programs because they do not respect the same mount points as the bash shell.

Also, some of the netcdf tests compare ideal outputs created by scripts with files created by compiled (Windows) executables, and the automatic path conversions may cause such tests to fail.

To avoid the automatic path conversions, Windows paths can be obtained in the mingw-w64 bash shell using the command pwd -W. In this PR, I have added logic in test_common.in to detect the mingw shell and define the shell variable PWDCMD to the appropriate form of pwd. The variable is then used in place of pwd commands in test scripts.

The test script ncdap_test/testpathcvt.sh required further treatment, because it tests path conversions in a variety of Unix formats, which were failing due to the automatic conversions performed by bash. The conversions by bash are disabled completely by defining the environment variable MSYS2_ARG_CONV_EXCL='*'.

@mjwoods mjwoods requested a review from WardF as a code owner September 4, 2021 07:49
@DennisHeimbigner
Copy link
Collaborator

I would be happier if we could isolate these changes to libdispatch/dpathcvt.c
and use the ncpathcvt utility to do the necessary conversions in shell scripts.

@DennisHeimbigner
Copy link
Collaborator

Since mingw has these peculiarities, we probably need to add a NCPD_MINGW
case to include/ncpathcvt.h and modify dpathcvt.c to handle mingw.

@mjwoods mjwoods marked this pull request as draft September 5, 2021 12:56
@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 6, 2021

Hi @DennisHeimbigner , I do not think this problem can be solved in the dpathcvt code. Only the bash shell (and related POSIX utilities) know the mount points in use, and there is no way for native Windows executables (such as ncdump.exe) to find out the mount points used by the calling shell.

For example, if pwd returns /home/user/netcdf-c/mybuild, the corresponding URL file:///home/user/netcdf-c/mybuild is not translated by bash when passed as an argument to ncdump. But ncdump cannot know that /home is actually c:/mingw64 (or any user specified installation prefix). If instead we use pwd -W, the resulting URL would be file://c:/mingw64/home/user/netcdf-c/mybuild.

I have tried to find a solution that is less invasive, and the best I can suggest is to use an alias for pwd when the shell is identified as mingw (please see mjwoods@86b7fa3). This fixes the path problems in all existing tests, without touching as many lines as the previous solution involving $PWDCMD.

@DennisHeimbigner
Copy link
Collaborator

Is there a run-time library equivalent to pwd -W?
If so, then presumably dpathcvt can invoke it when under mingw
to get the mount point if it exists.

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 6, 2021

I don't think there is a run-time library for mingw, apart from the Microsoft C run-time (and some wrapper code defined in the mingw header files).

The mingw build environment is based on Cygwin, so the POSIX parts of the environment are linked to a Cygwin-like run-time library. But the executables produced by mingw are intended to work without the Cygwin run-time. This is a deliberate choice by the mingw designers, and I can only speculate that it may be necessary to allow direct use of Microsoft APIs.

@DennisHeimbigner
Copy link
Collaborator

I was looking at this:

https://stackoverflow.com/questions/28533664/how-to-prevent-msys-to-convert-the-file-path-for-an-external-program

It seems to me that best solution is to add

MSYS_NO_PATHCONV=1

to test_common.in and then treat msys/mingw as windows.
Will that work?

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 6, 2021

I think the variable has changed to MSYS2_ARG_CONV_EXCL, as described in the mingw-w64 porting guide .

If you have concerns about adding my changes to netcdf, I can patch netcdf in the build script for the mingw-w64-netcdf package. But patches tend to have a limited shelf life, and it may help the netcdf developers to know what is needed by mingw packages (and other software that depends on them, such as R).

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Sep 6, 2021

Correct me, but I thought we had two problems with mingw/msys.

  1. the strcasecmp (and other) functions.
  2. the path conversion problem.

Your changes for #1 see a good solution. It is #2 I am concerned about.
I am looking for the smallest possible change because it appears that otherwise
we have a massive analysis and set of changes to all the test scripts. That makes
me cringe.

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 6, 2021

I forgot to answer your question about disabling pathname translations by setting a variable ...

The pathname translation already ignores URLs, which is why tests with file:// URLs are failing. Turning off pathname translation completely causes even more problems.

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 6, 2021

Hi Dennis, you mention #2 above - that isn't one of my PRs. But I do understand the desire to keep changes small and simple, so I have trimmed this PR to the smallest change that still works.

@DennisHeimbigner
Copy link
Collaborator

DennisHeimbigner commented Sep 6, 2021

I am confused. What PR is not yours?
Never mind, I see that github did a bad autocomplete. Sorry for the confusion.

@DennisHeimbigner
Copy link
Collaborator

My assumption is that the programs that get a URL argument will eventually
invoke NCpathcvt which ideally converts the URL path to a local one.
What I do not understand is why this failing.

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 6, 2021

The problem is caused by the virtual root directory that bash uses in msys2/mingw. When bash is launched, it mounts the msys2/mingw installation directory (on Windows) as a root directory, so paths inside that directory are referenced by stripping off the root prefix (e.g. c:/mingw64/mypath -> /mypath).

NCpathcvt does a good job of converting paths to/from a variety of formats, but it cannot determine that [dap4]file:///mypath/myfile.nc is supposed to be [dap4]file://c:/mingw64/mypath/myfile.nc, unless the prefix is added explicitly in the bash shell before invoking netcdf executables. That is the role of pwd -W in my proposed change to test_common.in.

If you have access to an msys2/mingw installation, you could try running a few experiments to see what I mean. Ordinary paths are converted by the shell so they are prefixed by the root directory, but URLs miss out on the automatic conversion.

@DennisHeimbigner
Copy link
Collaborator

but it cannot determine that [dap4]file:///mypath/myfile.nc is supposed to be [dap4]file://c:/mingw64/mypath/myfile.nc
Actually, NC_pathcvt should be able to if it knows that it is running under mingw
and what mapping is in effect; this is more or less the same as how it converts
c:\x to /cygdrive/c/x.
That is why I asked about figuring out the mount point programmatically.
Alternatively, can we assume that the mount is '/' -> 'c:\mingw64'
and build it into NC_pathcvt?.
Now that I think about it, a better solution is for NC_pathcvt
to look for an environment variable that defines the mapping.
Then, in test_common.in, we can set that variable using pwd -W.
Does that work?

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 6, 2021

I can't find any existing environment variables that show the path mappings used by bash (in msys2/mingw).

I do like your idea of defining an environment variable inside test_common.in, which could be used in NCpathcvt when converting absolute paths from Unix form to canonical form. Even so, it seems more complicated than defining an alias (or function or variable) for pwd -W.

Hopefully you now have a clear idea of what is causing the problem. I'll try to implement whichever solution you think is best.

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 6, 2021

Also, it is not correct to assume that / -> c:\mingw64, because the msys2/mingw distribution can be installed anywhere the user wants.

@DennisHeimbigner
Copy link
Collaborator

The problem is that aliasing pwd -R does not go far enough.
We need to support scripts produced by others outside of the netcdf-c testing regime,
which means that NC_pathcvt needs to handle msys mounted paths.

Probably the only way to get the actual mount points is to parse /etc/fstab,
but presumably that requires us to know where /etc/fstab is located.
Too bad msys does not have the equivalent to the cygpath command.
The msys people did not think this out very well.

@DennisHeimbigner
Copy link
Collaborator

Out of curiosity what output do you get when you do the command "mount"?

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 7, 2021

Hi Dennis,

I was only thinking about solving the problem in bash shell scripts, which is why I used the built-in pwd -W. But if you want a solution that works outside bash, msys2 provides cygpath and mount commands.

I wasn't sure if those commands would work outside the msys2 shell, so I tried running cmd from bash in the msys2 terminal, which runs the DOS command prompt. From there, I tried the cygpath and mount commands:

C:\msys64\home\milto>mount
mount
C:/msys64 on / type ntfs (binary,noacl,auto)
C:/msys64/usr/bin on /bin type ntfs (binary,noacl,auto)
C: on /c type ntfs (binary,noacl,posix=0,user,noumount,auto)
D: on /d type ntfs (binary,noacl,posix=0,user,noumount,auto)

C:\msys64\home\milto>cygpath -w /
cygpath -w /
C:\msys64\

These commands are in the PATH used by the shell and inherited by external commands (including Windows programs). So it would be possible to run the commands in NCpathcvt, if netcdf is built with mingw and the input path is in Unix format (without a drive letter). We would also need to handle cases when the commands are not found, as may occur when netcdf executables are launched outside of the bash shell - in such cases, the Unix path could be used without adding a prefix, because there are no mount points to worry about.

The presence of a cygpath command in the PATH does not imply that executables are running under msys2, because users could set their own PATH to include msys2 utilities. If NCpathcvt always tries to use cygpath, some users will be confused by the results. To avoid this problem, we probably need to disable path remapping by default and use an environment variable to enable it.

@DennisHeimbigner
Copy link
Collaborator

The reason I asked about mount was because it means there is code I could use
to parse the fstab to get the mount points.

I noticed that the mount output did not include c:/mingw64, which is the mount point
you mentioned before. Instead it uses c:/msys64.
Any idea why the difference?

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 8, 2021

c:/mingw64 was just my choice of example from earlier - the difference is not significant.

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 8, 2021

It looks like NCpathcvt could discover if it is running under the msys shell by checking the environment variables MSYS and/or MSYSTEM. If one or both of those variables are defined, it would make sense to use mount or cygpath to convert paths.

@DennisHeimbigner
Copy link
Collaborator

Refresh my memory:
what tests fail if you set MSYS2_ARG_CONV_EXCL="*" in test_common.in?

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 8, 2021

From memory, most tests fail if path conversions are disabled by setting MSYS2_ARG_CONV_EXCL="*". Using pwd -W can compensate for that though, because the paths are converted explicitly.

@DennisHeimbigner
Copy link
Collaborator

Sorry for all the questions; I do not have anywhere to install msys/mingw.
Anyway, what does pwd return (when you are, say, in "/") when the MSYS2_ARG_CONV_EXCL="*"
is set?

@DennisHeimbigner
Copy link
Collaborator

Here is my confusion. The tests that have urls should either use a relative path
or a constructed absolute path.
In either case the path is made absolute using the pwd command or the pwd() c library function.
I am baffled why this fails when MSYS2_ARG_CONV_EXCL="*" is set.

@DennisHeimbigner
Copy link
Collaborator

Well in any case let implement your solution (this PR)
until we can figure out a more general solution (if possible).
Thanks for catching this; we do not do much testing of msys2 since
it is not a common platform.

@DennisHeimbigner
Copy link
Collaborator

One last question: from where did you download msys2?

@mjwoods
Copy link
Contributor Author

mjwoods commented Sep 8, 2021

Relative paths should work fine, with or without MSYS2_ARG_CONV_EXCL.

I suggest that msys2 is growing in importance now, because R has adopted it as their build platform on Windows.

You can download msys2 from https://www.msys2.org/wiki/MSYS2-installation .

I'm happy to help with a longer-term solution, if you think it would be useful.

@mjwoods mjwoods marked this pull request as ready for review October 2, 2021 22:48
@DennisHeimbigner
Copy link
Collaborator

Your PR is confusing. Many of the changes look like stuff we have already done, like nczarr filters.
What version of netcdf are you working from?

@mjwoods
Copy link
Contributor Author

mjwoods commented Oct 2, 2021

I needed to fix a conflict, because my branch was based on unidata/master but we are merging into unidata/main. The only relevant difference was that ncdap_test/testpathcvt.sh was renamed to ncdump/testpathcvt.sh in unidata/main.

@DennisHeimbigner
Copy link
Collaborator

The problem we have is that when there are lots of "irrelevant" changes, we have to look
at all of them to ensure that there are no security issues. This PR changes too much
for me to be ok with it.

@mjwoods
Copy link
Contributor Author

mjwoods commented Oct 3, 2021

Hi @DennisHeimbigner , are we looking at the same changes? The end result of this PR is only a few lines changed in 2 files. Some of the intermediate commits made bigger changes, but I have reverted them.

@mjwoods
Copy link
Contributor Author

mjwoods commented Oct 3, 2021

Your PR is confusing. Many of the changes look like stuff we have already done, like nczarr filters.
What version of netcdf are you working from?

Sorry for the confusion. This was probably caused when I merged main into my branch, so that I could fix a conflict. Since then, I started with a clean copy of main and rebased my changes onto it, requiring a forced push into this branch. The end result is a total of 11 lines changed in 2 files.

@WardF
Copy link
Member

WardF commented Oct 4, 2021

@mjwoods thanks for the clarification; I just realized the test workflow was awaiting approval to run; I've triggered that and will review the PR as well, thanks!

DennisHeimbigner added a commit to DennisHeimbigner/netcdf-c that referenced this pull request Nov 3, 2021
1. Issue Unidata#2043
   * FreeBSD build fails because of conflicts in defining the fileno() function. So removed all extern declarations of fileno.

2. Issue Unidata#2124
   * There were a couple of problems here.
     * I was conflating msys with mingw and they need separate handling of paths. So treat mingw like windows.
     * memio.c was not always writing the full content of the memory to file. Untested fix by properly accounting for zero size writes.
     * Fix bug when skipping white space in tst_xcache.c

3. Issue Unidata#2105
   * On MINGW, bash and other POSIX utilities use a mounted root directory,
     but executables compiled for Windows do not recognise the mount point.
     Ensure that Windows paths are used in tests of Windows executables.

4. Issue Unidata#2132
   * Apparently the Intel C compiler on OSX defines isnan etc.
     So disable declaration in dutil.c under that condition.

5. Fix and re-enable test_rcmerge.sh by allowing override of where to
   look for .rc files

6. CMakeLists.txt suppresses certain ncdump directory tests because of differences in printing floats/doubles.
   * Extend the list to include those that also fail under mingw.
   * Suppress the mingw tests in ncdump/Makefile.am
@WardF
Copy link
Member

WardF commented Mar 10, 2022

Reviewing this one to determine whether it was incorporated already (or not) and then either incorporating or closing.

@WardF WardF added this to the 4.9.0 milestone Mar 10, 2022
@WardF WardF self-assigned this Apr 11, 2022
@WardF WardF merged commit 1cb28a9 into Unidata:main Apr 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants