-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
An option or function to get all the problems compiled and saved, so that they do not need to be compiled on the fly #48
Comments
Hello Zaikun, Thank you very much for getting in touch, a PyCUTEst function that precompiles (or caches as we call it) all the CUTEst problems is definitely something that we could easily add if you think that it would be useful, @lindonroberts what to you think? The only issue I can see with this is that the problems will only be compiled with their default sizes, whereas some users may want to use larger or smaller problem parameter values to get larger or smaller dimensional versions of a particular problem. At the moment it is possible to get a list of all possible CUTEst problems using the I have successfully used GitHub actions to precompile a C++/Python project (jfowkes/gofit) so I think it should be relatively easy to set up a pre-compiled PyCUTEst, however my concern is that it would be huge given the size of the CUTEst test collection -- how large is the precompiled MatCUTEst? Kind regards, |
Hi, Jaroslav! Thank you for the timely response.
This is the good question. The "raw" compiled files take about 6 GB. However, compressing them using 7-Zip, we get a file of about 150 MB. The reason behind this high compressing rate, as speculated by Tom, is that the OUTSIF.d files are essentially highly sparse matrices. There are other files, AUTOMAT.d and *.mexa64, which are negligible compared with OUTSIF.d. I also tried Zip or Gzip, but the compressing rate is not as impressive. In the compiled version of MatCUTEst, I compressed all the compiled problems into two 7-Zip files (full.matcutest.7z.001&2). Each of them is less than 90 MB, so GitHub allows them to be stored in a repo. In this way, we can use MatCUTEst in GitHub Actions by checking out the repo and unpacking the 7z files, which takes less than a minute. One may consider caching the problems so that later actions can use them without installing MatCUTEst again, but it is not worthwhile, as the installation is swift. Thank you for your consideration. Best wishes, |
Hi Zaikun, That's an impressive compression ratio!! Indeed looking at the OUTSIF.d files, they do seem to look like some kind of text-based sparse matrix. You should find that XZ also produces similarly small files to 7-Zip and is open-source. I'm happy to look into this for PyCUTEst, although we have a couple major PRs we need to merge first beforehand. Kind regards, |
Hello Jaroslav, Great! Thank you very much for considering my suggestion. I look forward to seeing it implemented, while fully understanding that it may take some time due to various reasons. Once it is available, I suppose we will use it in the CI of COBYQA @ragonneau , which can also serve as a tester of the implementation. I would like to mention aonther point that is related. It might be a good idea to have a (simple) function that checks all the compiled problems to see whether they work properly without failures. See, for example, According to Tom @ragonneau, it does happen sometimes that some cached problems fail to be loaded when he uses PyCUTEst. I suppose Tom can tell you more if you need information on this. It would be great to identify all these problems and check what are the reasons for them to fail. Many thanks again for your attention! Have a nice weekend! Best regards, |
Hello Jaroslav, Another strong motivation to have the problems pre-compiled is that you can use the compiled problems in parallel. In MATLAB, the compilation of the problems is not thread-safe due to the writing of some library files (.o or .a), but the compiled problems are. I am not sure about what will happy in Python. Best regards, |
@zaikunzhang - thanks for the suggestion (and glad you've found PyCUTEst useful)! I think this is a sensible option, but I'm not sure how this would be used in practice. If we have a zip file in the repo, is the idea that people can just download + extract this file as part of a CI tool (or their own testing), while still installing the package from pypi? I don't know how many people use github (as opposed to pypi) to get the code, but this would substantially increase the repo size. @jfowkes - I'm wondering if it would make sense for this to be in a related but distinct repo for this reason? (not sure, just an idea) I'm also not sure - would this be the full compiled problem (including *.o and cutestitf.c, *.so), or just the CUTEst files .d/.f? If the latter, we would need some extra code to build the relevant problem libraries too, which would take some time to run. If the former, there would need to be separate versions for each platform/Python version. |
Hi Lindon @lindonroberts !
I agree that how to organize the package is a question that needs discussion if the compiled problems are included as a zip, and the complexity is high if you aim to support many platforms. For MatCUTest, what I do is the following.
I understand that Python packages are managed in a completely different way.
I am not sure. For MatCUTEst, I just did some experiments and identified the minimal set of files needed. Hope this is helpful. Best regards, |
Yes I think we would have to do this as a separate repository/package, something like |
Hello, Jaroslav and Lindon (and anyone else who is maintaining PyCUTEst),
First of all, thank you very much for making CUTEst available under Python. It has been playing an essential role during our development of COBYQA @ragonneau . I know it takes a huge effort to make this happen.
Is your feature request related to a problem? Please describe
Sorry if I overlooked something and this feature is already available. Is there an option/function that allows us to get all the problems compiled and saved, so that they do not need to be compiled again later? This question is relevant for two reasons. First, some difficult problems take a significant amount of time and computing power to compile. Second, the compilation may fail (due to various reasons, e.g., limited resources, or some uncovered bug somewhere) from time to time. It would be great if all the problems can be prepared beforehand (for only once), and users never need to worry about the compilation when using these problems for testing.
Describe the solution you'd like
Make it possible to get all the problems compiled and saved, so that they do not need to be compiled a second time. Sure, some problems accept some options when being compiled (e.g., dimension). In that case, we may only compile the problem with the default option.
Describe alternatives you've considered
You may see my package MatCUTEst, which is the MATLAB counterpart of PyCUTEst (sure, CUTEst already provides a MATLAB interface; MatCUTEst makes it easier to use). When MatCUTEst is installed, all the problems will be compiled (it may take some time). After that, all problems are available via the function
macup
(an analog ofpycutest.import_problem
), which does nothing more than locate the compiled problem without doing any compilation. This saves enormous time during the testing.In addition, I also made a compiled version of MatCUTEst, which can be used without any compilation (of course, if works only for limited combinations of OS and MATLAB). With MatCUTEst, it is easy to use CUTEst in GitHub Actions, particularly with GitHub hosted runners, which would be rather time/resource-consuming if problems need to be compiled on the fly. It would be great if the same can be done for PyCUTEst. Anyone who wants to use PyCUTEst in CI would benefit.
Thank you very much for your attention.
Best regards,
Zaikun
The text was updated successfully, but these errors were encountered: