-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pytest testing support #21
base: dev
Are you sure you want to change the base?
Conversation
I pushed a couple of commits:
I also edited the description to add open questions on testing with different configurations. |
All in all, I think this is a great step in the direction of making contributing to the code easier and safer for new developers, great work! Here are some of my thoughts (ramblings?) about the questions:
In an ideal world, we would perhaps refactor the code so that we have one database interface that handles the actual SQL stuff. Then we mock methods like
Another option are self-hosted runners. The upside is that we can have a runner container that already has all the basic software installed, and that we won't be dependent on how much free GH Actions minutes we are given. The downside is that we must host the container ourselves somewhere and think about security more rigorously. The main security concern is that if a pipeline is run automatically e.g. on a pull request, a malicious actor can in execute arbitrary code on our machine. Language Bank does have a self-hosted runner in use for one repository though.
Or if that's overkill, we could do more targeting parametrization, perhaps with |
@aajarven, thanks for your insights! Here are a couple of comments.
|
Sorry for taking so long to get to this. I really appreciate the work you've done, as adding tests has been on our wish list for far too long! I ran into some trouble installing the CWB Perl tools (we don't usually use this part of CWB), so I tried replacing |
requirements.txt: - Update mysqlclient to 1.3.14, so that running on Ubuntu 22.04 does not crash with "ImportError: libmysqlclient.so.20: cannot open shared object file: No such file or directory".
Add facility for encoding VRT files into CWB corpora: fixture "corpora" that uses tests.corpusutils.CWBEncoder.
tests: - Add an empty __init__.py to make tests a package, to make calling pytest directly work (and not only "python -m pytest")
Add tests/testutils.py: - Function get_response_json can be called from tests for endpoints and helps in making them slightly more compact and less repetitive.
tests/conftest.py: - Add fixtures cache_dir and corpus_config_dir for creating temporary cache and corpus configuration directories, and use them in fixture app for overriding the app configuration defaults.
tests/conftest.py: - Define fixture corpus_configs: Copy corpus configurations from the configuration source directory (data/corpora/config) to a temporary test directory.
tests: - Make fixtures "app" and "client" return a function that takes an optional dict argument for overriding default configuration values. - Update tests to use the factory fixtures.
tests/dbutils.py: - Support expansion of variables "{var}" in the "definitions" of table information items. tests/data/db/tableinfo/relations.yaml: - Define variable "rel_type" and use it as the type of field "rel" in the table definitions.
tests/dbutils.py: - KorpDatabase: Add methods execute, execute_file for executing SQL statements in a string or file. Convert other code to use the methods.
tests/conftest.py: - Add fixture database_tables for importing database data by table types and corpora. tests/dbutils.py: - Add import_tables (and auxiliary methods): Find data files (TSV or SQL) based on corpus name and table type. This requires the filenames in the table info files to contain a "{corpus}" placeholder for corpus id. tests/data/db/tableinfo/*.yaml: - Add placeholder "{corpus}" for corpus ids in filename patterns. - Make the filename patterns slightly more explicit.
tests/functional/test_lemgram_count.py, tests/functional/test_timespan.py: - Add argument "corpus" to the function returned by a local fixture. - Use fixture database_tables and load database data only for the corpora actually used. - Use tests.testutils.make_liststr to convert a list of corpora to a string.
tests/dbutils.py: - KorpDatabase.execute_get_cursor: If cursor.connection.commit() fails with MySQLdb.ProgrammingError, try cursor.execute("COMMIT;"). - KorpDatabase._get_db_names: Do not commit after "SHOW DATABASES" to guarantee retrieving database names via the cursor (which does not work after cursor.execute("COMMIT;")).
tests/dbutils.py: - KorpDatabase.import_table_files: Commit after importing the SQL file, as not doing so sometimes caused tests using the imported data not to find data.
tests/functional/test_relations.py: - Parametrize the test with word and corpora. - Test that all the relations contain the word as either "head" or "dep".
tests/dbutils.py: - Table info files: Use "{CORPUS}" (or "{corpus}") in "tablename" as a placeholder for the corpus name (id) in uppercase (or lowercase), instead of "{1:u}" or similar, simplifying code and removing the need for a custom formatter. - Remove class KorpDatabase.CaseConversionFormatter.
1d5f5f9
to
e873533
Compare
@MartinHammarstedt, great that you found time to take a look at this PR! In the meantime, I haven’t been working on Korp as much as I had intended but I have done something, and I’d still like to make some improvements to the testing facility before you merge it. I just now pushed commits from last autumn that implement testing MySQL/MariaDB database access and allow testing with different values for configuration variables. I think the changes largely address my open questions 1 and 5 above. For testing endpoints requiring database access, a new database is created (by a specified MySQL user), and database data is imported from SQL or TSV files; please see I also rebased the whole branch onto the current I think you’re right in that it would better to avoid using In addition, I’d still like to make the following improvements or additions:
I hope to have time to do these in September. The way to add tests for plugins is still an open question, but I think the PR for the plugin system could perhaps address it. (Our extended plugin facility still needs some work before I dare create a PR for it; anyway, it should now be backward-compatible with yours.) Finally, I think it would be great if you could merge this PR before #20 (and other, forthcoming PRs), so that we’d be able to add some tests to the other PRs. However, I’d like to implement the above improvements first. |
As an additional note, the first commit in the PR (upgrading
If desired, the commit in question could be added separately from this PR, or you could replace it with something more appropriate that also fixes the problem mentioned above. In general, it doesn’t look to me very simple to support a relatively wide range of Python versions. In addition, different Linux distributions and versions may also complicate the matter. |
This pull request contains support for implementing Pytest tests for the Korp backend, along with a couple of sample tests.
Main features
The main features of the pull request are listed below. Please see
tests/README.md
for some more documentation on how to run and write tests.Dicrectory
tests
contains a couple of sample tests and some supporting code. Tests have been divided into functional and unit tests (with subdirectories named correspondingly): functional tests test endpoints as seen by the user, whereas unit tests test individual functions.To allow overriding configuration settings for testing purposes, the parameter
config_override
was added tokorp.create_app
.tests/conftest.py
contains a couple of fixtures that should make it easier to setup commonly-used test prerequisites, such as corpus data and a test client. More fixtures can be added as deemed useful.Requirements for testing are in
requirements-dev.txt
, as advised by @aajarven.The (functional) tests can use CWB corpus data, which is encoded for each test session from corpus source files in
tests/data/corpora/src
, as I thought it would make the tests more transparent than using pre-encoded corpus data. It would not seem to be significantly slower if the test corpora are relatively few and small. Encoding the corpus data requirescwb-encode
andcwb-make
.The corpus data is represented as VRT (VeRticalized Text), the CWB input format, one file per CWB corpus. To make the file more self-contained and to simplify encoding, I opted for listing both positional and structural attributes in the VRT file as comments at the beginning of the file; for example:
If you have suggestions for a better approach for specifying the attribute information (or the whole corpus data), please tell us. It would also be possible to support alternative approaches.
In addition to the actual corpus data, a corpus needs to have an
.info
file. As a future improvement, the information in the.info
file could be computed based on the VRT file if it does not exist.Open questions
At least the following questions are open:
How to test endpoints that use the MySQL database? Should we have a real database, populated for each test session, or should we only mock database access? I think for some tests, a real database would be better. @aajarven mentioned
testing.mysqld
but recommended that we should also investigate other options.How to test plugins? Plugin code and tests may be in another repository but I think plugin tests should be able to access fixtures and test corpora.
Would it be possible to automate testing with GitHub (or with some other test automation service)? Otherwise perhaps yes, but what about the requirement of having CWB installed?
How to organize tests in a good way? Maybe some requests could be made fixtures in the individual test modules, so that it would be easier and less repetitive to test different properties of the result in separate tests. I extracted a common pattern of making a request, asserting its success and getting the JSON result to a function in
tests/testutils.py
.How to test with different values for configuration variables? If a configuration variable is checked when making a request, it is perhaps possible to mock or override it within a test function. But I think that does not work if the configuration variable is used in
korp.create_app
. Should we parametrize theapp
fixture to be able to override configuration variable values, or would some other solution be better?General comments
I think it made implementing testing easier that the code had been split into modules.
We intend to rebase the branches containing our modifications on this branch, so that we can provide some tests for the modifications for which we make pull requests. However, we do not promise comprehensive tests up front.