-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(server): Store information about available checkers to the database #4089
feat(server): Store information about available checkers to the database #4089
Conversation
`Binary` is deprecated in SQLAlchemy and was removed in 1.4
3395d97
to
835ea00
Compare
3a97692
to
76b9910
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During database migration, the server gets trapped in a cyclic restart with the following error:
Traceback (most recent call last):
File "/codechecker/lib/python3/codechecker_common/cli.py", line 209, in main
sys.exit(args.func(args))
File "/codechecker/lib/python3/codechecker_server/cmd/server.py", line 425, in __handle
main(args)
File "/codechecker/lib/python3/codechecker_server/cmd/server.py", line 1009, in main
server_init_start(args)
File "/codechecker/lib/python3/codechecker_server/cmd/server.py", line 936, in server_init_start
__db_migration(cfg_sql_server, context.run_migration_root,
File "/codechecker/lib/python3/codechecker_server/cmd/server.py", line 609, in __db_migration
ret = db.upgrade()
File "/codechecker/lib/python3/codechecker_server/database/database.py", line 326, in upgrade
command.upgrade(cfg, "head")
File "/usr/local/lib/python3.9/site-packages/alembic/command.py", line 294, in upgrade
script.run_env()
File "/usr/local/lib/python3.9/site-packages/alembic/script/base.py", line 490, in run_env
util.load_python_file(self.dir, "env.py")
File "/usr/local/lib/python3.9/site-packages/alembic/util/pyfiles.py", line 97, in load_python_file
module = load_module_py(module_id, path)
File "/usr/local/lib/python3.9/site-packages/alembic/util/compat.py", line 182, in load_module_py
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/codechecker/lib/python3/codechecker_server/migrations/report/env.py", line 120, in <module>
run_migrations_online()
File "/codechecker/lib/python3/codechecker_server/migrations/report/env.py", line 114, in run_migrations_online
context.run_migrations()
File "<string>", line 8, in run_migrations
File "/usr/local/lib/python3.9/site-packages/alembic/runtime/environment.py", line 813, in run_migrations
self.get_context().run_migrations(**kw)
File "/usr/local/lib/python3.9/site-packages/alembic/runtime/migration.py", line 560, in run_migrations
step.migration_fn(**kw)
File "/codechecker/lib/python3/codechecker_server/migrations/report/versions/c3dad71f8e6b_store_information_about_enabled_and_disabled_checkers_for_a_run.py", line 235, in upgrade
upgrade_analysis_info()
File "/codechecker/lib/python3/codechecker_server/migrations/report/versions/c3dad71f8e6b_store_information_about_enabled_and_disabled_checkers_for_a_run.py", line 62, in upgrade_analysis_info
_, new_analyzer_command = recompress_zlib_as_tagged_exact_ratio(
File "/codechecker/lib/python3/codechecker_server/migrations/common.py", line 89, in recompress_zlib_as_tagged_exact_ratio
data = raw_zlib_decode_buf(value)
File "/codechecker/lib/python3/codechecker_server/migrations/common.py", line 26, in raw_zlib_decode_buf
return zlib.decompress(value)
TypeError: a bytes-like object is required, not 'NoneType'
@vodorok I have identified the cause of the problem for that behaviour and fixed it. Turns out I made the bogus assumption that Unfortunately, I encountered some pretty weird behaviour that I need to track down before this patch is touched... |
76b9910
to
0fe1134
Compare
79cda3f
to
682f4ae
Compare
979967b
to
afe550f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do a measurement of the store time of Xerxes-C analyzed with --enable-all. If the store time hit is not severe (under 5%), the patch can be merged. My other comments are questions, or suggestions.
s_ver = None | ||
prod_status[pd.endpoint] = (status, s_ver, package_schema, | ||
db_location) | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a concrete exception occuring here? Shouldn't we expect for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main exception here was from the absolutely unnecessary sess.commit()
timing out. After removing that line from here and moving it before the products are iterated, I was not able to produce an exception here. However, the entire server, in principle, should NOT die just because one product had an issue, so I think a catch-all case is appropriate here. Keeping the server alive should be paramount.
@vodorok Thanks for the mention of the potential issue. Indeed, there were problems with the added With this patch, this had gone up to 1731.29 s. However, undoing the So I've gone around and changed a few more things, because it turns out doing a
Unfortunately, it looks like this is a 16.2% reduction in time required on this test. I hope that's not a problem. 😏 |
5dc9caa
to
2519a2f
Compare
This patch allows users to gather whether a checker was enabled or disabled during the analysis, irrespective of whether a checker produced any reports (which might have been deleted from the server since!). This improves auditing capabilities as the definite knowledge on which checkers were _available_ is kept in the database, and not lost after the analysis temporaries are cleaned up from the analysing client. Features: - Create a new table, `checkers`, to store unique ID (per product database) for a checker's name. - Add information about checkers and enabledness to the database, based on the `metadata.json`, if available. - Extend the `AnalysisInfo` API object to report the collected information to the client. Refactoring: - Normalise the use of the `checkers` table by lifting additional checker-unique information (`severity`) from `reports`, leaving only a `FOREIGN KEY` in the `reports` table. - Ensure that all versions of `metadata.json` is represented the same way in memory once the `MetadataInfoParser` succeeded. - Ensure that a long migration of a report database does not result in time-outs for the connection of the configuration database, and that the failure in the migration of one product does not kill the entire server.
5dc9caa
to
f4a8089
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in this case, I think I can make an exception and allow a deviation from the 5% run time change :). Great work!
LGTM!
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
…their statistics It is a new feature based on Ericsson#4089. The new Analysis statitics tab on the Statistics page is able to list all enabled checkers for runs that are selected (or for all runs if no run selected) in the report filter. The table lists all checkers that were enabled in at least one of runs according to the latest analysis. It also shows checker severity, status, number of closed and outstanding reports. Status can inform the user that the specific checker was "Enabled in all" runs or "Enabled in all runs except these" where "runs" and "these" words are links to list appropriate runs. Closed and outstanding report counts depend on review and detection status. These statistics represent the number of closed and outstanding reports that belong to runs that were created with new DB schema.
Fixing report sorting on unique mode. After modifying the DB schema in Ericsson#4089 PR, the unique mode query of getRunResults endpoint has been changed, therefore, the report sorting is not working properly. Now, the unique mode query is redesigned and it use row_number() function to filter unique reports correctly. Where clause is also modified. It is getting rid of report annotation filter. Filtering annotation remains in having clause.
Fixing report sorting on unique mode. After modifying the DB schema in Ericsson#4089 PR, the unique mode query of getRunResults endpoint has been changed, therefore, the report sorting is not working properly. Now, the unique mode query is redesigned and it use row_number() function to filter unique reports correctly. Where clause is also modified. It is getting rid of report annotation filter. Filtering annotation remains in having clause.
Fixing report sorting on unique mode. After modifying the DB schema in Ericsson#4089 PR, the unique mode query of getRunResults endpoint has been changed, therefore, the report sorting is not working properly. Now, the unique mode query is redesigned and it use row_number() function to filter unique reports correctly. Where clause is also modified. It is getting rid of report annotation filter. Filtering annotation remains in having clause.
Fixing report sorting on unique mode. After modifying the DB schema in #4089 PR, the unique mode query of getRunResults endpoint has been changed, therefore, the report sorting is not working properly. Now, the unique mode query is redesigned and it use row_number() function to filter unique reports correctly. Where clause is also modified. It is getting rid of report annotation filter. Filtering annotation remains in having clause.
This patch contains the back-end developments and solutions to collecting the information needed for #4049.
Summary
In #4049, it was deemed important that losing the information about which checkers executed during analysis is confusing to developers who have access only to the web-based interface without anything from the
analyze
command (such as CI logs). The confusion stems from checkers that were enabled but did not produce any reports not showing up anywhere on the interface. This is exasperated by the fact that the "analysis command" that is currently shown on the Web UI often contains references to random temporary files created by CI systems without seeing their contents.The goal of this initiative is to deterministically show the knowledge gathered from the
metadata.json
s (if applicable) about which analysers' which checkers executed. Having this list at hand is useful from an auditing point of view, as well, to ensure that it can be seen whether the executed analysers contained a particular checker or not. Thus, we move from the following distinction:over to a more descriptive one:
Description
This patch implements the back-end changes required to support this new feature. No front-end changes available, yet.
checkers
is created, which deterministically stores a unique ID (per database) for each(analyser_name, checker_name)
pair. The uniqueness of IDs to, henceforth, checkers is enforced on the database level with a compoundUNIQUE KEY
.checkers
is filled with the names of all checkers identified during a run's store in a transactionally separated way, preceding the actual Report storage process. This ensures that during a (potentially concurrent) storage process, constraint violations do not tear down the progress of an entire run's contents. (A generously timed retry logic is implemented to ease this process, but it is not guaranteed to be infallible.)analysis_info
, which previously stored only theanalyze
command-line is extended toJOIN
onto thecheckers
table and stores a singleBOOL
to say whether the joined checker was enabled in the joinedanalysis_info
.analysis_info
for allrun_histories
, this list will trivially work in the same way as retrieving theanalyze
command-line for a past snapshot of a Run currently work.Run
s might skew these results. It's safer (or at least less misleading) just to say"No information."
for runs that were stored prior to the upgrade.map<analyser_name: string, map<checker_name: string, enabled: bool>>
) now stored in the database as of point 3..checkers
instead. This simplifies the storage and server-start clean-up logic somewhat while complicating matters elsewhere during querying. The following columns are removed (essentially, uniqued out) from Reports:checker_id
(string, "name"),analyzer_name
,severity
.reports
is pre-collected and dumped tocheckers
, with the newFOREIGN KEY
s appropriately established as well. This means that for a product with a diverse set of checkers that all produced warnings, the vast majority of the "potential checker ID universe" will practically be allocated during the schema upgrade.checker_cat
(-egory, string) andbug_type
(string) columns of Reports is dropped irrevocably, and not restored during a downgrade. It was never clear whether these values were uniquely the same for a particular checker, so moving these to thecheckers
table might not be a feasible solution. However, as these columns contained generic data that was never exposed over the API, the storage of them is deemed unnecessary. The semantics for whatever was available in these columns were never actually defined and were very analyser-specific or specific to how the report-converter dealt with the raw output of an analyser.Miscellaneous
This patch also contains the following quality improvements that are not directly related to the feature implementation but were designed and developed in parallel to this feature's development.
Binary
type toLargeBinary
consistently over the database schema. This means that this patch supersedes [fix] Replace Binary with LargeBinary #3736. To answer @vodorok's question added to #3736, SQLAlchemy's source code shows the established inheritance relationship:_Binary
LargeBinary
Binary
, andBinary.__init__()
only delegatesLargeBinary.__init__()
. Thus, it is clear that the two types behave the exact same across all database dialects, and the changes of both [fix] Replace Binary with LargeBinary #3736 and this patch are safe without additional schema migration commands.c3dad71f8e6b_...
of this patch. We will be updating multiple millions of rows in chunks, in multiple passes (one read, one collate, one write), so the accountability of where the migration is going benefits a lot from these improvements.metadata.json
s previously stored thecheckers
"list" in a hard-to-understand and unwieldy-to-use data structure:checkers: Dict[checker_name: str, _: Union[enabled_checkers: List[name: str] | _: Dict[name: str, enabled: bool]]]
. This made accessing actual data from this troublesome, as every client code (including new things I was writing for this patch) had to branch on whether the actual available data was "old-style" (only the list of enabled checkers) or "new-style" (the list of all found checkers, marked with its enabledness bit). What's more, the tests were conflating what it means to be av1
metadata.json
. The original implementation, up to 7254d05 in September 2017, contained the formerList[str]
grouped by the analyser. In March 2020, bd775d6 introducedv2
files, but at this point,v1
was already doingDict[str, bool]
s. In between the two points, in December 2019, 0cd28ac changed the format without the concept of "format versions", turning the list intoDict[str, bool]
. Nevertheless, several test files that do not indicate the version still test whether the "real"v1
is understood by the system. I fixed all these issues by transitioning to the currentv2
format in memory as soon as possible, creating an appropriatecheckers: Dict[str, Dict[str, bool]]
data structure irrespective of the format of themetadata.json
in the input.