-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect, store and send aggregate metrics, take 2 #691
Conversation
ed81264
to
c3280ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I should have spotted the issue with that global because I had exactly the same problem myself a while back which is why get_connection()
now uses a thread-local.
I'm not quite convinced the changes here are the right approach though. There's a test failure with:
sqlite3.OperationalError: cannot change into wal mode from within a transaction
But I think that indicates that get_connection()
isn't the right place to do this set up.
Stores and updates running totals/aggregates of metrics whilst a job is EXECUTING. To keep executor state separate from scheduler state, we store this state in a separate db file altogether. This prevents lock contention on the main db file. Unlike the last time we did this, it should not grow to exessive size, as we only store 1 small row per job. For each metric, we calculate and store: lastest sample, cumsum, current mean (measured at this point in time), and peak value. We also add them to the tick trace spans as can be useful.
This modifies local executor get_status() to return any collected metrics. This is not used right now, but potentially in future we might want to make scheduling decisions on it.
c3280ea
to
f7577d1
Compare
Ok, I've completely reworking this, after we reverted the old. This PR now includes the 3 reverted commits, and then 1 additional commit, which changes the connection management to a) avoid process global and b) support readonly mode |
This removes the use of a process global, and adds support for readonly connections to help reduce contention on the metrics db with multithreaded access. Adding readonly mode, and the initial PR feedback, led to a re-working of the connection management. Rather than re-use `lib.database.get_connection()`, we instead implement our own version. This gives us control, and allows us to efficiently assure appropriate set up as well as cleanly handle readonly connections. One small chnage with this - we cannot use an in-memory db in testing for the metrics db, as `mode=memory` is not compatible with setting `mode=ro`. Each test still gets their own db, but its on disk.
f7577d1
to
596532c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this makes sense. Small str
/Path
type confusion which will need fixing though.
This PR contains the 3 reverted commits #686, and then an additional commit to fix the use of a global variable to cache the connection string in multithreaded application. This is known to be Bad Juju, and I regret the error.
The sync, run and stats thread all have a db connection to the metrics
db, so this cause sqlite to barf.
This was not caught in the test suite because we do not currently have
a functional test that runs the full multithreaded service and tries to
run a job with it. This is in theory possible (it's what the
a backend-server tests for the test backend do), but its also hard to
set up and slow and requires lxc currently, so I have not done that
here.
Once this lands, I will test it does fix the multithreaded issue by
using the backend-server functional tests to fully exercise it.
The commit message for the fix is:
This removes the use of a process global, and adds support for readonly
connections to help reduce contention on the metrics db with
multithreaded access.
Adding readonly mode, and the initial PR feedback, led to a re-working
of the connection management. Rather than re-use
lib.database.get_connection()
, we instead implement our own version.This gives us control, and allows us to efficiently assure appropriate
set up as well as cleanly handle readonly connections.
One small chnage with this - we cannot use an in-memory db in testing
for the metrics db, as
mode=memory
is not compatible with settingmode=ro
. Each test still gets their own db, but its on disk.