Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data source synchronization performance improvement #1184

Merged
merged 12 commits into from
Jul 26, 2021
Merged

Data source synchronization performance improvement #1184

merged 12 commits into from
Jul 26, 2021

Conversation

Jylhis
Copy link
Contributor

@Jylhis Jylhis commented Jun 7, 2021

[Note by admin] Please visit #1186 for discussion.


I've been looking into improving the performance of synchronizing members from data sources. I moved a couple of queries outside of loops and enabled transactions for the insert/update/remove loops.

This PR currently work in progress. It is lacking some error handling and it fails with SQLite, because of the current SQLite locking workaround.

I would like comments if this type of changes would be acceptable and suggestions for improvements!

Benchmarks

add 20 000, remove 20 000 update 20 000 members

Before After
Time (sec) 263 67.2
Calls to do_prepared_query 240 032 120 032

add 60 000, update 40 000 members

Before After
Time (sec) 569 141
Calls to do_prepared_query 580 032 260 032

@ikedas
Copy link
Member

ikedas commented Jun 7, 2021

Hi @Jylhis ,
I think we'd be better to have a new issue to discuss on your proposal and related PRs. Could you please do it?

@Jylhis
Copy link
Contributor Author

Jylhis commented Jun 7, 2021

Hi @ikedas ,
I created the issue with some more information on our case.

@Jylhis Jylhis changed the title WIP data source performance improvement Data source synchronization performance improvement Jun 15, 2021
@Jylhis Jylhis marked this pull request as ready for review June 15, 2021 06:48
@ikedas
Copy link
Member

ikedas commented Jun 16, 2021

Conflict between this pr and #1189 was resolved.

@ikedas
Copy link
Member

ikedas commented Jun 30, 2021

Some other comments:

  • add_list_admin and delete_list_admin may also be wrapperd by begin-commit.
  • Typos: "$sdm->errstr" should be "$sdm->error".

@Jylhis
Copy link
Contributor Author

Jylhis commented Jul 12, 2021

add_list_admin is little bit trickier to wrap in begin-commit because if the replace option is set then the function tries to do UPDATE and INSERT. There might be a case when both UPDATE and INSERT would be executed successfully.

sympa/src/lib/Sympa/List.pm

Lines 3488 to 3529 in 978001b

if ( $options{replace}
and @set_list
and $sdm
and $sth = $sdm->do_prepared_query(
sprintf(
q{UPDATE admin_table
SET %s
WHERE role_admin = ? AND user_admin = ? AND
list_admin = ? AND robot_admin = ?},
join(', ', map { sprintf '%s = ?', $_ } @set_list)
),
@val_list,
$role,
$user->{email},
$self->{'name'},
$self->{'domain'}
)
and $sth->rows # If no affected rows, then insert a new row
) {
return 1;
}
@set_list = @map_field{@key_list};
@val_list = @{$user}{@key_list};
if ( @set_list
and $sdm
and $sdm->do_prepared_query(
sprintf(
q{INSERT INTO admin_table
(%s, role_admin, user_admin, list_admin, robot_admin)
VALUES (%s, ?, ?, ?, ?)},
join(', ', @set_list),
join(', ', map {'?'} @set_list)
),
@val_list,
$role,
$who,
$self->{'name'},
$self->{'domain'}
)
) {
return 1;
}

I also noticed that inside the User constructor there is call to get_global_user function, which does one SELECT. As far as I know this query would not be executed at the right time when transactions are used. This didn't cause any errors in my tests with add_list_member before, so I'm not sure if it's necessary or I just didn't happen to hit any cases where problem would be triggered.

sympa/src/lib/Sympa/List.pm

Lines 3301 to 3306 in 978001b

Sympa::User->new(
$who,
'gecos' => $new_user->{'gecos'},
'lang' => $new_user->{'lang'},
'password' => $new_user->{'password'}
)

and the query:

sympa/src/lib/Sympa/User.pm

Lines 495 to 523 in 978001b

sub get_global_user {
$log->syslog('debug2', '(%s)', @_);
my $who = Sympa::Tools::Text::canonic_email(shift);
## Additional subscriber fields
my $additional = '';
if ($Conf::Conf{'db_additional_user_fields'}) {
$additional = ', ' . $Conf::Conf{'db_additional_user_fields'};
}
push @sth_stack, $sth;
my $sdm = Sympa::DatabaseManager->instance;
unless (
$sdm
and $sth = $sdm->do_prepared_query(
sprintf(
q{SELECT email_user AS email, gecos_user AS gecos,
password_user AS password,
cookie_delay_user AS cookie_delay, lang_user AS lang,
attributes_user AS attributes, data_user AS data,
last_login_date_user AS last_login_date,
wrong_login_count_user AS wrong_login_count,
last_login_host_user AS last_login_host%s
FROM user_table
WHERE email_user = ?},
$additional
),
$who

@ikedas
Copy link
Member

ikedas commented Jul 12, 2021

add_list_admin is little bit trickier to wrap in begin-commit because if the replace option is set then the function tries to do UPDATE and INSERT. There might be a case when both UPDATE and INSERT would be executed successfully.

Please tell me, what exactly are the cases in which both UPDATE and INSERT would be executed successfully?

I also noticed that inside the User constructor there is call to get_global_user function, which does one SELECT. As far as I know this query would not be executed at the right time when transactions are used. This didn't cause any errors in my tests with add_list_member before, so I'm not sure if it's necessary or I just didn't happen to hit any cases where problem would be triggered.

Maybe I don't understand the problem, but executing SELECT doesn't seem to cause a problem as long as the same connection is kept alive ($sdm is a singleton instance, that is, the same connection can be obtained no matter how many times the ->instance method of Sympa::DatabaseManager is called.).

@Jylhis
Copy link
Contributor Author

Jylhis commented Jul 13, 2021

My understanding was that with transactions the queries will be executed when they are committed. So in the add_list_admin it adds the update query to transaction but might not return the affected rows, then it would add also insert to the transaction and during the commit, execute both. After reading on it little bit, this might not be as big of a problem as I thought as newer databases seem to be able to get the affected rows during transaction, if I have understood the documentation correctly.

I have similar concern for the SELECT statement, that it might be executed only during the commit.

@ikedas
Copy link
Member

ikedas commented Jul 13, 2021

No need to worry about that. Updates during a transaction appear to be reflected immediately, however updates after BEGIN may be recovered (cancelled) by ROLLBACK or by breakage of connection without COMMIT. On the other hand, updates can be invisible to the other sessions (the connection by the other process) until COMMIT is executed.

Markus Jylhänkangas and others added 6 commits July 13, 2021 16:04
  - Add Sympa::DataSource::is_external().
  - use Sympa::List::get_exclusion(). As a side work
    Sympa::List::is_member_excluded() was deprecated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants