Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PS-9149 Release tasks for PS-8.3.0 #5283

Merged
merged 995 commits into from
Apr 19, 2024
Merged

PS-9149 Release tasks for PS-8.3.0 #5283

merged 995 commits into from
Apr 19, 2024

Conversation

adivinho
Copy link
Contributor

No description provided.

Ole John Aske and others added 30 commits November 3, 2023 19:57
… Multi_Transporter.

Note: With all patches upto, including this, all Transporter methods
previously overridden by Multi_Transporter has now
been eliminated, or implemented as methods with a 'require(false)' to
prove that they are never used.

Change-Id: Ia1f774300168b968942e3705986c92fd6676282b
(cherry picked from commit ee86144475a85481a315d6593e536b4d5dc0fb61)
…ock_send_transporter()

Use these to eliminate 'lock' / 'unlock' on the Multi_Transporter

Change-Id: I124339222839aee600c31d8da78eefeb665acc72
(cherry picked from commit de0565ffa027e8e0b1790daacdb7c426349ea77a)
…being a Transporter any more

As the Multi-Transporter is not a sub class of Transporter any more,
it can not be stored in theNodeIdTransporters[] any more.

-> theNodeIdTransporters[] is enhanced to always store the initial
base transporter to a specific NodeId. This also enables use to
simplify the implementation of ::get_node_base_transporter() as we
not need to handle that that theNodeIdTransporters[] may be a
Multi_Transporter any more.

In case a Multi_Transporter is created for a NodeId, it is now
stored in theNodeIdMultiTransporters[] instead. This array replace
theMultiTransporters[] with a slightly changes semantic - indexed
by NodeId instead, instead of 0..nMultiTransporters.

::get_node_multi_transporter() is also enhanced to look up
the Multi_Transporter directly from theNodeIdMultiTransporters[].

Change-Id: I8d9ad9dde262380da8dc5555a310eeffbdf0c487
(cherry picked from commit cb6f19b151cb5f2731e43874d1804d7a3856656b)
…y more.

Class Multi_Transporter will not inherit from class Transporter.
Remove unused overridden methods inherited from class Transporter.

Change-Id: I87feb2673f02d1eeaac8331058cddc1a24d2ca77
(cherry picked from commit a93039734304105d3d23de49ffea5e280a412a0b)
Change-Id: Ie49c590f21f9479e1c8c1fffa305f54e36242683
Test executes a number of updates against a table it created.
Some of the updates are executed as 'NoCommit', with a final 'Commit'
for the entire transaction.

ERROR_INSERT is used to crash the node on the commit (as apposed
to the intermediate execute-NoCommit's)

Intention of test case is to check for regression against Bug#34216,
where: 'During TC-take-over (during NF) commit messages can come
out of order to TUP' - Thus presumable hitting the ERROR_INSERT
when execute-commit. Such failures are now randomly detected
by the failing test case.

Root cause is that there is other background activity going on on
the data nodes as well. In particular the update operations may
trigger ndb_index_stat updates performing READ operations on
the table being updated, as well as accessing the system tables.

Patch enhance the ERROR_INSERT 5048/5049 code to also require that
it is a ZUPDATE being commited, as well as that the table is
an UserTable.

Change-Id: I4061fa4521b8f670b3783a3aa0b6256bca15a7d0
Change-Id: I2f471fdb6936244c923bc7506afc4acf745b29ba
When setting up a new set of neighbour transporters, the old set
of neighbours may still have pending data awaiting to be sent.

In order to ensure that available data on these transporters
will still be sent, we need to ensure that the becoming non-neighbour
transporters are inserted into the list of transporter we need
to send on.

Patch enhance startChangeNeighbour() such that it will check
the old set of neighbour transporters if any has 'm_data_available'
and use insert_trp() to insert them into the list of non-neighbour
transporters.

Note that we now also need to clear the m_neighbour_trp flag before
doing such inserts, else insert_trp() would not have inserted the
TrpId into the non-neighbour list.

Patch also redeclares some local variables refering a 'transporter id'
from an Uint32 to a TrpId.

A few asserts are also added to ensure the consistency of the
Transporter list structures.

Change-Id: I87bb539868ef33eb19e37ed90b30f1a3e1e07680
Change-Id: Ie7255da6abb25906e71b2b19af2d066da59f3259
…t an equijoin

This commit fixes a regression introduced by the fix for bug
bug#34764211 "Too high row estimate for semijoin" (Change-Id:
I231cd0c8ef504d64cd835184a39f9975066b61bf). Some semijoins may be
transformed into an inner join between the right hand side aggregated
on the join fields, and the original left hand side. That fix
utilized this transform to make better row estimates, by estimating the
number of output rows as:

CARD(left_hand_relation) * inner_join_selectivity * CARD(d)

where:

* inner_join_selectivity is the cardinality of an inner join on the same
  predicate, divided by the cardinality of a cross join.

* 'd' is the set of distinct rows from right_hand_relation, when only
  looking at those columns that appear in the join predicate.

The regression happens for a semijoin that is not a pure
equijoin. That is, the join predicate is something else than a
conjunction of 'left_table.field=right_table.field' terms. In this
case JoinPredicate.semijoin_group is empty, because the semijoin to inner
join transform will never happen. In this case the
number of aggregated rows was wrongly set to 1. This fix corrects that.

This fix adds a new function EstimateSemijoinFanOut() that collects the
fields from the right hand_relation that appear in the join predicate.
Unlike JoinPredicate.semijoin_group, this works for both semijoin and
antijoin, and for arbitrary predicates, not just conjunctions of
field=field. That function then estimates CARD(d), using the same apparatus
that we use for e.g. DISTINCT.

Change-Id: I02224ff4f64315f2b0b92d6b883fe08e2f6a9975
PROBLEM
-------

1. When innodb_validate_tablespace_paths=off, innodb does not
   validate the tablespace during startup.
2. Since the tablespace is not validated ,we fail to initialize
   in memory filsystem hash map which maps space id with the
   tablespace.
3. If ibuf entries are present during startup, a background
   thread tries to merge these into the appropriate tablespace.
4. The background thread will search for tablespace in the hash
   map ,since it cannot find the tablespace it assumes that
   tablespace is deleted and drop the ibuf entries silently.
5. This leads to a corruption in the tablespace because number
   of primary and secondary entries differ.

FIX
---
1. If ibuf entries are present during startup , irrespective
   of the innodb_validate_tablespace_paths setting ,validate
   all the tablespaces.

Change-Id: I41bf5f39f654ce50c9fa47b6dd3d0153ba829308
…t an equijoin

Post-push fix: This commit fixes the following warning:
"sql/join_optimizer/cost_model.h:228:1: error: control reaches end of
 non-void function [-Werror=return-type]".

Change-Id: I41fbdd94b6442458a906d12bc176bb72c6b9000a
Problem:
The MySQL Server fails when replicating "GRANT NDB_STORED_USER ..." with
replication filter turned on. This occurs since the replication filter
causes all non-updating queries to return an error - due to the
assumption that only changes need to be replicated.

Analysis:
To handle the GRANT statement a `SELECT ... FROM
information_schema.user_privileges` is used, this is a non-updating
query and thus triggers an error in the MySQL Server when replication
filters are in use.

Solution:
Install an empty replication filter while running distributed privilege
queries, thus sucessfully getting a result returned.

Thanks to Mikael Ronström for providing the steps necessary how
to reproduce this problem.

Contributed by: Mikael Ronström

Change-Id: I2c171f64b7776ac2410d2d918f591b16bc8c1a04
In the Acl_change_notification constructor, remove the ambiguity
between a function parameter and a member variable both called
"users".

Change-Id: I8fba20f562026606411eb8b67cd11d56af09e11e
Problem:
Crash in replication applier when handling NDB synchronized privilieges.

Analysis:
Fatal failure occurs when query return code indicates sucess but no
result set is available. Such problem scenario can be seen in the
replication applier, for example in problem described in BUG#35928350.

Solution:
Handle the case when expected result set is not available by logging an
error message and returning failure. This should make for a more stable
behaviour when the queries does not return the expected result set.

Change-Id: I4623ff8be5e59cbc3edcbd1627cbedda6dbab5ce
Post-push fix: revert bad .result file.

Change-Id: I78f6aa63e4dd3d98be94b98d410ef4e598a23718
…O 8.0

Backport from mysql-trunk.

1) Bug#35211828: Derived condition pushdown with rollup gives wrong results
2) Bug#35498378: MYSQLD CRASH - ASSERTION NULLPTR != DYNAMIC_CAST<TARGET>(ARG) FAILED

Change-Id: I26833f6cc240bc0484e689021a5346029419a0cc
Change-Id: If4b103cff903f3994f8f38410d7c0f6bfeabeabb
authentication_ldap_sasl_client plug-in

So far LDAP authentication was limited depending on authentication
mechanism and the platform server or client runs.
This worklog will improve both the client and the server side Windows plugins, so they support SASL SCRAM and SASL GSSAPI authentication mechanisms.

Change-Id: I60c2ce4925f8d6c18e202b59c54a96303ea8e2ba
…the thread to start

Problem:
When the Transaction_monitor_thread is created, the thread that requested for
the creation waits until Transaction_monitor_thread starts running.
However upon running, the Transaction_monitor_thread thread does not unlock the
mutex causing the creator of a thread to wait for a long time to read the thread
running status.

Analysis:
Checked and observed all other threads on running unlocks the mutex.
In Transaction_monitor_thread the lock was never released post setting the
status of the thread as running.

Fix:
Code has benn improved to unlock the mutex upon thread creation after setting
the status of the thread as being running. The scope of the lock has been
reduced.

Change-Id: I07f961346b99a740c76b1e315850921a2f4fcdb8
authentication_ldap_sasl_client plug-in

Post-push fix: auth_ldap_sasl_mechanism.cc:111:2: error: extra ';'
outside of a function is incompatible with C++98
[-Werror,-Wc++98-compat-extra-semi]

Change-Id: Ida99c78d538f1422e32b8fe4b5caca24fae7f28b
…gIfPossible

A query with redundant elements in the ORDER BY clause could fail if
the query optimization took so long time that the secondary engine
requested a restart of the optimization with a different set of
parameters to restrict the search space.

BuildInterestingOrders() got confused because Query_block::order_list
was in an inconsistent state when the optimization was restarted. It
was inconsistent because the optimizer had modified the intrusive list
pointers in JOIN::order to remove redundant elements. JOIN::order
shares some data with Query_block::order_list, so some of the
underlying data of Query_block::order_list was modified as a result
without the parent object's knowledge, and the parent object became
inconsistent.

Query_block::order_list is restored before the next execution of a
prepared statement for this exact reason; see
Query_block::restore_cmd_properties() and the doxygen comment for
Query_block::order_list. But it is not restored when the optimization
is restarted. Since it's not safe to access Query_block::order_list
after such modifications, the optimizer should use JOIN::order
instead.

Fixed by making BuildInterestingOrders() inspect JOIN::order instead
of Query_block::order_list.

For completeness and consistency, the patch also replaces usages of
Query_block::group_list in the hypergraph optimizer with corresponding
data structures in JOIN, even though the hypergraph optimizer doesn't
currently mutate the group list in a similar way.
BuildInterestingOrders() now uses JOIN::group_list instead of
Query_block::group_list, and EstimateAggregateRows() uses
JOIN::group_fields. (EstimateAggregateRows() already used
JOIN::group_fields for some cases, but only when called by the old
optimizer. It could not use it for the hypergraph optimizer, because
the hypergraph optimizer didn't populate group_fields until after
EstimateAggregateRows(). The patch moved the hypergraph optimizer's
call to make_group_fields() a little earlier so that
EstimateAggregateRows() could use the same code for both optimizers.)

Change-Id: I4790482416e8935b7918cccd36f316d88b1d5700
Original patch broke the Pb2 tests basic_tls.test and tls_required.test.

Post push patch, fixing TransporterRegistry::is_encrypted_link(NodeId).

Root cause was an incorrect transfer of the Multi_Transporter::is_encrypted method
into TransporterRegistry::is_encrypted_link(). We need to get the 'is_encrypted'
property from the first active multitransporter, not from the base-transporter
which may already have been closed if we have switched to use the multi transporters.

Change-Id: If77531ca151839499e1c89358a5f9e6ea336a7c1
Post-push fix:
Broken build for clang on windows: error: unused variable 'trp_id'

Change-Id: I9f637a53267280c90fbcb6266a17b3bb7f7650bf
When the client aborts a TLS handshake because a the certificate can't
be verified (unknown CA) router logs:

   ERROR ... classic::loop() processor failed:
   error:0A000418:SSL routines::tlsv1 alert unknown ca
   (tls_err:167773208)

That error shouldn't be logged as ERROR.

Change
------

- close the connection without raising a "processor failed" error,
  if the TLS handshake fails with the client
  - log a message at INFO level why the tls-handshake failed.
- close the connection without raising a "processor failed" error,
  if it is closed without a COM_QUIT
- decode more TLS alert values for debugging.

Change-Id: I3a492189288d2c430744ec2a32dc40b70ffd0f11
Change-Id: I5fcdc6fe6e06b030f42422627d794b4267b2384c
…bal pointers

Fix this TODO:
// TODO(tdidriks) check name rather than address:
extern MYSQL_STRINGS_EXPORT CHARSET_INFO my_charset_gb18030_chinese_ci;
extern MYSQL_STRINGS_EXPORT CHARSET_INFO my_charset_utf16le_general_ci;

Character sets/collations should always be loaded/initialized properly
with get_collation_number() or some other defined function in the
mysys/strings API.

Change-Id: Ifa89308481d9a3db7428c1f3ed4ba375d1681209
Problem:

Test `ndb_rpl.ndb_rpl_log_updates` fails occasionally on PB2.

Analysis:

The errors are mostly related with synchronization issues, namely:
- Some inserts are not yet applied in the replica cluster in time (but
the updates are in the relay log)
- Table definition may not have been coordinated throughout the
cluster

Solution:

Make each source change wait for them to be applied on the binlog.

Change-Id: I2e79de9d784c87a2b0151b6a9a19413dd5f81b96
oleksandr-kachan and others added 22 commits March 6, 2024 19:12
https://perconadev.atlassian.net/browse/PS-9071

Most of test_router_stacktrace tests expect to see my_print_stacktrace()
method in stacktrace to find out if stacktrace was provided. Recent
upstream changes in mysys/stacktrace.cc removed my_print_stacktrace()
from the stacktrace.

Updated stacktrace::full() invocation in my_print_stacktrace() not
to skip itself while printing stacktrace.
https://perconadev.atlassian.net/browse/PS-9071

Tests lack knowledge of some Percona and debug build specific system
variable names.
Updated known system variables lists for tests to be able to categorize
them properly.
PS-9117: Make innodb_interpreter_output sysvar readonly
https://perconadev.atlassian.net/browse/PS-9071

The rpl_rocksdb_row_img_idx_* MTR tests started failing after changing
default value of binlog-transaction-dependency-tracking sysvar to WRITESET.

There is an issue with one specific configuration when table on main server
uses InnoDB as storage engine and corresponding table on replica uses RocksDB
and at the same time table on replica doesn't have directly defined key.
With this configuration and WRITESET being used to track transactions
dependencies replica fails to apply generated transactions in parallel.
Test fails by timeout on replica while trying to lock transactions.

To fix tests issue changed binlog_transaction_dependency_tracking to COMMIT_ORDER
for affected tests. The issue itself will be processed separately. Same
issue is actual for PS versions < 8.3 as well, it is just not visible in MTR
tests for older versions.
…_string()

https://perconadev.atlassian.net/browse/PS-9125

There was a missing '\0' in the end of buffer containing formatted
gtid_next in Gtid_specification::automatic_to_string() which lead to
garbage symbols in the end of read @@session.gtid_next sysvar.
…on_udf_digest_table.inc)

https://perconadev.atlassian.net/browse/PS-9071

There is an issue with running component_encryption_udf MTR tests on el7
platform with alternative system openssl11 lib. Test script is not able
to find correct openssl binary which is needed to identify openssl
version.

To fix the issue encryption_udf_digest_table.inc modyfied to read
openssl version from Tls_library_version variable.
https://perconadev.atlassian.net/browse/PS-9071

Exclude the following tests from ASAN tests run:
- binlog_mysqlbinlog_4g_start_position is a big tests and requires even
  more time with ASAN.
- buffered_error_log, processlist_tid - report internal lib issues.
PS-9149 Release tasks for PS-8.3.0
PS-9149 Release tasks for PS 8.3.0
PS-9149 Release tasks for PS 8.3.0
PS-9149 Release tasks for PS 8.3.0
Copy link
Collaborator

@percona-ysorokin percona-ysorokin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@oleksandr-kachan oleksandr-kachan merged commit 0616acc into trunk Apr 19, 2024
28 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.