Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update relstorage to 4.1.1 #385

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pyup-bot
Copy link
Collaborator

This PR updates RelStorage[postgresql] from 2.1.1 to 4.1.1.

Changelog

4.1.1

==================

- Make certain lock-related errors in very large transactions no
longer produce huge (unusable) error messages when logged or
printed. Now such messages are truncated. Previously, they were
allowed to grow without bounds. See :issue:`511`

4.1.0

==================

- Update the bundled version of the Boost libraries from 1.75 to 1.83
to support newer compilers like GCC 13.
- Compile in C++ 11 mode instead of whatever the compiler default was
(sometimes C++ 03), because the latter is deprecated by Boost.
- Stop relying on an undeclared dependency on ``six``. See
:issue:`504`.
- Drop support for Python 3.8.
- Add support for Python 3.13.

4.0.0

==================

- Packaging: Support for Python 3.12 uses released dependencies.
- Packaging: Fix a build error with gcc 13.
- Packaging: x86_64 manylinux wheels are built using a newer
supported manylinux Docker image.
- Packaging: Testing on Windows has moved from MySQL 5.7 to MySQL 8.

4.0.0a1

====================

- Drop support for Python versions that are end of life, including
everything less than 3.8.
- Add the "Requires Python" metadata to prevent installation on Python
< 3.8.
- Add support for Python 3.11.
- Add preliminary support for Python 3.12. This is using a pre-release
version of Cython 3.
- Bump tested database drivers to their latest versions. In
particular, the ``mysql-connector-python`` supported version is now
8.0.32, which introduces charset changes.
- pg8000: Require 1.29.0. See :issue:`495`.
- Fix the SQLite ZODB URI resolver. The ``data_dir`` query parameter
replaces the ``path`` query parameter.
- Remove the (local) runtime (install) dependency on
``setuptools`` / ``pkg_resources``. This was undeclared.
- History-preserving storage: Make deleting an object create a new
transaction with the new state set to NULL. This leaves the previous
revision of the object accessible. Previously, the most recent
revision of the object became unavailable. See :pr:`484`, with
thanks to Kirill Smelkov.
- Add support for MySQL 8.0.20 and above. In version 8.0.19, MySQL
deprecated the traditional ``SET col = VALUES(col)`` upsert syntax
in favor of a more PostgreSQL like ``SET col = excluded.col``
syntax. In version 8.0.20, MySQL started issuing warnings about the
older syntax, and in certain database drivers (MySQL
Connector/Python 8.0.32+) these warnings became ``TypeError`` exceptions
(due to a bug in the driver). Now, we use the new syntax on versions
that support it.

3.5.0

==================

- Add support for Python 3.10.
- Stop accidentally enabling unsafe math optimizations in compiled
manylinux binaries.
- Stop testing Postgresql 9.6. None of the code required to support
this obsolete version of Postgres was removed, but it is no longer
officially supported and the code will be removed in the future.
- NOTE: Expect this to be the last major release to contain support
for obsolete versions of Python, including Python 2.7 and Python
3.6. This major release may not include binary wheels for these platforms.

3.5.0a6

====================

- Correct a packaging problem in 3.5.0a5 (which is not on PyPI).

3.5.0a5

====================

- Allow RelStorage to be used in a FIPS enabled environment. See
:issue:`480`
- Fix ``RelStorage.zap_all()`` and ``zodbconvert --clear`` against
existing PostgreSQL databases with very large numbers of Blobs and
relatively small amounts of shared memory (e.g., default values for
``max_locks_per_transaction`` and ``max_connections``). Previously,
this could raise an ``out of shared memory`` error. See
:issue:`468`.
- Use C++ hashmaps and sets to store maps and sets of transaction IDs
and object IDs instead of using BTrees. The memory footprint is about
the same, but the performance is better for common operations (e.g.,
``O(1)`` for lookups instead of logarithmic.) See :pr:`479`.
- Rewrite the cache vacuum algorithm and supporting data structures to
be substantially faster. See :issue:`474`.

3.5.0a4

====================

- Stop closing RDBMS connections when ``tpc_vote`` raises a
semi-expected ``TransientError`` such as a ``ConflictError``.
- PostgreSQL: Now uses advisory locks instead of row-level locks
during the commit process. This benchmarks substantially faster and
reduces the potential for table bloat.

For environments that process many large, concurrent transactions,
or deploy many RelStorage instances to the same database server, it
might be necessary to increase the PostgreSQL configuration value
``max_locks_per_transaction.`` The default value of 64 is multiplied
by the default value of ``max_connections`` (100) to allow for 6,400
total objects to be locked across the entire database server. See
`the PostgreSQL documentation
<https://www.postgresql.org/docs/13/runtime-config-locks.html>`_ for
more information.

.. caution:: Be careful deploying this version while older versions
            are executing. There could be a small window of time
            where the locking strategies are different, leading to
            database corruption.

.. note:: Deploying multiple RelStorage instances to separate
         schemas in the same PostgreSQL database (e.g., the default
         of "public" plus another) has never been supported. It is
         even less supported now.

See :pr:`476`.

3.5.0a3

====================

- PostgreSQL: Stop sorting rows unnecessarily during the
``lock_and_move`` part of ``tpc_finish`` (MySQL was already not
sorting). On larger transactions and/or busier servers, this shows a
slight performance increase in benchmarks.

- Include the transaction ID in log messages about long-running
transactions (once available).

3.5.0a2

====================

- Revert :issue:`469` and return to taking shared locks before
exclusive locks. Testing in a large, busy application indicated that
performance was overall slightly worse this way. See :pr:`471`.

- Use the cache to cheaply check if a ``readCurrent()``  violation
will take place during an early part of two-phase commit, instead of
waiting until ``tpc_vote`` when we've sent data to the database. If
the cache can prove that there is a newer version of an object
stored, the conflict error will be raised during ``commit``; if the
cache can't prove it, the error will still be raised during
``tpc_vote``.

This more closely matches what ``FileStorage`` does and can help
avoid some unnecessary work.

- Fix the speed of getting the approximate number of objects in a
storage by using ``len(storage)`` on PostgreSQL. This was a
regression after 3.0a13.

3.5.0a1

====================

- Increase the default value of the ``RS_CACHE_MVCC_MAX_DEPTH``
advanced tuning parameter from 100 to 1000 based on observations of
production workloads. (Connections that haven't polled
for the last ``RS_CACHE_MVCC_MAX_DEPTH`` committed transactions ---
and thus are expected to have a large number of invalidations ---
are "detached" and forced to invalidate their entire persistent
object cache if they get used again.)

- Add StatsD counter metric
"relstorage.cache.mvcc.invalidate_all_detached" that is incremented
when a previously-detached Connection is required to invalidate its
entire persistent object cache. In a well-tuned environment, this
counter should be very low and as such is not sampled but always sent.

- Fix the logging of some environment variables RelStorage uses.

- If there is a read conflict error, PostgreSQL no longer holds any
database locks while the error is raised and the transaction is
rolled back in Python. Previously, shared locks could be held during
this process, preventing other transactions from moving forward.

- Take exclusive locks first, and then shared locks in NOWAIT mode.
This reverses :pr:`317`, but it eliminates the requirement that the
database server finds and breaks deadlocks (by eliminating
deadlocks). Deadlocks could never be resolved without retrying the
entire transaction, and which transaction got killed was unknowable.
Provisions are made to keep fast detection of ``readCurrent``
conflicts. Benchmarks with zodbshootout find no substantial
differences. See :issue:`469`.

3.4.5

==================

- Scale the new timing metrics introduced in 3.4.2 to milliseconds.
This matches the scale of other timing metrics produced
automatically by the use of ``perfmetrics`` in this package.
Similarly, append ``.t`` to the end of their names for the same
reason.

3.4.4

==================

- Fix an exception sending stats when TPC is aborted because of an error
during voting such as a ``ConflictError``. This only affected those
deployments with perfmetrics configured to use a StatsD client. See
:issue:`464`.

3.4.3

==================

- PostgreSQL: Log the backend PID at the start of TPC. This can help
correlate error messages from the server. See :issue:`460`.

- Make more conflict errors include information about the OIDs and
TIDs that may have been involved in the conflict.

- Add support for pg8000 1.17 and newer; tested with 1.19.2. See
:issue:`438`.

3.4.2

==================

- Fix write replica selection after a disconnect, and generally
further improve handling of unexpectedly closed store connections.

- Release the critical section a bit sooner at commit time, when
possible. Only affects gevent-based drivers. See :issue:`454`.

- Add support for mysql-connector-python-8.0.24.

- Add StatsD counter metrics
"relstorage.storage.tpc_vote.unable_to_acquire_lock",
"relstorage.storage.tpc_vote.total_conflicts,"
"relstorage.storage.tpc_vote.readCurrent_conflicts,"
"relstorage.storage.tpc_vote.committed_conflicts," and
"relstorage.storage.tpc_vote.resolved_conflicts". Also add StatsD
timer metrics "relstorage.storage.tpc_vote.objects_locked" and
"relstorage.storage.tpc_vote.between_vote_and_finish" corresponding
to existing log messages. The rate at which these are sampled, as
well as the rate at which many method timings are sampled, defaults
to 10% (0.1) and can be controlled with the
``RS_PERF_STATSD_SAMPLE_RATE`` environment variable. See :issue:`453`.

3.4.1

==================

- RelStorage has moved from Travis CI to `GitHub Actions
<https://github.com/zodb/relstorage/actions>`_ for macOS and Linux
tests and manylinux wheel building. See :issue:`437`.
- RelStorage is now tested with PostgreSQL 13.1. See :issue:`427`.
- RelStorage is now tested with PyMySQL 1.0. See :issue:`434`.
- Update the bundled boost C++ library from 1.71 to 1.75.
- Improve the way store connections are managed to make it less likely
a "stale" store connection that hasn't actually been checked for
liveness gets used.

3.4.0

==================

- Improve the logging of ``zodbconvert``. The regular minute logging
contains more information and takes blob sizes into account, and
debug logging is more useful, logging about four times a minute.
Some extraneous logging was bumped down to trace.

- Fix psycopg2 logging debug-level warnings from the PostgreSQL server
on transaction commit about not actually being in a transaction.
(Sadly this just squashes the warning, it doesn't eliminate the
round trip that generates it.)

- Improve the performance of packing databases, especially
history-free databases. See :issue:`275`.

- Give ``zodbpack`` the ability to check for missing references in
RelStorages with the ``--check-refs-only`` argument. This will
perform a pre-pack with GC, and then report on any objects that
would be kept and refer to an object that does not exist. This can
be much faster than external scripts such as those provided by
``zc.zodbdgc``, though it definitely only reports missing references
one level deep.

This is new functionality. Feedback, as always, is very welcome!

- Avoid extra pickling operations of transaction meta data extensions
by using the new ``extension_bytes`` property introduced in ZODB
5.6. This results in higher-fidelity copies of storages, and may
slightly improve the speed of the process too. See :issue:`424`.

- Require ZODB 5.6, up from ZODB 5.5. See :issue:`424`.

- Make ``zodbconvert`` *much faster* (around 5 times faster) when the
destination is a history-free RelStorage and the source supports
``record_iternext()`` (like RelStorage and FileStorage do). This
also applies to the ``copyTransactionsFrom`` method. This is disabled
with the ``--incremental`` option, however. Be sure to read the
updated zodbconvert documentation.

3.3.2

==================

- Fix an ``UnboundLocalError`` in case a store connection could not be
opened. This error shadowed the original error opening the
connection. See :issue:`421`.

3.3.1

==================

- Manylinux wheels: Do not specify the C++ standard to use when
compiling. This seemed to result in an incompatibility with
manylinux1 systems that was not caught by ``auditwheel``.

3.3.0

==================

- The "MySQLdb" driver didn't properly use server-side cursors when
requested. This would result in unexpected increased memory usage
for things like packing and storage iteration.

- Make RelStorage instances implement
``IStorageCurrentRecordIteration``. This lets both
history-preserving and history-free storages work with
``zodbupdate``. See :issue:`389`.

- RelStorage instances now pool their storage connection. Depending on
the workload and ZODB configuration, this can result in requiring
fewer storage connections. See :issue:`409` and :pr:`417`.

There is a potential semantic change: Under some circumstances, the
``loadBefore`` and ``loadSerial`` methods could be used to load
states from the future (not visible to the storage's load
connection) by using the store connection. This ability has been
removed.

- Add support for Python 3.9.

- Drop support for Python 3.5.

- Build manylinux x86-64 and macOS wheels on Travis CI as part of the
release process. These join the Windows wheels in being
automatically uploaded to PyPI.

3.2.1

==================

- Improve the speed of loading large cache files by reducing the cost
of cache validation.

- The timing metrics for ``current_object_oids`` are always collected,
not just sampled. MySQL and PostgreSQL will only call this method
once at startup during persistent cache validation. Other databases
may call this method once during the commit process.

- Add the ability to limit how long persistent cache validation will
spend polling the database for invalid OIDs. Set the environment
variable ``RS_CACHE_POLL_TIMEOUT`` to a number of seconds before
importing RelStorage to use this.

- Avoid an ``AttributeError`` if a persistent ``zope.component`` site
manager is installed as the current site, it's a ghost, and we're
making a load query for the first time in a particular connection.
See :issue:`411`.

- Add some DEBUG level logging around forced invalidations of
persistent object caches due to exceeding the cache MVCC limits. See
:issue:`338`.

3.2.0

==================

- Make the ``gevent psycopg2`` driver support critical sections. This
reduces the amount of gevent switches that occur while database
locks are held under a carefully chosen set of circumstances that
attempt to balance overall throughput against latency. See
:issue:`407`.

- Source distributions: Fix installation when Cython isn't available.
Previously it incorrectly assumed a '.c' extension which lead to
compiler errors. See :issue:`405`.

- Improve various log messages.

3.1.2

==================

- Fix the psycopg2cffi driver inadvertently depending on the
``psycopg2`` package. See :issue:`403`.
- Make the error messages for unavailable drivers include more
information on underlying causes.
- Log a debug message when an "auto" driver is successfully resolved.
- Add a ``--debug`` argument to the ``zodbconvert`` command line tool
to enable DEBUG level logging.
- Add support for pg8000 1.16. Previously, a ``TypeError`` was raised.

3.1.1

==================

- Add support for pg8000 >= 1.15.3. Previously, a ``TypeError`` was
raised.

- SQLite: Committing a transaction releases some resources sooner.
This makes it more likely that auto-checkpointing of WAL files will be
able to reclaim space in some scenarios. See :issue:`401`.

3.1.0

==================

- Use unsigned BTrees for internal data structures to avoid wrapping
in large databases. Requires BTrees 4.7.2.

3.0.1

==================

- Oracle: Fix an AttributeError saving to Oracle. See :pr:`380` by Mauro
Amico.

- MySQL+gevent: Release the critical section a bit sooner. See :issue:`381`.

- SQLite+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`382`.

- MySQL+gevent: Fix possible deadlocks with gevent if switches
occurred at unexpected times. See :issue:`385`.  This also included
some minor optimizations.

.. caution::

  This introduces a change in a stored procedure that is not
  compatible with older versions of RelStorage. When this version
  is first deployed, if there are older versions of RelStorage
  still running, they will be unable to commit. They will fail with
  a transient conflict error; they may attempt retries, but wil not
  succeed. Read-only transactions will continue to work.

3.0.0

==================

- Build binary wheels for Python 3.8 on Windows.

3.0rc1

===================

- SQLite: Avoid logging (at DEBUG level) an error executing ``PRAGMA
OPTIMIZE`` when closing a read-only (load) connection. Now, the
error is avoided by making the connection writable.

- PostgreSQL: Reduce the load connection's isolation level from
``SERIALIZABLE`` to ``REPEATABLE READ`` (two of the three other
supported databases also operate at this level). This allows
connecting to hot standby/streaming replicas. Since the connection
is read-only, and there were no other ``SERIALIZABLE`` transactions
(the store connection operates in ``READ COMMITTED`` mode), there
should be no other visible effects. See :issue:`376`.

- PostgreSQL: pg8000: Properly handle a ``port`` specification in the
``dsn`` configuration. See :issue:`378`.

- PostgreSQL: All drivers pass the ``application_name`` parameter at
connect time instead of later. This solves an issue with psycopg2
and psycopg2cffi connecting to hot standbys.

- All databases: If ``create-schema`` is false, use a read-only
connection to verify that the schema is correct.

- Packaging: Prune unused headers from the include/ directory.

3.0b3

==================

- SQLite: Fix a bug that could lead to invalid OIDs being allocated if
transactions were imported from another storage.

3.0b2

==================

- SQLite: Require the database to be in dedicated directory.

.. caution::

  This introduces a change to the <sqlite3> configuration.
  Please review the documentation. It is possible to migrate a
  database created earlier to the new structure, but no automated
  tooling or documentation is provided for that.

- SQLite: Allow configuration of many of SQLite's PRAGMAs for advanced
tuning.

- SQLite: Fix resetting OIDs when zapping a storage. This could be a
problem for benchmarks.

- SQLite: Fix large prefetches resulting in ``OperationalError``

- SQLite: Improve the speed of copying transactions into a SQLite
storage (e.g., with zodbconvert).

- SQLite: Substantially improve general performance. See :pr:`368`.

- SQLite: Add the ``gevent sqlite3`` driver that periodically yields
to the gevent loop at configurable intervals.

- PostgreSQL: Improve the speed of  writes when using the 'gevent
psycopg2' driver.

3.0b1

==================

- Make SQLite and Oracle both use UPSERT queries instead of multiple
database round trips.

- Fix an exception with large transactions on SQLite.

- Fix compiling the C extension on very new versions of Microsoft
Visual Studio.

3.0a13

===================

- Further speed improvements and memory efficiency gains of around 30%
for the cache.

- Restore support for Python 2.7 on Windows.

- No longer require Cython to build from a sdist (.tar.gz).

- Add support for using a SQLite file as a RelStorage backend, if all
processes accessing it will be on a single machine. The advantage
over FileStorage is that multiple processes can use the database
concurrently. To allow multiple processes to use a FileStorage one
must deploy ZEO, even if all processes are on a single machine. See
:pr:`362`.

- Fix and test Oracle. The minimum required cx_oracle is now 6.0.

- Add support for Python 3.8.

3.0a12

===================

- Add the ``gevent psycopg2`` driver to allow using the fast psycopg2
driver with gevent.

- Conflict resolution prefetches data for conflicted objects, reducing
the number of database queries and locks needed.

- Introduce a driver-agnostic method for elevating database connection
priority during critical times of two-phase commit, and implement it
for the ``gevent MySQLdb`` driver. This reduces the amount of gevent
switches that occur while database locks are held under a carefully
chosen set of circumstances that attempt to balance overall
throughput against latency. See :issue:`339`.

- Drop support for Python 2.7 on Windows. The required compiler is
very old. See :issue:`358`.

- Substantially reduce the overhead of the cache, making it mome
memory efficient. Also make it substantially faster. This was done
by rewriting it in C. See :issue:`358`.

3.0a11

===================

- Make ``poll_invalidations`` handle other retryable internal
exceptions besides just ``ReadConflictError`` so they don't
propagate out to ``transaction.begin()``.

- Make the zodburi resolver entry points not require a specific
RelStorage extra such as 'postgres', in case there is a desire to
use a different database driver than the default that's installed
with that extra. See :issue:`342`, reported by Éloi Rivard.

- Make the zodburi resolvers accept the 'driver' query paramater to
allow selecting a specific driver to use. This functions the same as
in a ZConfig configuration.

- Make the zodburi resolvers more strict on the distinction between
boolean arguments and arbitrary integer arguments. Previously, a
query like ``?read_only=12345&cache_local_mb=yes`` would have been
interpreted as ``True`` and ``1``, respectively. Now it produces errors.

- Fix the calculation of the persistent cache size, especially on
Python 2. This is used to determine when to shrink the disk cache.
See :issue:`317`.

- Fix several race conditions when packing history-free storages
through a combination of changes in ordering and more strongly
consistent (``READ ONLY REPEATABLE READ``) transactions.
Reported in :issue:`325` by krissik with initial PR by Andreas
Gabriel.

- Make ``zodbpack`` pass RelStorage specific options like
``--prepack`` and ``--use-prepack-state`` to the RelStorage, even
when it has been wrapped in a ``zc.zlibstorage``.

- Reduce the amount of memory required to pack a RelStorage through
more careful datastructure choices. On CPython 3, the peak
memory usage of the prepack phase can be up to 9 times less. On
CPython 2, pre-packing a 30MM row storage required 3GB memory; now
it requires about 200MB.

- Use server-side cursors during packing when available, further
reducing the amount of memory required. See :issue:`165`.

- Make history-free database iterators from the same storage use a
consistent view of the database (until a transaction is committed
using the storage or ``sync()`` is called). This prevents data loss
in some cases. See :issue:`344`.

- Make copying transactions *from* a history-free RelStorage (e.g., with
``zodbconvert``) require substantially less memory (75% less).

- Make copying transactions *to* a RelStorage clean up temporary blob
files.

- Make ``zodbconvert`` log progress at intervals instead of for every
transaction. Logging every transaction could add significant overhead
unless stdout was redirected to a file.

- Avoid attempting to lock objects being created. See :issue:`329`.

- Make cache vacuuming faster.

3.0a10

===================

- Fix a bug where the persistent cache might not properly detect
object invalidations if the MVCC index pulled too far ahead at save
time. Now it explicitly checks for invalidations at load time, as
earlier versions did. See :pr:`343`.

- Require perfmetrics 3.0.

3.0a9

==================

- Several minor logging improvements.

- Allow many internal constants to be set with environment variables
at startup for experimentation. These are presently undocumented; if
they prove useful to adjust in different environments they may be
promoted to full configuration options.

- Fix importing RelStorage when ``zope.schema`` is not installed.
``zope.schema`` is intended to be a test dependency and optional for
production deployments. Reported in :issue:`334` by Jonathan Lung.

- Make the gevent MySQL driver more efficient at avoiding needless  waits.

- Due to a bug in MySQL (incorrectly rounding the 'minute' value of a
timestamp up), TIDs generated in the last half second of a minute
would suddenly jump ahead by 4,266,903,756 integers (a full minute).

- Fix leaking an internal value for ``innodb_lock_timeout`` across
commits on MySQL. This could lead to ``tpc_vote`` blocking longer
than desired. See :issue:`331`.

- Fix ``undo`` to purge the objects whose transaction was revoked from
the cache.

- Make historical storages read-only, raising
``ReadOnlyHistoryError``, during the commit process. Previously this
was only enforced at the ``Connection`` level.

- Rewrite the cache to understand the MVCC nature of the connections
that use it.

This eliminates the use of "checkpoints." Checkpoints established a
sort of index for objects to allow them to be found in the cache
without necessarily knowing their ``_p_serial`` value. To achieve
good hit rates in large databases, large values for the
``cache-delta-size-limit`` were needed, but if there were lots of
writes, polling to update those large checkpoints could become very
expensive. Because checkpoints were separate in each ZODB connection
in a process, and because when one connection changed its
checkpoints every other connection would also change its checkpoints
on the next access, this could quickly become a problem in highly
concurrent environments (many connections making many large database
queries at the same time). See :issue:`311`.

The new system uses a series of chained maps representing polling
points to build the same index data. All connections can share all
the maps for their view of the database and earlier. New polls add
new maps to the front of the list as needed, and old mapps are
removed once they are no longer needed by any active transaction.
This simulates the underlying database's MVCC approach.

Other benefits of this approach include:

- No more large polls. While each connection still polls for each
 transaction it enters, they now share state and only poll against
 the last time a poll occurred, not the last time they were used.
 The result should be smaller, more predictable polling.

- Having a model of object visibility allows the cache to use more
 efficient data structures: it can now use the smaller LOBTree to
 reduce the memory occupied by the cache. It also requires
 fewer cache entries overall to store multiple revisions of an
 object, reducing the overhead. And there are no more key copies
 required after a checkpoint change, again reducing overhead and
 making the LRU algorithm more efficient.

- The cache's LRU algorithm is now at the object level, not the
 object/serial pair.

- Objects that are known to have been changed but whose old revision
 is still in the cache are preemptively removed when no references
 to them are possible, reducing cache memory usage.

- The persistent cache can now guarantee not to write out data that
 it knows to be stale.

Dropping checkpoints probably makes memcache less effective, but
memcache hasn't been recommended for awhile.

3.0a8

==================

- Improve the safety of the persistent local cache in high-concurrency
environments using older versions of SQLite. Perform a quick
integrity check on startup and refuse to use the cache files if they
are reported corrupt.

- Switch the order in which object locks are taken: try shared locks
first and only then attempt exclusive locks. Shared locks do not
have to block, so a quick lock timeout here means that a
``ReadConflictError`` is inevitable. This works best on PostgreSQL
and MySQL 8, which support true non-blocking locks. On MySQL 5.7,
non-blocking locks are emulated with a 1s timeout. See :issue:`310`.

.. note:: The transaction machinery will retry read conflict errors
         by default. The more rapid detection of them may lead to
         extra retries if there was a process still finishing its
         commit. Consider adding small sleep backoffs to retry
         logic.

- Fix MySQL to immediately rollback its transaction when it gets a
lock timeout, while still in the stored procedure on the database.
Previously it would have required a round trip to the Python
process, which could take an arbitrary amount of time while the
transaction may have still been holding some locks. (After
:issue:`310` they would only be shared locks, but before they would
have been exclusive locks.) This should make for faster recovery in
heavily loaded environments with lots of conflicts. See :issue:`313`.

- Make MySQL clear its temp tables using a single round trip.
Truncation is optional and disabled by default. See :issue:`319`.

- Fix PostgreSQL to not send the definition of the temporary tables
for every transaction. This is only necessary for the first
transaction.

- Improve handling of commit and rollback, especially on PostgreSQL.
We now generate many fewer unneeded rollbacks. See :issue:`289`.

- Stop checking the status of ``readCurrent`` OIDs twice.

- Make the gevent MySQL driver yield more frequently while getting
large result sets. Previously it would block in C to read the entire
result set. Now it yields according to the cursor's ``arraysize``.
See :issue:`315`.

- Polling for changes now iterates the cursor instead of using
``fetchall()``. This can reduce memory usage and provide better
behaviour in a concurrent environment, depending on the cursor
implementation.

- Add three environment variables to control the odds of whether any
given poll actually suggests shifted checkpoints. These are all
floating point numbers between 0 and 1. They are
``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_FULL`` (default to 0.7,
i.e., 70%), ``RELSTORAGE_CP_REPLACEMENT_BEGIN_CONSIDERING_PERCENT``
(default 0.8) and ``RELSTORAGE_CP_REPLACEMENT_CHANCE_WHEN_CLOSE``
(default 0.2). (There are corresponding class variables on the
storage cache that could also be set.) Use values of ``1``, ``1``
and ``0`` to restore the old completely deterministic behaviour.
It's not clear whether these will be useful, so they are not
officially options yet but they may become so. Feedback is
appreciated! See :issue:`323`.

.. note::

  These were removed in 3.0a9.

3.0a7

==================

- Eliminate runtime dependency on ZEO. See :issue:`293`.

- Fix a rare race condition allocating OIDs on MySQL. See
:issue:`283`.

- Optimize the ``loadBefore`` method. It appears to be mostly used in
the tests.

- Fix the blob cache cleanup thread to use a real native thread if
we're monkey-patched by gevent, using gevent's thread pool.
Previously, cleaning up the blob cache would block the event loop
for the duration. See :issue:`296`.

- Improve the thread safety and resource usage of blob cache cleanup.
Previously it could spawn many useless threads.

- When caching a newly uploaded blob for a history free storage, if
there's an older revision of the blob in the cache, and it is not in
use, go ahead and preemptively remove it from disk. This can help
prevent the cache size from growing out of hand and limit the number
of expensive full cache checks required. See :issue:`297`.

- Change the default value of the configuration setting
``shared-blob-dir`` to false, meaning that the default is now to use
a blob cache. If you were using shared blobs before, you'll need to
explicitly set a value for ``shared-blob-dir`` to ``true`` before
starting RelStorage.

- Add an option, ``blob-cache-size-check-external``, that causes the
blob cache cleanup process to run in a subprocess instead of a
thread. This can free up the storage process to handle requests.
This is not recommended on Windows. (``python -m
relstorage.blobhelper.cached /path/to/cache size_in_bytes`` can be
used to run a manual cleanup at any time. This is currently an
internal implementation detail.)

- Abort storage transactions immediately when an exception occurs.
Previously this could be specified by setting the environment
variable ``RELSTORAGE_ABORT_EARLY``. Aborting early releases
database locks to allow other transactions to make progress
immediately. See :issue:`50`.

- Reduce the strength of locks taken by ``Connection.readCurrent`` so
that they don't conflict with other connections that just want to
verify they haven't changed. This also lets us immediately detect a
conflict error with an in-progress transaction that is trying to
alter those objects. See :issue:`302`.

- Make databases that use row-level locks (MySQL and PostgreSQL) raise
specific exceptions on failures to acquire those locks. A different
exception is raised for rows a transaction needs to modify compared
to rows it only needs to read. Both are considered transient to
encourage transaction middleware to retry. See :issue:`303`.

- Move more of the vote phase of transaction commit into a database
stored procedure on MySQL and PostgreSQL, beginning with taking the
row-level locks. This eliminates several more database round trips
and the need for the Python thread (or greenlet) to repeatedly
release and then acquire the GIL while holding global locks. See
:issue:`304`.

- Make conflict resolution require fewer database round trips,
especially on PostgreSQL and MySQL, at the expense of using more
memory. In the ideal case it now only needs one (MySQL) or two
(PostgreSQL) queries. Previously it needed at least twice the number
of trips as there were conflicting objects. On both databases, the
benchmarks are 40% to 80% faster (depending on cache configuration).

3.0a6

==================

Enhancements
------------

- Eliminate a few extra round trips to the database on transaction
completion: One extra ``ROLLBACK`` in all databases, and one query
against the ``transaction`` table in history-preserving databases.
See :issue:`159`.

- Prepare more statements used during regular polling.

- Gracefully handle certain disconnected exceptions when rolling back
connections in between transactions. See :issue:`280`.

- Fix a cache error ("TypeError: NoneType object is not
subscriptable") when an object had been deleted (such as through
undoing its creation transaction, or with ``multi-zodb-gc``).

- Implement ``IExternalGC`` for history-preserving databases. This
lets them be used with `zc.zodbdgc
<https://pypi.org/project/zc.zodbdgc/>`_, allowing for
multi-database garbage collection (see :issue:`76`). Note that you
must pack the database after running ``multi-zodb-gc`` in order to
reclaim space.

.. caution::

  It is critical that ``pack-gc`` be turned off (set to false) in a
  multi-database and that only ``multi-zodb-gc`` be used to perform
  garbage collection.

Packing
~~~~~~~

- Make ``RelStorage.pack()`` also accept a TID from the RelStorage
database to pack to. The usual Unix timestamp form for choosing a
pack time can be ambiguous in the event of multiple transactions
within a very short period of time. This is mostly a concern for
automated tests.

Similarly, it will accept a value less than 0 to mean the most
recent transaction in the database. This is useful when machine
clocks may not be well synchronized, or from automated tests.

Implementation
--------------

- Remove vestigial top-level thread locks. No instance of RelStorage
is thread safe.

RelStorage is an ``IMVCCStorage``, which means that each ZODB
``Connection`` gets its own new storage object. No visible storage
state is shared among Connections. Connections are explicitly
documented as not being thread safe. Since 2.0, RelStorage's
Connection instances have taken advantage of that fact to be a
little lighter weight through not being thread safe. However, they
still paid the overhead of locking method calls and code complexity.

The top-level storage (the one belonging to a ``ZODB.DB``) still
used heavyweight locks in earlier releases. ``ZODB.DB.storage`` is
documented as being only useful for tests, and the ``DB`` object
itself does not expose any operations that use the storage in a way
that would require thread safety.

The remaining thread safety support has been removed. This
simplifies the code and reduces overhead.

If you were previously using the ``ZODB.DB.storage`` object, or a
``RelStorage`` instance you constructed manually, from multiple
threads, instead make sure each thread has a distinct
``RelStorage.new_instance()`` object.

- A ``RelStorage`` instance now only implements the appropriate subset
of ZODB storage interfaces according to its configuration. For
example, if there is no configured ``blob-dir``, it won't implement
``IBlobStorage``, and if ``keep-history`` is false, it won't
implement ``IStorageUndoable``.

- Refactor RelStorage internals for a cleaner separation of concerns.
This includes how (some) queries are written and managed, making it
easier to prepare statements, but only those actually used.


MySQL
-----

- On MySQL, move allocating a TID into the database. On benchmarks
of a local machine this can be a scant few percent faster, but it's
primarily intended to reduce the number of round-trips to the
database. This is a step towards :issue:`281`. See :pr:`286`.

- On MySQL, set the connection timezone to be UTC. This is necessary
to get values consistent between ``UTC_TIMESTAMP``,
``UNIX_TIMESTAMP``, ``FROM_UNIXTIME``, and Python's ``time.gmtime``,
as used for comparing TIDs.

- On MySQL, move most steps of finishing a transaction into a stored
procedure. Together with the TID allocation changes, this reduces
the number of database queries from::

 1 to lock
  + 1 to get TID
  + 1 to store transaction (0 in history free)
  + 1 to move states
  + 1 for blobs (2 in history free)
  + 1 to set current (0 in history free)
  + 1 to commit
 = 7 or 6 (in history free)

down to 1. This is expected to be especially helpful for gevent
deployments, as the database lock is held, the transaction finalized
and committed, and the database lock released, all without involving
greenlets or greenlet switches. By allowing the GIL to be released
longer it may also be helpful for threaded environments. See
:issue:`281` and :pr:`287` for benchmarks and specifics.

.. caution::

 MySQL 5.7.18 and earlier contain a severe bug that causes the
 server to crash when the stored procedure is executed.


- Make PyMySQL use the same precision as mysqlclient when sending
floating point parameters.

- Automatically detect when MySQL stored procedures in the database
are out of date with the current source in this package and replace
them.

PostgreSQL
----------

- As for MySQL, move allocating a TID into the database.

- As for MySQL, move most steps of finishing a transaction into a
stored procedure. On psycopg2 and psycopg2cffi this is done in a
single database call. With pg8000, however, it still takes two, with
the second call being the COMMIT call that releases locks.

- Speed up getting the approximate number of objects
(``len(storage)``) in a database by using the estimates collected by
the autovacuum process or analyzing tables, instead of asking for a
full table scan.

3.0a5

==================

- Reduce the time that MySQL will wait to perform OID garbage
collection on startup. See :issue:`271`.

- Fix several instances where RelStorage could attempt to perform
operations on a database connection with outstanding results on a
cursor. Some database drivers can react badly to this, depending on
the exact circumstances. For example, mysqlclient can raise
``ProgrammingError: (2014, "Commands out of sync; you can't run this
command now")``. See :issue:`270`.

- Fix the "gevent MySQLdb" driver to be cooperative during ``commit``
and ``rollback`` operations. Previously, it would block the event
loop for the entire time it took to send the commit or rollback
request, the server to perform the request, and the result to be
returned. Now, it frees the event loop after sending the request.
See :issue:`272`.

- Call ``set_min_oid`` less often if a storage is just updating
existing objects, not creating its own.

- Fix an occasional possible deadlock in MySQL's ``set_min_oid``. See
:pr:`276`.

3.0a4

==================

- Add support for the ZODB 5 ``connection.prefetch(*args)`` API. This
takes either OIDs (``obj._p_oid``) or persistent ghost objects, or
an iterator of those things, and asks the storage to load them into
its cache for use in the future. In RelStorage, this uses the shared
cache and so may be useful for more than one thread. This can be
3x or more faster than loading objects on-demand. See :issue:`239`.

- Stop chunking blob uploads on PostgreSQL. All supported PostgreSQL
versions natively handle blobs greater than 2GB in size, and the
server was already chunking the blobs for storage, so our layer of
extra chunking has become unnecessary.

.. important::

  The first time a storage is opened with this version,
  blobs that have multiple chunks will be collapsed into a single
  chunk. If there are many blobs larger than 2GB, this could take
  some time.

  It is recommended you have a backup before installing this
  version.

  To verify that the blobs were correctly migrated, you should
  clean or remove your configured blob-cache directory, forcing new
  blobs to be downloaded.

- Fix a bug that left large objects behind if a PostgreSQL database
containing any blobs was ever zapped (with ``storage.zap_all()``).
The ``zodbconvert`` command, the ``zodbshootout`` command, and the
RelStorage test suite could all zap databases. Running the
``vacuumlo`` command included with PostgreSQL will free such
orphaned large objects, after which a regular ``vacuumdb`` command
can be used to reclaim space. See :issue:`260`.

- Conflict resolution can use data from the cache, thus potentially
eliminating a database hit during a very time-sensitive process.
Please file issues if you encounter any strange behaviour when
concurrently packing to the present time and also resolving
conflicts, in case there are corner cases.

- Packing a storage now invalidates the cached values that were packed
away. For the global caches this helps reduce memory pressure; for
the local cache this helps reduce memory pressure and ensure a more
useful persistent cache (this probably matters most when running on
a single machine).

- Make MySQL use ``ON DUPLICATE KEY UPDATE`` rather than ``REPLACE``.
This can be friendlier to the storage engine as it performs an
in-place ``UPDATE`` rather than a ``DELETE`` followed by an
``INSERT``. See :issue:`189`.

- Make PostgreSQL use an upsert query for moving rows into place on
history-preserving databases.

- Support ZODB 5's parallel commit feature. This means that the
database-wide commit lock is taken much later in the process, and
held for a much shorter time than before.

Previously, the commit lock was taken during the ``tpc_vote`` phase,
and held while we checked ``Connection.readCurrent`` values, and
checked for (and hopefully resolved) conflicts. Other transaction
resources (such as other ZODB databases in a multi-db setup) then
got to vote while we held this lock. Finally, in ``tpc_finally``,
objects were moved into place and the lock was released. This
prevented any other storage instances from checking for
``readCurrent`` or conflicts while we were doing that.

Now, ``tpc_vote`` is (usually) able to check
``Connection.readCurrent`` and check and resolve conflicts without
taking the commit lock. Only in ``tpc_finish``, when we need to
finally allocate the transaction ID, is the commit lock taken, and
only held for the duration needed to finally move objects into
place. This allows other storages for this database, and other
transaction resources for this transaction, to proceed with voting,
conflict resolution, etc, in parallel.

Consistent results are maintained by use of object-level row
locking. Thus, two transactions that attempt to modify the same
object will now only block each other.

There are two exceptions. First, if the ``storage.restore()`` method
is used, the commit lock must be taken very early (before
``tpc_vote``). This is usually only done as part of copying one
database to another. Second, if the storage is configured with a
shared blob directory instead of a blob cache (meaning that blobs
are *only* stored on the filesystem) and the transaction has added
or mutated blobs, the commit lock must be taken somewhat early to
ensure blobs can be saved (after conflict resolution, etc, but
before the end of ``tpc_vote``). It is recommended to store blobs on
the RDBMS server and use a blob cache. The shared blob layout can be
considered deprecated for this reason).

In addition, the new locking scheme means that packing no longer
needs to acquire a commit lock and more work can proceed in parallel
with regular commits. (Though, there may have been some regressions
in the deletion phase of packing speed MySQL; this has not been
benchmarked.)

.. note::

  If the environment variable ``RELSTORAGE_LOCK_EARLY`` is
  set when RelStorage is imported, then parallel commit will not be
  enabled, and the commit lock will be taken at the beginning of
  the tpc_vote phase, just like before: conflict resolution and
  readCurrent will all be handled with the lock held.

  This is intended for use diagnosing and temporarily working
  around bugs, such as the database driver reporting a deadlock
  error. If you find it necessary to use this setting, please
  report an issue at https://github.com/zodb/relstorage/issues.

See :issue:`125`.

- Deprecate the option ``shared-blob-dir``. Shared blob dirs prevent
using parallel commits when blobs are part of a transaction.

- Remove the 'umysqldb' driver option. This driver exhibited failures
with row-level locking used for parallel commits. See :issue:`264`.

- Migrate all remaining MySQL tables to InnoDB. This is primarily the
tables used during packing, but also the table used for allocating
new OIDs.

Tables will be converted the first time a storage is opened that is
allowed to create the schema (``create-schema`` in the
configuration; default is true). For large tables, this may take
some time, so it is recommended to finish any outstanding packs
before upgrading RelStorage.

If schema creation is not allowed, and required tables are not using
InnoDB, an exception will be raised. Please contact the RelStorage
maintainers on GitHub if you have a need to use a storage engine
besides InnoDB.

This allows for better error detection during packing with parallel
commits. It is also required for `MySQL Group Replication
<https://dev.mysql.com/doc/refman/8.0/en/group-replication-requirements.html>`_.
Benchmarking also shows that creating new objects can be up to 15%
faster due to faster OID allocation.

Things to be aware of:

 - MySQL's `general conversion notes
   <https://dev.mysql.com/doc/refman/8.0/en/converting-tables-to-innodb.html>`_
   suggest that if you had tuned certain server parameters for
   MyISAM tables (which RelStorage only used during packing) it
   might be good to evaluate those parameters again.
 - InnoDB tables may take more disk space than MyISAM tables.
 - The ``new_oid`` table may temporarily have more rows in it at one
   time than before. They will still be garbage collected
   eventually. The change in strategy was necessary to handle
   concurrent transactions better.

See :issue:`188`.

- Fix an ``OperationalError: database is locked`` that could occur on
startup if multiple processes were reading or writing the cache
database. See :issue:`266`.

3.0a3

==================

- Zapping a storage now also removes any persistent cache files. See
:issue:`241`.

- Zapping a MySQL storage now issues ``DROP TABLE`` statements instead
of ``DELETE FROM`` statements. This is much faster on large
databases. See :issue:`242`.

- Workaround the PyPy 7.1 JIT bug using MySQL Connector/Python. It is no
longer necessary to disable the JIT in PyPy 7.1.

- On PostgreSQL, use PostgreSQL's efficient binary ``COPY FROM`` to
store objects into the database. This can be 20-40% faster. See
:issue:`247`.

- Use more efficient mechanisms to poll the database for current TIDs
when verifying serials in transactions.

- Silence a warning about ``cursor.connection`` from pg8000. See
:issue:`238`.

- Poll the database for the correct TIDs of older transactions when
loading from a persistent cache, and only use the entries if they
are current. This restores the functionality lost in the fix for
:issue:`249`.

- Increase the default cache delta limit sizes.

- Fix a race condition accessing non-shared blobs when the blob cache
limit was reached which could result in blobs appearing to be
spuriously empty. This was only observed on macOS. See :issue:`219`.

- Fix a bug computing the cache delta maps when restoring from
persistent cache that could cause data from a single transaction to
be stale, leading to spurious conflicts.

3.0a2

==================

- Drop support for PostgreSQL versions earlier than 9.6. See
:issue:`220`.

- Make MySQL and PostgreSQL use a prepared statement to get
transaction IDs. PostgreSQL also uses a prepared statement to set
them. This can be slightly faster. See :issue:`246`.

- Make PostgreSQL use a prepared statement to move objects to their
final destination during commit (history free only). See
:issue:`246`.

- Fix an issue with persistent caches written to from multiple
instances sometimes getting stale data after a restart. Note: This
makes the persistent cache less useful for objects that rarely
change in a database that features other actively changing objects;
it is hoped this can be addressed in the future. See :issue:`249`.

3.0a1

==================

- Add support for Python 3.7.

- Drop support for Python 3.4.

- Drop support for Python 2.7.8 and earlier.

- Drop support for ZODB 4 and ZEO 4.

- Officially drop support for versions of MySQL before 5.7.9. We haven't
been testing on anything older than that for some time, and older
than 5.6 for some time before that.

- Drop the ``poll_interval`` parameter. It has been deprecated with a
warning and ignored since 2.0.0b2. See :issue:`222`.

- Drop support for pg8000 older than 1.11.0.

- Drop support for MySQL Connector/Python older than 8.0.16. Many
older versions are known to be broken. Note that the C extension,
while available, is not currently recommended due to internal
errors. See :issue:`228`.

- Test support for MySQL Connector/Python on PyPy. See :issue:`228`.

.. caution:: Prior to PyPy 7.2 or RelStorage 3.0a3, it is necessary to disable JIT
            inlining due to `a PyPy bug
            <https://bitbucket.org/pypy/pypy/issues/3014/jit-issue-inlining-structunpack-hh>`_
            with ``struct.unpack``.

- Drop support for PyPy older than 5.3.1.

- Drop support for the "MySQL Connector/Python" driver name since it
wasn't possible to know if it would use the C extension or the
Python implementation. Instead, explicitly use the 'Py' or 'C'
prefixed name. See :pr:`229`.

- Drop the internal and undocumented environment variables that could be
used to force configurations that did not specify a database driver
to use a specific driver. Instead, list the driver in the database
configuration.

- Opening a RelStorage configuration object read from ZConfig more
than once would lose the database driver setting, reverting to
'auto'. It now retains the setting. See :issue:`231`.

- Fix Python 3 with mysqlclient 1.4. See :issue:`213`.

- Drop support for mysqlclient < 1.4.

- Make driver names in RelStorage configurations case-insensitive
(e.g., 'MySQLdb' and 'mysqldb' are both valid). See :issue:`227`.

- Rename the column ``transaction.empty`` to ``transaction.is_empty``
for compatibility with MySQL 8.0, where ``empty`` is now a reserved
word. The migration will happen automatically when a storage is
first opened, unless it is configured not to create the schema.

.. note:: This migration has not been tested for Oracle.

.. note:: You must run this migration *before* attempting to upgrade
         a MySQL 5 database to MySQL 8. If you cannot run the
         upgrade through opening the storage, the statement is
         ``ALTER TABLE transaction CHANGE empty is_empty BOOLEAN
         NOT NULL DEFAULT FALSE``.

- Stop getting a warning about invalid optimizer syntax when packing a
MySQL database (especially with the PyMySQL driver). See
:issue:`163`.

- Add ``gevent MySQLdb``, a new driver that cooperates with gevent
while still using the C extensions of ``mysqlclient`` to communicate
with MySQL. This is now recommended over ``umysqldb``, which is
deprecated and will be removed.

- Rewrite the persistent cache implementation. It now is likely to
produce much higher hit rates (100% on some benchmarks, compared to
1-2% before). It is currently slower to read and write, however.
This is a work in progress. See :pr:`243`.

- Add more aggressive validation and, when possible, corrections for
certain types of cache consistency errors. Previously an
``AssertionError`` would be raised with the message "Detected an
inconsistency between RelStorage and the database...". We now
proactively try harder to avoid that situation based on some
educated guesses about when it could happen, and should it still
happen we now reset the cache and raise a type of ``TransientError``
allowing the application to retry. A few instances where previously
incorrect data could be cached may now raise such a
``TransientError``. See :pr:`245`.
Links

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant