-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix boost URL #11858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Fix boost URL #11858
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
saintstack
approved these changes
Jan 3, 2025
Result of foundationdb-pr-clang-ide on Linux CentOS 7
|
Result of foundationdb-pr-macos-m1 on macOS Ventura 13.x
|
Result of foundationdb-pr-cluster-tests on Linux CentOS 7
|
Result of foundationdb-pr-clang-arm on Linux CentOS 7
|
Result of foundationdb-pr-macos on macOS Ventura 13.x
|
Result of foundationdb-pr-clang on Linux CentOS 7
|
Result of foundationdb-pr on Linux CentOS 7
|
spraza
approved these changes
Jan 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
MarkSh1
added a commit
to owtech/foundationdb
that referenced
this pull request
Apr 8, 2025
* address comments * address comments * Format S3Client_cli.actor.cpp * batch dumping at SS * Rename test script to s3client from s3cp * Formatting * condense manifest content * Rust external workload modifications (#11805) * Rust external workload modifications - add readme - add simulation configuration file - minor Rust bindings changes - fix FDBPerfMetric::format_code default value in the C++ bindings - Add CWorkload.c to CMake - Fix cpp_workload test file --------- Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com> * Run cycle tests after restore to validate correctness (#11814) * Remove duplicate code. Move BackupTLSConfig.* from fdbbackup to fdbclient so can be used in fdbclient. Remove the copies of BackupTLSConfig we had in place named BlobTLSConfig.*. Keep the old name though it a little clunky. * - When version vector is enabled, make proxies advance min committed version only after receiving a commit version reply from the sequencer. Advancing the min committed version prior to that point may result in invalid DBRecoveryDurability errors (if a recovery happens after the advancement) in simulation tests. * Fix usage formatting issue * Fix 'ERROR: unknown option: `'' when no options supplied. (#11818) Bug introduced by recent refactor adding being able to specify storage as an option. Found by Paymaan Raza. Co-authored-by: michael stack <stack@duboce.com> * Seaweed process was not getting cleaned up after test. (#11816) Seaweed is started in a subprocess so setting the global had no effect. Instead write the pid to a file so its available at cleanup time. Found by Zhe Wang. Co-authored-by: michael stack <stack@duboce.com> * Fix a heap-use-after-free bug Found by ASAN s3_backup_tests * Fix another ASAN bug restartShardTrackers will invalidate `it->range()` used in the trace events. * - In case of spill by reference, log servers should use the logic that (#11815) is based over "TagData::popped" to decide how long to keep the disk queue positions of versions in memory (instead of using the logic that is based over "LogData::persistentDataVersion", which is applicable to spill by value case). * address comments * fmt * Fix race condition in rocksdb checkpoint readers (#11819) * Add gRPC support to FDB (#11782) * Implement gRPC support * Move some CMake stuff around. * Fix typo * Add some test * Add async client * Add test for checking destroy * [testing] Automatically discover unit-test and register as ctest This patch adds `collect_unit_tests()` to CMake which searches over the codebase and finds all the unit-tests written using Flow's TEST_CASE macro and register as ctest. The test then can be then run using ctest command or directly via Test Explorer in VSCode. * Fix some tests * Use NetworkAddress * Add another variant of call method * Add a failed call test * Refactor * Cleanup shutdown * Start working on streaming * Implement server streaming * Cleanup some unnecessary templating * Cleanup some tests * WIP Client Streaming * WIP * File Transfer WIP * Remove UnitTest.h * Take grpc addresses from command line * startup grpc in fdbserver * Cancel if future ref is 0 * noop * Update some Cmake files * Fix some build/run issues * Review comments and remove file transfer * Compile with gRPC present * format * Address review comments * Add assert * fix FLOW_GRPC_ENABLED flag * include grpc/proto headers for generated files * fix arm build not finding generated proto * add debug message for protobuf generation * add generated dir again * add check for protoc compiler * Rename error variable in go tests to err #8828 (#11716) * Rename error variable in go tests to err #8829 renamed the variables from e to err as mentioned in the https://go.dev/doc/effective_go * Update packaging/docker/samples/golang/app/main.go * fixed accidental renames, renamed file from directoryLayer.go to directory_layer.go, to work on go formatting * Rename error variable in go tests to err #8829 renamed the variables from e to err as mentioned in the https://go.dev/doc/effective_go * fixed accidental renames, renamed file from directoryLayer.go to directory_layer.go, to work on go formatting * Update packaging/docker/samples/golang/app/main.go * renamed directoryPartition.go -> directory_partition.go and directorySubspace.go -> directory_subspace.go * updated: comments in files that were renamed, fixed accidental rename in bindings/go/src/fdb/fdb.go * Update doc.go Removed unintentional whitespaces due to formatter * Update get_encryption_keys.go removed unintentional whitespaces in get_encryption_keys.go * removed accidental whitespaces in get_encryption_keys.go * fixed few minor issues while renaming variable * updated: minor tweaks, typos * updated: CMakeLists * removed: named return value * handling nil Pointer exception --------- Co-authored-by: Vishesh Yadav <vishesh3y@gmail.com> * Ignore some ctests for ASAN * Fix a few issues with gcc 13 (#11833) * error: invalid operands to binary expression in ClogRemoteTLog.actor.cpp on appleclang (#11830) * On both Apple clang version 15.0.0 (clang-1500.1.0.2.5) and 16.0.0, compile fails here: /Users/stack/checkouts/fdb/foundationdb/fdbserver/workloads/ClogRemoteTLog.actor.cpp:254:67: error: invalid operands to binary expression ('Standalone<StringRef>' and 'Optional<Standalone<StringRef>>') if (ssi.locality.dcId().present() && ssi.locality.dcId().get() == g_simulator->remoteDcId) * Update fdbserver/workloads/ClogRemoteTLog.actor.cpp Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> * Formatting --------- Co-authored-by: stack <stack@duboce.com> Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> * Fix issues with clang 19 (#11834) * Fix issues with clang 19 * Fix format * Ignore --undefined-version for gcc * Fix java lib missing symbols issue (#11836) Clang19 doesn't like missing symbols * Pause perpetual storage wiggle when TSS count target is met. (#11823) * TSS pause * Add condition * Add transaction store mutation tracking capability (#11831) * Add transaction store mutation tracking * Adding DEBUG_TRANSACTION_STATE_STORE messages --------- Co-authored-by: Dan Lambright <hlambright@apple.com> * Make ClogRemoteTlog pass when check failures count as Joshua failures (#11837) * Make BulkDump work with S3 (#11822) * init * Add bulkdump to blobstore:// (s3) * cmake/CompileBoost.cmake Add boost url. Needed parsing blobstore:// urls. * documentation/sphinx/source/bulkdump.rst Minor edit to allow addition of blobstore target. * fdbcli/BulkDumpCommand.actor.cpp * fdbclient/BulkDumping.cpp s/blobstore/s3/ -- more generic and aligns with how backup/restore refers to "s3" thingies. * fdbclient/include/fdbclient/S3Client.actor.h * fdbclient/S3Client.actor.cpp Add batch upload handler. * fdbclient/tests/seaweedfs_fixture.sh Add run seaweed method. Also look for weed and if installed use it else download. * fdbserver/BulkDumpUtil.actor.cpp appendToPath does the right thing when passed an URL Add bulkDumpTransportBlobstore_impl. Add upload to blobstore. * tests/loopback_cluster/run_custom_cluster.sh Complain if unrecognized arguments. * Add ctest for bulkload with simple bulkdump test for now. * Add new test to ctest list * fix bugs * nit * nits * nits --------- Co-authored-by: stack <stack@duboce.com> * Make sharded rocks deterministic in simulation (phase 1) (#11841) * Fix storage server crashes (#11843) gcc build randomly crashes because the StorageServer structure exceeded 16KB size, but is still FastAllocated. Thus, LatencySample can overwrite memory during initialization, causing random segfaults. Added a static assertion to catch this problem for future modifications to the structure. * Lower bound version of CC_DEGRADED_PEER_DEGREE_TO_EXCLUDE (#11840) * Fix stack use-after-return bugs (#11846) Variables before "wait()" are temporary ones that will be destructed in the actor compiled code. So adding "state" to keep them live while executing the "wait()" calls. This is found by ASAN. * Refactor locality-based exclusion checks to reduce additional overhead (#11838) * Refactor locality-based exclusion checks to reduce additional overhead * Update exclusion logic to prevent copies * Update developer-guide.rst: Fix typo 'guaranatees' -> 'guarantees' (#11851) * Refine transaction store mutation tracking (#11844) * Make `ByteArrayUtil#EMPTY_BYTES` public * Upgrade awssdk to 1.11.473 (#11853) Old version has compiling errors. * - Use "count()", instead of "size()", to find the number of bits set (#11857) in a dynamic_bitset structure. * Fix boost URL (#11858) See https://github.com/boostorg/boost/issues/996 * Fix missing bindingtester dependency * Improve BulkLoad/Dump implementation (#11842) * Improve BulkLoad/Dump implementation * make bulkload test data folder inside simfdb folder * simplify code * use manifest in bulkdump metadata * use manifest in bulkload * apply bulkload fileset to bulkload and fix bugs of bytesampling value generation * remove BulkDumpFileFullPathSet * address comments * address comments * address comments * Add checksum checking of downloads. Add cleanup of test data. * fdbclient/ClientKnobs.cpp * fdbclient/include/fdbclient/ClientKnobs.h Add knob BLOBSTORE_ENABLE_ETAG_ON_GET * fdbclient/S3BlobStore.actor.cpp Optionally check etag (md5) volunteered by s3 against the content we have downloaded and fail if not equal (TODO: check the checksum after we've saved the content to the filesystem -- would require good bit of a refactoring). * fdbclient/S3Client.actor.cpp Add deleteResource support. * fdbclient/S3Client_cli.actor.cpp Add COMMAND support; currently either 'cp' or 'rm'. Set the knob blobstore_enable_etag_on_get to true by default for s3client. * fdbclient/tests/s3client_test.sh Add clean up of resources written up to s3 at end of test. (Awkward in bash) * jzhou77 Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.4 to 3.1.5. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.4...3.1.5) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix uninitialized value in BulkLoadManifest (#11869) * Update downgrade tests to have a new split point (#11867) * Update downgrade tests to have a new split point * Disable sharded rocks for tests downgrading to < 7.3.51 * Bumps [setuptools](https://github.com/pypa/setuptools) from 65.5.1 to 70.0.0. Bumps [setuptools](https://github.com/pypa/setuptools) from 65.5.1 to 70.0.0. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](https://github.com/pypa/setuptools/compare/v65.5.1...v70.0.0) --- updated-dependencies: - dependency-name: setuptools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.4 to 43.0.1. Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.4 to 43.0.1. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/42.0.4...43.0.1) --- updated-dependencies: - dependency-name: cryptography dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add rocksdb version to status json. (#11868) * Add rocksdb version to status json. * update schema * Check the requested path in the fdb Kubernetes sidecar * Add test cases for path checks * Bump golang.org/x/net from 0.23.0 to 0.33.0 in /fdbkubernetesmonitor (#11870) Bumps [golang.org/x/net](https://github.com/golang/net) from 0.23.0 to 0.33.0. - [Commits](https://github.com/golang/net/compare/v0.23.0...v0.33.0) --- updated-dependencies: - dependency-name: golang.org/x/net dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Clarifying documentation on blob backup URL and credentials file. * documentation/sphinx/source/backups.rst Minor edit. Add more examples making it clearer how to do S3 backup URLs in particular. Explain the 'trick' for omitting key, secret, and token from URL instead picking them up from the credentils file. * fdbclient/S3Client_cli.actor.cpp Minor cleanup of usage. * Address review comments * Formatting * Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.4 to 43.0.1. Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.4 to 43.0.1. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/42.0.4...43.0.1) --- updated-dependencies: - dependency-name: cryptography dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.4 to 3.1.5. Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.4 to 3.1.5. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.4...3.1.5) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix Get.svg in the commit design doc * Block until a signal is received in sidecar mode (#11874) * Block until a signal is received in sidecar mode * Formatting * docs: update GoDoc for ReadTransact() Mention that R/O transactions are garbage-collected once futures go out of scope. * docs: mention when fdb_database_get_client_status() returns an empty string * Go binding: add GetClientStatus method to Database Allow fetching client status JSON information for any database with multi-version client enabled; the raw JSON is returned, so that multiple versions of FoundationDB are supported without any Go structure constraint. * BulkLoad Job Framework and Co-Testing BulkLoad and BulkDump (#11865) * add bulkload job framework and fix bugs * add BulkLoadChecksum, fix CI issue * nits * nits * address comments * mitigate perpetual wiggle to make sure DD can select a valid team to inject data * fix submitBulkDumpJob and submitBulkLoadJob * change remoteRoot to jobRoot * add comments * Have ctests use s3 if it is available. Fix object integrity check; original approach doesn't work when serverside encryption is enabled (awz:kms). * contrib/SimpleOpt/include/SimpleOpt/SimpleOpt.h Address sanitizer was complaining about how SimpleOpt manipulates the array of options. While memcpy inside a buffer is 'odd', it seems fine. Its old code. Leaving it. * fdbbackup/tests/s3_backup_test.sh Pass in weed_dir rather than rely on fixture global (the latter didn't work). * fdbclient/ClientKnobs.cpp * fdbclient/include/fdbclient/ClientKnobs.h * fdbclient/include/fdbclient/S3BlobStore.h Add a knob to ask for object integrity check on download from s3. BLOBSTORE_ENABLE_OBJECT_INTEGRITY_CHECK replaces BLOBSTORE_ENABLE_ETAG_ON_GET which doesn't work when serverside encodes content (found in testing). * fdbclient/S3BlobStore.actor.cpp Implement object integrity check on download. If enable_object_integrity_check is set, we use sha256 in place of md5 as our hash. Removed a redundant 'verify' of md5 check. * fdbclient/S3Client.actor.cpp Remove unhelpful comments. * fdbclient/S3Client_cli.actor.cpp Add support for enable_object_integrity_check. This knob replaces enable_etag_on_get which didn't work when awz:kms serverside encryption was enabled. Add error code on exit when exception. * fdbclient/include/fdbclient/S3Client.actor.h Move an include (address a review comment from previous commit). * fdbclient/tests/aws_fixture.sh Add an aws fixture of utility that can be shared. * fdbclient/tests/bulkload_test.sh Use imported log_test_result * fdbclient/tests/s3client_test.sh Add using s3 if available; otherwise, do seaweedfs. * fdbclient/tests/seaweedfs_fixture.sh WEED_DIR global doesn't work so have caller pass it in for each method instead. * * fdbclient/S3BlobStore.actor.cpp Fix compile fail. * fdbclient/tests/aws_fixture.sh * fdbclient/tests/seaweedfs_fixture.sh * fdbclient/tests/tests_common.sh Rename of local variable so they don't clash w/ varibles set by the caller. * fdbclient/tests/bulkload_test.sh Refactoring in preparation for this test to go against s3. Currently only works against seaweed. (Will do in a follow-on PR. I need to do a bit of work first to make this possible). * fdbclient/tests/s3client_test.sh Refactor removing duplicated code. Added a test to prove s3 works using old md5 hash; i.e. disabled integrity check. * * fdbbackup/tests/s3_backup_test.sh Refactor to go against s3 if available. * fdbclient/tests/aws_fixture.sh Add aws_setup utility shared by scripts going against s3. * fdbclient/tests/bulkload_test.sh Comment out verification for now. Redo of how we use seaweed (less code). * fdbclient/tests/fdb_cluster_fixture.sh Take knobs when starting backup_agent. * fdbclient/tests/s3client_test.sh Explain the OKTETO_NAMESPACE variable. Add logging of whether we are going against s3 or seaweed. We don't know certificate and key talking to s3. Move common setup code out to aws and weed fixtures. * fdbclient/tests/seaweedfs_fixture.sh Make it so less methods to call running seaweed. * Move export under s3 clause; doesn't make sense when seaweed is the backing store * Dump out 1k lines of log instead of 50 so can hopefully see why the failure on test machine * Add cleanup of test data after test is done * Add more variety to the random temp name making; we seem to have been using an old directory left over which caused start of weed to fail. * fdbbackup/tests/s3_backup_test.sh Remove unused S3_RESOURCE * fdbclient/tests/aws_fixture.sh * fdbclient/tests/seaweedfs_fixture.sh Mix in process id into tmp dir name. * fdbclient/tests/bulkload_test.sh Add in (disabled) use s3 code if it available. * Extend gray failure recentHealthTriggeredRecoveryTime state to reflect any recovery * Extend gray failure recentHealthTriggeredRecoveryTime state to reflect any recovery, including non-gray failure triggered ones * Update knob documentation * Add log * Update recovery doc with CC orchestrated process (#11883) * Update recovery doc with CC orchestrated process The doc is outdated since FDB 7.1 * Rewrite a few sentences per review comments * Bulkload FDBCLI Command (#11886) * Parallelize Fetching BulkLoad Manifest Metadata (#11884) * * fdbclient/tests/seaweedfs_fixture.sh The search for 'address in use' was overly specific. Loosen it up. * Add ENABLE_VERSION_VECTOR_REPLY_RECOVERY switch (#11889) Co-authored-by: Dan Lambright <hlambright@apple.com> * Fix the scope of sharded rocks checkpoint determinism flag (#11893) * - Do not do the replication policy validation check when trying to find the recovery version in the context of version vector. We will need to extend the version vector recovery algorithm to do this check in an efficient manner later. * Rocksdb manual flush code changes (#11849) * Add knob for direct IO * Add custom compaction policy based on number of range deletions in file * compaction policy * fix build error * Improve AuditLocationMetadataPostCheck coverage (#11888) * improve-auditLocationMetadataPostCheck-coverage * address comments * nit * Use s3 if available when running the bulkload test. It was disabled until we made it so the SS could talk to s3, included in this PR. Also finished the bulkload test. It only had the bulkdump portion. bulkload support was recentlty added so finish off the test here by adding bulkload of the bulkdump and then verifying all data present. Added passing knobs to the fdb cluster so available to the fdbserver when it goes to talk to s3. Also added passing SS count to start in fdb cluster. * fdbclient/tests/fdb_cluster_fixture.sh Add ability to pass multiple knobs to fdb cluster and to specify more than just one SS. * fdbserver/fdbserver.actor.cpp Add --blob-server option and processing of FDB_BLOB_CREDENTIALS if present (hijacked the unused, unadvertised -- blob-credentials-file). * tests/loopback_cluster/run_custom_cluster.sh Allow passing more than just one knob. * fdbclient/BulkLoading.cpp * fdbclient/include/fdbclient/BulkLoading.h Added getPath * fdbclient/S3BlobStore.actor.cpp Fix bug where we were doubling up the first '/' on a path if it had a root '/' already (s3 treats /a/b as distinct from /a//b). * fdbclient/S3Client.actor.cpp Fix up of traceevent Types. * fdbclient/tests/bulkload_test.sh Enable being able to use s3 if available. Pick up jobid when bulkdumping. Feed it to new bulkload method. Add verification all data present post-bulkload. * fdbserver/BulkLoadUtil.actor.cpp Add support for blobstore. * tests/loopback_cluster/run_custom_cluster.sh Bug fix -- we were only able to pass in one knob. Allow passing multiple. * Change how we process array passed to a function -- the bash on test servers seems to behave differently * Release notes for 7.3.58 and 7.3.59 * Release notes for 7.3.58 and 7.3.59 * Address feedback * New restore consolidated commit (#11901) * New restore consolidated commit This change adds RestoreDispatchPartitionedTaskFunc to restore from partitioned-format backup. * ArenaBlock::totalSize parameter pass by ref * Fix format issues identified by CI * Refactor backup mutation serialization * address comments * Update the latest stable release to 7.3.57 (#11909) * Fix cycle test valgrind issue #11906 * Pause store wiggle if all SS does not have minimum available space. (#11905) * Handling for 'line 16: kill: Binary: arguments must be process or job IDs' On cleanup after tests, don't fail. Also print PIDs for fdb processes in case there an issue here. * Using knob to enable physical shard move in failure injection workloads #11914 * Allow tht FDB_PIDS may not be set * fix shardedrocksdb knob and add ENFORCE_SHARDED_ROCKSDB_SIM_IF_AVALIABLE (#11916) * Close DB properly in unit tests. #11915 * Push the fdb-kubernetes-monitor 'unified' image. (#11919) See https://github.com/FoundationDB/fdb-kubernetes-tests/pull/859 Co-authored-by: michael stack <stack@duboce.com> * Fix assertion failure in GcGenerations workload. Sometimes Cycle Setup can take a long time, so we need to enable connection failures injection for clogRemoteDc() to work properly. Otherwise, assertions in generateMultipleTxnGenerations() can fire, because of no clogging and recovery count can go down. * Fix double new database in ChangeConfig workLoad #11927 * Fix double new database in ChangeConfigWorkLoad * address comment * address comments * Close and delete DB when checkpoint reader gets an error. #11925 * Update ConfigureDatabase workload to issue aggressive storage migration if needed * Extend connection failure in GcGenerations workload The Tester may disable the connection failure after the GcGenerations enables it. So we want to extend the connection failure for the Tester in this case. * Fix floating comparison issue * Add disableConnectionFailures back Otherwise, the connection can stay clogged, because previous extend may be just enabling clogging without unclogging. * Bulkload Engine Support General Storage Engine and Fix BulkLoad Bugs (#11898) * bulkload support general engine and fix bugs * add comments * improve test coverage and fix bug * nits and address comments * nit * nits * fix data inconsistency bug due to bulkload metadata * fix ss bulkload task metadata bugs * nit and fix CI issue * fix bugs of restore ss bulkload metadata * use ssBulkLoadMetadata for fetchKey and general kv engine * cleanup bulkload file for fetchkey * fix CI issue * fix simulation stuck due to repeated re-recruitment of unfit dd * randomly do available space check when finding the dest team for bulkload in simulation * address conflict * code clean up * update BulkDumping.toml same to BulkLoading.toml * consolidate ss fetchkey and fetchshard failed to read bulkload task metadata * fix DD bulkload job busy loop bug which causes segfault and test terminate unexpectedly in joshua test * nit * fix ss busy loop for bulkload in fetchkey * use sqlite for bulkload ctest * fix bulkload ctest stuck issue due to merge and change storage engine to ssd * fix comments for CC recruit DD * address comments * address comments * add comments * fix ci format issue * address comments * add comments * Fix off by one randomInt bug in ConfigureDatabase workload * Gray failure observability (#11923) * Improve BulkLoad Implementation (#11929) * improve bulkload code * address CI * disable audit storage replica check and distributed consistency check in bulkload and bulkdump simulation test * fix ci * disable waitForQuiescence in bulkload and bulkdump tests * Fix startMoveShards() caused corruption (#11933) At commit: fff5439e with clang, seed -f ./tests/slow/SharedDefaultBackupCorrectness.toml -s 2189316179 -b on We found a corruption where the destination storage server can get the incorrect serverKeys mutations. Note this only happens when shard_encode_location_metadata is enabled. The reason is that one of the actors in the previous iteration encountered transaction_too_old error, and the transaction restarted. However, because the actors are not cancelled, these can still modify the next transaction that retried. * Do not use "safe range" logic in recovery code when version vector is enabled #11887 * - When version vector is enabled use "min(DV)" as the recovery version when trying to decide whether to restart recovery or not. * - Address a review comment * - Address a review comment * Holds onto temporary variables' memories Otherwise, StringRef points to free'ed memory locations. * reduce ShouldCheckPeer frequency and increase max_trace_lines for bulkload tests (#11935) * Refactor BulkLoad Engine and Improve Trace Events (#11937) * refactor bulkload engine framework * add time span measure * fmt * Bump cryptography from 43.0.1 to 44.0.1 in /tests/authorization #11939 Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.1 to 44.0.1. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/43.0.1...44.0.1) --- updated-dependencies: - dependency-name: cryptography dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Update shared rocksdb knobs. #11936 (#11938) * Add 7.3.60, 7.3.61 release notes (#11943) Documentation changes only. * improve-bulkload (#11941) * Disable attrition fault injection in snapshot workload * Migration to consider wiggling based on perpetualStorageEngine and not on configureStorageEngine (#11917) * update logs (#11944) * Refactor initialize_logger_level and unit_tests_version_510 (#11879) * Refactor initialize_logger_level and unit_tests_version_510 * clang-format fixed * Apply clang-format * Fix a compiling error * Fix SIGSEGV --------- Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com> Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> * Add multiparting to s3client. (#11920) * Add multiparting to s3client. Fix boost::urls::parse_uri 's dislike of credentialed blobstore urls. * fdbclient/BulkLoading.cpp Add blobstore regex to extract credentials before feeding the boost parse_uri. * fdbclient/include/fdbclient/S3BlobStore.h * fdbclient/S3BlobStore.actor.cpp Add cleanup of failed multipart -- abortMultiPartUpload l(s3 will do this in the background eventually but lets clean up after ourselves). Also add getObjectRangeMD5 so can do multipart checksumming. * fdbclient/S3Client.actor.cpp Change upload file and download file to do multipart always. Retry too. * fdbclient/S3Client_cli.actor.cpp Add command line to trace rather than output. * Address Zhe review * More logging around part upload and download * Undo assert that proved incorrect; restore the old length math doing copy in readObject. Cleanup around TraceEvents in HTTTP.actor. * Undo commented out cleanup -- for debugging * formatting --------- Co-authored-by: stack <stack@duboce.com> * Add compile time switch NO_MULTIREGION_TEST. (#11931) * Add compile time switch NO_MULTIREGION_TEST. When set, simulation tests will not create configurations with more than one region. Tests requiring multiple regions are ignored. * While the RUN_IGNORED_TESTS setting allows running tests that have been marked as ignored, this should not apply to multiregion tests. Multiregion tests must be completely disabled if the NO_MULTIREGION setting is enabled. --------- Co-authored-by: Dan Lambright <hlambright@apple.com> * Disable noSim/ShardedRocksDBCheckpointTest.toml * Add gRPC file transfer service (#11892) Add gRPC file transfer service * grpc: Add file size check * grpc: change test addresses * Fix CI/CD failure * Disable gRPC for build * Fixes for new gRPC in new build image * Move FileTransfer definitions to CPP file * DataMove Should Decide BulkLoading After Old DataMove Actor Has Been Cleared (#11947) * fix bulkload bug * fix CI * Allow BulkLoadEngine to Handle Non-Retriable Task (#11950) * enable-bulkload-engine-accept-unretriable-task * nit and fmt * fix bug * Update 7.3.59 as the latest release (#11955) * Update 7.3.59 as the latest release * Update cmake and boost versions used for compiling * Handle cases when backup worker pulling may miss mutations I.e., throw an error to trigger a recovery. * Fix start version after backup worker exits noop mode * Save NOOP progress of backup workers This is needed so that CC knows the lower bound of versions that can be included in a backup. * Pause backup workers during quite database Because in NOOP mode, backup workers still writes to the database, and cause non-empty storage queues. * It's fine to ignore mutations if noop mode popped them * Delay updating pop version in noop mode until it's saved Otherwise, the pop version can become larger than the actual saved version when switching to the regular pulling mode. Because the pop version is larger, mutations larger than saved version could be popped and no long available. * Fix start version for pullAsyncData * Address comments * Enable BulkLoad Job to Give Up Unretrievable Task and Fix DDStuck Bug (#11952) * enable bulkload job to give up unretriable task * fix ddstuck bug * Add ability to ignore multiple tests (#11956) * Add ability to ignore multiple tests - Also ignores gRPC unit tests * Update fdbserver/workloads/UnitTests.actor.cpp Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> * ignore grpc from other toml files --------- Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> * Fix rocksdb crash caused because of passing uninitialized metadata to ExportColumnFamily (#11957) * Conditionally disable backup worker * RandomMoveKey should choose SSes from different data halls (#11964) * DDShardLost should be an error in simulation * fix randomMoveKey workload * revert DDShardLost severity change * Remove per thread histogram in storage engine and fix bugs in range scan. (#11967) * Simplify BulkLoad Job Metadata (#11959) * address comments in the PR 11952 * code refactor and simplification * avoid task outdated in DDBulkLoadJobExecute * nit * fix CI issue * Add compile switch to disable restart simulation tests * - Correct an issue to do with populating the list of reporting log servers during recovery with version vector - the list of reporting log servers should include even those that have an empty unknown committed version list. * Improve BulkLoad TraceEvent (#11971) * improve bulkload event * fmt * Improve BulkDump Implementation (#11974) * bulkdump code refactor * fix bugs * improve * Update main branch to 8.0 (#11968) * Add BulkloadJob Cancellation (#11976) * add bulkload cancellation * reduce frequency of job cancellation in tests * fix bulkload assert failure * nits * fix busy loop in bulkload/dump workload * fix workload * but * address comments and CI failures * add task count trace event * Fix isOnMainThread in Simulation and Testing (#11978) * Fix isOnMainThread in Simulation and Testing isOnMainThread() is used to check if the currently running task is on the FDB's event loop. However, in simulation this behaviour is broken and always returns false. In other modes such as UnitTest mode since `runTests()` is called before `g_network->run()`, but without a wait() statement the event loop never gets chance to set itself as main thread and the tests never sees current thread as main thread. Therefore we add a yield inside `runTests()` so yield control back to caller block and continue with g_network->run() which eventually schedule it back after initialization. * Update fdbserver/tester.actor.cpp Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> --------- Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> * Do not pick SS with a colocated LR in ExcludeIncludeStorageServersWorkload (#11980) * Release notes for 7.3.62 and 7.3.63 (#11982) * rocksdb: fix crash due to uninitialized/stale ColumnFamilyHandle `CreateColumnFamilyWithImport()` expects that the value inside handle is `nullptr`. This patch fixed a codepath where we pass a stale handle left by destroyed column family. * Revert "Update main branch to 8.0 (#11968)" This reverts commit 710f3f3083b845b0ae5f94b9a2e58eced826f463. * Update future protocol versions for 7.4 protocol version binaries * FDB cmake: update to latest production ready 7.3 and 7.1 patch releases * AuditStorage Documentation (#11983) * audit doc * fix ci * address comments * address comments * Enable TRACK_TLOG_RECOVERY as default (#11987) Test RECORD_RECOVER_AT_IN_CSTATE and TRACK_TLOG_RECOVERY in buggify with random on or off. * Build a sidecar container that refreshes s3 credentials (#11945) * packaging/docker/Dockerfile Add fdb-aws-s3-credentials-fetcher-sidecar container. Runs perpetual script that writes blob-credentials.json to /var/fdb. * packaging/docker/build-images.sh Build and publish new sidecar container * packaging/docker/fdb-aws-s3-credentials-fetcher/README.md * packaging/docker/fdb-aws-s3-credentials-fetcher/fdb-aws-s3-credentials-fetcher.go * packaging/docker/fdb-aws-s3-credentials-fetcher/go.mod * packaging/docker/fdb-aws-s3-credentials-fetcher/go.sum Script that fetches credentials via IRSA (IAM Roles for Service Accounts). * packaging/docker/fdb-aws-s3-credentials-fetcher/fdb-aws-s3-credentials-fetcher.go Match the key generated by fdbserver internally. * fdbclient/S3BlobStore.actor.cpp Add some logging around fail-to-find-credentials -- why. * * fdbclient/tests/aws_fixture.sh Use the fdb-aws-s3-credentials-fetcher script fetching credentials if available in ctests. * fdbclient/tests/s3client_test.sh TMPDIR might not be defined when we print usage. Co-authored-by: Johannes Scheuermann <johscheuer@users.noreply.github.com> * Bump cryptography from 43.0.1 to 44.0.1 in /tests/TestRunner (#11989) Bumps [cryptography](https://github.com/pyca/cryptography) from 43.0.1 to 44.0.1. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pyca/cryptography/compare/43.0.1...44.0.1) --- updated-dependencies: - dependency-name: cryptography dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Add an option to Cycle workload to skip setup phase (#11990) Useful for testing upgrade/downgrade tests. * Temporarily disabling backup dry run request until the issue is fixed (#11991) * Version vector: compute locations only once during commits. (#11924) * During commits with version vector enabled, compute location list only once, as recalcuating could generate a different random number, hence a different set of locations. * Respond to review comments. * Select replicas from locations returned from resolver. * Respond to review comments --------- Co-authored-by: Dan Lambright <hlambright@apple.com> * Add BulkLoad History (#11992) * add bulkload history * address comments * address comments * Fix DCC tester (#11995) * AsyncTaskExecutor: lightweight wrapper for `IThreadPool` This patch implements `AsyncTaskExecutor` for asynchronous execution of tasks in a separate thread pool. We already have `IThreadPool` however its API is more well suited for bigger tasks. This just provides an easier to use API. There is `AsyncTaskThread` which is similar in nature, but this is not re-wrapping IThreadPool hence has ability to have multiple worker threads. We can potentially replace that with this component by setting `num_threads = 1`. TODO: Move this to `flow/include` instead of here. * gRPC server life-cycle management and AsyncTaskExecution This patch has two set of changes: - Whenever a service is registered and removed from server, we need to restart gRPC server. GrpcServer provides some methods that can be used by worker actors so that the life of services registered by them can tied to the life of the worker role itself. - Replace asio::thread_pool with AsyncTaskExecutor both in client and server. * Update 7.3.63 as the stable latest release (#11999) * Reduce some parameter values for StoreFrontTest (#11998) * Hold `ThreadReturnPromiseStream` reference when sending value/error When a value/error is sent via `ThreadReturnPromiseStream` we assume that the underlying `PromiseStream` will be alive when the client waits. However, if the last `ThreadReturnPromiseStream` gets destroyed after sending values/end_of_stream(), the underlying `PromiseStream` will as well resulting in `broken_promise`. This happens because the actual work of sending the value/error is deferred on the main thread. This is likely to happen because the sender did its work and it isn't supposed to check if client got the value. Hence, little reason to keep the promise. Meanwhile, client is free to read values from its future whenever it needs to. This patch just holds the reference to underlying `NotifiedQueue` by copying `PromiseStream` until the value/error is sent. The test added would fail without this patch. * Add move constructors for `ThreadReturnPromise*` and delete copy constructors Copy-constructor can be added back if necessary. Meanwhile, its simpler to enforce only copy of ThreadReturnPromise* family, and avoid scattering it all over places. * Bump jinja2 from 3.1.5 to 3.1.6 in /flow/protocolversion (#12002) Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.5 to 3.1.6. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](https://github.com/pallets/jinja/compare/3.1.5...3.1.6) --- updated-dependencies: - dependency-name: jinja2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Make bulkload file reads and writes async and memory parsimonious (#11997) * * fdbclient/S3Client.actor.cpp Change field names so capitialized (convention) Add duration as field to traces. * fdbserver/BulkLoadUtil.actor.cpp When the job-manifest is big, processing blocks so much getBulkLoadJobFileManifestEntryFromJobManifestFile fails. * Make bulkload file reads and writes async and memory parsimonious. In tests at scale, processing a large job-manifest.txt was blocking and causing the bulk job to fail. This is part 1 of two patches. The second is to address data copy added in the below when we made methods ACTORs (ACTOR doesn't allow passing by reference). * fdbserver/BulkDumpUtil.actor.cpp Removed writeStringToFile and buldDumpFileCopy in favor of new methods in BulkLoadUtil. Made hosting functions ACTORs so could wait on async calls. * fdbserver/BulkLoadUtil.actor.cpp Added async read and write functions. * fdbserver/DataDistribution.actor.cpp Making uploadBulkDumpJobManifestFile async made it so big bulkloads work. * fix memory corruption in writeBulkFileBytes and fix read options in getBulkLoadJobFileManifestEntryFromJobManifestFile * If read or write < 1MB, do it in a single read else do multiple read/writes * * packaging/docker/fdb-aws-s3-credentials-fetcher/fdb-aws-s3-credentials-fetcher.go Just be blunt and write out the credentials. Trying to figure when the blob credentials have expired is error prone. Co-authored-by: michael stack <stack@duboce.com> Co-authored-by: Zhe Wang <zhe.wang@wustl.edu> * Handle Exceptions in AsyncTaskExecutor Forwards FDB's `Error` type thrown by tasks in `AsyncTaskExecutor`. Any other kind of exception is forwarded as `unknown_error()`. * ThreadReturnPromise* in AsyncTaskExecutor don't need to be pointers * Improve Range Lock and Add Documentation (#11986) * rangelock doc * nits * fix ci * fix ci * nits * address comments * nits * nit * make read lock exclusive * fix * fix CI * improve doc * fix bug * address simulation failues * fix bugs * nits * Never absorb wrong_shard_server in LoadBalance replicaComparison (#12006) * Never absorb wrong_shard_server in LoadBalance replicaComparison * Add comment * Throw wrong_shard_server() instead of Error(error_code_wrong_shard_server) * Improve BulkLoad Test Coverage And Fix Bugs (#12009) * Set max_read_transaction_life_versions for KillRegionCycle.toml Simulation found an assertion failure in SS: ASSERT(rollbackVersion >= data->storageVersion()); The reason is that storage version is updated to a version larger than the forced recovery version, due to only 1'000'000 for max_read_transaction_life_versions. Also added debugging for cumulative checksum mutations. See rdar://144550725 20250309-185039-jzhou-5145c65b0e8071b7 * Fix RangeLock in BulkDump Test and Avoid Memory Copy For Async Read/Write Bulk Files (#12007) * Increase TLOG_MAX_CREATE_DURATION in simulation * Edit of bulkload/bulkdump cli. (#12012) * fdbcli/BulkDumpCommand.actor.cpp * fdbcli/BulkLoadCommand.actor.cpp Print out the bulkdump description rather than usage so user has a chance of figuring out what it is they entered incorrectly. Make bulkdump and bulkload align by using 'cancel' instead of 'clear' in both and ordering the sub-commands the same for bulkload and bulkdump. Add more help to the description. Bulkload was missing mention of the jobid needed specifying a bulkload. * documentation/sphinx/source/bulkdump.rst s/clearBulkDumpJob/cancelBulkDumpJob/ Co-authored-by: stack <stack@duboce.com> * Track shard moves for version vector (#11977) * Track shard moves for version vector * Don't broadcast to all TL when a different CP had a metadata mutation, unless on shard moves * update lastShardMove on resolver * Respond to review comments --------- Co-authored-by: Dan Lambright <hlambright@apple.com> * A Couple for Fixes for BulkDump and RangeLock (#12013) * fix lockrange test and improve bulk dump * fix bulkdump stuck error * remove unnecessary yield when read/write bulk files * remove unnecessary string creation in read/write bulk files * Add checksumming across multipart upload and download (#11988) * Hash file before uploading. Add it as tag after successful multipart upload. On download, after the file is on disk, get its hash and compare to that of the tag we get from s3. * fdbclient/CMakeLists.txt Be explicit what s3client needs. * fdbclient/S3BlobStore.actor.cpp * fdbclient/include/fdbclient/S3BlobStore.h Add putObjectTags and getObjectTags * fdbclient/S3Client.actor.cpp Add calculating checksum, adding it as tags on upload, fetching on download, and verifying match if present. Clean up includes. Less logging. * fdbclient/tests/s3client_test.sh Less logging. * Make failed checksum check an error (and mark non-retryable) --------- Co-authored-by: michael stack <stack@duboce.com> * Request reboot for TSS data move conflicts in simulation (#12008) * Request reboot for TSS data move conflicts in simulation * Add comment * Update storageserver.actor.cpp Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com> --------- Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com> * Added compaction knobs. (#12018) * Add replica comparison wrong_shard_server trace event (#12020) * Add replica comparison wrong_shard_server trace event * Suppress trace for 1 sec * Extend the unicast based recovery algorithm to do the replication policy check (#11996) * - Extend the unicast based recovery algorithm to do the replication policy check * - Review comments related changes * - Review and compilation related changes * Adding rocksdb obsolete files size property in metrics. (#12017) * persist bulkload task count in bulkload job (#12022) * Add Error Message To BulkLoadJob Metadata (#12024) * add error message to bulkload metadata * remove TODOs and add error message for bulkload job manifest map creation failures * nits * More cleanup of bulk* cli (#12015) Tighten up options for bulk*. Compound 'local' and 'blobstore' as 'dump'/'load'. Ditto for 'history'. Make it so 'bulkload mode' works like 'bulkdump mode': i.e. dumps current mode. If mode is not on for bulk*, ERROR in same manner as for writemode. Make it so we can return bulk* subcommand specific help rather than dump all help when an issue. Make the commands match in the ctest * ENABLE_VERSION_VECTOR_REPLY_RECOVERY can be T only if ENABLE_VERSION_VECTOR_TLOG_UNICAST is T (#12021) * ENABLE_VERSION_VECTOR_REPLY_RECOVERY can be T only if ENABLE_VERSION_VECTOR_TLOG_UNICAST is T * Respond to review comments --------- Co-authored-by: Dan Lambright <hlambright@apple.com> * Initialize lastShardMove for recovery txn and in CommitBatchContext (#12027) * Add BulkLoad Task Count to BulkLoad FDBCLI Command (#12029) * change a event name * add bulkload task count to fdbcli * nit * Fix use-after-move issue in AsyncTaskExecutor `getFuture()` should be called before post as `send`/`sendError` operation in `ThreadReturnPromise` moves the underlying Promise to `tagAndForward()`. Ideally, `ThreadReturnPromise` behavior should stay consistent with the `Promise`. However, the problem is that it relies on the invariant that there will always be one owner of its internal `Promise` which is either itself or `tagOrForward` -- which is necessary to ensure that only one thread can operate on the Promise's internal state (ref count, flags etc) and avoid race conditions. This patch (1) makes sure that in case of `post()` function we get future before, (2) adds an ASSERT as this should never happen, (3) documentation for future users and (4) a test case for potentially fixing this in future. * BulkLoadJob Should Not Schedule Completed BulkLoadTask (#12030) * make bulkload job manager logic clear * bypass task if the task has been completed * improve scheduleBulkLoadJob * Add Verbose Level for BulkLoad Trace Events (#12034) * add level for DDBulkLoad except for datadistribution * nits * Add a bulkload user guide (#12033) * Add a bulkload user guide * Forgot to add a file * Address review comments --------- Co-authored-by: stack <stack@duboce.com> * avoid shard merge when bulkload (#12035) * Allow One BulkloadTask Do Multiple Manifests (#12036) * Implement TLS support for Flow/gRPC This patch adds TLS support for GrpcServer and AsyncGrpcClient by implementing `GrpcCredentialsProvider` and using that to get channel credentials. It adds `FlowGrpc` which is a flow global instance, and initializes TLS credentials that are consistent with the ones provided to FlowTransport. - Added `FlowGrpc` to manage gRPC server initialization and TLS configuration globally. - `GrpcCredentialsProvider` abstracts secure/insecure communications configurations for server/clients. - Introduced `GrpcTlsCredentialProvider` for dynamic TLS certificate reloading from filesystem and `GrpcTlsCredentialStaticProvider` for static in-memory credentials. - Updated `GrpcServer` to accept a `GrpcCredentialProvider`, enabling dynamic TLS credential management. - Modified `fdbserver` to use `FlowGrpc::init()` for gRPC server initialization instead of `GrpcServer::initInstance()`, aligning it with FlowTransport behavior. - Modified `GrpcServer::run()` to use the provided `GrpcCredentialProvider` instead of hardcoded insecure credentials. Testing: - Implemented a basic mTLS test case (`/fdbrpc/grpc/basic_tls`) to verify secure gRPC connections using `GrpcTlsCredentialStaticProvider`. Todo: - Generate certificates during testruns instead statically. - Add test for `GrpcTlsCredentialProvider` which reads keys/certs from filesystem and monitors changes. - Verify peers rules/criterias like FDB --verify-peer feature. * Add more gRPC/TLS tests * Fix a restore bug due to a race (#12037) Found by simulation: seed: -f tests/slow/ApiCorrectnessAtomicRestore.toml -s 177856328 -b on Commit: 51ad8428e0fbe1d82bc76cf42b1579f51ecf2773 Compiler: clang++ Env: Rhel9 okteto applyMutations() has processed version 801400000-803141392, and before calling sendCommitTransactionRequest(), which was going to update apply begin version to 803141392. But DID NOT wait for the transaction commit. Then there is an update on the apply end version to 845345760, which picks up the PREVIOUS apply begin version 801400000. Thus started another applyMutation() with version range 801400000-845345760. Note because previous applyMutation() has finished and didn't wait for the transaction commit, thus the starting version is wrong. As a result, this applyMutation() re-processed version range 801400000-803141392. The test failed during re-processing, because mutations are missing for the overlapped range. The fix is to wait for the transaction to commit in sendCommitTransactionRequest(). This bug probably affects DR as well. See rdar://146877552 20250317-162835-jzhou-ff4c4d6d7c51bfed * Disable enable_version_vector_reply_recovery in version vector tests. (#12032) Co-authored-by: Dan Lambright <hlambright@apple.com> * A Couple of Fixes and Improvements for BulkLoad/Dump (#12040) * Change backup worker memory accounting to use message sizes Previously I used arena size for accouting, which has problems such as arena memory usage changes and arena memory block circular reference. And we are acquiring flow lock multiple times, even though the cursor has already fetched data into memory. This change simplifies the accounting to just use message sizes, reducing the number of flow lock acquisitions. * Fix lock take bytes Avoid calling take() when the number of bytes is 0 and add an overhead factor for memory usage of a mutation. * Address a comment 20250318-000905-jzhou-f92206b417dfabb7 * Fix an assertion failure lock->release(toRelease) can trigger assertion failure that release size is larger than the active size. This is because lock->take() has not returned yet. So the memory released has not been reserved. Fix by breaking release into two steps. After the first release(), the lock->take() will be unblocked, thus the second release() does the correct accounting. gcc correctness: 20250319-002950-jzhou-a9ecc09dd1c8812a * disable mutation checksum and accumulative checksum by default and add trace for audit storage (#12042) * audit replica should not read empty range to check (#12043) * Fix backup worker assertion failure on memory usage (#12046) * Fix backup worker assertion failure on memory usage pullAsyncData() can be cancelled thus not take() some of the memory used by the in memory queue. 20250320-171737-jzhou-752fd8c45fdadd4a 100k tests/slow/ParallelRestoreNewBackupCorrectnessAtomicOp.toml 20250320-172251-jzhou-3bc7db8e7ce5e1de * Refactor such that messages in the queue have to reserve memory first Thus, when we release the memory, it must have already been reserved. 100k ParallelRestoreNewBackupCorrectnessAtomicOp.toml 20250320-234748-jzhou-a159972dd0a72e03 20250320-235252-jzhou-54a6ae8c67873b59 * Fix a backup worker assertion failure The pop version obtained from GRV proxy replies may go backwards, which can cause assertion failure when pulling mutations, i.e., missing mutations. Fix this bug and add an assertion that popVersion can go lower. 100k 20250321-012019-jzhou-01eb56938cf0a3fc 100k tests/slow/ParallelRestoreNewBackupCorrectnessAtomicOp.toml 20250321-012144-jzhou-68181bb51aedbb5d * Fix assertion failure of triggered version The triggered version could be the same as popVersion. 20250321-025429-jzhou-a088c3f119331b93 100k tests/slow/ParallelRestoreNewBackupCorrectnessAtomicOp.toml 20250321-025532-jzhou-77e2dbd5fd035157 * Update fdbserver/BackupWorker.actor.cpp Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> --------- Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> * Disable version vector unicast with idempotent transactions (#12039) * Disable version vector unicast with idempotent transactions * Respond to review requests --------- Co-authored-by: Dan Lambright <hlambright@apple.com> * Reject Range Lock/Unlock Requests with Conflicting Range, User, or Lock Type (#12047) * fix range lock * make bulkload workload correct * fix bugs and improve test coverage * nits * address comments * nits * address comments * fix compilation errors * Added new metrics to the json schema (#109) --------- Signed-off-by: Eloi DEMOLIS <eloi.demolis@clever-cloud.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Zhe Wang <zhe.wang@wustl.edu> Co-authored-by: michael stack <stack@duboce.com> Co-authored-by: Eloi Démolis <43861898+Wonshtrum@users.noreply.github.com> Co-authored-by: flowguru <77984096+flowguru@users.noreply.github.com> Co-authored-by: Sreenath Bodagala <sbodagala@apple.com> Co-authored-by: Michael Stack <saintstack@users.noreply.github.com> Co-authored-by: Jingyu Zhou <jingyu_zhou@apple.com> Co-authored-by: Jingyu Zhou <jingyuzhou@gmail.com> Co-authored-by: Sreenath Bodagala <82616783+sbodagala@users.noreply.github.com> Co-authored-by: Syed Paymaan Raza <1238752+spraza@users.noreply.github.com> Co-authored-by: Vishesh Yadav <vishesh3y@gmail.com> Co-authored-by: Rudraditya Thakur <164143622+h4ck3r-04@users.noreply.github.com> Co-authored-by: Yao Xiao <87789492+yao-xiao-github@users.noreply.github.com> Co-authored-by: Dan Lambright <dlambrig@gmail.com> Co-authored-by: Dan Lambright <hlambright@apple.com> Co-authored-by: Johannes Scheuermann <johscheuer@users.noreply.github.com> Co-authored-by: Tarik Demirci <tarikdemirci@users.noreply.github.com> Co-authored-by: Jon Chambers <jon.chambers@gmail.com> Co-authored-by: Kornelijus Survila <kornholijo@gmail.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Johannes M. Scheuermann <jscheuermann@apple.com> Co-authored-by: daleiz <30970925+daleiz@users.noreply.github.com> Co-authored-by: gm42 <16498973+gm42@users.noreply.github.com> Co-authored-by: neethuhaneesha <nbingi@apple.com> Co-authored-by: hao fu <hfu5@apple.com> Co-authored-by: Vivek Raj <77738940+rajv79@users.noreply.github.com> Co-authored-by: Vishesh Yadav <vishesh_yadav@apple.com> Co-authored-by: Oleg Samarin <osamarin68@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See boostorg/boost#996
Code-Reviewer Section
The general pull request guidelines can be found here.
Please check each of the following things and check all boxes before accepting a PR.
For Release-Branches
If this PR is made against a release-branch, please also check the following:
release-branch
ormain
if this is the youngest branch)