Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-768 Store config uniqueness enforced by StoreManager #1555

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

mkeen
Copy link

@mkeen mkeen commented Dec 29, 2024

Description

Ensuring that when new Stores are added to StoreManager, that memory, experimental, filesystem, grpc, and redis_store stores will not experience overlap within a StoreManager instance. The same data store may be used for multiple values for the listed types only if relevant StoreSpec configuration values differ (such as connection details, key namespace, etc). Dictated by disallow_duplicates_digest implementation being present on the relevant StoreSpec enum type.

Primary changes:

  • Added a new helper: make_and_add_store_to_manager to make sure the store_factory function is not used for config that overlaps on enforced Store entities
  • Fortified add_store so that duplicate-named items cannot be added
  • Ensuring that unique-enforced StoreSpec entities are defined in stores.rs and enforced by store_manager.rs.
  • Ensured that StoreSpec entities that don't need to be unique aren't encumbered by uniqueness enforcement

DRAFT STATUS: Need to finalize tests and do a bit more cleanup where possible

Fixes #768

Type of change

  • [ x ] Bug fix (non-breaking change which fixes an issue)
  • [ x ] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to
    not work as expected)
  • [ x ] This change requires a documentation update

How Has This Been Tested?

Please also list any relevant details for your test configuration

Checklist

  • Updated documentation if needed
  • [ x ] Tests added/amended
  • [ x ] bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@CLAassistant
Copy link

CLAassistant commented Dec 29, 2024

CLA assistant check
All committers have signed the CLA.

@mkeen mkeen changed the title GH-768 Store config uniqueness enforiced by StoreManager GH-768 Store config uniqueness enforced by StoreManager Dec 29, 2024
Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond simplification requested, everything looks good.

.await?,
);
&StoreSpec::filesystem(FilesystemSpec {
content_path: current_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too complicated. Would be simpler to keep MemorySpec here. I know it isn't pure but we don't have consistent way to test FilesystemStore. We need one.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @mkeen There's a few things to tidy up and I also want one of two other maintainers to look at the rename and deletion of store_factory. After lots of thought, I'm in favor of that change, but we simply need to boost awareness.

I want to think just for a few more hours also about what the error text should say when there are conflicting store configurations. What you have is fine, but can it be improved. I'm not sure.

Ok(data_len.try_into().unwrap())
}

async fn make_store_manager() -> Result<Arc<StoreManager>, Error> {
let store_manager = Arc::new(StoreManager::new());
store_manager.add_store(
make_and_add_store_to_manager(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, what is the reason why you moved away from store_factory rather than modifying store_factory?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still using store_factory to build the Store, but wrapping up in make_and_add_store_to_manager because it makes building and adding a Store to a StoreManager a single step from a public-facing perspective.

@@ -35,7 +35,7 @@ use nativelink_proto::google::devtools::build::v1::{
PublishBuildToolEventStreamRequest, PublishLifecycleEventRequest, StreamId,
};
use nativelink_service::bep_server::BepServer;
use nativelink_store::default_store_factory::store_factory;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems heavy at first read but maybe it is an improvement to both the naming and the functionality. store_factory as a name says nothing about what its function is, but developers could go look at the definition. make_and_add_store_to_manager is definitely clearer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store_factory is still used but I made private and think that make_and_add_store_to_manager is better as a front-facing API because it makes and associates the Store with a StoreManager -- rather than leaving them as separate steps. The door is still open to manually build any Store instance though and call add_store on StoreManager.

@@ -0,0 +1,48 @@
// Copyright 2024 The NativeLink Authors. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PERFECT!

@@ -12,11 +12,21 @@
// See the License for the specific language governing permissions and
// limitations under the License.




Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these spaces for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove these and make sure formatting gets run again. My mistake on these spaces here.

use std::collections::{HashMap, HashSet};



Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and these? I mean empty lines.





Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and these?

@@ -45,7 +55,7 @@ use nativelink_service::cas_server::CasServer;
use nativelink_service::execution_server::ExecutionServer;
use nativelink_service::health_server::HealthServer;
use nativelink_service::worker_api_server::WorkerApiServer;
use nativelink_store::default_store_factory::store_factory;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allada and @aaronmondal you two are going to need to sign off on moving away from store_factory. I would say this new name is more self-describing for sure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The big thing here guys is that make_and_add_store_to_manager enforces that all factory-built stores are part of a StoreManager so it's less likely that an unmanaged Store would be hanging around. Still possible to build them separately and pass into add_store this way though (useful for testing and probably other reasons). The old store_factory is still there and used by the new make_and_add_store_to_manager function, but store_factory is no longer public in this changeset.

@@ -13,7 +13,9 @@
// limitations under the License.

use std::collections::HashMap;
use std::ptr;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this.. will remove

}

for existing_store in stores.values().into_iter() {
if ptr::eq(&store, existing_store) {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this check. Thanks to rust's memory model and the fact that in this design, ownership of the struct instance is passed to the StoreManager on line 58. We don't have Store instance pointer being passed around with a risk of passing the same pointer N times. We have a single instance of which ownership is transferred exactly once in guaranteed fashion (thanks Rust).

So instance protection is no longer a goal or needed. But I do think the config overlap detection is valuable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Rust is an underrated comment.

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 14 files reviewed, and pending CI: Bazel Dev / macos-13, Bazel Dev / macos-14, Bazel Dev / ubuntu-24.04, NativeLink.com Cloud / Remote Cache / macos-14, NativeLink.com Cloud / Remote Cache / ubuntu-24.04, Remote / lre-cc / large-ubuntu-22.04, Remote / lre-rs / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (22.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-22.04, and 12 discussions need to be resolved


nativelink-config/src/stores.rs line 435 at r11 (raw file):

}

impl StoreSpec {

nit: We don't allow complex functions in the config. We use these configs as simple definitions with some macro magic (like number->string conversions and such in serde).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NativeLink Should Not Start if CAS and AC are the Same
4 participants