Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make CAS and AC server not to use the same store #1175

Conversation

aleksdmladenovic
Copy link
Contributor

@aleksdmladenovic aleksdmladenovic commented Jul 19, 2024

Description

When CAS and AC uses the same store, it might produce unexpected side-effects and wrong behaviors. To prevent system from such situation, it currently raises an error when CAS and AC uses the same store.

Fixes #768

Type of change

Please delete options that aren't relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • This change requires a documentation update

How Has This Been Tested?

Please also list any relevant details for your test configuration

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@aleksdmladenovic
Copy link
Contributor Author

Hope you to review this. cc: @allada , @MarcusSorealheis

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 2 discussions need to be resolved

a discussion (no related file):
We should also create an integration test for this.



src/bin/nativelink.rs line 230 at r1 (raw file):

        // Check if CAS and AC uses the same store.
        if let Some(cas_config) = &services.cas {

This is O(N^2). Can we instead create a hashmap, of one of them then index into the other to check? This will make it O(max(N, M)) which is literally exponentially more efficient.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 2 discussions need to be resolved

a discussion (no related file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

We should also create an integration test for this.

+1


Copy link
Contributor Author

@aleksdmladenovic aleksdmladenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 2 discussions need to be resolved

a discussion (no related file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

+1

I will add integration test in mins.



src/bin/nativelink.rs line 230 at r1 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

This is O(N^2). Can we instead create a hashmap, of one of them then index into the other to check? This will make it O(max(N, M)) which is literally exponentially more efficient.

My algorithm is already O(N * logM).
I do not agree to this.
Well, fyi, cas_config and ac_config are already hashmaps here.
There is only one for loop here for looping through the hashmap cas_config.
And then ac_config.get(instance_name) - this part is O(logM) because ac_config is already HashMap.
Since we have hashmaps by default there's no need to create a new hashmap or something.

Hope you to check the code again. cc: @allada

Copy link
Contributor Author

@aleksdmladenovic aleksdmladenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 2 discussions need to be resolved

a discussion (no related file):

Previously, aleksdmladenovic (Aleksandar Mladenovic) wrote…

I will add integration test in mins.

cc: @allada , @MarcusSorealheis


@MarcusSorealheis
Copy link
Collaborator

Oh wait, @aleksdmladenovic you're already working on this issue. I'll check it now.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a test, for sure. All you need to do is include a config JSON where the AC and CAS use the same store type.

Reviewable status: 0 of 1 LGTMs obtained, and 2 discussions need to be resolved

@aleksdmladenovic aleksdmladenovic force-pushed the prevent-store-share branch 2 times, most recently from 452ba6b to 5aba799 Compare July 22, 2024 18:15
@CLAassistant
Copy link

CLAassistant commented Jul 22, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor Author

@aleksdmladenovic aleksdmladenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an integration test for this. cc: @allada , @MarcusSorealheis

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 2 discussions need to be resolved

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some weird failure not related to the PR, I believe, at first glance. It was caused by some permissions issue with Tekton so I re-ran the commit. We should not be able to fail due to Tekton cc @aaronmondal in case it happens again.

I'll dig into the test now.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Remote / large-ubuntu-22.04, integration-tests (20.04), integration-tests (22.04), pre-commit-checks, and 2 discussions need to be resolved

a discussion (no related file):

Previously, aleksdmladenovic (Aleksandar Mladenovic) wrote…

cc: @allada , @MarcusSorealheis

Needs test.


Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You also have some pre-commit issues to work out. Probably something like trailing whitespace.

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Remote / large-ubuntu-22.04, integration-tests (20.04), integration-tests (22.04), pre-commit-checks, and 2 discussions need to be resolved

Copy link
Contributor Author

@aleksdmladenovic aleksdmladenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. @MarcusSorealheis

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Analyze (javascript-typescript), Analyze (python), Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, pre-commit-checks, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 2 discussions need to be resolved

a discussion (no related file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

Needs test.

Done. cc: @allada , @MarcusSorealheis


@aleksdmladenovic aleksdmladenovic force-pushed the prevent-store-share branch 2 times, most recently from 89c5ba1 to 9d5e661 Compare July 24, 2024 02:08
Copy link
Contributor Author

@aleksdmladenovic aleksdmladenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made it pass all CI/CD. FYI, I've installed nix environment locally and finally it went through all pre-commit checks.
For my further commits, it won't fall into pre-commit issues again. :)

Waiting for your feedback. cc: @allada, @MarcusSorealheis

Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Publish nativelink-worker-init, Remote / large-ubuntu-22.04, and 2 discussions need to be resolved

Copy link
Member

@allada allada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 3 discussions need to be resolved


integration_tests/store_conflict_test.sh line 26 at r4 (raw file):

EXIT_CODE=0
sudo docker compose --file docker-compose.test.yml up -d

@aaronmondal , this feels like the wrong way to use docker for integration tests. Do you see any low hanging fruit on how to make this less hacky?

Copy link
Member

@aaronmondal aaronmondal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 3 discussions need to be resolved


integration_tests/store_conflict_test.sh line 26 at r4 (raw file):

Previously, allada (Nathan (Blaise) Bruer) wrote…

@aaronmondal , this feels like the wrong way to use docker for integration tests. Do you see any low hanging fruit on how to make this less hacky?

First I thought no, but actually yes!

For things that only need to test specific config options we can start the executable directly without wrapping it in Docker. The nativelink-is-installable test does this:

  1. The test: https://github.com/TraceMachina/nativelink/blob/main/tools/nativelink-is-executable-test.nix (you could create the nativelink config inline in the script).
  2. Import it in the flake:
    nativelink-is-executable-test = import ./tools/nativelink-is-executable-test.nix {inherit pkgs nativelink;};
  3. Export it as flake output:
    inherit publish-ghcr local-image-test nativelink-is-executable-test nativelink nativelink-debug native-cli lre-cc nativelink-worker-init;
  4. Add it as a github action or in this case just add it as a step above the step here so it can reuse the cache:
    - name: Test nix run

@MarcusSorealheis
Copy link
Collaborator

Magicians.

Copy link
Contributor Author

@aleksdmladenovic aleksdmladenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 LGTMs obtained, and 3 discussions need to be resolved


integration_tests/store_conflict_test.sh line 26 at r4 (raw file):

Previously, aaronmondal (Aaron Siddhartha Mondal) wrote…

First I thought no, but actually yes!

For things that only need to test specific config options we can start the executable directly without wrapping it in Docker. The nativelink-is-installable test does this:

  1. The test: https://github.com/TraceMachina/nativelink/blob/main/tools/nativelink-is-executable-test.nix (you could create the nativelink config inline in the script).
  2. Import it in the flake:
    nativelink-is-executable-test = import ./tools/nativelink-is-executable-test.nix {inherit pkgs nativelink;};
  3. Export it as flake output:
    inherit publish-ghcr local-image-test nativelink-is-executable-test nativelink nativelink-debug native-cli lre-cc nativelink-worker-init;
  4. Add it as a github action or in this case just add it as a step above the step here so it can reuse the cache:
    - name: Test nix run

Thanks. @aaronmondal.

Let me try that way. :)

@aleksdmladenovic aleksdmladenovic force-pushed the prevent-store-share branch 2 times, most recently from f35455b to 36b0491 Compare August 2, 2024 04:25
Copy link
Contributor Author

@aleksdmladenovic aleksdmladenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I added an integration test for checking this feature in a non-hacky way using nix and github nix workflow in a way as @aaronmondal has mentioned. :)

I hope you to check this and work on other issues.

cc: @aaronmondal . @allada , @MarcusSorealheis

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 6 files reviewed, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 3 discussions need to be resolved

@MarcusSorealheis
Copy link
Collaborator

This feature is something we'd ideally like to get into the next release.

Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 6 files reviewed, and 5 discussions need to be resolved


src/bin/nativelink.rs line 254 at r5 (raw file):

                            return Err(make_err!(
                                Code::InvalidArgument,
                                "CAS and AC use the same store '{}' in the config",

The error text here should be:

"CAS and AC cannot use the same store '{}' in the config"

It's open question for @bclark8923 and @adam-singer as to whether or not we should also link to the documentation on this issue.


tools/nativelink-prevent-store-conflict-test.nix line 62 at r5 (raw file):

  print_error_output=$(cat <<EOF
  Error: Error { code: InvalidArgument, messages: ["CAS and AC use the same store 'FILESYSTEM_STORE' in the config"] }

again, we should be clear that users cannot use

Copy link
Contributor Author

@aleksdmladenovic aleksdmladenovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed your changes. cc: @MarcusSorealheis, @allada

Reviewable status: 0 of 2 LGTMs obtained, and 0 of 6 files reviewed, and pending CI: Bazel Dev / ubuntu-22.04, Cargo Dev / macos-13, Cargo Dev / ubuntu-22.04, Installation / macos-13, Installation / macos-14, Installation / ubuntu-22.04, Local / ubuntu-22.04, Publish image, Publish nativelink-worker-init, Publish nativelink-worker-lre-cc, Remote / large-ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), integration-tests (20.04), integration-tests (22.04), macos-13, ubuntu-20.04 / stable, ubuntu-22.04, ubuntu-22.04 / stable, windows-2022 / stable, and 5 discussions need to be resolved


src/bin/nativelink.rs line 254 at r5 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

The error text here should be:

"CAS and AC cannot use the same store '{}' in the config"

It's open question for @bclark8923 and @adam-singer as to whether or not we should also link to the documentation on this issue.

Done.


tools/nativelink-prevent-store-conflict-test.nix line 62 at r5 (raw file):

Previously, MarcusSorealheis (Marcus Eagan) wrote…

again, we should be clear that users cannot use

Done.

@aleksdmladenovic
Copy link
Contributor Author

Rebased. cc: @MarcusSorealheis

@aleksdmladenovic
Copy link
Contributor Author

Hope you to merge this. cc: @allada, @MarcusSorealheis, @aaronmondal

When CAS and AC uses the same store, it might produce unexpected
side-effects and wrong behaviors. To prevent system from such situation,
it currently raises an error when CAS and AC uses the same store.
An integration test is added for this using nix test file and github's
nix workflow.
@aleksdmladenovic
Copy link
Contributor Author

Rebased and handled conflicts.

Hope you to check this. cc: @MarcusSorealheis, @allada, @aaronmondal

@MarcusSorealheis
Copy link
Collaborator

#1555 obviates this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NativeLink Should Not Start if CAS and AC are the Same
5 participants