Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow stream creation from ingestor in distributed deployments #980

Merged
merged 5 commits into from
Nov 12, 2024

Conversation

Anirudhxx
Copy link
Contributor

Fixes #825.

Description


As per @nikhilsinhaparseable comment in #825 (comment):

  • add auth token (named querier_auth_token) and domain/port (querier_endpoint) in parseable.json that tells you where and how to connect to query server
  • create a lock when querier is creating stream based on ingestor's signal - this is required for scenario when multiple ingestors send the stream creation request to querier at the same time and querier has to ensure it creates only once (TODO: need some help in testing this though)
  • migrate the existing parseable.json to add the new fields (from point 1) so that older deployments when migrated to this new change, can add this new field at the time of upgrade (migration code also in place, we need to update and create new version). (v5)

This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Copy link
Contributor

github-actions bot commented Nov 2, 2024

CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅

@Anirudhxx
Copy link
Contributor Author

I have read the CLA Document and I hereby sign the CLA

@Anirudhxx
Copy link
Contributor Author

recheck

nitisht added a commit to parseablehq/.github that referenced this pull request Nov 2, 2024
@nitisht
Copy link
Member

nitisht commented Nov 4, 2024

Thank you for the PR @Anirudhxx , we'll review and get back shortly

@nitisht nitisht requested review from nikhilsinhaparseable and parmesant and removed request for nikhilsinhaparseable November 4, 2024 11:50
@nikhilsinhaparseable
Copy link
Contributor

@Anirudhxx first of all thank you for sending the PR, I think you have done a great job understanding the codebase very well, secondly, you were able to understand the requirements really well as most of the expectations are met.

I have reviewed the code and did a test run to validate your PR, I see a few issues -

  1. parseable.json in storage, in querier staging, or in ingestor's staging -- all of them have this - "querier_endpoint":null,"querier_auth_token":null
  2. because of this, when ingestor tries to forward the stream creation request to the querier, it fails with below error-
thread 'actix-rt|system:0|arbiter:7' panicked at server/src/handlers/http/cluster/mod.rs:852:76:
called `Option::unwrap()` on a `None` value
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

please fix and let me know, thanks!

@Anirudhxx
Copy link
Contributor Author

@nikhilsinhaparseable Pushed 00fa1d3.

  • Handled the case where querier_endpoint and querier_auth_token will be populated in case where server mode is Query or All.
  • In Ingest mode this would be an empty string. But since that could never happen I suppose it should be fine. Let me know if there's a more elegant way to handle this?

@nikhilsinhaparseable
Copy link
Contributor

nikhilsinhaparseable commented Nov 6, 2024

@Anirudhxx the changes work well, thanks for the same.

Please take care of below issues -

  1. Say you have 2 ingestors and 1 querier, you use one of the ingestor and ingest in a new stream which forwards the create stream call to the querier. The querier again forwards the request to all the ingestors which is redundant for the ingestor which initialised the request.
    There are two ways to fix it -
    a. either let ingestor forward the request to all the other ingestors along with querier
    b. or querier forwards the request to all the other ingestors except the one which initialised the request

  2. I get an error when migrating from old to new deployment -
    thread 'main' panicked at server/src/storage/store_metadata.rs:281:64: called Result::unwrap()on anErrvalue: Error("missing fieldquerier_endpoint", line: 1, column: 255) note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

steps to reproduce -

  1. checkout main branch, compile and start the server in query mode, verify that parseable.json in storage as well as in staging does not have the endpoint or token
  2. stop the server
  3. checkout your branch, compile and start the server again, you will get this error.

you may also have to think of migrating from old standalone deployment to new deployment with endpoint and token in parseable.json and verify.

Thanks!

@nikhilsinhaparseable
Copy link
Contributor

@Anirudhxx did you get a chance to look at the changes requested to the PR?
please update.

Thanks!

@Anirudhxx
Copy link
Contributor Author

Anirudhxx commented Nov 11, 2024

  • querier forwards the request to all the other ingestors except the one which initialised the request
  • I get an error when migrating from old to new deployment
  • think of migrating from old standalone deployment to new deployment with endpoint and token in parseable.json and verify.
    • @nikhilsinhaparseable The parseable.json doesn't get migrated right after running the Server in Distributed mode, although it does migrate after rerunning the Query Server. These are the state changes the parseable.json file goes through. PTAL and lmk if this is a known bug or something introduced in this PR.
# After starting in Standalone mode
anirudh@ubuntu ~/parseable (feat/create-stream-ingestor)> bat --plain ./staging/.parseable.json | jq
{
  "version": "v4",
  "mode": "s3",
  "staging": "/home/anirudh/parseable/staging",
  "storage": "http://localhost:9000/parseable",
  "deployment_id": "01JCD0AMDVAKRKQRDS7A4HCRSN",
  "users": [],
  "streams": [],
  "server_mode": "All",
  "roles": {},
  "default_role": null
}

# Restart in Distributed Mode
anirudh@ubuntu ~/parseable (feat/create-stream-ingestor)> bat --plain ./staging/.parseable.json | jq
{
  "version": "v4",
  "mode": "s3",
  "staging": "/home/anirudh/parseable/staging",
  "storage": "http://localhost:9000/parseable",
  "deployment_id": "01JCD0AMDVAKRKQRDS7A4HCRSN",
  "users": [],
  "streams": [],
  "server_mode": "Query",
  "roles": {},
  "default_role": null,
  "querier_endpoint": null,
  "querier_auth_token": null
}

# Restart Again in Distributed Mode
anirudh@ubuntu ~/parseable (feat/create-stream-ingestor)> bat --plain ./staging/.parseable.json | jq
{
  "version": "v5",
  "mode": "s3",
  "staging": "/home/anirudh/parseable/staging",
  "storage": "http://localhost:9000/parseable",
  "deployment_id": "01JCD0AMDVAKRKQRDS7A4HCRSN",
  "users": [],
  "streams": [],
  "server_mode": "Query",
  "roles": {},
  "default_role": null,
  "querier_endpoint": "0.0.0.0:8000",
  "querier_auth_token": "Basic YWRtaW46YWRtaW4="
}
anirudh@ubuntu ~/parseable (feat/create-stream-ingestor)>

Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anirudhxx i just figured out that overwriting the Server Mode in parseable.json in storage from All to Query is not happening in the main branch also, we will get it fixed internally.
Your changes are good to merge.

Thanks a lot for the contribution!

@nitisht nitisht merged commit 46686dc into parseablehq:main Nov 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow Stream Creation from Ingestors in Distributed Deployments
3 participants