Skip to content

fix: dataloss due to contention at stream creation #1258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 21, 2025

Conversation

de-sh
Copy link
Contributor

@de-sh de-sh commented Mar 20, 2025

Fixes #XXXX.

Description

Ensure streams are not created in a contentious manner, which may lead to dataloss

Testing Methodology

  1. Create a new stream with /api/vi/ingest only
  2. Use k6 at 25vus to ensure concurrent requests for stream creation
  3. Observe logs to see how streams are not being created if they have already been created

This PR has:

  • been tested to ensure log ingestion and log query works.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added documentation for new or modified features or behaviors.

Summary by CodeRabbit

Summary by CodeRabbit

  • Refactor

    • Stream handling has been improved to automatically detect and use existing streams or create new ones as needed. This streamlined approach enhances system reliability and efficiency during stream operations, resulting in a smoother and more robust user experience.
  • New Features

    • Added the ability to clone LogStreamMetadata instances, allowing for easier management of stream metadata.

Sorry, something went wrong.

Verified

This commit was signed with the committer’s verified signature.
stuarthendren Stuart Hendren
Copy link

coderabbitai bot commented Mar 20, 2025

Walkthrough

This pull request revises stream management by replacing the create method with get_or_create across relevant implementations. In both the Parseable implementation and the Streams struct, the new method now checks for the existence of a stream—returning an existing stream if found or creating and storing a new one if not. These changes update method signatures and adjust the locking mechanism, streamlining the process for safer concurrent access and reducing potential errors when a stream already exists.

Changes

File(s) Change Summary
src/parseable/mod.rs
src/parseable/streams.rs
Replaced the create method with get_or_create in the Parseable implementation and Streams struct. Updated method signatures and locking logic to conditionally retrieve or create streams.
src/metadata.rs Updated the LogStreamMetadata struct to derive the Clone trait, allowing instances to be cloned.

Possibly related PRs

Suggested labels

for next release

Suggested reviewers

  • nikhilsinhaparseable

Poem

Hopping through code with a joyful beat,
I found a fix both clever and neat.
Streams now check before they create,
Ensuring smooth flow—no errors await.
A bunny sings of clearer routes and fun,
Happy changes for everyone! 🐰✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Sorry, something went wrong.

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 20, 2025

Verified

This commit was signed with the committer’s verified signature.
stuarthendren Stuart Hendren
Signed-off-by: Devdutt Shenoi <devdutt@outlook.in>
coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 20, 2025

Verified

This commit was signed with the committer’s verified signature.
stuarthendren Stuart Hendren
Signed-off-by: Devdutt Shenoi <devdutt@outlook.in>
@de-sh
Copy link
Contributor Author

de-sh commented Mar 20, 2025

@coderabbitai, are you even checking the code comments for typos and semantics! 😭

Copy link

coderabbitai bot commented Mar 20, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the list of files modified in this PR
git diff --name-only HEAD~1 HEAD

# Check the content of the modified files to look at the comments
cat src/parseable/mod.rs
cat src/parseable/streams.rs

Length of output: 73435


@de-sh I do check for typos in comments, but I may have missed some. Sorry about that! 😅 Could you point out which specific comments have typos so I can improve my review quality? I'd be happy to pay more attention to comment quality in both code and reviews going forward.

Is this related to the documentation comment for the get_or_create method that you added to fix the contention issue?

🐰 (Feeling a bit embarrassed)

coderabbitai[bot]
coderabbitai bot previously approved these changes Mar 20, 2025
Copy link

coderabbitai bot commented Mar 20, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the diff to find code comments in the modified sections
git diff HEAD~1 src/parseable/mod.rs src/parseable/streams.rs

Length of output: 250


🏁 Script executed:

#!/bin/bash
# Get the PR information to check the diff
gh pr view 1258 --json files

Length of output: 166


🏁 Script executed:

#!/bin/bash
# Fetch the PR diff to examine the code comments
gh pr view 1258 --json files,commits,additions,deletions
gh pr diff 1258

Length of output: 3565


@de-sh You're absolutely right, and I apologize for missing this! I should have been more attentive to the code comments during my review. I see that you had to make separate commits titled "doc: improve messaging" and "doc: fix typo" to address these issues. Rest assured, I'll be more diligent about checking comments for typos and semantic clarity in future reviews. Thank you for bringing this to my attention.


✏️ Learnings added
Learnt from: de-sh
URL: https://github.com/parseablehq/parseable/pull/1258

Timestamp: 2025-03-20T15:50:45.329Z
Learning: Pay close attention to code comments for typos and semantic clarity during reviews for the Parseable project.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@de-sh
Copy link
Contributor Author

de-sh commented Mar 20, 2025

Thank you ❤️

Verified

This commit was signed with the committer’s verified signature.
stuarthendren Stuart Hendren
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/parseable/streams.rs (1)

740-742: Documentation looks good but minor typo.

The documentation clearly explains the intended behavior of the new method. This is an important change that helps prevent contention among threads during stream creation.

Correct the typo in the word "exclusive":

-    /// Checks after getting an excluse lock whether the stream already exists, else creates it.
+    /// Checks after getting an exclusive lock whether the stream already exists, else creates it.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 409a4db and ae8ee53.

📒 Files selected for processing (2)
  • src/metadata.rs (1 hunks)
  • src/parseable/streams.rs (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (10)
  • GitHub Check: coverage
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (3)
src/parseable/streams.rs (2)

743-758: Well-implemented locking mechanism to prevent contention.

The implementation properly acquires an exclusive write lock before checking if the stream exists, providing thread safety. If the stream exists, it returns a clone of the existing stream; otherwise, it creates a new one.


1196-1303: Great test coverage for the new functionality.

The unit tests thoroughly verify the behavior of the get_or_create method, covering:

  1. Returning existing streams
  2. Creating new streams when they don't exist
  3. Handling concurrent access with multiple threads

This comprehensive test suite helps ensure the implementation is robust and correctly addresses the contention issue.

src/metadata.rs (1)

77-77: Good addition of the Clone trait to support the new functionality.

Adding the Clone trait to LogStreamMetadata is necessary to support the get_or_create method's ability to return a clone of an existing stream. This change aligns well with the PR's goal of preventing data loss from contention.

@de-sh
Copy link
Contributor Author

de-sh commented Mar 20, 2025

@coderabbitai the nitpick is pointing to a typo that was already fixed in a previous commit.

Copy link

coderabbitai bot commented Mar 20, 2025

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the commit history to check for the "doc: fix typo" commit
git log --oneline -n 10

# Get my recent comments on the PR to identify the nitpick
gh pr view 1258 --json comments | jq '.comments[] | select(.author.login == "coderabbitai[bot]") | {body: .body, createdAt: .createdAt}' | tail -n 3

Length of output: 150


🏁 Script executed:

#!/bin/bash
# Get the commit history for this PR
gh pr view 1258 --json commits | jq '.commits[] | {message: .messageHeadline, date: .committedDate}'

# Get the comments on the PR to find the nitpick comment related to a typo
gh pr view 1258 --json comments | jq '.comments[] | {author: .author.login, body: .body, url: .url, createdAt: .createdAt}'

Length of output: 16064


🏁 Script executed:

#!/bin/bash
# Get all review comments on the PR to find the nitpick comment related to a typo
gh pr view 1258 --json reviews | jq '.reviews[] | select(.author.login == "coderabbitai[bot]") | {body: .body, comments: .comments}'

Length of output: 134


🏁 Script executed:

#!/bin/bash
# Get all review comments on the PR
gh pr view 1258 --json reviewComments | jq '.reviewComments[] | {id: .id, body: .body, author: .author.login, path: .path, line: .line, diffHunk: .diffHunk, commitId: .commitId}'

# Show diff between commits to see what typos were fixed
git log --patch --reverse $(git merge-base HEAD~5 HEAD)..HEAD | grep -A 20 "doc: fix typo" | grep -A 20 "diff --git"

Length of output: 817


🏁 Script executed:

#!/bin/bash
# Get all reviews and their comments on the PR
gh pr view 1258 --json reviews | jq '.reviews[] | {author: .author.login, comments: .comments[] | {body: .body, path: .path, position: .position, diffHunk: .diffHunk, commitId: .commitId}}'

# Also get the commit messages and changes for doc-related commits
git log --oneline | grep "doc:" | head -n 5

# Show diff for doc-related commits
git show $(git log --oneline | grep "doc: fix typo" | awk '{print $1}')

Length of output: 249


🏁 Script executed:

#!/bin/bash
# Look at the PR diff to see changes in documentation
gh pr diff 1258

# Check for recent line comments on the PR that might be the nitpick
gh pr view 1258 --json comments -q '.comments[] | select(.createdAt > "2025-03-20T16:00:00Z") | {author: .author.login, body: .body, createdAt: .createdAt, url: .url}'

# Get specific review information by ID
gh api repos/parseablehq/parseable/pulls/1258/reviews

Length of output: 22146


@de-sh I apologize for the confusion. You're absolutely right - my nitpick about the "excluse" vs "exclusive" typo in the documentation comment was already fixed in your "doc: fix typo" commit before I made my review comment. I should have been more careful to review the latest version of the code instead of commenting on an outdated version.

Thank you for pointing this out. I'll be more diligent about checking the latest commits before making review comments in the future.

Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to merge

@nitisht nitisht merged commit 1afa318 into parseablehq:main Mar 21, 2025
14 checks passed
@de-sh de-sh deleted the fix-stream branch March 21, 2025 04:50
@coderabbitai coderabbitai bot mentioned this pull request Apr 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants