Skip to content

add telemetry type to a dataset #1392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 29, 2025

Conversation

nikhilsinhaparseable
Copy link
Contributor

@nikhilsinhaparseable nikhilsinhaparseable commented Jul 28, 2025

add optional header X-P-Telemetry-Type = logs/traces/metrics to the ingestion and stream creation endpoints
this is to tag a dataset to one of the telemetry type - logs / traces or metrics

Summary by CodeRabbit

  • New Features

    • Added support for a telemetry type field in stream metadata to categorize streams as logs, metrics, traces, or events.
    • Stream information endpoints now display telemetry type details.
    • HTTP endpoints accept and process a new telemetry type header for precise stream creation and ingestion.
  • Improvements

    • Upgraded object store and schema versions to v7 to support telemetry type.
    • Enhanced migration logic to add telemetry type information to existing streams during upgrades.

Copy link
Contributor

coderabbitai bot commented Jul 28, 2025

Walkthrough

The changes introduce a new TelemetryType enum and propagate a telemetry_type field throughout stream metadata, storage, and handler logic. This includes updating struct definitions, constructors, migration logic, and HTTP header parsing to support and persist telemetry type information for streams. Version constants are incremented to reflect schema changes.

Changes

Cohort / File(s) Change Summary
TelemetryType Enum & Key
src/handlers/mod.rs
Adds public TELEMETRY_TYPE_KEY constant and TelemetryType enum with conversion and display implementations.
Handler Logic: Ingest & OTEL
src/handlers/http/ingest.rs
Updates ingest and OTEL handler functions to extract and pass TelemetryType from headers or as fixed variants.
Logstream HTTP Handler
src/handlers/http/logstream.rs
Adds telemetry_type to StreamInfo construction and removes an extraneous blank line.
Logstream Utils
src/handlers/http/modal/utils/logstream_utils.rs
Adds telemetry_type field to PutStreamHeaders and parses it from HTTP headers.
Metadata Struct & Constructor
src/metadata.rs
Adds telemetry_type to LogStreamMetadata struct and its constructor.
Migration Logic
src/migration/mod.rs, src/migration/stream_metadata_migration.rs
Adds v6→v7 migration step handling telemetry_type; updates setup and migration flows accordingly.
Parseable & Stream Creation
src/parseable/mod.rs
Propagates telemetry_type through stream creation, update, and metadata management methods.
Prism Logstream
src/prism/logstream/mod.rs
Includes telemetry_type in StreamInfo returned by helper.
Storage Field Stats
src/storage/field_stats.rs
Passes TelemetryType::Logs to create_stream_if_not_exists in field stats logic.
Storage Structs & Versioning
src/storage/mod.rs
Adds telemetry_type to ObjectStoreFormat and StreamInfo, updates version constants to v7.
Kafka Processor
src/connectors/kafka/processor.rs
Adds default TelemetryType argument to stream creation call in event chunk processing.
Prism Home Dataset Type Refactor
src/prism/home/mod.rs
Replaces local DataSetType enum with TelemetryType and uses telemetry_type field from metadata for dataset type determination.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTPHandler
    participant Parseable
    participant Storage
    participant Metadata

    Client->>HTTPHandler: POST /ingest (with x-p-telemetry-type header)
    HTTPHandler->>Parseable: create_stream_if_not_exists(..., telemetry_type)
    Parseable->>Storage: create_stream(..., telemetry_type)
    Storage->>Metadata: Store LogStreamMetadata with telemetry_type
    Metadata-->>Storage: Acknowledge
    Storage-->>Parseable: Stream created/exists
    Parseable-->>HTTPHandler: Result
    HTTPHandler-->>Client: HTTP Response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • add dataset name and type to home api response #1271: Refactors src/prism/home/mod.rs to replace local DataSetType enum with TelemetryType and changes dataset type determination logic; modifies the same module but implements a different approach to dataset type handling.

Suggested labels

for next release

Suggested reviewers

  • parmesant

Poem

🐇 A telemetry tale hops bright and clear,
Through headers and streams it draws near.
Logs, metrics, traces, all aligned,
In every struct and code defined.
Version seven leads the way,
With Rabbit cheers, hip-hip-hooray! 🎉

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1055c81 and 4c00b5e.

📒 Files selected for processing (2)
  • src/connectors/kafka/processor.rs (2 hunks)
  • src/prism/home/mod.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/connectors/kafka/processor.rs
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.395Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.
src/prism/home/mod.rs (4)

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: nikhilsinhaparseable
PR: #1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.

Learnt from: nikhilsinhaparseable
PR: #1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: coverage
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
🔇 Additional comments (5)
src/prism/home/mod.rs (5)

32-35: LGTM: Clean import restructuring for TelemetryType

The import change properly brings in TelemetryType from the handlers module, aligning with the broader refactoring to use a unified telemetry type enum across the codebase.


43-44: LGTM: Type alias updated consistently

The StreamMetadataResponse type alias correctly reflects the new return type using TelemetryType instead of the previous local enum.


57-57: LGTM: Struct field type updated appropriately

The DataSet struct's dataset_type field now uses the unified TelemetryType, maintaining consistency with the overall refactoring.


207-207: LGTM: Function signature updated correctly

The return type of get_stream_metadata has been properly updated to include TelemetryType in the tuple, maintaining the same structure while using the new type.


236-236: Telemetry_type is always present on ObjectStoreFormat—no changes needed

The ObjectStoreFormat struct in src/storage/mod.rs includes

#[serde(default)]
pub telemetry_type: TelemetryType,

so deserialization will always populate telemetry_type (defaulting to TelemetryType::Logs if absent). The stream‐metadata migration (e.g. in stream_metadata_migration.rs) also explicitly inserts a telemetry_type for older formats, and there are tests covering every flow. You can safely use stream_jsons[0].telemetry_type without additional guards.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jul 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41f5c1e and 53b7462.

📒 Files selected for processing (2)
  • src/connectors/kafka/processor.rs (1 hunks)
  • src/migration/stream_metadata_migration.rs (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
src/connectors/kafka/processor.rs (4)

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

Learnt from: nikhilsinhaparseable
PR: #1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.

Learnt from: nikhilsinhaparseable
PR: #1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.

src/migration/stream_metadata_migration.rs (3)

Learnt from: nikhilsinhaparseable
PR: #1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
  • GitHub Check: Quest Smoke and Load Tests for Standalone deployments
  • GitHub Check: Quest Smoke and Load Tests for Distributed deployments
  • GitHub Check: Build Default x86_64-apple-darwin
  • GitHub Check: Build Default aarch64-unknown-linux-gnu
  • GitHub Check: Build Default x86_64-pc-windows-msvc
  • GitHub Check: Build Default x86_64-unknown-linux-gnu
  • GitHub Check: Build Default aarch64-apple-darwin
  • GitHub Check: Build Kafka aarch64-apple-darwin
  • GitHub Check: Build Kafka x86_64-unknown-linux-gnu
  • GitHub Check: coverage
🔇 Additional comments (3)
src/migration/stream_metadata_migration.rs (3)

20-25: LGTM! Import changes are appropriate.

The addition of TelemetryType import and json macro import are necessary for the new v6_v7 migration function.


233-270: Excellent migration implementation with robust error handling.

The v6_v7 migration function correctly:

  • Updates version fields to current constants
  • Safely navigates the nested JSON structure with proper null handling
  • Maps OTEL-specific log source formats to appropriate telemetry types
  • Defaults sensibly to TelemetryType::Logs for unrecognized formats
  • Preserves existing telemetry_type values to avoid overwriting customizations

The error handling with unwrap_or("json") ensures the migration won't fail on malformed metadata.


341-402: Excellent comprehensive test coverage for v6_v7 migration.

The new test suite thoroughly covers all critical scenarios:

  • ✅ All three OTEL format mappings (logs→logs, traces→traces, metrics→metrics)
  • ✅ Default behavior for standard formats (json, kinesis→logs)
  • ✅ Preservation of existing telemetry_type values
  • ✅ Edge cases (missing log_source, empty arrays)

The test structure is consistent, names are descriptive, and expected outputs correctly reflect v7 format. This provides strong confidence in the migration function's reliability.

coderabbitai[bot]
coderabbitai bot previously approved these changes Jul 28, 2025
@coderabbitai coderabbitai bot mentioned this pull request Jul 29, 2025
@nitisht nitisht merged commit 32bfc04 into parseablehq:main Jul 29, 2025
13 checks passed
@nikhilsinhaparseable nikhilsinhaparseable deleted the tag-telemetry-type branch July 29, 2025 07:49
@coderabbitai coderabbitai bot mentioned this pull request Jul 29, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants