add telemetry type to a dataset #1392

nikhilsinhaparseable · 2025-07-28T16:32:36Z

add optional header X-P-Telemetry-Type = logs/traces/metrics to the ingestion and stream creation endpoints
this is to tag a dataset to one of the telemetry type - logs / traces or metrics

Summary by CodeRabbit

New Features
- Added support for a telemetry type field in stream metadata to categorize streams as logs, metrics, traces, or events.
- Stream information endpoints now display telemetry type details.
- HTTP endpoints accept and process a new telemetry type header for precise stream creation and ingestion.
Improvements
- Upgraded object store and schema versions to v7 to support telemetry type.
- Enhanced migration logic to add telemetry type information to existing streams during upgrades.

coderabbitai · 2025-07-28T16:32:43Z

Walkthrough

The changes introduce a new TelemetryType enum and propagate a telemetry_type field throughout stream metadata, storage, and handler logic. This includes updating struct definitions, constructors, migration logic, and HTTP header parsing to support and persist telemetry type information for streams. Version constants are incremented to reflect schema changes.

Changes

Cohort / File(s)	Change Summary
TelemetryType Enum & Key `src/handlers/mod.rs`	Adds public `TELEMETRY_TYPE_KEY` constant and `TelemetryType` enum with conversion and display implementations.
Handler Logic: Ingest & OTEL `src/handlers/http/ingest.rs`	Updates ingest and OTEL handler functions to extract and pass `TelemetryType` from headers or as fixed variants.
Logstream HTTP Handler `src/handlers/http/logstream.rs`	Adds `telemetry_type` to `StreamInfo` construction and removes an extraneous blank line.
Logstream Utils `src/handlers/http/modal/utils/logstream_utils.rs`	Adds `telemetry_type` field to `PutStreamHeaders` and parses it from HTTP headers.
Metadata Struct & Constructor `src/metadata.rs`	Adds `telemetry_type` to `LogStreamMetadata` struct and its constructor.
Migration Logic `src/migration/mod.rs`, `src/migration/stream_metadata_migration.rs`	Adds v6→v7 migration step handling `telemetry_type`; updates setup and migration flows accordingly.
Parseable & Stream Creation `src/parseable/mod.rs`	Propagates `telemetry_type` through stream creation, update, and metadata management methods.
Prism Logstream `src/prism/logstream/mod.rs`	Includes `telemetry_type` in `StreamInfo` returned by helper.
Storage Field Stats `src/storage/field_stats.rs`	Passes `TelemetryType::Logs` to `create_stream_if_not_exists` in field stats logic.
Storage Structs & Versioning `src/storage/mod.rs`	Adds `telemetry_type` to `ObjectStoreFormat` and `StreamInfo`, updates version constants to v7.
Kafka Processor `src/connectors/kafka/processor.rs`	Adds default `TelemetryType` argument to stream creation call in event chunk processing.
Prism Home Dataset Type Refactor `src/prism/home/mod.rs`	Replaces local `DataSetType` enum with `TelemetryType` and uses `telemetry_type` field from metadata for dataset type determination.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTPHandler
    participant Parseable
    participant Storage
    participant Metadata

    Client->>HTTPHandler: POST /ingest (with x-p-telemetry-type header)
    HTTPHandler->>Parseable: create_stream_if_not_exists(..., telemetry_type)
    Parseable->>Storage: create_stream(..., telemetry_type)
    Storage->>Metadata: Store LogStreamMetadata with telemetry_type
    Metadata-->>Storage: Acknowledge
    Storage-->>Parseable: Stream created/exists
    Parseable-->>HTTPHandler: Result
    HTTPHandler-->>Client: HTTP Response

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

add dataset name and type to home api response #1271: Refactors src/prism/home/mod.rs to replace local DataSetType enum with TelemetryType and changes dataset type determination logic; modifies the same module but implements a different approach to dataset type handling.

Suggested labels

for next release

Suggested reviewers

parmesant

Poem

🐇 A telemetry tale hops bright and clear,
Through headers and streams it draws near.
Logs, metrics, traces, all aligned,
In every struct and code defined.
Version seven leads the way,
With Rabbit cheers, hip-hip-hooray! 🎉

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1055c81 and 4c00b5e.

📒 Files selected for processing (2)

src/connectors/kafka/processor.rs (2 hunks)
src/prism/home/mod.rs (4 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/connectors/kafka/processor.rs

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: nikhilsinhaparseable
PR: parseablehq/parseable#1392
File: src/migration/stream_metadata_migration.rs:303-322
Timestamp: 2025-07-28T17:10:39.395Z
Learning: In Parseable's migration system (src/migration/stream_metadata_migration.rs), each migration function updates the metadata to the current latest format using CURRENT_OBJECT_STORE_VERSION and CURRENT_SCHEMA_VERSION constants, rather than producing incremental versions. For example, v5_v6 function produces v7 format output when these constants are set to "v7", not v6 format.

src/prism/home/mod.rs (4)

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: nikhilsinhaparseable
PR: #1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.

Learnt from: nikhilsinhaparseable
PR: #1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: coverage
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments

🔇 Additional comments (5)

src/prism/home/mod.rs (5)

32-35: LGTM: Clean import restructuring for TelemetryType

The import change properly brings in TelemetryType from the handlers module, aligning with the broader refactoring to use a unified telemetry type enum across the codebase.

43-44: LGTM: Type alias updated consistently

The StreamMetadataResponse type alias correctly reflects the new return type using TelemetryType instead of the previous local enum.

57-57: LGTM: Struct field type updated appropriately

The DataSet struct's dataset_type field now uses the unified TelemetryType, maintaining consistency with the overall refactoring.

207-207: LGTM: Function signature updated correctly

The return type of get_stream_metadata has been properly updated to include TelemetryType in the tuple, maintaining the same structure while using the new type.

236-236: Telemetry_type is always present on ObjectStoreFormat—no changes needed

The ObjectStoreFormat struct in src/storage/mod.rs includes
#[serde(default)]
pub telemetry_type: TelemetryType,
so deserialization will always populate telemetry_type (defaulting to TelemetryType::Logs if absent). The stream‐metadata migration (e.g. in stream_metadata_migration.rs) also explicitly inserts a telemetry_type for older formats, and there are tests covering every flow. You can safely use stream_jsons[0].telemetry_type without additional guards.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41f5c1e and 53b7462.

📒 Files selected for processing (2)

src/connectors/kafka/processor.rs (1 hunks)
src/migration/stream_metadata_migration.rs (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

src/connectors/kafka/processor.rs (4)

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

Learnt from: nikhilsinhaparseable
PR: #1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.

Learnt from: nikhilsinhaparseable
PR: #1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.

src/migration/stream_metadata_migration.rs (3)

Learnt from: nikhilsinhaparseable
PR: #1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.

Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.

Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?) ensures proper error handling in both modes.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)

GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Default x86_64-apple-darwin
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: coverage

🔇 Additional comments (3)

src/migration/stream_metadata_migration.rs (3)

20-25: LGTM! Import changes are appropriate.

The addition of TelemetryType import and json macro import are necessary for the new v6_v7 migration function.

233-270: Excellent migration implementation with robust error handling.

The v6_v7 migration function correctly:

Updates version fields to current constants

Safely navigates the nested JSON structure with proper null handling

Maps OTEL-specific log source formats to appropriate telemetry types

Defaults sensibly to TelemetryType::Logs for unrecognized formats

Preserves existing telemetry_type values to avoid overwriting customizations

The error handling with unwrap_or("json") ensures the migration won't fail on malformed metadata.

341-402: Excellent comprehensive test coverage for v6_v7 migration.

The new test suite thoroughly covers all critical scenarios:

✅ All three OTEL format mappings (logs→logs, traces→traces, metrics→metrics)

✅ Default behavior for standard formats (json, kinesis→logs)

✅ Preservation of existing telemetry_type values

✅ Edge cases (missing log_source, empty arrays)

The test structure is consistent, names are descriptive, and expected outputs correctly reflect v7 format. This provides strong confidence in the migration function's reliability.

src/connectors/kafka/processor.rs

src/migration/stream_metadata_migration.rs

add telemetry type to a dataset

41f5c1e

coderabbitai bot previously approved these changes Jul 28, 2025

View reviewed changes

fix kafka, fix migration tests

53b7462

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via 53b7462 July 28, 2025 17:01

coderabbitai bot requested changes Jul 28, 2025

View reviewed changes

src/connectors/kafka/processor.rs Show resolved Hide resolved

src/migration/stream_metadata_migration.rs Show resolved Hide resolved

add import

1055c81

coderabbitai bot previously approved these changes Jul 28, 2025

View reviewed changes

update prism home api

4c00b5e

nikhilsinhaparseable dismissed coderabbitai[bot]’s stale review via 4c00b5e July 29, 2025 02:18

coderabbitai bot approved these changes Jul 29, 2025

View reviewed changes

coderabbitai bot mentioned this pull request Jul 29, 2025

Protobuf ingestion #1391

Merged

nitisht merged commit 32bfc04 into parseablehq:main Jul 29, 2025
13 checks passed

nikhilsinhaparseable deleted the tag-telemetry-type branch July 29, 2025 07:49

coderabbitai bot mentioned this pull request Jul 29, 2025

Modularize Alerts #1390

Merged

3 tasks

coderabbitai bot mentioned this pull request Aug 20, 2025

feat: latest_event_at in stream info #1409

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

add telemetry type to a dataset #1392

add telemetry type to a dataset #1392

Uh oh!

nikhilsinhaparseable commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

add telemetry type to a dataset #1392

add telemetry type to a dataset #1392

Uh oh!

Conversation

nikhilsinhaparseable commented Jul 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nikhilsinhaparseable commented Jul 28, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jul 28, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)