-
-
Notifications
You must be signed in to change notification settings - Fork 148
add telemetry type to a dataset #1392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add telemetry type to a dataset #1392
Conversation
WalkthroughThe changes introduce a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant HTTPHandler
participant Parseable
participant Storage
participant Metadata
Client->>HTTPHandler: POST /ingest (with x-p-telemetry-type header)
HTTPHandler->>Parseable: create_stream_if_not_exists(..., telemetry_type)
Parseable->>Storage: create_stream(..., telemetry_type)
Storage->>Metadata: Store LogStreamMetadata with telemetry_type
Metadata-->>Storage: Acknowledge
Storage-->>Parseable: Stream created/exists
Parseable-->>HTTPHandler: Result
HTTPHandler-->>Client: HTTP Response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used🧠 Learnings (2)📓 Common learnings
src/prism/home/mod.rs (4)Learnt from: de-sh Learnt from: nikhilsinhaparseable Learnt from: nikhilsinhaparseable Learnt from: nikhilsinhaparseable ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
🔇 Additional comments (5)
✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/connectors/kafka/processor.rs
(1 hunks)src/migration/stream_metadata_migration.rs
(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
src/connectors/kafka/processor.rs (4)
Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)
ensures proper error handling in both modes.
Learnt from: nikhilsinhaparseable
PR: #1276
File: src/prism/logstream/mod.rs:0-0
Timestamp: 2025-03-28T06:17:01.201Z
Learning: In the Parseable datasets API, specific stream names don't need to be logged in error cases because the API is called from the Parseable UI where only authorized users can access and the streams in the request are pre-filtered based on user authorization.
Learnt from: nikhilsinhaparseable
PR: #1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.
src/migration/stream_metadata_migration.rs (3)
Learnt from: nikhilsinhaparseable
PR: #1271
File: src/prism/home/mod.rs:207-224
Timestamp: 2025-03-26T06:13:48.898Z
Learning: In the Parseable codebase, if a stream is found, the stream_jsons array will always have at least one element. Additionally, for any valid stream_json object, the log_source array will always have at least one element. This is a design invariant that makes additional null checks unnecessary.
Learnt from: nikhilsinhaparseable
PR: #1263
File: src/handlers/http/ingest.rs:300-310
Timestamp: 2025-03-26T06:44:53.362Z
Learning: In Parseable, every stream is always associated with a log_source - no stream can exist without a log_source. For otel-traces and otel-metrics, strict restrictions are implemented where ingestion is rejected if a stream already has a different log_source format. However, regular logs from multiple log_sources can coexist in a single stream.
Learnt from: de-sh
PR: #1185
File: src/handlers/http/logstream.rs:255-261
Timestamp: 2025-02-14T09:49:25.818Z
Learning: In Parseable's logstream handlers, stream existence checks must be performed for both query and standalone modes. The pattern !PARSEABLE.streams.contains(&stream_name) && (PARSEABLE.options.mode != Mode::Query || !PARSEABLE.create_stream_and_schema_from_storage(&stream_name).await?)
ensures proper error handling in both modes.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: Quest Smoke and Load Tests for Standalone deployments
- GitHub Check: Quest Smoke and Load Tests for Distributed deployments
- GitHub Check: Build Default x86_64-apple-darwin
- GitHub Check: Build Default aarch64-unknown-linux-gnu
- GitHub Check: Build Default x86_64-pc-windows-msvc
- GitHub Check: Build Default x86_64-unknown-linux-gnu
- GitHub Check: Build Default aarch64-apple-darwin
- GitHub Check: Build Kafka aarch64-apple-darwin
- GitHub Check: Build Kafka x86_64-unknown-linux-gnu
- GitHub Check: coverage
🔇 Additional comments (3)
src/migration/stream_metadata_migration.rs (3)
20-25
: LGTM! Import changes are appropriate.The addition of
TelemetryType
import andjson
macro import are necessary for the new v6_v7 migration function.
233-270
: Excellent migration implementation with robust error handling.The
v6_v7
migration function correctly:
- Updates version fields to current constants
- Safely navigates the nested JSON structure with proper null handling
- Maps OTEL-specific log source formats to appropriate telemetry types
- Defaults sensibly to
TelemetryType::Logs
for unrecognized formats- Preserves existing
telemetry_type
values to avoid overwriting customizationsThe error handling with
unwrap_or("json")
ensures the migration won't fail on malformed metadata.
341-402
: Excellent comprehensive test coverage for v6_v7 migration.The new test suite thoroughly covers all critical scenarios:
- ✅ All three OTEL format mappings (logs→logs, traces→traces, metrics→metrics)
- ✅ Default behavior for standard formats (json, kinesis→logs)
- ✅ Preservation of existing telemetry_type values
- ✅ Edge cases (missing log_source, empty arrays)
The test structure is consistent, names are descriptive, and expected outputs correctly reflect v7 format. This provides strong confidence in the migration function's reliability.
add optional header
X-P-Telemetry-Type
=logs/traces/metrics
to the ingestion and stream creation endpointsthis is to tag a dataset to one of the telemetry type - logs / traces or metrics
Summary by CodeRabbit
New Features
Improvements