normalise field name: change prefix from @ to _ to allow proper querying #1514

nikhilsinhaparseable · 2026-01-08T10:26:26Z

all fields prefixed with @ will be renamed to have _ prefix
this is to make field queryable

Summary by CodeRabbit

New Features
- Fields whose names start with "@" are automatically normalized to use a "_" prefix before schema inference and validation.
Bug Fixes
- Detection of name collisions between original and normalized fields now triggers an error to prevent ambiguous data ingestion.
Chores
- Improved logging of ingestion errors and added documentation for the new key-normalization behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-08T12:06:04Z

Walkthrough

Introduces JSON key normalization: a public normalize_field_name utility replaces a leading @ with _, applied to incoming JSON keys via a new rename_json_keys function (with collision checks) and used when determining/overriding field types. Also adds logging of ingest errors.

Changes

Cohort / File(s)	Summary
Field name normalization utility `src/event/format/mod.rs`	Added `pub fn normalize_field_name(name: &mut String)` and used it in `override_data_type` to normalize field names before matching/upgrading types.
JSON key normalization `src/event/format/json.rs`	Added private `rename_json_keys` that renames object keys starting with `@` using `normalize_field_name`, preserves non-object values, and errors on key collisions; `to_data` now calls this before further processing.
Ingest error logging `src/handlers/http/ingest.rs`	Added `tracing::error` and logs the error string inside `error_response` via `error!("{self}")` before constructing the response.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

for next release

Poem

🐰
I hopped through keys with whiskers bright,
Swapped at-signs for underscores at night,
I checked for clashes, kept order true,
Tiny hops, big changes—hooray for new! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is too minimal; it lacks context, rationale, testing confirmation, and does not reference a related issue as the template requires.	Add sections describing the problem, solution rationale, key implementation details, and check off testing/documentation items from the template.
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly describes the main change: normalizing field names by replacing '@' prefix with '_' to enable querying.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @src/event/format/json.rs:
- Around line 121-122: collect_keys currently returns raw field names (e.g.,
"@foo") which are only normalized later by rename_json_keys, causing
derive_arrow_schema to miss matches and re-infer schemas; fix by normalizing the
collected keys immediately after collect_keys and before calling
derive_arrow_schema (i.e., pass rename_json_keys(collect_keys(...)) into
derive_arrow_schema or call rename_json_keys on the keys array returned by
collect_keys), ensuring functions rename_json_keys, collect_keys, and
derive_arrow_schema are updated accordingly so schema lookup uses the normalized
names.

🧹 Nitpick comments (2)

src/event/format/mod.rs (2)
60-67: Consider optimizing to avoid allocation.

The format! macro allocates a new String. Since the PR aims to "avoid extra allocation by mutating the var," consider using replace_range for in-place mutation:
⚡ More efficient in-place mutation
 #[inline]
 pub fn normalize_field_name(name: &mut String) {
-    if let Some(stripped) = name.strip_prefix('@') {
-        *name = format!("_{}", stripped);
+    if name.starts_with('@') {
+        name.replace_range(0..1, "_");
     }
 }
347-376: Logic is correct but consider adding clarification.

The normalization flow is correct: the field name is normalized for schema creation and pattern matching (line 357), while the original field name is used for value lookup from the JSON map (line 350). This works because JSON key normalization happens later in the pipeline via rename_json_keys.

Consider adding a brief comment at line 347-349 to clarify this subtle but important distinction for future maintainers.
📝 Suggested clarifying comment
         .map(|field| {
+            // Normalize field name for schema - JSON keys are normalized later in the pipeline
             let mut field_name = field.name().to_string();
             normalize_field_name(&mut field_name);
             match (schema_version, map.get(field.name())) {

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31a3ad9 and 277674d.

📒 Files selected for processing (2)

src/event/format/json.rs
src/event/format/mod.rs

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-18T09:59:20.177Z

Learnt from: nikhilsinhaparseable
Repo: parseablehq/parseable PR: 1415
File: src/metrics/mod.rs:700-756
Timestamp: 2025-09-18T09:59:20.177Z
Learning: In src/event/mod.rs, the parsed_timestamp used in increment_events_ingested_by_date() is correctly UTC-normalized: for dynamic streams it remains Utc::now(), and for streams with time partition enabled it uses the time partition value. Both cases result in proper UTC date strings for metrics labeling, preventing double-counting issues.

Applied to files:

src/event/format/mod.rs

🧬 Code graph analysis (1)

src/event/format/json.rs (2)

src/event/format/mod.rs (1)

normalize_field_name (63-67)

src/utils/arrow/mod.rs (1)

get_field (76-84)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: coverage
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Kafka aarch64-apple-darwin

🔇 Additional comments (1)

src/event/format/json.rs (1)

294-296: LGTM!

The normalization here is necessary and correct since the schema being validated against contains normalized field names (with "_" prefix instead of "@").

src/event/format/json.rs

all fields prefixed with `@` will be renamed to have `_` prefix this is to make field queryable

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/handlers/http/ingest.rs (1)
546-557: Fix: Avoid logging client errors at ERROR level; use conditional log levels.

The error!("{self}") syntax at line 547 is correct—error! is a tracing macro and will compile fine. However, the underlying concern is valid: logging all errors (including 4xx client errors) at ERROR level creates unnecessary noise. PostError::status_code() returns BAD_REQUEST (4xx) for ~19 error variants including malformed requests, invalid parameters, and stream not found conditions. These should be logged at WARN level, not ERROR. Reserve ERROR logging for server errors (5xx).

The proposed fix with conditional logging based on status code is appropriate:
Proposed fix
-use tracing::error;
+use tracing::{error, warn};

@@
     fn error_response(&self) -> actix_web::HttpResponse<actix_web::body::BoxBody> {
-        error!("{self}");
+        let status = self.status_code();
+        if status.is_client_error() {
+            warn!(status = %status, error = %self, "request failed");
+        } else {
+            error!(status = %status, error = %self, "request failed");
+        }
         match self {

🧹 Nitpick comments (1)

src/handlers/http/ingest.rs (1)

27-27: Prefer importing macros you’ll actually use (error!/warn!) after fixing the call site.

Once the logging call is corrected (see below), consider importing both macros (or using tracing::error! / tracing::warn! explicitly) to avoid another edit later.

📜 Review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 277674d and 0e0a8f8.

📒 Files selected for processing (3)

src/event/format/json.rs
src/event/format/mod.rs
src/handlers/http/ingest.rs

🚧 Files skipped from review as they are similar to previous changes (2)

src/event/format/json.rs
src/event/format/mod.rs

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-09-06T04:26:17.191Z

Learnt from: parmesant
Repo: parseablehq/parseable PR: 1424
File: src/enterprise/utils.rs:65-72
Timestamp: 2025-09-06T04:26:17.191Z
Learning: In Parseable's metastore implementation, MetastoreError::to_detail() returns a MetastoreErrorDetail struct (not a string), which contains structured error information including operation, message, stream_name, and other contextual fields. This struct is designed to be boxed in ObjectStorageError::MetastoreError(Box<MetastoreErrorDetail>).

Applied to files:

src/handlers/http/ingest.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: Quest Smoke and Load Tests for Standalone deployments
GitHub Check: Quest Smoke and Load Tests for Distributed deployments
GitHub Check: Build Default x86_64-pc-windows-msvc
GitHub Check: Build Default aarch64-apple-darwin
GitHub Check: Build Default x86_64-unknown-linux-gnu
GitHub Check: Build Default aarch64-unknown-linux-gnu
GitHub Check: Build Kafka x86_64-unknown-linux-gnu
GitHub Check: Build Kafka aarch64-apple-darwin
GitHub Check: coverage

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

nitisht requested a review from parmesant January 8, 2026 10:28

nitisht changed the title ~~normalise field name: change prefix from @ to _~~ normalise field name: change prefix from @ to _ to allow proper querying Jan 8, 2026

nitisht closed this Jan 8, 2026

nitisht reopened this Jan 8, 2026

parmesant approved these changes Jan 8, 2026

View reviewed changes

coderabbitai bot requested changes Jan 8, 2026

View reviewed changes

src/event/format/json.rs Outdated Show resolved Hide resolved

src/event/format/json.rs Outdated Show resolved Hide resolved

parmesant approved these changes Jan 8, 2026

View reviewed changes

nikhilsinhaparseable added 3 commits January 14, 2026 21:02

normalise field name: change prefix from @ to _

0df2a2d

all fields prefixed with `@` will be renamed to have `_` prefix this is to make field queryable

avoid extra allocation by mutating the var

8a2a412

reject event on key collision, log error on ingestion failure

0e0a8f8

nikhilsinhaparseable force-pushed the update-field-name branch from 277674d to 0e0a8f8 Compare January 14, 2026 10:03

coderabbitai bot reviewed Jan 14, 2026

View reviewed changes

coderabbitai bot approved these changes Jan 14, 2026

View reviewed changes

nitisht merged commit ac1e6e0 into parseablehq:main Jan 14, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

normalise field name: change prefix from @ to _ to allow proper querying #1514

normalise field name: change prefix from @ to _ to allow proper querying #1514

Uh oh!

nikhilsinhaparseable commented Jan 8, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 8, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

normalise field name: change prefix from @ to _ to allow proper querying #1514

normalise field name: change prefix from @ to _ to allow proper querying #1514

Uh oh!

Conversation

nikhilsinhaparseable commented Jan 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikhilsinhaparseable commented Jan 8, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 8, 2026 •

edited

Loading