Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add json bulk stream #701

Merged
merged 1 commit into from
Feb 25, 2025
Merged

feat: add json bulk stream #701

merged 1 commit into from
Feb 25, 2025

Conversation

gfyrag
Copy link
Contributor

@gfyrag gfyrag commented Feb 21, 2025

Fixes LX-1

@gfyrag gfyrag requested a review from a team as a code owner February 21, 2025 11:22
Copy link

coderabbitai bot commented Feb 21, 2025

Warning

Rate limit exceeded

@gfyrag has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 51cd04a and c991ecb.

📒 Files selected for processing (8)
  • internal/api/bulking/bulker_test.go (12 hunks)
  • internal/api/bulking/handler_stream_json.go (1 hunks)
  • internal/api/bulking/handler_stream_json_test.go (1 hunks)
  • internal/api/bulking/handler_stream_text.go (3 hunks)
  • internal/api/bulking/handler_stream_text_test.go (2 hunks)
  • internal/api/router.go (1 hunks)
  • internal/api/v2/routes.go (1 hunks)
  • test/e2e/api_bulk_test.go (2 hunks)

Walkthrough

This pull request introduces a new JSON stream bulk handler for processing bulk JSON streaming requests along with its factory, termination, and channel management methods. It also refactors the existing script stream bulk handler to a text stream bulk handler, updating method names and constructors accordingly. Router configurations are modified to use a default bulk handler factory helper, and additional tests, including an end-to-end ledger bulk creation test using JSON streams, have been added to verify the new functionality.

Changes

Files Change Summary
internal/api/bulking/handler_stream_json.go
internal/api/bulking/handler_stream_json_test.go
Adds a new JSONStreamBulkHandler with methods for initializing bulk channels, decoding JSON, handling results, graceful termination, and its associated factory and tests for bulk JSON streaming.
internal/api/bulking/handler_stream_text.go
internal/api/bulking/handler_stream_text_test.go
Refactors the existing ScriptStreamBulkHandler into TextStreamBulkHandler by renaming types, methods, and constructors, with corresponding test updates.
internal/api/router.go
internal/api/v2/routes.go
Updates router configuration by replacing an explicit bulk handler factory mapping with a default configuration using the new WithDefaultBulkHandlerFactories function that encapsulates bulk handler factory setup based on a maximum bulk size.
test/e2e/api_bulk_test.go Introduces a new end-to-end test scenario for bulk ledger creation via a JSON stream, validating the proper handling of bulk operations in the API.
internal/api/bulking/bulker_test.go Adds an import statement for the pointer package and modifies the return value of a mocked method in the TestBulk function to enhance test specificity regarding expected output.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Router
    participant Factory
    participant JSONHandler
    participant Goroutine1 as JSON Decoder
    participant Goroutine2 as Result Collector

    Client->>Router: Submit bulk JSON stream request
    Router->>Factory: Retrieve BulkHandlerFactory
    Factory-->>Router: Return JSONStreamBulkHandler
    Router->>JSONHandler: Invoke GetChannels
    JSONHandler->>Goroutine1: Spawn JSON decoding routine
    JSONHandler->>Goroutine2: Spawn result collection routine
    Client->>Router: Send termination signal
    Router->>JSONHandler: Invoke Terminate
    JSONHandler-->>Router: Return aggregated results
    Router-->>Client: Respond with JSON results
Loading

Suggested labels

build-images

Suggested reviewers

  • flemzord
  • Dav-14

Poem

I’m a happy rabbit, code in full bloom,
Bulk streams and tests lighting up the room.
JSONs hop in rows, text streams play their part,
Routers and factories dancing, a work of art.
♡ With every change, I nibble joy with glee,
Hop along with CodeRabbit—carefree and free!
🐇 Hop, hop, hooray for new code to see!


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (1)
internal/api/bulking/handler_stream_text.go (1)

33-33: 💡 Verification agent

🧩 Analysis chain

Verify the ParseTextStream function implementation.

The ParseTextStream function is called but not visible in the provided code. Please ensure it's properly implemented and handles all edge cases.


🏁 Script executed:

#!/bin/bash
# Search for ParseTextStream implementation
ast-grep --pattern 'func ParseTextStream($$$) $$$'

Length of output: 8575


ParseTextStream Implementation Verified – Minor Header Format Check Suggested

The ParseTextStream function is implemented in internal/api/bulking/text_stream.go and appears to handle header processing, body reading, and scanner errors appropriately. However, please consider adding an explicit check to ensure that each header part (split on "=") produces both a key and a value before accessing parts2[1] to avoid potential index-out-of-range panics when malformed headers are encountered.

  • Location: internal/api/bulking/text_stream.go (lines 12–83)
  • Suggestion: Validate the result of strings.Split(part, "=") to ensure it returns at least two elements before processing the header key.
🧹 Nitpick comments (5)
internal/api/bulking/handler_stream_json.go (1)

26-26: Consider configuring the JSON decoder.

The default decoder configuration might not be optimal for streaming use cases. Consider setting appropriate limits to prevent potential DoS attacks.

-dec := json.NewDecoder(r.Body)
+dec := json.NewDecoder(r.Body)
+dec.SetMaxTokenSize(1024 * 1024) // 1MB token size limit
internal/api/bulking/handler_stream_text_test.go (1)

72-73: Consider making timeout duration configurable.

The hardcoded 100ms timeout might be too aggressive for slower systems or debug scenarios.

+const testTimeout = 100 * time.Millisecond
+
 select {
 case <-send:
-case <-time.After(100 * time.Millisecond):
+case <-time.After(testTimeout):
     t.Fatal("should have received send channel")
 }

Also applies to: 81-82

internal/api/bulking/handler_stream_json_test.go (1)

83-85: Enhance response validation.

The test should validate the actual content of the response, not just its length.

 response, ok := api.DecodeSingleResponse[[]APIResult](t, w.Result().Body)
 require.True(t, ok)
 require.Len(t, response, testCase.expectScriptCount)
+for i, result := range response {
+    require.Equal(t, "CREATE_TRANSACTION", result.Action)
+    require.NotEmpty(t, result.Data)
+}
internal/api/v2/routes.go (1)

140-140: Consider extracting the magic number.

The bulk max size of 100 is hardcoded. Consider using the DefaultBulkMaxSize constant from router.go for consistency.

-	WithDefaultBulkHandlerFactories(100),
+	WithDefaultBulkHandlerFactories(DefaultBulkMaxSize),
test/e2e/api_bulk_test.go (1)

199-259: LGTM! Good test coverage for the new JSON stream functionality.

The test properly verifies the successful creation of transactions using the JSON stream bulk handler.

Consider adding test cases for:

  • Error handling when JSON is malformed
  • Behavior with empty stream
  • Handling of the bulk max size limit
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bf5b817 and 8f320b3.

📒 Files selected for processing (7)
  • internal/api/bulking/handler_stream_json.go (1 hunks)
  • internal/api/bulking/handler_stream_json_test.go (1 hunks)
  • internal/api/bulking/handler_stream_text.go (3 hunks)
  • internal/api/bulking/handler_stream_text_test.go (1 hunks)
  • internal/api/router.go (1 hunks)
  • internal/api/v2/routes.go (1 hunks)
  • test/e2e/api_bulk_test.go (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Tests
🔇 Additional comments (6)
internal/api/bulking/handler_stream_text.go (2)

8-15: LGTM! The struct definition is well-organized.

The TextStreamBulkHandler struct has a clear separation of concerns with fields for channel management, results tracking, and error handling.


23-48: LGTM! The goroutines are well-structured.

The implementation correctly:

  • Handles context cancellation
  • Uses proper channel closing with defer
  • Manages error propagation

Also applies to: 49-63

internal/api/bulking/handler_stream_json.go (1)

8-15: LGTM! The struct and factory implementations are clean.

The JSONStreamBulkHandler and its factory follow the same pattern as TextStreamBulkHandler, maintaining consistency in the codebase.

Also applies to: 76-77

internal/api/bulking/handler_stream_text_test.go (1)

55-55: LGTM! The handler instantiation is correctly updated.

The change from NewScriptStreamBulkHandler to NewTextStreamBulkHandler is consistent with the refactoring.

internal/api/router.go (1)

82-82: LGTM! Good refactoring.

The change simplifies the bulk handler factory configuration by using a default implementation, making the code more maintainable and reducing duplication.

internal/api/v2/routes.go (1)

129-135: LGTM! Well-structured bulk handler factories configuration.

The implementation:

  • Adds support for JSON stream content type
  • Maintains backward compatibility
  • Follows good practices by encapsulating factory logic

Comment on lines +33 to +38
nextElement := &BulkElement{}
err := dec.Decode(nextElement)
if err != nil {
h.err = err
return
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance JSON error handling.

The current implementation doesn't distinguish between EOF and other JSON decoding errors. Consider adding specific handling for io.EOF to gracefully terminate the stream.

 nextElement := &BulkElement{}
 err := dec.Decode(nextElement)
 if err != nil {
+    if err == io.EOF {
+        return
+    }
     h.err = err
     return
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
nextElement := &BulkElement{}
err := dec.Decode(nextElement)
if err != nil {
h.err = err
return
}
nextElement := &BulkElement{}
err := dec.Decode(nextElement)
if err != nil {
if err == io.EOF {
return
}
h.err = err
return
}

Comment on lines +27 to +36
for _, testCase := range []testCase{
{
name: "nominal",
stream: `
{"action": "CREATE_TRANSACTION", "data": {"postings": [{"source": "world", "amount": 100, "asset": "USD", "destination": "bank"}]}}
{"action": "CREATE_TRANSACTION", "data": {"postings": [{"source": "world", "amount": 200, "asset": "USD", "destination": "bank"}]}}
`,
expectScriptCount: 2,
},
} {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add more test cases for error scenarios.

The test suite would benefit from additional test cases covering:

  • Invalid JSON syntax
  • Missing required fields
  • Malformed actions
  • Empty stream
 for _, testCase := range []testCase{
     {
         name: "nominal",
         stream: `
{"action": "CREATE_TRANSACTION", "data": {"postings": [{"source": "world", "amount": 100, "asset": "USD", "destination": "bank"}]}}
{"action": "CREATE_TRANSACTION", "data": {"postings": [{"source": "world", "amount": 200, "asset": "USD", "destination": "bank"}]}}
`,
         expectScriptCount: 2,
     },
+    {
+        name: "invalid_json",
+        stream: `{"invalid json`,
+        expectedError: true,
+        expectedStatusCode: http.StatusBadRequest,
+    },
+    {
+        name: "missing_action",
+        stream: `{"data": {}}`,
+        expectedError: true,
+        expectedStatusCode: http.StatusBadRequest,
+    },
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for _, testCase := range []testCase{
{
name: "nominal",
stream: `
{"action": "CREATE_TRANSACTION", "data": {"postings": [{"source": "world", "amount": 100, "asset": "USD", "destination": "bank"}]}}
{"action": "CREATE_TRANSACTION", "data": {"postings": [{"source": "world", "amount": 200, "asset": "USD", "destination": "bank"}]}}
`,
expectScriptCount: 2,
},
} {
for _, testCase := range []testCase{
{
name: "nominal",
stream: `
{"action": "CREATE_TRANSACTION", "data": {"postings": [{"source": "world", "amount": 100, "asset": "USD", "destination": "bank"}]}}
{"action": "CREATE_TRANSACTION", "data": {"postings": [{"source": "world", "amount": 200, "asset": "USD", "destination": "bank"}]}}
`,
expectScriptCount: 2,
},
{
name: "invalid_json",
stream: `{"invalid json`,
expectedError: true,
expectedStatusCode: http.StatusBadRequest,
},
{
name: "missing_action",
stream: `{"data": {}}`,
expectedError: true,
expectedStatusCode: http.StatusBadRequest,
},
} {

@gfyrag gfyrag force-pushed the feat/bulk-json-stream branch from 8f320b3 to 51cd04a Compare February 21, 2025 11:47
Copy link

codecov bot commented Feb 21, 2025

Codecov Report

Attention: Patch coverage is 88.88889% with 7 lines in your changes missing coverage. Please review.

Project coverage is 81.67%. Comparing base (bf5b817) to head (c991ecb).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
internal/api/bulking/handler_stream_json.go 89.58% 5 Missing ⚠️
internal/api/bulking/handler_stream_text.go 75.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #701      +/-   ##
==========================================
+ Coverage   81.55%   81.67%   +0.11%     
==========================================
  Files         131      132       +1     
  Lines        7086     7137      +51     
==========================================
+ Hits         5779     5829      +50     
- Misses       1004     1006       +2     
+ Partials      303      302       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
internal/api/bulking/handler_stream_json.go (2)

8-15: Add a doc comment for clarity and maintainability.

Consider adding a top-level comment describing the concurrency model (channels, goroutines, etc.) that this struct implements, along with usage guidelines about reading from and closing channels. This helps future maintainers quickly grasp the intended usage.


64-70: Gracefully handle early termination scenario.

Currently, if r.Context() is canceled before <-h.terminated, the function exits without writing a response, possibly leaving clients without a final status. Consider sending a partial or error response to notify clients that the stream ended prematurely.

internal/api/bulking/handler_stream_text_test.go (1)

87-91: Consider increasing the timeout or handling slow I/O gracefully.

Relying on a fixed 100ms limit for channel closure might cause flakiness in slower environments. You could consider using a higher timeout or using a more robust synchronization approach (like a WaitGroup).

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f320b3 and 51cd04a.

📒 Files selected for processing (8)
  • internal/api/bulking/bulker_test.go (2 hunks)
  • internal/api/bulking/handler_stream_json.go (1 hunks)
  • internal/api/bulking/handler_stream_json_test.go (1 hunks)
  • internal/api/bulking/handler_stream_text.go (3 hunks)
  • internal/api/bulking/handler_stream_text_test.go (2 hunks)
  • internal/api/router.go (1 hunks)
  • internal/api/v2/routes.go (1 hunks)
  • test/e2e/api_bulk_test.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • internal/api/router.go
  • test/e2e/api_bulk_test.go
  • internal/api/v2/routes.go
  • internal/api/bulking/handler_stream_json_test.go
  • internal/api/bulking/handler_stream_text.go
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Tests
🔇 Additional comments (3)
internal/api/bulking/handler_stream_json.go (1)

33-38: Distinguish between EOF and other JSON decoding errors.

This mirrors a previously raised suggestion to differentiate between io.EOF (a normal stream shutdown) and other errors—e.g., partial decoding failures—to avoid marking an orderly end of stream as an error. For instance:

 if err != nil {
+    if errors.Is(err, io.EOF) {
+        return
+    }
     h.err = err
     return
 }
internal/api/bulking/handler_stream_text_test.go (1)

55-55: Constructor usage looks good.

The handler instantiation aligns with the updated naming convention.

internal/api/bulking/bulker_test.go (1)

108-110: Pointer usage for log ID is valid.

Using pointer.For(1) is a clear, explicit way to initialize the ID pointer. This ensures the test check is unambiguous about the expected ID value.

Comment on lines +23 to +44
go func() {
defer close(h.channel)

dec := json.NewDecoder(r.Body)

for {
select {
case <-r.Context().Done():
return
default:
nextElement := &BulkElement{}
err := dec.Decode(nextElement)
if err != nil {
h.err = err
return
}

h.actions = append(h.actions, nextElement.GetAction())
h.channel <- *nextElement
}
}
}()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Address potential data race during concurrent writes to shared fields.

The fields actions, results, and err are appended or updated in separate goroutines. Meanwhile, Terminate() reads them after <-h.terminated. Although one goroutine closes h.terminated, there's no guarantee the decoding goroutine is also finished by that time. This can lead to a data race.

A common fix is to use synchronization (e.g., a sync.WaitGroup) to ensure all goroutines are done before reading shared state in Terminate(). For example:

 func (h *JSONStreamBulkHandler) GetChannels(...){
     var wg sync.WaitGroup
+    wg.Add(2) // for the two goroutines

     go func(){
         defer wg.Done()
         // decode loop ...
     }()

     go func(){
         defer wg.Done()
         // results loop ...
     }()

     // later: store wg somewhere in h for use in Terminate()
 }

 func (h *JSONStreamBulkHandler) Terminate(...){
+    // wait for goroutines to finish before writing the response
+    h.wg.Wait()
     writeJSONResponse(...)
 }

Also applies to: 46-59

@gfyrag gfyrag force-pushed the feat/bulk-json-stream branch from 51cd04a to c991ecb Compare February 21, 2025 11:52
@gfyrag gfyrag added this pull request to the merge queue Feb 25, 2025
Merged via the queue into main with commit fdd78b5 Feb 25, 2025
10 checks passed
@gfyrag gfyrag deleted the feat/bulk-json-stream branch February 25, 2025 12:55
@coderabbitai coderabbitai bot mentioned this pull request Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants