Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: sanitize messageID from \u0000 and irregular utf8 runes #4063

Merged
merged 8 commits into from
Nov 3, 2023

Conversation

lvrach
Copy link
Member

@lvrach lvrach commented Oct 31, 2023

Description

Issue

An incident in production was caused due to \u0000 being passed as a "messageId" by the client. Causing the messageId to be stored as empty in gateway db. Even if we check for empty IDs and populate if missing.

It happened because:

  1. When we check for messageId we pass it via strings.TrimSpace and check if empty. \u0000 was not trimmed and it was not empty
  2. No new uuid was generated, we passed \u0000 to the database
  3. Postgres doesn't accept \u0000 into JSONB column and we get his error
rudder-server-db-1           | 2023-10-31 19:22:33.416 UTC [245] ERROR:  unsupported Unicode escape sequence
rudder-server-db-1           | 2023-10-31 19:22:33.416 UTC [245] DETAIL:  \u0000 cannot be converted to text.
rudder-server-db-1           | 2023-10-31 19:22:33.416 UTC [245] CONTEXT:  JSON data, line 1: ...","integrations":{"All":true},"messageId":"\u0000...
  1. We have a mechanism that captures the error and sanizes the JSON payload
  2. \u0000 gets remove and the payload was stored to db with empty messageId.
  3. badger DB complained when trying to store with an empty key and we panicked:
2023/10/31 14:13:41 notifying bugsnag: Key cannot be empty
panic: Key cannot be empty [recovered]
   panic: Key cannot be empty [recovered]
   panic: Key cannot be empty

github.com/rudderlabs/rudder-server/services/dedup.(*badgerDB).Get(0x2fa3840?, {0x0?, 0x0?})
   /rudder-server/services/dedup/badger.go:41 +0xce
github.com/rudderlabs/rudder-server/services/dedup.(*dedup).Set(0xc00270c1e0, {{0x0?, 0xc0026e6800?}, 0x2?})
   /rudder-server/services/dedup/dedup.go:100 +0xee
github.com/rudderlabs/rudder-server/processor.(*Handle).processJobsForDest(0xc000868c00, {0xc0013c4080?, 0x1b?}, {{0xc020088000, 0x7d0, 0x2000}, 0x1, {0x3d46960, 0xc00101ab80}})
   /rudder-server/processor/processor.go:1501 +0x272d

Solution

Proper sanitization when checking themessageId.

The simple solution was to remove just the \u0000. However, I wanted to handle other issues associated with UTF8, and looked into similar issues and how we can better sanitize our input.

I started with unicode.IsPrint, but testing under a list of invisible characters I figured there were some gaps, thus https://invisible-characters.com was introduced.

Invisible characters, with or without malicious intent can confuse when debugging. Also, they are not handled properly by common string manipulation functions. Given the importance of the messageId, it is better to clean it properly even if it requires some extra complexity and CPU cycles. For the same reasons, I've moved the sanitization function under utils.

An alternative solution is to accept a much stricter format, uuid for example. Or as an optimisation test that is a valid uuid before trying to sanitise. However, this is not a costly operation and almost all of the time no mutation is required (thus no memory allocation.

benchmark

BenchmarkMessageID/in-place_for_loop_-_dirty-10                 14419621                84.58 ns/op            0 B/op          0 allocs/op
BenchmarkMessageID/in-place_for_loop_-_dirty-10                 14321311                83.38 ns/op            0 B/op          0 allocs/op
BenchmarkMessageID/in-place_for_loop_-_dirty-10                 14148620                83.31 ns/op            0 B/op          0 allocs/op
BenchmarkMessageID/in-place_for_loop_-_proper-10                14772222                80.95 ns/op            0 B/op          0 allocs/op
BenchmarkMessageID/in-place_for_loop_-_proper-10                14782982                81.37 ns/op            0 B/op          0 allocs/op
BenchmarkMessageID/in-place_for_loop_-_proper-10                14586229                83.10 ns/op            0 B/op          0 allocs/op
BenchmarkMessageID/strings_map_-_dirty-10                        5674894               210.4 ns/op            48 B/op          1 allocs/op
BenchmarkMessageID/strings_map_-_dirty-10                        5623113               212.4 ns/op            48 B/op          1 allocs/op
BenchmarkMessageID/strings_map_-_dirty-10                        5839114               205.6 ns/op            48 B/op          1 allocs/op
BenchmarkMessageID/strings_map_-_proper-10                       6251143               190.1 ns/op             0 B/op          0 allocs/op
BenchmarkMessageID/strings_map_-_proper-10                       6238122               190.0 ns/op             0 B/op          0 allocs/op
BenchmarkMessageID/strings_map_-_proper-10                       6308074               189.7 ns/op             0 B/op          0 allocs/op

Linear Ticket

https://linear.app/rudderstack/issue/PIPE-468/sanitize-message-id

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@lvrach lvrach changed the title fix: sanitize messageID to avoid \u0000 and irregular utf8 runes fix: sanitize messageID to remove \u0000 and irregular utf8 runes Oct 31, 2023
Copy link

codecov bot commented Oct 31, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (8dd4ab9) 71.74% compared to head (f5c419e) 71.74%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4063      +/-   ##
==========================================
- Coverage   71.74%   71.74%   -0.01%     
==========================================
  Files         373      374       +1     
  Lines       54905    54919      +14     
==========================================
+ Hits        39389    39399      +10     
- Misses      13192    13194       +2     
- Partials     2324     2326       +2     
Files Coverage Δ
gateway/handle.go 89.06% <100.00%> (+0.08%) ⬆️
utils/misc/unicode.go 100.00% <100.00%> (ø)

... and 11 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@lvrach lvrach changed the title fix: sanitize messageID to remove \u0000 and irregular utf8 runes fix: sanitize messageID from \u0000 and irregular utf8 runes Nov 1, 2023
@lvrach lvrach marked this pull request as ready for review November 1, 2023 08:18
gateway/handle.go Outdated Show resolved Hide resolved
@lvrach lvrach requested a review from atzoum November 2, 2023 07:52
go.mod Outdated
@@ -290,7 +290,7 @@ require (
golang.org/x/net v0.17.0 // indirect
golang.org/x/sys v0.13.0 // indirect
golang.org/x/term v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
golang.org/x/text v0.13.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[minor[:
We can probably move this to direct dependencies?

@lvrach lvrach merged commit 408046d into master Nov 3, 2023
35 checks passed
@lvrach lvrach deleted the fix/pipe-468-sanitizeMesssageId branch November 3, 2023 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants