Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

where_object_changes returns more records than expected #803

Closed
jaredbeck opened this issue May 11, 2016 · 3 comments
Closed

where_object_changes returns more records than expected #803

jaredbeck opened this issue May 11, 2016 · 3 comments

Comments

@jaredbeck
Copy link
Member

jaredbeck commented May 11, 2016

The readme says:

Known issue: inconsistent results for numeric values due to limitations of SQL wildcard matchers against the serialized objects.

I asked Ben about this in https://github.com/airblade/paper_trail/pull/801/files

does this "known issue" still exist, re: inconsistent results for numeric values? In the YAML serializer, the value is succeeded by a newline, and in the JSON serializer, care has been taken to succeed the value by a comma or brace.

Yes I believe this issue does still exist. I'm not super knowledgable on wildcard matchers, but I did some investigation and this dates back to 1ad7838, which I believe shows evidence of issues when you are using numbers such as 10, 100, 1000, etc, since I think certain flavors of SQL have difficulty matching on those values.

What should our next step be in troubleshooting this? Can we get a failing test by reverting 1ad7838, or should we write a new test?

jaredbeck added a commit that referenced this issue Jul 8, 2017
Minor improvements and additions to tests.

Trying to reproduce issue #803.
jaredbeck added a commit that referenced this issue Jul 8, 2017
Minor improvements and additions to tests.

Trying to reproduce issue #803.
jaredbeck added a commit that referenced this issue Jul 9, 2017
Minor improvements and additions to tests.

Trying to reproduce issue #803.
@jaredbeck
Copy link
Member Author

Bug

with_object_changes matches more records than it should when json is stored in a text column. This is not the default configuration, and there's no way to know how few people are affected by this bug.

Reproduction

This bug can be reproduced in the existing test suite by reverting part of 1ad7838.

# spec/models/version_spec.rb
+          let(:int) { 1 }
-          let(:int) { rand(5) + 2 }

This patch will cause with_object_changes to return more records than expected and the tests will fail.

Analysis

Consider where_object_changes(an_integer: 1), which generates the following SQL:

"versions"."object_changes" ILIKE '%"an_integer":[1,%' 
OR "versions"."object_changes" ILIKE '%"an_integer":[%,1]%'

See the problem yet? It's in the second half of the disjunction. Think about what those wildcards might match. What about a version record with object_changes like this?

{"name":[null,"Retta"],"an_integer":[null,0],"id":[null,1]}

We don't want to match this record. We don't want where_object_changes(an_integer: 1) to return this record. You see why it does though, right? That % wildcard matches everything between "an_integer":[ and the 1] at the very end.

Suggestions

I don't know how to fix this, so I might add a runtime warning to the method asking people to come here and join the discussion. Here are some ideas to get the conversation going.

Have where_object_changes raise an error if object_changes is a text column

The simplest solution. I think it works fine for json and jsonb columns, so there's no need to remove the where_object_changes method entirely.

A Fix for Postgres only, using ::json->

--- a/lib/paper_trail/serializers/json.rb
+++ b/lib/paper_trail/serializers/json.rb
@@ -39,8 +39,16 @@ module PaperTrail
         json_value = value.to_json
 
         # Need to check first (before) and secondary (after) fields
-        arel_field.matches("%\"#{field}\":[#{json_value},%").
-          or(arel_field.matches("%\"#{field}\":[%,#{json_value}]%"))
+        column_reference = format("%s.%s", arel_field.relation.name, arel_field.name)
+        ::Arel::Nodes::SqlLiteral.new(
+          "(#{column_reference}::json->'#{field}')::text ilike '[#{json_value},%]' " \
+          "or (#{column_reference}::json->'#{field}')::text ilike '[%,#{json_value}]'"
+        )

The above passes the tests, but it's not usable as is, of course, because it's postgres-only and more importantly it has SQL-injection vulnerabilities.

Use regular expressions instead of the like operator

arel has a matches_regexp these days, but might not be supported in rails 4, sqlite, or mysql. Also, it's hard to come up with a perfect regular expression for this situation.

@jaredbeck jaredbeck changed the title where_object and where_object_changes do not handle integers where_object_changes returns more records than expected Jul 9, 2017
jaredbeck added a commit that referenced this issue Jul 9, 2017
jaredbeck added a commit that referenced this issue Jul 9, 2017
@jaredbeck
Copy link
Member Author

I don't know how to fix this, so I might add a runtime warning to the method asking people to come here and join the discussion.

I added the warning (in #979), it's been two months and 47,000 gem downloads, and no one has joined the discussion here, so I'll probably change the warning to an error in the next major release unless I hear from someone with a better idea.

@jaredbeck
Copy link
Member Author

Closed by #993. To be released as 8.0.0.

sergey-alekseev added a commit to sergey-alekseev/paper_trail that referenced this issue Sep 29, 2017
History
---
[The first problem mention](paper-trail-gem@1ad7838).
[The major issue](paper-trail-gem#803) with relevant discussions.

Details
---
I see the problem only with one specific case – when the method is `where_object_changes` and the serializer is YAML. Which is a default configuration.

I see there are 6 types of queries with a numeric value for `where_object_changes` and `where_object` (grabbed from `version_spec.rb`):
```
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (object_changes @> '{"an_integer":[1]}')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND ("versions"."object_changes" ILIKE '%
an_integer:
- 1
%' OR "versions"."object_changes" ILIKE '%
an_integer:
-%
- 1
%')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (((object_changes->>'an_integer' ILIKE '[1,%') OR (object_changes->>'an_integer' ILIKE '[%,1]%')))  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

SELECT "versions".* FROM "versions" WHERE (object @> '{"an_integer":1}')

SELECT "versions".* FROM "versions" WHERE (object->>'an_integer' = '1')

SELECT "versions".* FROM "versions" WHERE ("versions"."object" ILIKE '%
an_integer: 1
%')
```

The only problematic one is `paper-trail-gem#2`. It incorrectly matches `object_changes` like
```
---
name:
-
- Kimberli
an_integer:
-
- 0
created_at:
-
- 2017-09-28 18:30:13.369889000 Z
updated_at:
-
- 2017-09-28 18:30:13.369889000 Z
id:
-
- 1
---
name:
- foobar
- Hayes
an_integer:
- 77
- 1
updated_at:
- 2017-09-28 18:30:13.383966000 Z
- 2017-09-28 18:30:13.395539000 Z
```
sergey-alekseev added a commit to sergey-alekseev/paper_trail that referenced this issue Oct 8, 2017
History
---
[The first problem mention](paper-trail-gem@1ad7838).
[The major issue](paper-trail-gem#803) with relevant discussions.

Details
---
I see the problem only with one specific case – when the method is `where_object_changes` and the serializer is YAML. Which is a default configuration.

I see there are 6 types of queries with a numeric value for `where_object_changes` and `where_object` (grabbed from `version_spec.rb`):
```
.where_object_changes
N1
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (object_changes @> '{"an_integer":[1]}')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

N1
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND ("versions"."object_changes" ILIKE '%
an_integer:
- 1
%' OR "versions"."object_changes" ILIKE '%
an_integer:
-%
- 1
%')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

N3
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (((object_changes->>'an_integer' ILIKE '[1,%') OR (object_changes->>'an_integer' ILIKE '[%,1]%')))  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

.where_object
N4
SELECT "versions".* FROM "versions" WHERE (object @> '{"an_integer":1}')

N5
SELECT "versions".* FROM "versions" WHERE (object->>'an_integer' = '1')

N6
SELECT "versions".* FROM "versions" WHERE ("versions"."object" ILIKE '%
an_integer: 1
%')
```

The only problematic one is N2. It incorrectly matches `object_changes` like
```
---
name:
-
- Kimberli
an_integer:
-
- 0
created_at:
-
- 2017-09-28 18:30:13.369889000 Z
updated_at:
-
- 2017-09-28 18:30:13.369889000 Z
id:
-
- 1
---
name:
- foobar
- Hayes
an_integer:
- 77
- 1
updated_at:
- 2017-09-28 18:30:13.383966000 Z
- 2017-09-28 18:30:13.395539000 Z
```
sergey-alekseev added a commit to sergey-alekseev/paper_trail that referenced this issue Oct 22, 2017
History
---
[The first problem mention](paper-trail-gem@1ad7838).
[The major issue](paper-trail-gem#803) with relevant discussions.

Details
---
I see the problem only with one specific case – when the method is `where_object_changes` and the serializer is YAML. Which is a default configuration.

I see there are 6 types of queries with a numeric value for `where_object_changes` and `where_object` (grabbed from `version_spec.rb`):
```
.where_object_changes
N1
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (object_changes @> '{"an_integer":[1]}')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

N1
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND ("versions"."object_changes" ILIKE '%
an_integer:
- 1
%' OR "versions"."object_changes" ILIKE '%
an_integer:
-%
- 1
%')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

N3
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (((object_changes->>'an_integer' ILIKE '[1,%') OR (object_changes->>'an_integer' ILIKE '[%,1]%')))  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

.where_object
N4
SELECT "versions".* FROM "versions" WHERE (object @> '{"an_integer":1}')

N5
SELECT "versions".* FROM "versions" WHERE (object->>'an_integer' = '1')

N6
SELECT "versions".* FROM "versions" WHERE ("versions"."object" ILIKE '%
an_integer: 1
%')
```

The only problematic one is N2. It incorrectly matches `object_changes` like
```
---
name:
-
- Kimberli
an_integer:
-
- 0
created_at:
-
- 2017-09-28 18:30:13.369889000 Z
updated_at:
-
- 2017-09-28 18:30:13.369889000 Z
id:
-
- 1
---
name:
- foobar
- Hayes
an_integer:
- 77
- 1
updated_at:
- 2017-09-28 18:30:13.383966000 Z
- 2017-09-28 18:30:13.395539000 Z
```
jaredbeck pushed a commit that referenced this issue Oct 23, 2017
* clarify the issue with `where_object_changes` for numeric values

History
---
[The first problem mention](1ad7838).
[The major issue](#803) with relevant discussions.

Details
---
I see the problem only with one specific case – when the method is `where_object_changes` and the serializer is YAML. Which is a default configuration.

I see there are 6 types of queries with a numeric value for `where_object_changes` and `where_object` (grabbed from `version_spec.rb`):
```
.where_object_changes
N1
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (object_changes @> '{"an_integer":[1]}')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

N1
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND ("versions"."object_changes" ILIKE '%
an_integer:
- 1
%' OR "versions"."object_changes" ILIKE '%
an_integer:
-%
- 1
%')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

N3
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (((object_changes->>'an_integer' ILIKE '[1,%') OR (object_changes->>'an_integer' ILIKE '[%,1]%')))  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

.where_object
N4
SELECT "versions".* FROM "versions" WHERE (object @> '{"an_integer":1}')

N5
SELECT "versions".* FROM "versions" WHERE (object->>'an_integer' = '1')

N6
SELECT "versions".* FROM "versions" WHERE ("versions"."object" ILIKE '%
an_integer: 1
%')
```

The only problematic one is N2. It incorrectly matches `object_changes` like
```
---
name:
-
- Kimberli
an_integer:
-
- 0
created_at:
-
- 2017-09-28 18:30:13.369889000 Z
updated_at:
-
- 2017-09-28 18:30:13.369889000 Z
id:
-
- 1
---
name:
- foobar
- Hayes
an_integer:
- 77
- 1
updated_at:
- 2017-09-28 18:30:13.383966000 Z
- 2017-09-28 18:30:13.395539000 Z
```
aried3r pushed a commit to aried3r/paper_trail that referenced this issue Dec 14, 2020
aried3r pushed a commit to aried3r/paper_trail that referenced this issue Dec 14, 2020
* clarify the issue with `where_object_changes` for numeric values

History
---
[The first problem mention](paper-trail-gem@1ad7838).
[The major issue](paper-trail-gem#803) with relevant discussions.

Details
---
I see the problem only with one specific case – when the method is `where_object_changes` and the serializer is YAML. Which is a default configuration.

I see there are 6 types of queries with a numeric value for `where_object_changes` and `where_object` (grabbed from `version_spec.rb`):
```
.where_object_changes
N1
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (object_changes @> '{"an_integer":[1]}')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

N1
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND ("versions"."object_changes" ILIKE '%
an_integer:
- 1
%' OR "versions"."object_changes" ILIKE '%
an_integer:
-%
- 1
%')  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

N3
SELECT "versions".* FROM "versions" WHERE "versions"."item_id" = 1 AND "versions"."item_type" = 'Widget' AND (((object_changes->>'an_integer' ILIKE '[1,%') OR (object_changes->>'an_integer' ILIKE '[%,1]%')))  ORDER BY "versions"."created_at" ASC, "versions"."id" ASC

.where_object
N4
SELECT "versions".* FROM "versions" WHERE (object @> '{"an_integer":1}')

N5
SELECT "versions".* FROM "versions" WHERE (object->>'an_integer' = '1')

N6
SELECT "versions".* FROM "versions" WHERE ("versions"."object" ILIKE '%
an_integer: 1
%')
```

The only problematic one is N2. It incorrectly matches `object_changes` like
```
---
name:
-
- Kimberli
an_integer:
-
- 0
created_at:
-
- 2017-09-28 18:30:13.369889000 Z
updated_at:
-
- 2017-09-28 18:30:13.369889000 Z
id:
-
- 1
---
name:
- foobar
- Hayes
an_integer:
- 77
- 1
updated_at:
- 2017-09-28 18:30:13.383966000 Z
- 2017-09-28 18:30:13.395539000 Z
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant