Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object_changes benchmarks with text, json and jsonb as column types #964

Closed
aried3r opened this issue May 18, 2017 · 16 comments
Closed

object_changes benchmarks with text, json and jsonb as column types #964

aried3r opened this issue May 18, 2017 · 16 comments

Comments

@aried3r
Copy link
Member

aried3r commented May 18, 2017

Following up on the discussion here I tried to to some benchmarking.

Using this code I got the following results:

Warming up --------------------------------------
     PaperTrail text   550.000  i/100ms
     PaperTrail json   418.000  i/100ms
    PaperTrail jsonb   462.000  i/100ms
  PaperTrail jsonb i   472.000  i/100ms
  ActiveRecord jsonb   538.000  i/100ms
ActiveRecord jsonb i   526.000  i/100ms
Calculating -------------------------------------
     PaperTrail text      5.530k (± 9.0%) i/s -     27.500k in   5.015177s
     PaperTrail json      4.203k (± 8.6%) i/s -     20.900k in   5.008627s
    PaperTrail jsonb      4.575k (± 7.6%) i/s -     23.100k in   5.078690s
  PaperTrail jsonb i      4.451k (±12.4%) i/s -     22.184k in   5.102453s
  ActiveRecord jsonb      5.243k (± 8.7%) i/s -     26.362k in   5.068018s
ActiveRecord jsonb i      5.241k (± 8.3%) i/s -     26.300k in   5.053713s

Comparison:
     PaperTrail text:     5529.8 i/s
  ActiveRecord jsonb:     5242.9 i/s - same-ish: difference falls within error
ActiveRecord jsonb i:     5240.8 i/s - same-ish: difference falls within error
    PaperTrail jsonb:     4574.9 i/s - 1.21x  slower
  PaperTrail jsonb i:     4451.0 i/s - 1.24x  slower
     PaperTrail json:     4202.8 i/s - 1.32x  slower

Which I think is strange. The differences are very slim and the text column type provides the fastest result.

I'm not sure if I'm doing anything wrong in the setup or if the samples are too small (both in amount and size of the JSON stored) or anything. But I thought I'd go ahead and maybe someone can explain the results better.

@jaredbeck
Copy link
Member

Interesting. You might try adding an index specifically on the name: CREATE INDEX ON jsonb_with_index_versions((object_changes->>'name')).

@aried3r
Copy link
Member Author

aried3r commented May 19, 2017

I've updated the gist with the index and tried with 100.000 records:

Warming up --------------------------------------
     PaperTrail text   536.000  i/100ms
     PaperTrail json   436.000  i/100ms
    PaperTrail jsonb   453.000  i/100ms
  PaperTrail jsonb i   482.000  i/100ms
  ActiveRecord jsonb   506.000  i/100ms
ActiveRecord jsonb i   525.000  i/100ms
Calculating -------------------------------------
     PaperTrail text      5.877k (±14.6%) i/s -     28.944k in   5.045688s
     PaperTrail json      4.475k (±14.8%) i/s -     22.236k in   5.094213s
    PaperTrail jsonb      4.588k (±15.6%) i/s -     22.650k in   5.073612s
  PaperTrail jsonb i      4.608k (±14.4%) i/s -     23.136k in   5.139816s
  ActiveRecord jsonb      5.362k (±14.3%) i/s -     26.312k in   5.018899s
ActiveRecord jsonb i      5.381k (±14.8%) i/s -     26.775k in   5.099043s

Comparison:
     PaperTrail text:     5876.9 i/s
ActiveRecord jsonb i:     5381.3 i/s - same-ish: difference falls within error
  ActiveRecord jsonb:     5362.2 i/s - same-ish: difference falls within error
  PaperTrail jsonb i:     4607.7 i/s - same-ish: difference falls within error
    PaperTrail jsonb:     4588.1 i/s - same-ish: difference falls within error
     PaperTrail json:     4475.1 i/s - same-ish: difference falls within error

@jaredbeck
Copy link
Member

Hmm, so the more specific index barely has any effect. I'm really surprised! I'll be curious to hear what other people make of these findings. Thanks Anton.

@aried3r
Copy link
Member Author

aried3r commented May 19, 2017

Whoops, I just noticed I made a big mistake while using activerecord-import. I'm now creating 100.000 records with one version instead of one record with 100.000 versions.

Will update on Monday!

@jaredbeck
Copy link
Member

Hi Anton, are you still working on these benchmarks? I'd love to see a summary of your findings make it into the readme eventually (Section 6.b. Custom Serializer?).

@aried3r
Copy link
Member Author

aried3r commented Jul 6, 2017

Hey! Sorry for letting this go like this. I'll try to have some new results up by the start of next week if that's ok?

@aried3r
Copy link
Member Author

aried3r commented Jul 21, 2017

I updated the benchmark, but it has become very inefficient since I only hacked it together very quickly. As you can see in the Gist, I create a lot of hashes, resulting in an immense use of memory. But otherwise creating so many versions is unbearably slow. Using raw SQL would probably yield better results at this point.

Anyway, here are the results. I can't seem to shake the feeling that I'm doing something very wrong. At least to me the results are very surprising, because I can reproduce the results mentioned in this article.

Warming up --------------------------------------
     PaperTrail text   580.000  i/100ms
     PaperTrail json   442.000  i/100ms
    PaperTrail jsonb   458.000  i/100ms
  PaperTrail jsonb i   475.000  i/100ms
  ActiveRecord jsonb   576.000  i/100ms
ActiveRecord jsonb i   591.000  i/100ms
Calculating -------------------------------------
     PaperTrail text      6.623k (± 4.7%) i/s -     33.060k in   5.002984s
     PaperTrail json      4.668k (± 6.1%) i/s -     23.426k in   5.038770s
    PaperTrail jsonb      5.115k (± 5.4%) i/s -     25.648k in   5.029418s
  PaperTrail jsonb i      5.054k (± 5.7%) i/s -     25.175k in   5.001075s
  ActiveRecord jsonb      5.410k (±11.1%) i/s -     27.072k in   5.090030s
ActiveRecord jsonb i      5.395k (± 9.0%) i/s -     27.186k in   5.086443s

Comparison:
     PaperTrail text:     6623.1 i/s
  ActiveRecord jsonb:     5410.2 i/s - 1.22x  slower
ActiveRecord jsonb i:     5394.5 i/s - 1.23x  slower
    PaperTrail jsonb:     5114.9 i/s - 1.29x  slower
  PaperTrail jsonb i:     5053.5 i/s - 1.31x  slower
     PaperTrail json:     4668.3 i/s - 1.42x  slower

I'll give it another shot at some point, probably rewrite the benchmark.

@jaredbeck
Copy link
Member

Yeah, I'm also surprised. I'd expect json to be faster than text, but I guess the -> and @> operators add some expense. The postgres query planner agrees:

create table z (
  col_text text,
  col_json json,
  col_jsonb jsonb,
  col_jsonbi jsonb
);
create index ix_z_col_jsonbi on z using gin ((col_jsonbi->'name'));

explain select * from z where col_text like '%banana%';
                     QUERY PLAN                     
----------------------------------------------------
 Seq Scan on z  (cost=0.00..16.50 rows=1 width=128)
   Filter: (col_text ~~ '%banana%'::text)
(2 rows)

explain select * from z where (col_json->'name')::text like '%banana%';
                             QUERY PLAN                             
--------------------------------------------------------------------
 Seq Scan on z  (cost=0.00..20.40 rows=1 width=128)
   Filter: (((col_json -> 'name'::text))::text ~~ '%banana%'::text)
(2 rows)

jared=# explain select * from z where (col_jsonb->'name')::text like '%banana%';
                             QUERY PLAN                              
---------------------------------------------------------------------
 Seq Scan on z  (cost=0.00..20.40 rows=1 width=128)
   Filter: (((col_jsonb -> 'name'::text))::text ~~ '%banana%'::text)
(2 rows)

The key observation here is that the max. estimated cost is 16.50 for text and 20.40 for json. It's not a benchmark, just an estimate on an empty table, so take that with a grain of salt.

However, our use of the like operator on text columns has always struck me as error prone. I recently discovered a bug, in fact: #803 (comment)

So, I'd recommend using json/b anyway, even if it is slower.

I don't see any problems with your benchmark, so I think we're ready to summarize our findings and put them in the readme. Can you take a stab at that, please?

@seanlinsley
Copy link
Member

Trying to query a JSON string using like reminds me of this famous SO post; it's always going to be a hack.

The move to JSON is only really valuable for correctness in your queries.

JSONB only really shines when you want to do more advanced things like:

  • querying a nested array to check partial containment with @>
  • checking whether one or many keys do or don't exist in object_changes
  • building fast, complex indexes

@seanlinsley
Copy link
Member

jsonb_path_ops is a faster GIN index by only supporting @>.

When dealing with large amounts of data, another way to make queries fast is by creating an index for each item_type that only indexes the JSON keys in object_changes that you know you'll be querying (or by excluding the ones you won't).

@seanlinsley
Copy link
Member

It's surprising that the numbers are all so similar. You may want to add an index to aid the like queries on both the YAML and JSON versions.

Since there's only one record for each table, with 100k versions for that single record, the indexes on item_type and item_id probably aren't doing anything.

You may want to analyze the database before running these queries, to tell Postgres to update its internal statistics.

How much memory does your computer have? Benchmark.ips could be causing Postgres to cache the data in memory, deciding that that's easier than actually using the index.

Hmm, actually, 5k iterations a second? That's 5 database queries every millisecond, which seems unlikely (this is Postgres, not Redis). Are you sure you don't need to call count on the ActiveRecord query objects inside of Benchmark.ips?

@jaredbeck
Copy link
Member

Stellar analysis, thanks Sean! It sounds like we have some more experimentation to do.

Trying to query a JSON string using like reminds me of this famous SO post; it's always going to be a hack.

Yes! I'm worried there are more bugs in where_object and where_object_changes that we haven't found yet. Did you see the one I found recently? (#803 (comment)) Nefarious!

The move to JSON is only really valuable for correctness in your queries.

Yeah, that's why I would recommend json/b.

@aried3r
Copy link
Member Author

aried3r commented Aug 2, 2017

@seanlinsley, thanks for the explanation! Would you be up for adapting the gist? It seems you have more insight into the whole JSON/JSONB column type than I do :)

@jaredbeck
Copy link
Member

This has been a good discussion. Thanks Anton and Sean. I'll close this for now, and maybe we can use it as the basis for some more formal benchmarks in the future.

@ziaulrehman40
Copy link

Its been quiet some time, and postgres has built a lot on jsonb columns and their performance, i wonder if there are any changes in the benchmarks now? 🤔

@aried3r
Copy link
Member Author

aried3r commented Nov 5, 2021

@ziaulrehman40, I think I did the benchmarks wrong. Like @seanlinsley mentioned I should have analyzed before.

Having said that, we've been using JSONB ever since with paper_trail and in other places and have not faced any performance problems for our use cases, given the right indexes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants