object_changes benchmarks with text, json and jsonb as column types #964

aried3r · 2017-05-18T19:01:07Z

Following up on the discussion here I tried to to some benchmarking.

Using this code I got the following results:

Warming up --------------------------------------
     PaperTrail text   550.000  i/100ms
     PaperTrail json   418.000  i/100ms
    PaperTrail jsonb   462.000  i/100ms
  PaperTrail jsonb i   472.000  i/100ms
  ActiveRecord jsonb   538.000  i/100ms
ActiveRecord jsonb i   526.000  i/100ms
Calculating -------------------------------------
     PaperTrail text      5.530k (± 9.0%) i/s -     27.500k in   5.015177s
     PaperTrail json      4.203k (± 8.6%) i/s -     20.900k in   5.008627s
    PaperTrail jsonb      4.575k (± 7.6%) i/s -     23.100k in   5.078690s
  PaperTrail jsonb i      4.451k (±12.4%) i/s -     22.184k in   5.102453s
  ActiveRecord jsonb      5.243k (± 8.7%) i/s -     26.362k in   5.068018s
ActiveRecord jsonb i      5.241k (± 8.3%) i/s -     26.300k in   5.053713s

Comparison:
     PaperTrail text:     5529.8 i/s
  ActiveRecord jsonb:     5242.9 i/s - same-ish: difference falls within error
ActiveRecord jsonb i:     5240.8 i/s - same-ish: difference falls within error
    PaperTrail jsonb:     4574.9 i/s - 1.21x  slower
  PaperTrail jsonb i:     4451.0 i/s - 1.24x  slower
     PaperTrail json:     4202.8 i/s - 1.32x  slower

Which I think is strange. The differences are very slim and the text column type provides the fastest result.

I'm not sure if I'm doing anything wrong in the setup or if the samples are too small (both in amount and size of the JSON stored) or anything. But I thought I'd go ahead and maybe someone can explain the results better.

The text was updated successfully, but these errors were encountered:

jaredbeck · 2017-05-18T20:30:23Z

Interesting. You might try adding an index specifically on the name: CREATE INDEX ON jsonb_with_index_versions((object_changes->>'name')).

aried3r · 2017-05-19T16:00:51Z

I've updated the gist with the index and tried with 100.000 records:

Warming up --------------------------------------
     PaperTrail text   536.000  i/100ms
     PaperTrail json   436.000  i/100ms
    PaperTrail jsonb   453.000  i/100ms
  PaperTrail jsonb i   482.000  i/100ms
  ActiveRecord jsonb   506.000  i/100ms
ActiveRecord jsonb i   525.000  i/100ms
Calculating -------------------------------------
     PaperTrail text      5.877k (±14.6%) i/s -     28.944k in   5.045688s
     PaperTrail json      4.475k (±14.8%) i/s -     22.236k in   5.094213s
    PaperTrail jsonb      4.588k (±15.6%) i/s -     22.650k in   5.073612s
  PaperTrail jsonb i      4.608k (±14.4%) i/s -     23.136k in   5.139816s
  ActiveRecord jsonb      5.362k (±14.3%) i/s -     26.312k in   5.018899s
ActiveRecord jsonb i      5.381k (±14.8%) i/s -     26.775k in   5.099043s

Comparison:
     PaperTrail text:     5876.9 i/s
ActiveRecord jsonb i:     5381.3 i/s - same-ish: difference falls within error
  ActiveRecord jsonb:     5362.2 i/s - same-ish: difference falls within error
  PaperTrail jsonb i:     4607.7 i/s - same-ish: difference falls within error
    PaperTrail jsonb:     4588.1 i/s - same-ish: difference falls within error
     PaperTrail json:     4475.1 i/s - same-ish: difference falls within error

jaredbeck · 2017-05-19T16:52:12Z

Hmm, so the more specific index barely has any effect. I'm really surprised! I'll be curious to hear what other people make of these findings. Thanks Anton.

aried3r · 2017-05-19T17:50:58Z

Whoops, I just noticed I made a big mistake while using activerecord-import. I'm now creating 100.000 records with one version instead of one record with 100.000 versions.

Will update on Monday!

jaredbeck · 2017-06-26T15:35:56Z

Hi Anton, are you still working on these benchmarks? I'd love to see a summary of your findings make it into the readme eventually (Section 6.b. Custom Serializer?).

aried3r · 2017-07-06T09:02:01Z

Hey! Sorry for letting this go like this. I'll try to have some new results up by the start of next week if that's ok?

aried3r · 2017-07-21T13:50:37Z

I updated the benchmark, but it has become very inefficient since I only hacked it together very quickly. As you can see in the Gist, I create a lot of hashes, resulting in an immense use of memory. But otherwise creating so many versions is unbearably slow. Using raw SQL would probably yield better results at this point.

Anyway, here are the results. I can't seem to shake the feeling that I'm doing something very wrong. At least to me the results are very surprising, because I can reproduce the results mentioned in this article.

Warming up --------------------------------------
     PaperTrail text   580.000  i/100ms
     PaperTrail json   442.000  i/100ms
    PaperTrail jsonb   458.000  i/100ms
  PaperTrail jsonb i   475.000  i/100ms
  ActiveRecord jsonb   576.000  i/100ms
ActiveRecord jsonb i   591.000  i/100ms
Calculating -------------------------------------
     PaperTrail text      6.623k (± 4.7%) i/s -     33.060k in   5.002984s
     PaperTrail json      4.668k (± 6.1%) i/s -     23.426k in   5.038770s
    PaperTrail jsonb      5.115k (± 5.4%) i/s -     25.648k in   5.029418s
  PaperTrail jsonb i      5.054k (± 5.7%) i/s -     25.175k in   5.001075s
  ActiveRecord jsonb      5.410k (±11.1%) i/s -     27.072k in   5.090030s
ActiveRecord jsonb i      5.395k (± 9.0%) i/s -     27.186k in   5.086443s

Comparison:
     PaperTrail text:     6623.1 i/s
  ActiveRecord jsonb:     5410.2 i/s - 1.22x  slower
ActiveRecord jsonb i:     5394.5 i/s - 1.23x  slower
    PaperTrail jsonb:     5114.9 i/s - 1.29x  slower
  PaperTrail jsonb i:     5053.5 i/s - 1.31x  slower
     PaperTrail json:     4668.3 i/s - 1.42x  slower

I'll give it another shot at some point, probably rewrite the benchmark.

jaredbeck · 2017-07-24T03:43:40Z

Yeah, I'm also surprised. I'd expect json to be faster than text, but I guess the -> and @> operators add some expense. The postgres query planner agrees:

create table z (
  col_text text,
  col_json json,
  col_jsonb jsonb,
  col_jsonbi jsonb
);
create index ix_z_col_jsonbi on z using gin ((col_jsonbi->'name'));

explain select * from z where col_text like '%banana%';
                     QUERY PLAN                     
----------------------------------------------------
 Seq Scan on z  (cost=0.00..16.50 rows=1 width=128)
   Filter: (col_text ~~ '%banana%'::text)
(2 rows)

explain select * from z where (col_json->'name')::text like '%banana%';
                             QUERY PLAN                             
--------------------------------------------------------------------
 Seq Scan on z  (cost=0.00..20.40 rows=1 width=128)
   Filter: (((col_json -> 'name'::text))::text ~~ '%banana%'::text)
(2 rows)

jared=# explain select * from z where (col_jsonb->'name')::text like '%banana%';
                             QUERY PLAN                              
---------------------------------------------------------------------
 Seq Scan on z  (cost=0.00..20.40 rows=1 width=128)
   Filter: (((col_jsonb -> 'name'::text))::text ~~ '%banana%'::text)
(2 rows)

The key observation here is that the max. estimated cost is 16.50 for text and 20.40 for json. It's not a benchmark, just an estimate on an empty table, so take that with a grain of salt.

However, our use of the like operator on text columns has always struck me as error prone. I recently discovered a bug, in fact: #803 (comment)

So, I'd recommend using json/b anyway, even if it is slower.

I don't see any problems with your benchmark, so I think we're ready to summarize our findings and put them in the readme. Can you take a stab at that, please?

seanlinsley · 2017-07-24T04:35:49Z

Trying to query a JSON string using like reminds me of this famous SO post; it's always going to be a hack.

The move to JSON is only really valuable for correctness in your queries.

JSONB only really shines when you want to do more advanced things like:

querying a nested array to check partial containment with @>
checking whether one or many keys do or don't exist in object_changes
building fast, complex indexes

seanlinsley · 2017-07-24T05:04:42Z

jsonb_path_ops is a faster GIN index by only supporting @>.

When dealing with large amounts of data, another way to make queries fast is by creating an index for each item_type that only indexes the JSON keys in object_changes that you know you'll be querying (or by excluding the ones you won't).

seanlinsley · 2017-07-24T05:40:20Z

It's surprising that the numbers are all so similar. You may want to add an index to aid the like queries on both the YAML and JSON versions.

Since there's only one record for each table, with 100k versions for that single record, the indexes on item_type and item_id probably aren't doing anything.

You may want to analyze the database before running these queries, to tell Postgres to update its internal statistics.

How much memory does your computer have? Benchmark.ips could be causing Postgres to cache the data in memory, deciding that that's easier than actually using the index.

Hmm, actually, 5k iterations a second? That's 5 database queries every millisecond, which seems unlikely (this is Postgres, not Redis). Are you sure you don't need to call count on the ActiveRecord query objects inside of Benchmark.ips?

jaredbeck · 2017-07-24T15:43:03Z

Stellar analysis, thanks Sean! It sounds like we have some more experimentation to do.

Trying to query a JSON string using like reminds me of this famous SO post; it's always going to be a hack.

Yes! I'm worried there are more bugs in where_object and where_object_changes that we haven't found yet. Did you see the one I found recently? (#803 (comment)) Nefarious!

The move to JSON is only really valuable for correctness in your queries.

Yeah, that's why I would recommend json/b.

aried3r · 2017-08-02T09:51:45Z

@seanlinsley, thanks for the explanation! Would you be up for adapting the gist? It seems you have more insight into the whole JSON/JSONB column type than I do :)

jaredbeck · 2017-11-01T16:14:00Z

This has been a good discussion. Thanks Anton and Sean. I'll close this for now, and maybe we can use it as the basis for some more formal benchmarks in the future.

ziaulrehman40 · 2021-11-04T16:22:12Z

Its been quiet some time, and postgres has built a lot on jsonb columns and their performance, i wonder if there are any changes in the benchmarks now? 🤔

aried3r · 2021-11-05T11:30:58Z

@ziaulrehman40, I think I did the benchmarks wrong. Like @seanlinsley mentioned I should have analyzed before.

Having said that, we've been using JSONB ever since with paper_trail and in other places and have not faced any performance problems for our use cases, given the right indexes.

jaredbeck added Documentation labels Jun 26, 2017

jaredbeck closed this as completed Nov 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

object_changes benchmarks with text, json and jsonb as column types #964

object_changes benchmarks with text, json and jsonb as column types #964

aried3r commented May 18, 2017

jaredbeck commented May 18, 2017

aried3r commented May 19, 2017

jaredbeck commented May 19, 2017

aried3r commented May 19, 2017

jaredbeck commented Jun 26, 2017

aried3r commented Jul 6, 2017

aried3r commented Jul 21, 2017

jaredbeck commented Jul 24, 2017

seanlinsley commented Jul 24, 2017

seanlinsley commented Jul 24, 2017

seanlinsley commented Jul 24, 2017

jaredbeck commented Jul 24, 2017

aried3r commented Aug 2, 2017

jaredbeck commented Nov 1, 2017

ziaulrehman40 commented Nov 4, 2021

aried3r commented Nov 5, 2021

object_changes benchmarks with text, json and jsonb as column types #964

object_changes benchmarks with text, json and jsonb as column types #964

Comments

aried3r commented May 18, 2017

jaredbeck commented May 18, 2017

aried3r commented May 19, 2017

jaredbeck commented May 19, 2017

aried3r commented May 19, 2017

jaredbeck commented Jun 26, 2017

aried3r commented Jul 6, 2017

aried3r commented Jul 21, 2017

jaredbeck commented Jul 24, 2017

seanlinsley commented Jul 24, 2017

seanlinsley commented Jul 24, 2017

seanlinsley commented Jul 24, 2017

jaredbeck commented Jul 24, 2017

aried3r commented Aug 2, 2017

jaredbeck commented Nov 1, 2017

ziaulrehman40 commented Nov 4, 2021

aried3r commented Nov 5, 2021