Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft delete instead of disconnect for containers models migrations #15250

Conversation

Ladas
Copy link
Contributor

@Ladas Ladas commented May 30, 2017

Soft delete instead of disconnect for containers models migrations + data migration specs

Using partial indexes have actually a better cots than indexing :ems_id and using disconnect:

Testing on queries, on 20k deleted and 20k not deleted container images:
pp ems = ManageIQ::Providers::ContainerManager.find(26); pp ems.container_images.where("id > 1111").limit(1000).explain; pp ems.container_images.order(:name).limit(1000).explain; pp ContainerImage.not_deleted.limit(1000).explain; ContainerImage.not_deleted.order(:name).limit(1000).explain

The soft delete with partial index on :deleted => true:

add_index :container_images, :deleted, name: "index_container_images_on_deleted_false", where: "NOT deleted"
----------------------------------------------------------------------
 ContainerImage Load (5.1ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted" = $2 AND (id > 1111) LIMIT $3  [["ems_id", 26], ["deleted", "f"], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (16.2ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted" = $2 AND (id > 1111) LIMIT $3 [["ems_id", 26], ["deleted", "f"], ["LIMIT", 1000]]
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..141.19 rows=1000 width=950)
   ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.29..2749.01 rows=19508 width=950)
         Index Cond: (deleted = false)
         Filter: ((id > 1111) AND (ems_id = '26'::bigint))
(4 rows)

  ContainerImage Load (21.5ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted" = $2 ORDER BY "container_images"."name" ASC LIMIT $3  [["ems_id", 26], ["deleted", "f"], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (19.8ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted" = $2 ORDER BY "container_images"."name" ASC LIMIT $3 [["ems_id", 26], ["deleted", "f"], ["LIMIT", 1000]]
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3823.80..3826.30 rows=1000 width=950)
   ->  Sort  (cost=3823.80..3873.95 rows=20060 width=950)
         Sort Key: name
         ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.29..2723.93 rows=20060 width=950)
               Index Cond: (deleted = false)
               Filter: (ems_id = '26'::bigint)
(6 rows)

  ContainerImage Load (3.4ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted" = $1 LIMIT $2  [["deleted", "f"], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (12.0ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted" = $1 LIMIT $2 [["deleted", "f"], ["LIMIT", 1000]]
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..134.81 rows=1000 width=950)
   ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.29..2698.85 rows=20060 width=950)
         Index Cond: (deleted = false)
(3 rows)

  ContainerImage Load (19.0ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted" = $1 ORDER BY "container_images"."name" ASC LIMIT $2  [["deleted", "f"], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (10.6ms - 1000rows)
 => EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted" = $1 ORDER BY "container_images"."name" ASC LIMIT $2 [["deleted", "f"], ["LIMIT", 1000]]
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3798.72..3801.22 rows=1000 width=950)
   ->  Sort  (cost=3798.72..3848.87 rows=20060 width=950)
         Sort Key: name
         ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.29..2698.85 rows=20060 width=950)
               Index Cond: (deleted = false)
(5 rows)

The :ems_id => nil disconnect

add_index :container_images, :ems_id, name: "index_container_images_on_ems_id"
-------------------------------------------------------------------------------
ContainerImage Load (4.2ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND (id > 1111) LIMIT $2  [["ems_id", 26], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (40.4ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND (id > 1111) LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..147.09 rows=1000 width=948)
   ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.29..2882.43 rows=19633 width=948)
         Index Cond: (ems_id = '26'::bigint)
         Filter: (id > 1111)
(4 rows)

  ContainerImage Load (20.4ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 ORDER BY "container_images"."name" ASC LIMIT $2  [["ems_id", 26], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (14.2ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 ORDER BY "container_images"."name" ASC LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3939.84..3942.34 rows=1000 width=948)
   ->  Sort  (cost=3939.84..3990.36 rows=20207 width=948)
         Sort Key: name
         ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.29..2831.91 rows=20207 width=948)
               Index Cond: (ems_id = '26'::bigint)
(5 rows)

  ContainerImage Load (3.2ms)  SELECT  "container_images".* FROM "container_images" WHERE ("container_images"."ems_id" IS NOT NULL) LIMIT $1  [["LIMIT", 1000]]
  ContainerImage Inst Including Associations (29.8ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE ("container_images"."ems_id" IS NOT NULL) LIMIT $1 [["LIMIT", 1000]]
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..140.42 rows=1000 width=948)
   ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.29..2831.91 rows=20207 width=948)
         Index Cond: (ems_id IS NOT NULL)
(3 rows)

  ContainerImage Load (19.9ms)  SELECT  "container_images".* FROM "container_images" WHERE ("container_images"."ems_id" IS NOT NULL) ORDER BY "container_images"."name" ASC LIMIT $1  [["LIMIT", 1000]]
  ContainerImage Inst Including Associations (12.4ms - 1000rows)
 => EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE ("container_images"."ems_id" IS NOT NULL) ORDER BY "container_images"."name" ASC LIMIT $1 [["LIMIT", 1000]]
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3939.84..3942.34 rows=1000 width=948)
   ->  Sort  (cost=3939.84..3990.36 rows=20207 width=948)
         Sort Key: name
         ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.29..2831.91 rows=20207 width=948)
               Index Cond: (ems_id IS NOT NULL)
(5 rows)

@Ladas
Copy link
Contributor Author

Ladas commented May 30, 2017

@miq-bot assign @agrare

@Ladas
Copy link
Contributor Author

Ladas commented May 30, 2017

@miq-bot add_label enhancement

@Ladas
Copy link
Contributor Author

Ladas commented May 30, 2017

cc @cben @simon3z

@simon3z
Copy link
Contributor

simon3z commented May 30, 2017

cc @zeari

@Ladas
Copy link
Contributor Author

Ladas commented May 31, 2017

@agrare @Fryguy I wonder is :deleted is the best, it could be also :archived, or ...?

Copy link
Member

@kbrock kbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good start.

just indexing deleted will help with using bitmap indexes, but these are not that good.
We need to get the where clause into the actual index that will be used.

So I'm guessing we're going to need to look at how deleted will actually be used and change most of our indexes for these tables.

end

def disconnect_to_soft_delete(model)
model.where(:ems_id => nil).find_each do |rec|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move over to a single query?

model.where(:ems_id => nil).update_all('ems_id = old_ems_id AND deleted = true')

@Ladas Ladas force-pushed the soft_delete_instead_of_disconnect_for_containers_models_migrations branch 2 times, most recently from 0666604 to 6c89530 Compare June 5, 2017 08:26
@Ladas
Copy link
Contributor Author

Ladas commented Jun 5, 2017

@kbrock data migration is using update_all now

@cben @kbrock I am using just the :deleted_on column now, but the query cost came up almost 3x times. But I guess it's not that bad in the end, what do you think? The partial index is only slightly better now, so not really worth it.

using partial index on deleted_on IS NULL

ContainerImage Load (4.8ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL AND (id > 1111) LIMIT $2  [["ems_id", 26], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (18.2ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL AND (id > 1111) LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..419.70 rows=1000 width=922)
   ->  Index Scan using index_container_images_on_deleted_on_null on container_images  (cost=0.29..8284.82 rows=19753 width=922)
         Index Cond: (deleted_on IS NULL)
         Filter: ((id > 1111) AND (ems_id = '26'::bigint))
(4 rows)

  ContainerImage Load (27.2ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $2  [["ems_id", 26], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (21.8ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=9283.32..9285.82 rows=1000 width=922)
   ->  Sort  (cost=9283.32..9333.40 rows=20034 width=922)
         Sort Key: name
         ->  Index Scan using index_container_images_on_deleted_on_null on container_images  (cost=0.29..8184.87 rows=20034 width=922)
               Index Cond: (deleted_on IS NULL)
               Filter: (ems_id = '26'::bigint)
(6 rows)

  ContainerImage Load (3.3ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL LIMIT $1  [["LIMIT", 1000]]
  ContainerImage Inst Including Associations (26.5ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL LIMIT $1 [["LIMIT", 1000]]
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..202.51 rows=1000 width=922)
   ->  Index Scan using index_container_images_on_deleted_on_null on container_images  (cost=0.29..8084.92 rows=39979 width=922)
         Index Cond: (deleted_on IS NULL)
(3 rows)

  ContainerImage Load (40.2ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $1  [["LIMIT", 1000]]
  ContainerImage Inst Including Associations (11.8ms - 1000rows)
 => EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $1 [["LIMIT", 1000]]
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10276.93..10279.43 rows=1000 width=922)
   ->  Sort  (cost=10276.93..10376.88 rows=39979 width=922)
         Sort Key: name
         ->  Index Scan using index_container_images_on_deleted_on_null on container_images  (cost=0.29..8084.92 rows=39979 width=922)
               Index Cond: (deleted_on IS NULL)
(5 rows)

using index on :deleted_on

 ContainerImage Load (3.6ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL AND (id > 1111) LIMIT $2  [["ems_id", 26], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (30.7ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL AND (id > 1111) LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..818.82 rows=1000 width=924)
   ->  Index Scan using index_container_images_on_deleted_on_null on container_images  (cost=0.29..8069.36 rows=9858 width=924)
         Index Cond: (deleted_on IS NULL)
         Filter: ((id > 1111) AND (ems_id = '26'::bigint))
(4 rows)

  ContainerImage Load (29.3ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $2  [["ems_id", 26], ["LIMIT", 1000]]
  ContainerImage Inst Including Associations (15.1ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8517.52..8520.02 rows=1000 width=924)
   ->  Sort  (cost=8517.52..8542.52 rows=10001 width=924)
         Sort Key: name
         ->  Index Scan using index_container_images_on_deleted_on_null on container_images  (cost=0.29..7969.17 rows=10001 width=924)
               Index Cond: (deleted_on IS NULL)
               Filter: (ems_id = '26'::bigint)
(6 rows)

  ContainerImage Load (3.0ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL LIMIT $1  [["LIMIT", 1000]]
  ContainerImage Inst Including Associations (10.0ms - 1000rows)
EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL LIMIT $1 [["LIMIT", 1000]]
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..196.64 rows=1000 width=924)
   ->  Index Scan using index_container_images_on_deleted_on_null on container_images  (cost=0.29..7868.98 rows=40075 width=924)
         Index Cond: (deleted_on IS NULL)
(3 rows)

  ContainerImage Load (40.4ms)  SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $1  [["LIMIT", 1000]]
  ContainerImage Inst Including Associations (14.8ms - 1000rows)
 => EXPLAIN for: SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $1 [["LIMIT", 1000]]
                                                              QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10066.25..10068.75 rows=1000 width=924)
   ->  Sort  (cost=10066.25..10166.44 rows=40075 width=924)
         Sort Key: name
         ->  Index Scan using index_container_images_on_deleted_on_null on container_images  (cost=0.29..7868.98 rows=40075 width=924)
               Index Cond: (deleted_on IS NULL)
(5 rows)

@Ladas
Copy link
Contributor Author

Ladas commented Jun 5, 2017

@lpichler could you review the migrations?

@Ladas Ladas force-pushed the soft_delete_instead_of_disconnect_for_containers_models_migrations branch from 6c89530 to e317c15 Compare June 7, 2017 16:15
@Ladas
Copy link
Contributor Author

Ladas commented Jun 7, 2017

@kbrock @cben @agrare ok I switched back to using :deleted boolean, since it's more performant. To this should be ready now

@kbrock
Copy link
Member

kbrock commented Jun 8, 2017

cool schema.yml update update looks like the only missing bit

Copy link
Member

@kbrock kbrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy with these changes.

Once, we change the code for deleted, then we can see what other indexes are needed

Copy link
Contributor

@lpichler lpichler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Ladas added 6 commits June 8, 2017 10:35
Add :deleted to the disconnectable Containers tables, which
will allow us to mark soft deleted records.
Change disconnected Containers records to soft deleted, reconnecting
the old_ems_id if possible.
Specs for changing disconnected Containers records to soft deleted
Add partial index to :deleted col to speedup queries. It has
to be a partial index, otherwise the cost is too high and
PG will not pick it.
Fix the comment :deleted => true goes to :ems_id => nil
Optimize data migration queries to run in 1 query per model
@Fryguy
Copy link
Member

Fryguy commented Jun 13, 2017

@agrare Can you do a first-pass review?


add_index :container_definitions, :deleted,
:name => "index_container_definitions_on_deleted_false",
:where => "NOT deleted"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a partial index on the boolean do? Why have the where clause at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a boolean it's recommended. Since PG would not pick a normal index on a boolean, because the cost is too high.

add_column :container_groups, :deleted, :boolean, :default => false, :null => false
add_column :container_images, :deleted, :boolean, :default => false, :null => false
add_column :container_projects, :deleted, :boolean, :default => false, :null => false
add_column :containers, :deleted, :boolean, :default => false, :null => false
Copy link
Member

@Fryguy Fryguy Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer default_value_for over the :default => false, :null => false, or does having it in the column contribute to better performance?

Additionally, this PR doesn't handle upgrading existing values in the database to set the default to true where things are already deleted. NVM, didn't see the second follow-up migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I wan't just true/false ensured, so the indexing space is small

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_value_for ensures a value and would keep the indexing space small as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a default_value_for a ruby thing, done on Model? It make sense to have this ensured on the DB layer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to have this debate again. 😄 We intentionally avoid DB layer logic (foreign keys, default values, etc), so that we can maintain maximum flexibility for release branches. That is, since we don't allow schema changes / db migrations on release branches, if we make a mistake we can't fix it. (Additionally, when we were on rubyrep, these constraints had always proven to be pratically impossible for replication. Now that we are on pglogical, that argument is probably moot.)

I've been hesitant to allow these through unless they are absolutely necessary (e.g. there'd have to be a compelling performance reason)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we use a DB level replication, that argument is not valid, but good to know the past reasons. :-)

So saying that, what are the other concrete blockers? Given we want to design it right, this should be the correct way. So I would like to avoid bad DB design, unless there are real concerns.

Btw. I can't think of any reason that would require a later change in this column. Deleted true/false should be the end state.

@Fryguy
Copy link
Member

Fryguy commented Jun 13, 2017

@Ladas Can you give me a tl;dr on your EXPLAINs there? Maybe a little chart or something that can show me the pertinent details in a glance?

@Ladas
Copy link
Contributor Author

Ladas commented Jun 14, 2017

@Fryguy right, so the explains are testing few SQL queries we are using through the MIQ and it seems that using a partial index on a boolean has the lowest cost. The cost of using :deleted_on => nil was almost 3x as high.

The queries were tested on a DB having 1M records + in each table, to make sure sequential scan will not be picked. (1M on a different manager, 40k on a tested manager, where 20k is active and 20k disconnected)

@Fryguy
Copy link
Member

Fryguy commented Jun 16, 2017

The cost of using :deleted_on => nil was almost 3x as high.

but 3x on what magnitude? Are we talking 0.001ms to 0.003 ms? 😆

@Fryguy
Copy link
Member

Fryguy commented Jun 16, 2017

I depesz-ed your explains above for readability (I just can't read them haha). however it looks like you just ran them with EXPLAIN instead of EXPLAIN ANALYZE. Can you do the latter, which tends to give more interesting results as you also get the real-world query times?

using partial index on deleted_on IS NULL

SELECT "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL AND (id > 1111) LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
https://explain.depesz.com/s/mOZk

SELECT "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
https://explain.depesz.com/s/2N98

SELECT "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL LIMIT $1 [["LIMIT", 1000]]
https://explain.depesz.com/s/BbU8

SELECT "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $1 [["LIMIT", 1000]]
https://explain.depesz.com/s/5jaW

using index on :deleted_on

SELECT "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL AND (id > 1111) LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
https://explain.depesz.com/s/BrUK

SELECT "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = $1 AND "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $2 [["ems_id", 26], ["LIMIT", 1000]]
https://explain.depesz.com/s/sDvO

SELECT "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL LIMIT $1 [["LIMIT", 1000]]
https://explain.depesz.com/s/tVoa

SELECT "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT $1 [["LIMIT", 1000]]
https://explain.depesz.com/s/KyJl


ProTip: Open the links for the same query in two separate tabs then click back and forth to see the differences.

@zakiva
Copy link
Contributor

zakiva commented Jun 19, 2017

@Ladas Please note that Container Node will be also archived instead of being deleted, I'm adding this in #15351

@Ladas
Copy link
Contributor Author

Ladas commented Jun 19, 2017

@zakiva could you do it this way then? So we don't have to do more data migrations.

@zakiva
Copy link
Contributor

zakiva commented Jun 19, 2017

@zakiva could you do it this way then? So we don't have to do more data migrations.

@Ladas #15351 is targeted to 5.8.1 BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1460401 and we would like to merge this as soon as possible, so I prefer not to create dependency on this PR with all its migrations.
cc @simon3z @moolitayer

@Ladas
Copy link
Contributor Author

Ladas commented Jun 20, 2017

@zakiva ah, I think we can't put #15351 into 5.8.1, since we can't backport a migration. @Fryguy right?

@Ladas
Copy link
Contributor Author

Ladas commented Jun 20, 2017

@Fryguy let me measure the EXPLAIN ANALYZE and get back to you :-)

@Ladas
Copy link
Contributor Author

Ladas commented Jun 20, 2017

@Fryguy this is explain analyze for the partial index on :deleted

pp ems = ManageIQ::Providers::ContainerManager.find(26); conl = ->(q) {ActiveRecord::Base.connection.execute("EXPLAIN ANALYZE #{q.to_sql}") }; pp conl.call(ems.container_images.where("id > 1111").limit(1000)).to_a; pp conl.call(ems.container_images.order(:name).limit(1000)).to_a; pp conl.call(ContainerImage.active.limit(1000)).to_a; pp conl.call(ContainerImage.active.order(:name).limit(1000)).to_a

   (1.7ms)  EXPLAIN ANALYZE SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = 26 AND "container_images"."deleted" = 'f' AND (id > 1111) LIMIT 1000
[{"QUERY PLAN"=>
   "Limit  (cost=0.41..245.49 rows=1000 width=911) (actual time=0.026..0.989 rows=1000 loops=1)"},
 {"QUERY PLAN"=>
   "  ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.41..3176.08 rows=12958 width=911) (actual time=0.025..0.900 rows=1000 loops=1)"},
 {"QUERY PLAN"=>"        Index Cond: (deleted = false)"},
 {"QUERY PLAN"=>"        Filter: ((id > 1111) AND (ems_id = 26))"},
 {"QUERY PLAN"=>"Planning time: 0.180 ms"},
 {"QUERY PLAN"=>"Execution time: 1.074 ms"}]
   (32.9ms)  EXPLAIN ANALYZE SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = 26 AND "container_images"."deleted" = 'f' ORDER BY "container_images"."name" ASC LIMIT 1000
[{"QUERY PLAN"=>
   "Limit  (cost=3860.84..3863.34 rows=1000 width=911) (actual time=32.225..32.340 rows=1000 loops=1)"},
 {"QUERY PLAN"=>
   "  ->  Sort  (cost=3860.84..3893.55 rows=13082 width=911) (actual time=32.223..32.279 rows=1000 loops=1)"},
 {"QUERY PLAN"=>"        Sort Key: name"},
 {"QUERY PLAN"=>"        Sort Method: top-N heapsort  Memory: 1068kB"},
 {"QUERY PLAN"=>
   "        ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.41..3143.57 rows=13082 width=911) (actual time=0.017..22.617 rows=20107 loops=1)"},
 {"QUERY PLAN"=>"              Index Cond: (deleted = false)"},
 {"QUERY PLAN"=>"              Filter: (ems_id = 26)"},
 {"QUERY PLAN"=>"              Rows Removed by Filter: 20001"},
 {"QUERY PLAN"=>"Planning time: 0.096 ms"},
 {"QUERY PLAN"=>"Execution time: 32.415 ms"}]
   (1.4ms)  EXPLAIN ANALYZE SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted" = 'f' LIMIT 1000
[{"QUERY PLAN"=>
   "Limit  (cost=0.41..78.07 rows=1000 width=911) (actual time=0.022..0.934 rows=1000 loops=1)"},
 {"QUERY PLAN"=>
   "  ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.41..3111.06 rows=40056 width=911) (actual time=0.022..0.851 rows=1000 loops=1)"},
 {"QUERY PLAN"=>"        Index Cond: (deleted = false)"},
 {"QUERY PLAN"=>"Planning time: 0.086 ms"},
 {"QUERY PLAN"=>"Execution time: 1.010 ms"}]
   (43.4ms)  EXPLAIN ANALYZE SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted" = 'f' ORDER BY "container_images"."name" ASC LIMIT 1000
[{"QUERY PLAN"=>
   "Limit  (cost=5307.29..5309.79 rows=1000 width=911) (actual time=42.769..42.882 rows=1000 loops=1)"},
 {"QUERY PLAN"=>
   "  ->  Sort  (cost=5307.29..5407.43 rows=40056 width=911) (actual time=42.767..42.814 rows=1000 loops=1)"},
 {"QUERY PLAN"=>"        Sort Key: name"},
 {"QUERY PLAN"=>"        Sort Method: top-N heapsort  Memory: 1066kB"},
 {"QUERY PLAN"=>
   "        ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.41..3111.06 rows=40056 width=911) (actual time=0.021..21.500 rows=40108 loops=1)"},
 {"QUERY PLAN"=>"              Index Cond: (deleted = false)"},
 {"QUERY PLAN"=>"Planning time: 0.094 ms"},
 {"QUERY PLAN"=>"Execution time: 42.940 ms"}]

@Ladas
Copy link
Contributor Author

Ladas commented Jun 20, 2017

@Fryguy and for general index on :deleted_on

 (2.6ms)  EXPLAIN ANALYZE SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = 26 AND "container_images"."deleted_on" IS NULL AND (id > 1111) LIMIT 1000
[{"QUERY PLAN"=>
   "Limit  (cost=0.42..715.01 rows=1000 width=913) (actual time=0.027..1.637 rows=1000 loops=1)"},
 {"QUERY PLAN"=>
   "  ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.42..9285.89 rows=12994 width=913) (actual time=0.025..1.457 rows=1000 loops=1)"},
 {"QUERY PLAN"=>"        Index Cond: (deleted_on IS NULL)"},
 {"QUERY PLAN"=>"        Filter: ((id > 1111) AND (ems_id = 26))"},
 {"QUERY PLAN"=>"Planning time: 0.217 ms"},
 {"QUERY PLAN"=>"Execution time: 1.788 ms"}]
(44.6ms)  EXPLAIN ANALYZE SELECT  "container_images".* FROM "container_images" WHERE "container_images"."ems_id" = 26 AND "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT 1000
[{"QUERY PLAN"=>
   "Limit  (cost=9904.25..9906.75 rows=1000 width=913) (actual time=43.564..43.680 rows=1000 loops=1)"},
 {"QUERY PLAN"=>
   "  ->  Sort  (cost=9904.25..9937.04 rows=13117 width=913) (actual time=43.563..43.607 rows=1000 loops=1)"},
 {"QUERY PLAN"=>"        Sort Key: name"},
 {"QUERY PLAN"=>"        Sort Method: top-N heapsort  Memory: 1068kB"},
 {"QUERY PLAN"=>
   "        ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.42..9185.06 rows=13117 width=913) (actual time=0.030..27.377 rows=20107 loops=1)"},
 {"QUERY PLAN"=>"              Index Cond: (deleted_on IS NULL)"},
 {"QUERY PLAN"=>"              Filter: (ems_id = 26)"},
 {"QUERY PLAN"=>"              Rows Removed by Filter: 20001"},
 {"QUERY PLAN"=>"Planning time: 0.174 ms"},
 {"QUERY PLAN"=>"Execution time: 43.895 ms"}]
   (0.8ms)  EXPLAIN ANALYZE SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL LIMIT 1000
[{"QUERY PLAN"=>
   "Limit  (cost=0.42..225.64 rows=1000 width=913) (actual time=0.011..0.547 rows=1000 loops=1)"},
 {"QUERY PLAN"=>
   "  ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.42..9084.23 rows=40332 width=913) (actual time=0.011..0.490 rows=1000 loops=1)"},
 {"QUERY PLAN"=>"        Index Cond: (deleted_on IS NULL)"},
 {"QUERY PLAN"=>"Planning time: 0.052 ms"},
 {"QUERY PLAN"=>"Execution time: 0.599 ms"}]
   (41.6ms)  EXPLAIN ANALYZE SELECT  "container_images".* FROM "container_images" WHERE "container_images"."deleted_on" IS NULL ORDER BY "container_images"."name" ASC LIMIT 1000
[{"QUERY PLAN"=>
   "Limit  (cost=11295.59..11298.09 rows=1000 width=913) (actual time=40.953..41.050 rows=1000 loops=1)"},
 {"QUERY PLAN"=>
   "  ->  Sort  (cost=11295.59..11396.42 rows=40332 width=913) (actual time=40.952..41.004 rows=1000 loops=1)"},
 {"QUERY PLAN"=>"        Sort Key: name"},
 {"QUERY PLAN"=>"        Sort Method: top-N heapsort  Memory: 1066kB"},
 {"QUERY PLAN"=>
   "        ->  Index Scan using index_container_images_on_deleted_false on container_images  (cost=0.42..9084.23 rows=40332 width=913) (actual time=0.012..18.389 rows=40108 loops=1)"},
 {"QUERY PLAN"=>"              Index Cond: (deleted_on IS NULL)"},
 {"QUERY PLAN"=>"Planning time: 0.060 ms"},
 {"QUERY PLAN"=>"Execution time: 41.234 ms"}]

@Ladas
Copy link
Contributor Author

Ladas commented Jun 20, 2017

@Fryguy so the explain analyze is comparable, but the table count is not that high (80k images in 2 managers), also the times do change

cc @kbrock

So would you rather use the index on deleted_on => nil ? The higher cost might be noticeable on higher table count.

@Fryguy
Copy link
Member

Fryguy commented Jun 20, 2017

Would like @simon3z to review before proceeding to make sure this works for him and his team.

@simon3z
Copy link
Contributor

simon3z commented Jun 20, 2017

Would like @simon3z to review before proceeding to make sure this works for him and his team.

Thanks @Fryguy, indeed we need @cben and @zeari to look at this (and officially approve) because it's touching several things they worked on.

@Ladas
Copy link
Contributor Author

Ladas commented Jun 20, 2017

@zeari @cben probably last thing to decide is, whether to use :deleted bool column, or just :deleted_on. The :deleted_on has more cost, but the actual speed difference might be ivsible only on larger tables. So I am tentative to go with :deleted_on since it's just simpler for now

@zeari
Copy link

zeari commented Jun 22, 2017

@zeari @cben probably last thing to decide is, whether to use :deleted bool column, or just :deleted_on. The :deleted_on has more cost, but the actual speed difference might be ivsible only on larger tables. So I am tentative to go with :deleted_on since it's just simpler for now

Im ok with that.

…nnect

Migration keep :deleted_on as the only only source of truth for disconnect
@Ladas Ladas force-pushed the soft_delete_instead_of_disconnect_for_containers_models_migrations branch from b11b3cc to c6199d7 Compare June 22, 2017 09:33
@Ladas
Copy link
Contributor Author

Ladas commented Jun 22, 2017

cc @zeari @cben @agrare

ok, done. changing to use :deleted_on as only source of truth for disconnect

so it should be ready to merge if you guys agree :-) also if you agree with the data migration, we are basically just putting old_ems_id into ems_id

Reword the migration comments
@Ladas Ladas force-pushed the soft_delete_instead_of_disconnect_for_containers_models_migrations branch from 4ce65c3 to 64e655f Compare June 22, 2017 09:43
@simon3z
Copy link
Contributor

simon3z commented Jun 22, 2017

@cben please review/approve, thanks.

Update specs to work with deleted_on
@miq-bot
Copy link
Member

miq-bot commented Jun 22, 2017

Checked commits Ladas/manageiq@9f2ce26~...143d67d with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
3 files checked, 2 offenses detected

db/migrate/20170530102536_use_deleted_in_containers_tables.rb

@Fryguy
Copy link
Member

Fryguy commented Jun 22, 2017

Database migrations have now been moved to the https://github.com/ManageIQ/manageiq-schema repo. Please see http://talk.manageiq.org/t/new-split-repo-manageiq-schema/2478 for instructions on how to transfer your database migrations. If this PR contains only migrations, I will leave it open for a short time during the transition, after which I will close this if it has not been moved over.

@Ladas
Copy link
Contributor Author

Ladas commented Jun 23, 2017

replaced with ManageIQ/manageiq-schema#18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants