Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip query references in Rbac when not needed #17141

Merged
merged 3 commits into from
Apr 9, 2018

Conversation

NickLaMuro
Copy link
Member

@NickLaMuro NickLaMuro commented Mar 13, 2018

Makes calling .references on the Rbac scope more conservative, since it has the possibility of causing some nasty join bombs.

Detailed explanation

Given roughly these kind of table counts

container_images:       700
containers:             2500
openscap_results:       100
openscap_rule_results:  70000
custom_attributes:      20000

Where half of the custom_attributes are associated with container_images. If the following MiqReport:

cols:
- name
- virtual_custom_attribute_build-date:SECTION:docker_labels
- virtual_custom_attribute_io.openshift.build.name:SECTION:docker_labels
- virtual_custom_attribute_io.openshift.build.namespace:SECTION:docker_labels
include:
  openscap_rule_results:
    columns:
    - severity
  containers:
    columns:
    - state

is run, there would be a "LEFT JOIN" bomb that would end up causing about 40GB of data being returned from the database. In this case as well, there is nothing that is making use of the references for the result, but the extra table data would be returned from the query prior to this commit in a very inefficient manner with loads of duplicate data.

(:custom_attributes => {} is also part of the resulting include, but pretty sure that gets tacked on in the MiqReport#generate call)

Without needing a report, the query that gets executed can be replicated by doing the following:

includes_for_find = {
  :openscap_rule_results => {},
  :containers => {},
  :custom_attributes => {}
}
ContainerImage.includes(includes_for_find)
              .references(includes_for_find)
              .to_sql

Other info

The change to the CloudTenancyMixin was made so that it can be in charge of it's own specific references instead of relying on a quirk of ActiveRecord, in which references(nil) will call references for all of the includes values.

So using the example from above, the following is equivalent:

ContainerImage.includes(includes_for_find).references(nil).to_sql

Unless there is a polymorphic relationship that is being referenced in the Rbac filters that can be triggered via SQL, then this would have worked normally by fluke. This makes it so the CloudTenancyMixin is more resilient.

Links

There is still some N+1's with custom_attributes still present after this change, so I will look into that, but we aren't bombing out in memory with this in place. This was addressed in #17195

TODO

  • Add dev/test mode log output when EXPLAIN is fired

@miq-bot miq-bot added the wip label Mar 13, 2018
@NickLaMuro NickLaMuro force-pushed the optional_report_references branch 2 times, most recently from d24937c to 83c7e0e Compare March 13, 2018 15:02
@kbrock
Copy link
Member

kbrock commented Mar 19, 2018

Looking at skip_references

  • :extra_cols is present when there are sql friendly virtual_attribute values. This column embeds the virtual attributes directly into the primary query, removing N+1 lookups later.
  • apply_limit_in_sql is true when the order is sql friendly. If it is false, then we are bringing back all records to be in memory.
  • references and includes has some nuances, and we blindly used both. This looks to start challenging the blanket "just join to everything" attitude from before, and is very welcome.

Could you talk a little as to how you came up with the logic for this method.

@NickLaMuro
Copy link
Member Author

NickLaMuro commented Mar 19, 2018

@kbrock So I was going to put what I am about to say here in the description, but I realized I have some holes in my logic (I think), so I want to discuss this first before doing that.

First, your to your bullet points:

  • :extra_cols is present when there are sql friendly virtual_attribute values. This column embeds the virtual attributes directly into the primary query, removing N+1 lookups later.

Yup, but really could be anything, and could be abused to use info from the .references a way of doing some weird data munging. So I put that as something that might need the references. Thinking about it now, that probably wasn't really something that we needed be a check for .references removal, but oh well.

  • apply_limit_in_sql is true when the order is sql friendly. If it is false, then we are bringing back all records to be in memory.

This one, I was to be a bit clever on since attrs[:apply_limit_in_sql] takes into acount both exp_attrs[:supported_by_sql] and user_filters:

attrs[:apply_limit_in_sql] = (exp_attrs.nil? || exp_attrs[:supported_by_sql]) && user_filters["belongsto"].blank?

Re-looking at this now, this might have not been the best choice, or at least enough of one. How user_filters is defined is pretty complex, and I don't think the check on user_filters["belongsto"] is enough to completely rule out that user_filters won't need to make use of .references in every use case completely.


Another thing that occurred to me is that I am not checking any of the .where clauses to see if there is any columns being targeted through the references clause, since that is also possible. I am starting to think this is a bit of a rabbit hole to find every case where it is safe to remove .references...

My "blunt force" alternative approach that I was considering was just if you have 3 or more top level keys in the includes, we won't allow references to be applied to the query, and then adding a way to add a :force options for those who #YOLOMiqReportAllDayEveryDay and don't mind it potentially being a memory hog. That said, this solution seemed like it might have been more invasive, and require some UI tweaks to make it work as well.

@NickLaMuro
Copy link
Member Author

@kbrock Curious on your thoughts regarding the second commit I just added. Implementation details in the commit message.

I think this is the best failsafe I can come up with that won't be a large hit to the platform as a whole, and (should) catch all the edge that I most undoubtedly missed in the first commit. That said, it is a bit of a change to Rbac so I would appreciate any thoughts on how to do this better.

@kbrock
Copy link
Member

kbrock commented Mar 26, 2018

@NickLaMuro

doing a scope.explain will both execute the regular query and
the explain

Sigh.

  1. My knee jerk is "no more queries" but to be fair, this is only performed if we were told to skip and it was a mistake?
  2. Do we want to fix code that guesses this wrong? Maybe output for tests or development?
  3. Is it possible to have a special case in reporting to send a skip and detect an issue during the execution of the actual query?

@NickLaMuro
Copy link
Member Author

NickLaMuro commented Mar 26, 2018

@kbrock Just wanted to clarify this, in case it was not clear:

doing a scope.explain will both execute the regular query and
the explain

I am working around this by doing the connection.explain, so only the explain is executed. It is a different than ActiveRecord::Relation#explain (again, not using that), so only the explain is executed and that is pretty quick.

My knee jerk is "no more queries" but to be fair, this is only performed if we were told to skip and it was a mistake?

Correct. And again, it is only performing the EXPLAIN to do the check, and that does provide a quick check if the query is invalid or not.

Do we want to fix code that guesses this wrong? Maybe output for tests or development?

The reason why I am not going this route is I think it is a can of worms trying to figure this out. Just think .where("from_some_table.from_some_col = 4") for example, and how do we check for that?

Is it possible to have a special case in reporting to send a skip and detect an issue during the execution of the actual query?

Maybe, though not sure how I feel about this, but my gut says it will make this more ugly. Will have to think on this one.

@kbrock
Copy link
Member

kbrock commented Mar 28, 2018

Do we want to fix code that guesses this wrong? Maybe output for tests or development?

Yes, we can not detect the imput. But we can detect when the explain blows up
So I was thinking we could output when the explain isn't good stuff.

@NickLaMuro
Copy link
Member Author

Do we want to fix code that guesses this wrong? Maybe output for tests or development?

Yes, we can not detect the imput. But we can detect when the explain blows upSo I was thinking we could output when the explain isn't good stuff.

(heh... "imput")

Ah, I see, I misread this. I was reading more the first bit of that question ("Do we want to fix...") and thinking you were suggesting this for this PR. Instead, I assume you are suggesting that we add some debug/warn output when this has to fallback because of the failed EXPLAIN, so then we have a record of what to fix in the future so this is no longer needed, correct?

@NickLaMuro
Copy link
Member Author

@kbrock did some pretty hefty rebasing to get some tests in, but I think this should have implemented the feedback from above. Look good?

@NickLaMuro NickLaMuro changed the title [WIP] Skip query references in Rbac when not needed Skip query references in Rbac when not needed Mar 30, 2018
@miq-bot miq-bot removed the wip label Mar 30, 2018
@NickLaMuro
Copy link
Member Author

The rest of the rubocop warnings I think are not valid. See ManageIQ/guides#302

@NickLaMuro
Copy link
Member Author

@kbrock with the tests being fixed now, does this check the boxes you had? Do we want to pull someone in for a second opinion to see if this seems reasonable to others and not just our own mind-share?

@NickLaMuro NickLaMuro force-pushed the optional_report_references branch 2 times, most recently from b106048 to 3f4d50e Compare April 5, 2018 00:27
@NickLaMuro
Copy link
Member Author

@kbrock okay, this just got a bit uglier...

So, as we talked about in some PMs, I added some more integration based tests, and that turned out to be a REALLY GOOD IDEA, since it caught a quirk that probably would have bit us down the road. That said, the fix is disgusting (to put it lightly), so I feel even less happy with how this looks at this point. To avoid a book, basically read some of the info in the comments and the last commit message, and see the parts where I do the SQL SAVEPOINT jazz. If you have questions, I can explain further.

I think we probably want another couple of opinions after this latest change, since while it bit us in tests, it will most likely be an issue in the codebase elsewhere when the explain check fails and is applied. @jrafanie , want to take a look and throw some 🍅 's?

lib/rbac/filterer.rb Outdated Show resolved Hide resolved
@@ -13,7 +18,8 @@ def tenant_id_clause_format(tenant_ids)
end

def tenant_joins_clause(scope)
scope.includes(:cloud_tenant => "source_tenant").includes(:ext_management_system)
scope.includes(QUERY_REFERENCES)
.references(QUERY_REFERENCES) # needed for the where to work
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to also make this change over to the other tenant_joins_clause code? e.g.:
app/models/flavor.rb#tenant_joins_clause ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, you might be right, and it looks like we don't have a test for that either... which is how I caught this in the first place... 😑

Copy link
Member Author

@NickLaMuro NickLaMuro Apr 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added some tests around it as well that didn't require Rbac.

Copy link
Member

@jrafanie jrafanie Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# needed for the where to work

As someone who doesn't look at query optimizations all day, I'm not sure what this means. Why is this needed to make the where work? Which where? Can we better explain why?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If read correctly, maybe saying something about being explicit instead of relying on references(nil) to blindly do the right thing sometimes...

Copy link
Member Author

@NickLaMuro NickLaMuro Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jrafanie yeah, looking at this diff on it's own makes it hard to see, but it is for the other method in this short file, #tenant_id_clause_format, to function properly.

So maybe I should change it to:

# needed for the where with `tenant_id_clause_format` to work

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but say what "to work" / function properly means... N+1 in spite of includes? Does it fail to build the sql? How is the existing code working or not working here?

Copy link
Member Author

@NickLaMuro NickLaMuro Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Means the SQL is invalid without it.

The method in this file, tenant_id_clause_format, has columns that "reference" the other tables it is including

def tenant_id_clause_format(tenant_ids)
"(tenants.id IN (?) AND ext_management_systems.tenant_mapping_enabled IS TRUE)..."
end

And I explain how is was not failing briefly in my second commit (which adds the .skip_references bit to begin with). Specifically, this is the explanation from that commit:

The change to the CloudTenancyMixin was made so that it can be in charge
of it's own specific references instead of relying on a quirk of
ActiveRecord, in which references(nil) will call references for all
of the includes values. Also a few models that overwrote the
tenant_joins_clause method had to be updated as well.

tenant_id_clause_format is almost always called with tenant_joins_clause in Rbac (forget the line specifically, but you can look it up).

Makes calling `.references` on the Rbac scope more conservative, since
it has the possibility of causing some nasty join bombs.

The change to the CloudTenancyMixin was made so that it can be in charge
of it's own specific references instead of relying on a quirk of
`ActiveRecord`, in which `references(nil)` will call references for all
of the `includes` values.  Also a few models that overwrote the
tenant_joins_clause method had to be updated as wel.

Moved some tests around to also provide some tests in a more
"integration" based fashion.  Should be mostly additive, and the tests
that I moved are ones that I had written myself and weren't really in
the best spot.

This is a bandage for for the following issue:

* * *

Given roughly these kind of table counts

    container_images:       700
    containers:             2500
    openscap_results:       100
    openscap_rule_results:  70000
    custom_attributes:      20000

Where half of the custom_attributes are associated with
container_images.  If the following MiqReport is run:

    cols:
    - name
    - virtual_custom_attribute_build-date:SECTION:docker_labels
    - virtual_custom_attribute_io.openshift.build.name:SECTION:docker_labels
    - virtual_custom_attribute_io.openshift.build.namespace:SECTION:docker_labels
    include:
      openscap_rule_results:
        columns:
        - severity
      containers:
        columns:
        - state

(`:custom_attributes => {}` is also part of the resulting `include`, but
pretty sure that gets tacked on in the `MiqReport#generate` call)

Without needing a report, the query that gets executed can be replicated
by doing the following:

    irb> includes_for_find = {
           :openscap_rule_results => {},
           :containers => {},
           :custom_attributes => {}
         }
    irb> ContainerImages.includes(includes_for_find)
                        .references(includes_for_find)
                        .to_sql

There would be a "LEFT JOIN" bomb that would end up causing about 40GB
of data being returned from the database.  In this case as well, there
is nothing that is making use of the references for the result, but the
extra table data would be returned from the query prior to this commit
in a very inefficient manner with loads of duplicate data.
Currently, there are a lot of factors that determine whether or not a
query can be executed without the `.references` call.  ActiveRecord
itself is even smart enough to figured it out and add it for you if you
use the hash `.where` syntax:

    ContainerImage.includes(:containers => {})
                  .where(:containers => {:name => "foo"})
                  .to_sql

So as a fail safe, if `skip` is passed to method
`Rbac::Filterer#include_references`, then we can do an much cheaper
explain to figure out if it is a valid query before deciding if skipping
is a bad idea.

This means 1 extra query to every RBac call search that is currently
"skip-able" by our criteria, but they should hopefully be quick to
execute and be a fail safe for what we miss.

Note, doing a `scope.explain` will both execute the regular query and
the explain, and since we have a lot of heavy Rbac calls, this would not
be ideal.  We work around this by calling a private API with `.to_sql`
to only execute the EXPLAIN query, which still returns an error of
`ActiveRecord::StatementInvalid` when it is malformed.

In that same vain, a check if we are in a transaction is also required,
and setting up a sub transaction SAVEPOINT (via
`transaction(:requires_new => true)`)is necessary so we don't pollute
an existing transaction in the process.  More info on that here:

https://stackoverflow.com/a/31146267/3574689
@miq-bot
Copy link
Member

miq-bot commented Apr 5, 2018

Checked commits NickLaMuro/manageiq@826f579~...fcc4633 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
6 files checked, 4 offenses detected

lib/rbac/filterer.rb

@kbrock
Copy link
Member

kbrock commented Apr 5, 2018

the explain looked like it was going to add a ton of complexity. In the end, it turned out alright.

@NickLaMuro
Copy link
Member Author

@Fryguy @jrafanie : Could I have one of you (or both?) take a look at this. After @kbrock and I have been looking at for a week or so now, I think we have too much mind share to be objective about this PR at the moment. Thanks.

raise ActiveRecord::Rollback
end
end
# If the result of the transaction is non-nil, then the block was
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickLaMuro when do we expect we'd try to skip references with an invalid scope? What benefit does the above diagnostics provide over trying to use an invalid scope directly and blowing up? In other words, why do you think the above is needed? I'm asking because I don't know.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickLaMuro when do we expect we'd try to skip references with an invalid scope?

That is the "Million Dollar Question"...

I think this comment, and the comment from Keenan that spawned it kind sums it up:

#17141 (comment)

But basically, I don't think there is a way to tell (properly) that we have all of our basis covered by removing .references. This is basically an attempt to to make sure we aren't making any new breakages with this new fix. That said, this is straight up a HACK, no question, but I don't know of a better way to handle this. Before that, it was just the first two commits in this PR, and the EXPLAIN came in after the fact when I was questioning the work I did would be valid in all cases. I don't necessarily think our tests cover all of our bases either, so that is why I am using this to attempt to make it so we don't make things worse with this "fix".

What benefit does the above diagnostics provide over trying to use an invalid scope directly and blowing up?

Based on what I said above, the point of the diagnostics is to eventually get to a state where the EXPLAIN is deemed not needed. The hope is that it would bombard the logs with output that would incentivize the developer to fix this in a more meaningful way instead of letting error continue. Though, that said, I doubt these errors would show up in a QE environment, so maybe finding a way to trigger it in those environments might not be a bad idea either.

In other words, why do you think the above is needed?

I realize that this was kind of answered in the above, but to re-state that answer: I don't know.

I have added tests that simulate ways it could happen with the interfaces that Rbac provides, so it is "theoretically" possible, but whether this would actually be something that is triggered in practice is honestly an unknown. I basically was trying to code defensively knowing that I don't know every possible report or automation script that might be going through this that could potentially break without the .references addition in place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to the CloudTenancyMixin was made so that it can be in charge of it's own specific references instead of relying on a quirk of ActiveRecord, in which references(nil) will call references for all of the includes values.

I guess I was trying to understand what failing caller code would look like since I don't follow when this optimization would fail and inform the developer to fix it.

As an example, if you rolled back the changes to explicitly add .references to these models instead of relying on the quirk, would this explain debugging have caught an invalid scope?

Copy link
Member Author

@NickLaMuro NickLaMuro Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example, if you rolled back the changes to explicitly add .references to these models instead of relying on the quirk, would this explain debugging have caught an invalid scope?

Yes, but I caught this myself when the tests failed, and this was prior to me adding the EXPLAIN concept after Keenan's first review. Prior to that, I only had a single commit in this PR:

"Skip query references in Rbac when not needed"

I have rebased so many times at this point that the only thing that has stayed consistent is the commit title.

That said, the CloudTenancyMixin happened to be a prime example where we would want to fix things if the "EXPLAIN try/catch" did get triggered, since it is a known and consistent scope. Not sure that will always be the case with this for reporting though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was trying to understand what failing caller code would look like since I don't follow when this optimization would fail and inform the developer to fix it.

You will get a Postrgresql error that the table is not part of the FROM (or JOIN).

Copy link
Member Author

@NickLaMuro NickLaMuro Apr 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a quick rails console example, you can run the following and see what happens when .references isn't present:

irb> VmOrTemplate.includes(:host).where('"hosts"."name" = ?', 'foo')
#=> # BOOM!
irb> VmOrTemplate.includes(:host).references(:host).where('"hosts"."name" = ?', 'foo')
#=> # works!

Calling a .to_sql on those lines will also show you the resulting SQL from that, though the second is a little bit tough to digest...

@jrafanie
Copy link
Member

jrafanie commented Apr 9, 2018

@miq-bot add_label gaprindashvili/yes
@miq-bot add_label fine/yes

Copy link
Member

@jrafanie jrafanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We'll see what gremlins pop out of this change. What could go wrong?

@jrafanie jrafanie merged commit 05ab3d0 into ManageIQ:master Apr 9, 2018
@jrafanie jrafanie added this to the Sprint 83 Ending Apr 9, 2018 milestone Apr 9, 2018
@jrafanie jrafanie self-assigned this Apr 9, 2018
@NickLaMuro
Copy link
Member Author

Looks good. We'll see what gremlins pop out of this change. What could go wrong?

Famous Last words. @jrafanie - 2018

simaishi pushed a commit that referenced this pull request Apr 10, 2018
@simaishi
Copy link
Contributor

Gaprindashvili backport details:

$ git log -1
commit 6961422f206ed0d413896c74d74f369e0bccb8cd
Author: Joe Rafaniello <jrafanie@users.noreply.github.com>
Date:   Mon Apr 9 16:23:21 2018 -0400

    Merge pull request #17141 from NickLaMuro/optional_report_references
    
    Skip query references in Rbac when not needed
    (cherry picked from commit 05ab3d0c7be3206541d52a2b950b0623b48f1939)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1565677

simaishi pushed a commit that referenced this pull request Apr 10, 2018
@simaishi
Copy link
Contributor

Fine backport details:

$ git log -1
commit eae68eacf8482c100bf51caf971669ea856b0c9a
Author: Joe Rafaniello <jrafanie@users.noreply.github.com>
Date:   Mon Apr 9 16:23:21 2018 -0400

    Merge pull request #17141 from NickLaMuro/optional_report_references
    
    Skip query references in Rbac when not needed
    (cherry picked from commit 05ab3d0c7be3206541d52a2b950b0623b48f1939)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1565678

d-m-u pushed a commit to d-m-u/manageiq that referenced this pull request Jun 6, 2018
…erences

Skip query references in Rbac when not needed
(cherry picked from commit 05ab3d0)

https://bugzilla.redhat.com/show_bug.cgi?id=1565678
@kbrock kbrock mentioned this pull request Sep 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants