-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip query references in Rbac when not needed #17141
Skip query references in Rbac when not needed #17141
Conversation
d24937c
to
83c7e0e
Compare
Looking at
Could you talk a little as to how you came up with the logic for this method. |
@kbrock So I was going to put what I am about to say here in the description, but I realized I have some holes in my logic (I think), so I want to discuss this first before doing that. First, your to your bullet points:
Yup, but really could be anything, and could be abused to use info from the
This one, I was to be a bit clever on since attrs[:apply_limit_in_sql] = (exp_attrs.nil? || exp_attrs[:supported_by_sql]) && user_filters["belongsto"].blank? Re-looking at this now, this might have not been the best choice, or at least enough of one. How Another thing that occurred to me is that I am not checking any of the My "blunt force" alternative approach that I was considering was just if you have 3 or more top level keys in the |
83c7e0e
to
780e6f9
Compare
598de43
to
03ac680
Compare
@kbrock Curious on your thoughts regarding the second commit I just added. Implementation details in the commit message. I think this is the best failsafe I can come up with that won't be a large hit to the platform as a whole, and (should) catch all the edge that I most undoubtedly missed in the first commit. That said, it is a bit of a change to Rbac so I would appreciate any thoughts on how to do this better. |
Sigh.
|
@kbrock Just wanted to clarify this, in case it was not clear:
I am working around this by doing the
Correct. And again, it is only performing the
The reason why I am not going this route is I think it is a can of worms trying to figure this out. Just think
Maybe, though not sure how I feel about this, but my gut says it will make this more ugly. Will have to think on this one. |
Yes, we can not detect the imput. But we can detect when the |
(heh... "imput") Ah, I see, I misread this. I was reading more the first bit of that question ("Do we want to fix...") and thinking you were suggesting this for this PR. Instead, I assume you are suggesting that we add some debug/warn output when this has to fallback because of the failed |
03ac680
to
449f6ff
Compare
@kbrock did some pretty hefty rebasing to get some tests in, but I think this should have implemented the feedback from above. Look good? |
449f6ff
to
944da19
Compare
The rest of the |
944da19
to
6c193d9
Compare
@kbrock with the tests being fixed now, does this check the boxes you had? Do we want to pull someone in for a second opinion to see if this seems reasonable to others and not just our own mind-share? |
b106048
to
3f4d50e
Compare
@kbrock okay, this just got a bit uglier... So, as we talked about in some PMs, I added some more I think we probably want another couple of opinions after this latest change, since while it bit us in tests, it will most likely be an issue in the codebase elsewhere when the explain check fails and is applied. @jrafanie , want to take a look and throw some 🍅 's? |
@@ -13,7 +18,8 @@ def tenant_id_clause_format(tenant_ids) | |||
end | |||
|
|||
def tenant_joins_clause(scope) | |||
scope.includes(:cloud_tenant => "source_tenant").includes(:ext_management_system) | |||
scope.includes(QUERY_REFERENCES) | |||
.references(QUERY_REFERENCES) # needed for the where to work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to also make this change over to the other tenant_joins_clause
code? e.g.:
app/models/flavor.rb#tenant_joins_clause
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, you might be right, and it looks like we don't have a test for that either... which is how I caught this in the first place... 😑
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Added some tests around it as well that didn't require Rbac
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# needed for the where to work
As someone who doesn't look at query optimizations all day, I'm not sure what this means. Why is this needed to make the where
work? Which where
? Can we better explain why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If read correctly, maybe saying something about being explicit instead of relying on references(nil)
to blindly do the right thing sometimes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jrafanie yeah, looking at this diff on it's own makes it hard to see, but it is for the other method in this short file, #tenant_id_clause_format
, to function properly.
So maybe I should change it to:
# needed for the where with `tenant_id_clause_format` to work
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but say what "to work" / function properly means... N+1 in spite of includes
? Does it fail to build the sql? How is the existing code working or not working here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Means the SQL is invalid without it.
The method in this file, tenant_id_clause_format
, has columns that "reference" the other tables it is including
def tenant_id_clause_format(tenant_ids)
"(tenants.id IN (?) AND ext_management_systems.tenant_mapping_enabled IS TRUE)..."
end
And I explain how is was not failing briefly in my second commit (which adds the .skip_references
bit to begin with). Specifically, this is the explanation from that commit:
The change to the CloudTenancyMixin was made so that it can be in charge
of it's own specific references instead of relying on a quirk of
ActiveRecord
, in whichreferences(nil)
will call references for all
of theincludes
values. Also a few models that overwrote the
tenant_joins_clause method had to be updated as well.
tenant_id_clause_format
is almost always called with tenant_joins_clause
in Rbac
(forget the line specifically, but you can look it up).
Makes calling `.references` on the Rbac scope more conservative, since it has the possibility of causing some nasty join bombs. The change to the CloudTenancyMixin was made so that it can be in charge of it's own specific references instead of relying on a quirk of `ActiveRecord`, in which `references(nil)` will call references for all of the `includes` values. Also a few models that overwrote the tenant_joins_clause method had to be updated as wel. Moved some tests around to also provide some tests in a more "integration" based fashion. Should be mostly additive, and the tests that I moved are ones that I had written myself and weren't really in the best spot. This is a bandage for for the following issue: * * * Given roughly these kind of table counts container_images: 700 containers: 2500 openscap_results: 100 openscap_rule_results: 70000 custom_attributes: 20000 Where half of the custom_attributes are associated with container_images. If the following MiqReport is run: cols: - name - virtual_custom_attribute_build-date:SECTION:docker_labels - virtual_custom_attribute_io.openshift.build.name:SECTION:docker_labels - virtual_custom_attribute_io.openshift.build.namespace:SECTION:docker_labels include: openscap_rule_results: columns: - severity containers: columns: - state (`:custom_attributes => {}` is also part of the resulting `include`, but pretty sure that gets tacked on in the `MiqReport#generate` call) Without needing a report, the query that gets executed can be replicated by doing the following: irb> includes_for_find = { :openscap_rule_results => {}, :containers => {}, :custom_attributes => {} } irb> ContainerImages.includes(includes_for_find) .references(includes_for_find) .to_sql There would be a "LEFT JOIN" bomb that would end up causing about 40GB of data being returned from the database. In this case as well, there is nothing that is making use of the references for the result, but the extra table data would be returned from the query prior to this commit in a very inefficient manner with loads of duplicate data.
3f4d50e
to
de04e36
Compare
Currently, there are a lot of factors that determine whether or not a query can be executed without the `.references` call. ActiveRecord itself is even smart enough to figured it out and add it for you if you use the hash `.where` syntax: ContainerImage.includes(:containers => {}) .where(:containers => {:name => "foo"}) .to_sql So as a fail safe, if `skip` is passed to method `Rbac::Filterer#include_references`, then we can do an much cheaper explain to figure out if it is a valid query before deciding if skipping is a bad idea. This means 1 extra query to every RBac call search that is currently "skip-able" by our criteria, but they should hopefully be quick to execute and be a fail safe for what we miss. Note, doing a `scope.explain` will both execute the regular query and the explain, and since we have a lot of heavy Rbac calls, this would not be ideal. We work around this by calling a private API with `.to_sql` to only execute the EXPLAIN query, which still returns an error of `ActiveRecord::StatementInvalid` when it is malformed. In that same vain, a check if we are in a transaction is also required, and setting up a sub transaction SAVEPOINT (via `transaction(:requires_new => true)`)is necessary so we don't pollute an existing transaction in the process. More info on that here: https://stackoverflow.com/a/31146267/3574689
de04e36
to
fcc4633
Compare
Checked commits NickLaMuro/manageiq@826f579~...fcc4633 with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0 lib/rbac/filterer.rb
|
the |
raise ActiveRecord::Rollback | ||
end | ||
end | ||
# If the result of the transaction is non-nil, then the block was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickLaMuro when do we expect we'd try to skip references with an invalid scope? What benefit does the above diagnostics provide over trying to use an invalid scope directly and blowing up? In other words, why do you think the above is needed? I'm asking because I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NickLaMuro when do we expect we'd try to skip references with an invalid scope?
That is the "Million Dollar Question"...
I think this comment, and the comment from Keenan that spawned it kind sums it up:
But basically, I don't think there is a way to tell (properly) that we have all of our basis covered by removing .references
. This is basically an attempt to to make sure we aren't making any new breakages with this new fix. That said, this is straight up a HACK, no question, but I don't know of a better way to handle this. Before that, it was just the first two commits in this PR, and the EXPLAIN
came in after the fact when I was questioning the work I did would be valid in all cases. I don't necessarily think our tests cover all of our bases either, so that is why I am using this to attempt to make it so we don't make things worse with this "fix".
What benefit does the above diagnostics provide over trying to use an invalid scope directly and blowing up?
Based on what I said above, the point of the diagnostics is to eventually get to a state where the EXPLAIN
is deemed not needed. The hope is that it would bombard the logs with output that would incentivize the developer to fix this in a more meaningful way instead of letting error continue. Though, that said, I doubt these errors would show up in a QE environment, so maybe finding a way to trigger it in those environments might not be a bad idea either.
In other words, why do you think the above is needed?
I realize that this was kind of answered in the above, but to re-state that answer: I don't know.
I have added tests that simulate ways it could happen with the interfaces that Rbac
provides, so it is "theoretically" possible, but whether this would actually be something that is triggered in practice is honestly an unknown. I basically was trying to code defensively knowing that I don't know every possible report or automation script that might be going through this that could potentially break without the .references
addition in place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change to the CloudTenancyMixin was made so that it can be in charge of it's own specific references instead of relying on a quirk of ActiveRecord, in which references(nil) will call references for all of the includes values.
I guess I was trying to understand what failing caller code would look like since I don't follow when this optimization would fail and inform the developer to fix it.
As an example, if you rolled back the changes to explicitly add .references
to these models instead of relying on the quirk, would this explain debugging have caught an invalid scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example, if you rolled back the changes to explicitly add .references to these models instead of relying on the quirk, would this explain debugging have caught an invalid scope?
Yes, but I caught this myself when the tests failed, and this was prior to me adding the EXPLAIN
concept after Keenan's first review. Prior to that, I only had a single commit in this PR:
"Skip query references in Rbac when not needed"
I have rebased so many times at this point that the only thing that has stayed consistent is the commit title.
That said, the CloudTenancyMixin
happened to be a prime example where we would want to fix things if the "EXPLAIN
try/catch" did get triggered, since it is a known and consistent scope. Not sure that will always be the case with this for reporting though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was trying to understand what failing caller code would look like since I don't follow when this optimization would fail and inform the developer to fix it.
You will get a Postrgresql error that the table is not part of the FROM
(or JOIN
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a quick rails console
example, you can run the following and see what happens when .references
isn't present:
irb> VmOrTemplate.includes(:host).where('"hosts"."name" = ?', 'foo')
#=> # BOOM!
irb> VmOrTemplate.includes(:host).references(:host).where('"hosts"."name" = ?', 'foo')
#=> # works!
Calling a .to_sql
on those lines will also show you the resulting SQL from that, though the second is a little bit tough to digest...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. We'll see what gremlins pop out of this change. What could go wrong?
Famous Last words. @jrafanie - 2018 |
Skip query references in Rbac when not needed (cherry picked from commit 05ab3d0) https://bugzilla.redhat.com/show_bug.cgi?id=1565677
Gaprindashvili backport details:
|
Skip query references in Rbac when not needed (cherry picked from commit 05ab3d0) https://bugzilla.redhat.com/show_bug.cgi?id=1565678
Fine backport details:
|
…erences Skip query references in Rbac when not needed (cherry picked from commit 05ab3d0) https://bugzilla.redhat.com/show_bug.cgi?id=1565678
Makes calling
.references
on the Rbac scope more conservative, since it has the possibility of causing some nasty join bombs.Detailed explanation
Given roughly these kind of table counts
Where half of the
custom_attributes
are associated withcontainer_images
. If the followingMiqReport
:is run, there would be a "LEFT JOIN" bomb that would end up causing about 40GB of data being returned from the database. In this case as well, there is nothing that is making use of the references for the result, but the extra table data would be returned from the query prior to this commit in a very inefficient manner with loads of duplicate data.
(
:custom_attributes => {}
is also part of the resultinginclude
, but pretty sure that gets tacked on in theMiqReport#generate
call)Without needing a report, the query that gets executed can be replicated by doing the following:
Other info
The change to the
CloudTenancyMixin
was made so that it can be in charge of it's own specific references instead of relying on a quirk ofActiveRecord
, in whichreferences(nil)
will call references for all of theincludes
values.So using the example from above, the following is equivalent:
Unless there is a polymorphic relationship that is being referenced in the Rbac filters that can be triggered via SQL, then this would have worked normally by fluke. This makes it so the
CloudTenancyMixin
is more resilient.Links
There is still some N+1's withThis was addressed in #17195custom_attributes
still present after this change, so I will look into that, but we aren't bombing out in memory with this in place.TODO