-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource protection should be deep, not shallow #105
Conversation
This change addresses a concurrency problem, when two distinct threads processing the same non-object template ended up modifying the same HashResource object simultaneously, leading to TreeMap corruption, which manifested itself as NullPointerException or compiler hang. The trigger was a foreach() loop over a protected hash, which did not propagate the protection down to the hash values; the stack trace leading for this situation was: [panc] org.quattor.pan.dml.data.HashResource.put(HashResource.java:172) [panc] org.quattor.pan.dml.data.Resource.rput(Resource.java:131) [panc] org.quattor.pan.template.BuildContext.setLocalVariable(BuildContext.java:1298) [panc] org.quattor.pan.dml.operators.SetValue.execute(SetValue.java:170) [panc] org.quattor.pan.dml.operators.Assign.execute(Assign.java:65) [panc] org.quattor.pan.dml.operators.IfElse.execute(IfElse.java:109) [panc] org.quattor.pan.dml.DML.execute(DML.java:120) [panc] org.quattor.pan.dml.operators.Foreach.execute(Foreach.java:93) Note there is a similar concurrency issue in org.quattor.pan.type.RecordType.setDefaults(), which is _not_ addressed by this patch. So this patch addresses, but does not completely solve, issues quattor#6 and quattor#56. There is a visible side effect of this patch. There is the following bit of code in the standard template library: variable DISK_VOLUME_PARAMS = { foreach (volume;params;DISK_VOLUME_PARAMS) { if ( (params['type'] == 'partition') && (is_defined(DISK_PART_BY_DEV['changed_part_num'][params['device']])) ) { debug('Updating '+volume+' partition to new name/number: '+DISK_PART_BY_DEV['changed_part_num'][params['device']]); params['device'] = DISK_PART_BY_DEV['changed_part_num'][params['device']]; params['final'] = true; }; }; SELF; }; The code iterates over and changes DISK_VOLUME_PARAMS, and expects the changes to be visible in SELF. After this patch the above code will no longer work, because DISK_VOLUME_PARAMS inside the DML block will have copy-on-write semantics, changing it will not alter SELF. The foreach() loop in the above code needs to be re-written to refrence SELF instead of DISK_VOLUME_PARAMS.
We really don't want the default value of a type to be changed... This patch resolves issues quattor#6 and quattor#56.
This is just for correctness. append() and prepend() should raise an error if ever called on a protected list, rather than silently falling back to the implementation inherited from ListResource.
c661770
to
a5743f1
Compare
@gombasg i'm trying to make sense of this patch. so there is no actual thread-safety issue, this is a different problem in the sense that the protected resources were only shallow protected. in principle, even single threaded panc runs could trigger this, but we were just very lucky that the protected resources were always updated with the same data? (and threading can cause a corrupted resource, leading to "wrong" data or crash; rather than 2 threads updating with different data and leading to wriong profile data) i'm just surprised that the fix didn't involve any concurrency/thread-safety fixes, that's all. |
@@ -25,7 +25,11 @@ public Element duplicate() { | |||
|
|||
@Override | |||
public Element get(Term key) throws InvalidTermException { | |||
return baseList.get(key); | |||
final Element value = baseList.get(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gombasg is this really needed? any element in a protected resource should be protected already. (or is there a way to make a protectedresource with unprotected elements in it? or is the way to enforce that shallow copy of a protected resource becomes a deep-protected copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is to my current understanding of the code 😄 )
so suppose you have a protected list with unprotected elements, the code below does the following: it takes the element, makes a protected copy and returns the copy. this is suboptimal. it would be better to protect the element once (but not by calling .protect
), so the next time it doesn't have to make a copy anymore.
if the element is protected, .protect
does not return a copy, so that is fine. (but them you don't need the additional code anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not make a copy - both ProtectedHashResource and ProtectedListResource are just wrappers which say "you can't modify the data through this reference", but the data is not copied.
The copying happens when the resource is used - there are lots of places with logic like "check if the resource is protected, and if it is, make a copy and work with that instead".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gombasg the copying happens 2 lines lower, with the .protect
, so unless the value you are trying to retrieve is null, a copy does happen, unless the value itself is a protected resource, in which case the reference is returned.
my main point is the following: with the current code, everytime a procteted resources is used, this deep-copying occurs, and only because the elements of the protected resource are not/never converted in protected resources themself.
We are beginning to use this patch in production. What is remaining for this to be merged to master? |
This change addresses a concurrency problem, when two distinct threads
processing the same non-object template ended up modifying the same
HashResource object simultaneously, leading to TreeMap corruption, which
manifested itself as NullPointerException or compiler hang. The trigger
was a foreach() loop over a protected hash, which did not propagate
the protection down to the hash values; the stack trace leading for this
situation was:
Note there is a similar concurrency issue in
org.quattor.pan.type.RecordType.setDefaults(), which is not addressed by this
patch. So this patch addresses, but does not completely solve, issues #6
and #56.
There is a visible side effect of this patch. There is the following
bit of code in the standard template library:
The code iterates over and changes DISK_VOLUME_PARAMS, and expects the
changes to be visible in SELF. After this patch the above code will no
longer work, because DISK_VOLUME_PARAMS inside the DML block will have
copy-on-write semantics, changing it will not alter SELF. The foreach()
loop in the above code needs to be re-written to refrence SELF instead
of DISK_VOLUME_PARAMS.