Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource protection should be deep, not shallow #105

Merged
merged 4 commits into from
Aug 4, 2016

Conversation

gombasg
Copy link
Contributor

@gombasg gombasg commented Apr 12, 2016

This change addresses a concurrency problem, when two distinct threads
processing the same non-object template ended up modifying the same
HashResource object simultaneously, leading to TreeMap corruption, which
manifested itself as NullPointerException or compiler hang. The trigger
was a foreach() loop over a protected hash, which did not propagate
the protection down to the hash values; the stack trace leading for this
situation was:

 [panc] org.quattor.pan.dml.data.HashResource.put(HashResource.java:172)
 [panc] org.quattor.pan.dml.data.Resource.rput(Resource.java:131)
 [panc] org.quattor.pan.template.BuildContext.setLocalVariable(BuildContext.java:1298)
 [panc] org.quattor.pan.dml.operators.SetValue.execute(SetValue.java:170)
 [panc] org.quattor.pan.dml.operators.Assign.execute(Assign.java:65)
 [panc] org.quattor.pan.dml.operators.IfElse.execute(IfElse.java:109)
 [panc] org.quattor.pan.dml.DML.execute(DML.java:120)
 [panc] org.quattor.pan.dml.operators.Foreach.execute(Foreach.java:93)

Note there is a similar concurrency issue in
org.quattor.pan.type.RecordType.setDefaults(), which is not addressed by this
patch. So this patch addresses, but does not completely solve, issues #6
and #56.

There is a visible side effect of this patch. There is the following
bit of code in the standard template library:

variable DISK_VOLUME_PARAMS = {
  foreach (volume;params;DISK_VOLUME_PARAMS) {
    if ( (params['type'] == 'partition') &&
         (is_defined(DISK_PART_BY_DEV['changed_part_num'][params['device']])) ) {
      debug('Updating '+volume+' partition to new name/number: '+DISK_PART_BY_DEV['changed_part_num'][params['device']]);
      params['device'] = DISK_PART_BY_DEV['changed_part_num'][params['device']];
      params['final'] = true;
    };
  };
  SELF;
};

The code iterates over and changes DISK_VOLUME_PARAMS, and expects the
changes to be visible in SELF. After this patch the above code will no
longer work, because DISK_VOLUME_PARAMS inside the DML block will have
copy-on-write semantics, changing it will not alter SELF. The foreach()
loop in the above code needs to be re-written to refrence SELF instead
of DISK_VOLUME_PARAMS.

gombasg added 4 commits April 14, 2016 16:27
This change addresses a concurrency problem, when two distinct threads
processing the same non-object template ended up modifying the same
HashResource object simultaneously, leading to TreeMap corruption, which
manifested itself as NullPointerException or compiler hang. The trigger
was a foreach() loop over a protected hash, which did not propagate
the protection down to the hash values; the stack trace leading for this
situation was:

     [panc] org.quattor.pan.dml.data.HashResource.put(HashResource.java:172)
     [panc] org.quattor.pan.dml.data.Resource.rput(Resource.java:131)
     [panc] org.quattor.pan.template.BuildContext.setLocalVariable(BuildContext.java:1298)
     [panc] org.quattor.pan.dml.operators.SetValue.execute(SetValue.java:170)
     [panc] org.quattor.pan.dml.operators.Assign.execute(Assign.java:65)
     [panc] org.quattor.pan.dml.operators.IfElse.execute(IfElse.java:109)
     [panc] org.quattor.pan.dml.DML.execute(DML.java:120)
     [panc] org.quattor.pan.dml.operators.Foreach.execute(Foreach.java:93)

Note there is a similar concurrency issue in
org.quattor.pan.type.RecordType.setDefaults(), which is _not_ addressed by this
patch. So this patch addresses, but does not completely solve, issues quattor#6
and quattor#56.

There is a visible side effect of this patch. There is the following
bit of code in the standard template library:

    variable DISK_VOLUME_PARAMS = {
      foreach (volume;params;DISK_VOLUME_PARAMS) {
        if ( (params['type'] == 'partition') &&
             (is_defined(DISK_PART_BY_DEV['changed_part_num'][params['device']])) ) {
          debug('Updating '+volume+' partition to new name/number: '+DISK_PART_BY_DEV['changed_part_num'][params['device']]);
          params['device'] = DISK_PART_BY_DEV['changed_part_num'][params['device']];
          params['final'] = true;
        };
      };
      SELF;
    };

The code iterates over and changes DISK_VOLUME_PARAMS, and expects the
changes to be visible in SELF. After this patch the above code will no
longer work, because DISK_VOLUME_PARAMS inside the DML block will have
copy-on-write semantics, changing it will not alter SELF. The foreach()
loop in the above code needs to be re-written to refrence SELF instead
of DISK_VOLUME_PARAMS.
We really don't want the default value of a type to be changed...
This patch resolves issues quattor#6 and quattor#56.
This is just for correctness. append() and prepend() should raise an
error if ever called on a protected list, rather than silently falling
back to the implementation inherited from ListResource.
@gombasg gombasg force-pushed the deep_resource_protection branch from c661770 to a5743f1 Compare April 15, 2016 11:39
@stdweird
Copy link
Member

@gombasg i'm trying to make sense of this patch. so there is no actual thread-safety issue, this is a different problem in the sense that the protected resources were only shallow protected. in principle, even single threaded panc runs could trigger this, but we were just very lucky that the protected resources were always updated with the same data? (and threading can cause a corrupted resource, leading to "wrong" data or crash; rather than 2 threads updating with different data and leading to wriong profile data)

i'm just surprised that the fix didn't involve any concurrency/thread-safety fixes, that's all.

@@ -25,7 +25,11 @@ public Element duplicate() {

@Override
public Element get(Term key) throws InvalidTermException {
return baseList.get(key);
final Element value = baseList.get(key);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gombasg is this really needed? any element in a protected resource should be protected already. (or is there a way to make a protectedresource with unprotected elements in it? or is the way to enforce that shallow copy of a protected resource becomes a deep-protected copy.

Copy link
Member

@stdweird stdweird Apr 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this is to my current understanding of the code 😄 )
so suppose you have a protected list with unprotected elements, the code below does the following: it takes the element, makes a protected copy and returns the copy. this is suboptimal. it would be better to protect the element once (but not by calling .protect), so the next time it doesn't have to make a copy anymore.
if the element is protected, .protect does not return a copy, so that is fine. (but them you don't need the additional code anyway)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not make a copy - both ProtectedHashResource and ProtectedListResource are just wrappers which say "you can't modify the data through this reference", but the data is not copied.

The copying happens when the resource is used - there are lots of places with logic like "check if the resource is protected, and if it is, make a copy and work with that instead".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gombasg the copying happens 2 lines lower, with the .protect, so unless the value you are trying to retrieve is null, a copy does happen, unless the value itself is a protected resource, in which case the reference is returned.

my main point is the following: with the current code, everytime a procteted resources is used, this deep-copying occurs, and only because the elements of the protected resource are not/never converted in protected resources themself.

@ned21
Copy link
Contributor

ned21 commented May 5, 2016

We are beginning to use this patch in production. What is remaining for this to be merged to master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants