Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fail_on_error clause makes logic engine facts False that would otherwise be true. #318

Open
StevenCTimm opened this issue Mar 25, 2021 · 13 comments
Assignees
Labels
prj_testing Issue identified in HEPCloud Project IV integration testing

Comments

@StevenCTimm
Copy link

I had previously tested that the fail_on_error clause successfully made facts "false" that would otherwise be "error".
But I am now observing is that some facts that would otherwise be True end up False when wrapped by fail_on_error.

Initially in resource_request.jsonnet I had the following:

      "gcewithininstburnrate": "fail_on_error(financial_params.iloc[0].target_gce_vm_burn_rate>GCE_Burn_Rate.iloc[0].BurnRate)",
      "gceabovebalance": "fail_on_error( financial_params.iloc[0].target_gce_balance<GCE_Billing_Info.iloc[0].Balance)",

For record, financial_params.iloc[0].target_gce_balance is -20000
GCE_Billing_Info.iloc.[0].Balance is -507
GCE_Burn_Rate.iloc.[0].BurnRate is 0.01
financial_params.iloc.[0].target_gce_vm_burn_rate is 9

Both of these facts wrapped by the fail_on_error evaluated to False.

I removed the fail_on_error wrapper and they evaluated to True as they should.

@StevenCTimm StevenCTimm added the prj_testing Issue identified in HEPCloud Project IV integration testing label Mar 29, 2021
@knoepfel
Copy link
Contributor

I have been able to reproduce the failure. Investigating options.

@knoepfel
Copy link
Contributor

The problem is understood. I am exploring options--one of which is described here: https://stackoverflow.com/questions/66890900/replacing-python-call-ast-node-with-try-node

@knoepfel
Copy link
Contributor

See PR #322.

@StevenCTimm
Copy link
Author

PR looks promising do you have a rpm build artifact of it? Could dig it out myself but appreciate if you could save me some time. Steve

@knoepfel
Copy link
Contributor

knoepfel commented Apr 1, 2021

@StevenCTimm
Copy link
Author

The patch does not seem to be working as designed.
Installed on fermicloud155, verified the patch is there.

2021-03-29T10:23:36-0500 - root - BooleanExpression - 4945 - MainThread - DEBUG - calling NamedFact::evaluate()
2021-03-29T10:23:36-0500 - root - datablock - 4945 - MainThread - ERROR - Did not get key in datablock getitem
2021-03-29T10:23:36-0500 - root - datablock - 4945 - MainThread - ERROR - No Key in datablock getitem

All the logic engine BooleanExpressions that are wrapped by fail_on_error now fail in this way

For reference:

      "awswithininstburnrate": "fail_on_error( financial_params.iloc[0].target_aws_vm_burn_rate>AWS_Burn_Rate.iloc[0].BurnRate)",
      "awswithinbillburnrate": "fail_on_error( financial_params.iloc[0].target_aws_bill_burn_rate>AWS_Billing_Rate[AWS_Billing_Rate['accountName']=='Fermilab'].iloc[0].costRatePerHourInLastSixHours )",
      "awsabovebalance": "fail_on_error( financial_params.iloc[0].target_aws_balance<AWS_Billing_Info[AWS_Billing_Info['AccountName']=='Fermilab'].iloc[0].Balance)",
      "gcewithininstburnrate": "fail_on_error( financial_params.iloc[0].target_gce_vm_burn_rate>GCE_Burn_Rate.iloc[0].BurnRate)",
      "gceabovebalance": "fail_on_error( financial_params.iloc[0].target_gce_balance<GCE_Billing_Info.iloc[0].Balance)",
      "fifenerscbelowlimit": "fail_on_error( Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='fife'].iloc[0].usedAlloc<Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='fife'].iloc[0].currentAlloc)",
      "uscmsnerscbelowlimit": "fail_on_error( Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='uscms'].iloc[0].usedAlloc<Nersc_Allocation_Info[Nersc_Allocation_Info['name']=='uscms'].iloc[0].currentAlloc)"

I have verified that all of the quantities thus referenced in the above configuration file do exist so none of these
statements should be in an error condition at the moment.

@knoepfel
Copy link
Contributor

knoepfel commented Apr 1, 2021

Steve, I'm not positive that the logic-engine is to blame here--at least, not the fail_on_error component of it. I suspect the exception is actually being raised in a (e.g.) downstream publisher. I wouldn't know for sure, however, until I do some more debugging. Any issues with me playing around with the decisionengine on fermicloud155?

@StevenCTimm
Copy link
Author

go ahead, play with it.. note that the current configuration takes a while to start up, 10 minutes or so,.

@StevenCTimm
Copy link
Author

all the goods are in /var/log/decisionengine/resource_request.log

@knoepfel
Copy link
Contributor

knoepfel commented Apr 1, 2021

Yep, it's my fault. Modified the datablock code to print the missing key on fermicloud155:

2021-04-01T14:29:44-0500 - root - datablock - 15503 - MainThread - ERROR - Did not get key 'fail_on_error' in datablock __getitem__

Will look for a solution.

@StevenCTimm
Copy link
Author

Followup--I believe that Kyle did indeed fix this issue but I have not yet verified it.. will attempt to add the
fail_on_error logic back to a 1.7 configuration to be sure it works.

@knoepfel
Copy link
Contributor

knoepfel commented Apr 4, 2022

@StevenCTimm, is this issue resolved?

@StevenCTimm
Copy link
Author

StevenCTimm commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prj_testing Issue identified in HEPCloud Project IV integration testing
Projects
None yet
Development

No branches or pull requests

2 participants