Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval clears all variables #984

Closed
wanaryytel opened this issue Oct 21, 2016 · 16 comments
Closed

Eval clears all variables #984

wanaryytel opened this issue Oct 21, 2016 · 16 comments

Comments

@wanaryytel
Copy link

wanaryytel commented Oct 21, 2016

Hey,
I have a small TICKscript for monitoring when the server runs out of free disk space. Looks like this:

// Alerts when more than 3 GB + 10% of the server total disk space are used up.

var gb = 1000 * 1000 * 1000

stream
    |from()
        .measurement('disk')
// Some unexplainable magic happens with this eval. It would be lovely to use but as long as
// it breaks everything, let it be commented out.
//  |eval(lambda: "free" / gb)
//          .as('free_gb')
//      .keep('free', 'total', 'used', 'used_percent')
    |groupBy('host', 'path')
    |window()
        .period(30s)
        .every(30s)
    |alert()
        .id('Free disk space @ {{ index .Tags "host" }}')
        .message('{{ .ID }} is {{ .Level }}! {{ index .Fields "used_percent" | printf "%0.2f" }}% is used up.')

        .crit(lambda: gb / 2 + "total" / 100 * 10 > "free")
        .warn(lambda: 3 * gb + "total" / 100 * 10 > "free")
        .stateChangesOnly(12h)

        .email()
        .slack()

        .log('/var/log/kapacitor/low_disk.log')

PS. Although Github displays it weird, I don't have indentation like that in the actual script.

Problem is the commented out |eval() part that for some reason clears all my incoming variables and if I comment that in, no alerts are triggered since all variables are <nil>. Is this desired behavior? I guess one problem might be that I'm passing more variables to keep() than are changed in that |eval(), but I tried it with only free as well and got the same results. It seems to me that I copied the |eval() directly from documentation and can't figure out what's wrong with it.
But the reason I need it at all is that the incoming data gives free space on disk in bytes and I would like to send out MB-s or GB-s instead of 6584745867 bytes. :) So another question, is there a way to
a) add the eval() output to message()?
or b) do that in any other way? I tried to do some magic with the printf thingie, but I couldn't find any documentation for it anywhere so didn't get anywhere with that either.

I'm running on a 64bit Ubuntu 14.04 and have the latest Kapacitor installed from https://dl.influxdata.com/kapacitor/releases/kapacitor_1.0.2_amd64.deb.

@nathanielc
Copy link
Contributor

@wanaryytel Your example looks correct. Can you share some sample data that can reproduce the error?

@wanaryytel
Copy link
Author

@nathanielc How should that data be gathered? By recording and saving it to a file?

@nathanielc
Copy link
Contributor

Sure that works well, the recording data files are stored in /var/lib/kapacitor/replays by default. Or just a text file with line protocol that reproduces the error would be enough. Thanks

@wanaryytel
Copy link
Author

@nathanielc Is there a way to go through the .srpl file data to make sure there isn't any sensitive data in that? It seems to be binary format by default. Or another way to get the output of the recording?

@nathanielc
Copy link
Contributor

@wanaryytel Sure, the .srpl file is really just gzipped text data so you can extract it and take a look.

@wanaryytel
Copy link
Author

@wanaryytel
Copy link
Author

wanaryytel commented Nov 6, 2016

@nathanielc Hey, I updated to the latest release but it didn't fix the problem. I think you missed the part in the initial description that said "I guess one problem might be that I'm passing more variables to keep() than are changed in that |eval(), but I tried it with only free as well and got the same results."

That's exactly what I'm seeing. Output from the same script's kapactior show low_disk:

ID: low_disk
Error: 
Template: 
Type: stream
Status: enabled
Executing: true
Created: 20 Oct 16 06:55 EDT
Modified: 06 Nov 16 17:35 EST
LastEnabled: 06 Nov 16 17:35 EST
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
// Alerts when more than 3 GB + 10% of the server total disk space are used up.
var gb = 1000 * 1000 * 1000

stream
    |from()
        .measurement('disk')
    // Some unexplainable magic happens with this eval. It would be lovely to use but as long as
    // it breaks everything, let it be commented out.
    |eval(lambda: "free" / gb)
        .as('free_gb')
        .keep('free', 'free_gb')
    |groupBy('host', 'path')
    |window()
        .period(30s)
        .every(30s)
    |alert()
        .id('Free disk space @ `{{ index .Tags "host" }}`')
        .message('{{ .ID }} is *{{ .Level }}*! {{ index .Fields "used_percent" | printf "%0.2f" }}% is used up.')
        .crit(lambda: gb / 2 + "total" / 100 * 10 > "free")
        .warn(lambda: 3 * gb + "total" / 100 * 10 > "free")
        //      .stateChangesOnly(12h)
        .email()
        .slack()
        .log('/var/log/kapacitor/low_disk.log')

DOT:
digraph low_disk {
graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0" ];
stream0 -> from1 [processed="69"];

from1 [avg_exec_time_ns="6.097µs" ];
from1 -> eval2 [processed="69"];

eval2 [avg_exec_time_ns="114.429µs" eval_errors="0" ];
eval2 -> groupby3 [processed="69"];

groupby3 [avg_exec_time_ns="4.01µs" ];
groupby3 -> window4 [processed="69"];

window4 [avg_exec_time_ns="5.229µs" ];
window4 -> alert5 [processed="13"];

alert5 [alerts_triggered="0" avg_exec_time_ns="320.375µs" crits_triggered="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" ];

stats6 [avg_exec_time_ns="0" ];
stats6 -> derivative7 [processed="6"];

derivative7 [avg_exec_time_ns="0" ];
derivative7 -> alert8 [processed="5"];

alert8 [alerts_triggered="5" avg_exec_time_ns="763.275µs" crits_triggered="5" infos_triggered="0" oks_triggered="0" warns_triggered="0" ];
}

No alerts are triggered with this script. When I comment out the eval, everything works fine. Could you reopen this issue?

@nathanielc
Copy link
Contributor

@wanaryytel I tested this again with your replay file and the above script. It works as is. Both the free and free_gb fields exist on the data when it is passed to the alert node.

Specifically which version did you test?

@wanaryytel
Copy link
Author

@nathanielc Both v1.0.2 and v1.1.0-rc2. Exactly the same behavior. I'm not 100% sure if all the variables are cleared, but when the eval is commented in, no alerts are triggered. It was just my assumption that when all variables are undefined, then that could be a probable reason for not triggering alerts.

@nathanielc
Copy link
Contributor

@wanaryytel Hmm, that is odd.

Can you try this TICKscript? It just has a few log nodes added in:

stream
    |from()
        .measurement('disk')
    // Some unexplainable magic happens with this eval. It would be lovely to use but as long as
    // it breaks everything, let it be commented out.
    |log()
        .prefix('PRE EVAL')
    |eval(lambda: "free" / gb)
        .as('free_gb')
        .keep('free', 'free_gb')
    |log()
        .prefix('POST EVAL')
    |groupBy('host', 'path')
    |window()
        .period(30s)
        .every(30s)
    |alert()
        .id('Free disk space @ `{{ index .Tags "host" }}`')
        .message('{{ .ID }} is *{{ .Level }}*! {{ index .Fields "used_percent" | printf "%0.2f" }}% is used up.')
        .crit(lambda: gb / 2 + "total" / 100 * 10 > "free")
        .warn(lambda: 3 * gb + "total" / 100 * 10 > "free")
        //      .stateChangesOnly(12h)
        .email()
        .slack()
        .log('/var/log/kapacitor/low_disk.log')

Then could you share the log lines that start with PRE/POST EVAL? That should help us get to the bottom of what is going on.

@wanaryytel
Copy link
Author

@nathanielc Ran your script (and added the gb variable at the start) for about 5 minutes, but again - no alerts triggered and no log file created for it.

ID: ld
Error: 
Template: 
Type: stream
Status: enabled
Executing: true
Created: 08 Nov 16 04:54 EST
Modified: 08 Nov 16 04:54 EST
LastEnabled: 08 Nov 16 04:54 EST
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
var gb = 1000 * 1000 * 1000

stream
    |from()
        .measurement('disk')
    // Some unexplainable magic happens with this eval. It would be lovely to use but as long as
    // it breaks everything, let it be commented out.
    |log()
        .prefix('PRE EVAL')
    |eval(lambda: "free" / gb)
        .as('free_gb')
        .keep('free', 'free_gb')
    |log()
        .prefix('POST EVAL')
    |groupBy('host', 'path')
    |window()
        .period(30s)
        .every(30s)
    |alert()
        .id('Free disk space @ `{{ index .Tags "host" }}`')
        .message('{{ .ID }} is *{{ .Level }}*! {{ index .Fields "used_percent" | printf "%0.2f" }}% is used up.')
        .crit(lambda: gb / 2 + "total" / 100 * 10 > "free")
        .warn(lambda: 3 * gb + "total" / 100 * 10 > "free")
        //      .stateChangesOnly(12h)
        .email()
        .slack()
        .log('/var/log/kapacitor/ld.log')

DOT:
digraph ld {
graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0s" ];
stream0 -> from1 [processed="253"];

from1 [avg_exec_time_ns="8.359µs" ];
from1 -> log2 [processed="253"];

log2 [avg_exec_time_ns="0s" ];
log2 -> eval3 [processed="253"];

eval3 [avg_exec_time_ns="59.18µs" eval_errors="0" ];
eval3 -> log4 [processed="253"];

log4 [avg_exec_time_ns="0s" ];
log4 -> groupby5 [processed="253"];

groupby5 [avg_exec_time_ns="3.283µs" ];
groupby5 -> window6 [processed="253"];

window6 [avg_exec_time_ns="16.909µs" ];
window6 -> alert7 [processed="70"];

alert7 [alerts_triggered="0" avg_exec_time_ns="464.241µs" crits_triggered="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" ];

stats8 [avg_exec_time_ns="13.33µs" ];
stats8 -> derivative9 [processed="23"];

derivative9 [avg_exec_time_ns="5.774µs" ];
derivative9 -> alert10 [processed="22"];

alert10 [alerts_triggered="22" avg_exec_time_ns="0s" crits_triggered="22" infos_triggered="0" oks_triggered="0" warns_triggered="0" ];
}

File at /var/log/kapacitor/ld.log wasn't created, but I do have logfiles for all the other scripts that don't include an eval function. Also checked the kapacitord.err log, which indicated that everything was fine. Any more ideas what I could try?

@nathanielc
Copy link
Contributor

Can you check the /var/log/kapacitor/kapacitor.log file? It will have the PRE EVAL and POST EVAL log lines so we can determine what might be causing the issue.

@nathanielc nathanielc reopened this Nov 15, 2016
@wanaryytel
Copy link
Author

@nathanielc
Hey,
here's an excerpt:

[ld:log2] 2016/11/15 17:27:01 I! PRE EVAL {"Name":"disk","Database":"telegraf","RetentionPolicy":"autogen","Group":"","Dimensions":{"ByName":false,"TagNames":null},"Tags":{"dc":"eu-west-1","fstype":"ext4","host":"host1.company.com","hosttype":"alfa","path":"/","provider":"big"},"Fields":{"free":7524429824,"inodes_free":838045,"inodes_total":1310720,"inodes_used":472675,"total":21137846272,"used":12539834368,"used_percent":62.4983515368576},"Time":"2016-11-15T22:27:00Z"}

[ld:log2] 2016/11/15 17:27:01 I! PRE EVAL {"Name":"disk","Database":"telegraf","RetentionPolicy":"autogen","Group":"","Dimensions":{"ByName":false,"TagNames":null},"Tags":{"dc":"eu-west-1","fstype":"ext3","host":"host1.company.com","hosttype":"alfa","path":"/disk","provider":"big"},"Fields":{"free":3931746304,"inodes_free":262133,"inodes_total":262144,"inodes_used":11,"total":4221763584,"used":75558912,"used_percent":1.8855292503878998},"Time":"2016-11-15T22:27:00Z"}

[ld:log2] 2016/11/15 17:27:01 I! PRE EVAL {"Name":"disk","Database":"telegraf","RetentionPolicy":"autogen","Group":"","Dimensions":{"ByName":false,"TagNames":null},"Tags":{"dc":"eu-west-1","fstype":"ext4","host":"host1.company.com","hosttype":"alfa","path":"/disk2","provider":"big"},"Fields":{"free":17060515840,"inodes_free":1310544,"inodes_total":1310720,"inodes_used":176,"total":21137846272,"used":3003588608,"used_percent":14.969960985721439},"Time":"2016-11-15T22:27:00Z"}

[ld:log4] 2016/11/15 17:27:01 I! POST EVAL {"Name":"disk","Database":"telegraf","RetentionPolicy":"autogen","Group":"","Dimensions":{"ByName":false,"TagNames":null},"Tags":{"dc":"eu-west-1","fstype":"ext4","host":"host1.company.com","hosttype":"alfa","path":"/","provider":"big"},"Fields":{"free":7524429824,"free_gb":7},"Time":"2016-11-15T22:27:00Z"}

[ld:log4] 2016/11/15 17:27:01 I! POST EVAL {"Name":"disk","Database":"telegraf","RetentionPolicy":"autogen","Group":"","Dimensions":{"ByName":false,"TagNames":null},"Tags":{"dc":"eu-west-1","fstype":"ext3","host":"host1.company.com","hosttype":"alfa","path":"/disk","provider":"big"},"Fields":{"free":3931746304,"free_gb":3},"Time":"2016-11-15T22:27:00Z"}

[ld:log4] 2016/11/15 17:27:01 I! POST EVAL {"Name":"disk","Database":"telegraf","RetentionPolicy":"autogen","Group":"","Dimensions":{"ByName":false,"TagNames":null},"Tags":{"dc":"eu-west-1","fstype":"ext4","host":"host1.company.com","hosttype":"alfa","path":"/disk2","provider":"big"},"Fields":{"free":17060515840,"free_gb":17},"Time":"2016-11-15T22:27:00Z"}

@nathanielc
Copy link
Contributor

@wanaryytel Great, the logs show that the fields are being kept correctly. As for why the alert is not triggering it seems to be a different issue. Could it be that these conditions are not correct?

        .crit(lambda: gb / 2 + "total" / 100 * 10 > "free")
        .warn(lambda: 3 * gb + "total" / 100 * 10 > "free")

@wanaryytel
Copy link
Author

@nathanielc It is the first thing that comes to mind, but considering

  1. I have another script (without eval) that has the same threshold values that does alert me with the same data
  2. I have tried much higher thresholds with no avail

it seems improbable at best.

Is there anything else I can try?

@nathanielc
Copy link
Contributor

@wanaryytel Use the |log() to debug why your alert is not firing. With the log node you can see exactly the data that the alert node is processing and determine if it is working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants