OPC-UA input plugin collection interval changes after bad quality #10665

sjwang90 · 2022-02-16T20:35:35Z

Relevant telegraf.conf

[agent]
    interval = "1s"
    debug = false
    round_interval = true
    flush_interval = "1s"
    flush_jitter = "0s"
    collection_jitter = "0s"
    metric_batch_size = 1000
    metric_buffer_limit = 100000
    quiet = true

[[inputs.opcua]]

endpoint = "opc.tcp://127.0.0.1:12345"

connect_timeout = "60s"

security_policy = "auto"
security_mode = "auto"

#certificate = "C:/Source/telegraf/telegraf-selfsigned.crt"

#private_key = "C:/Source/telegraf/telegraf-selfsigned.key"

#auth_method = "Certificate"

## Entire configuration includes 42 opcua.groups with 21 nodes each
[[inputs.opcua.group]]
        name="anodic_current"
        namespace="2"
        identifier_type="s"
                tags=[["device","1001"]]

nodes = [
        {name="current", identifier="ABCD_1234_RA.ID.I01", data_type="float",tags=[["anode","01"]],description="Current Anode 01 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I02", data_type="float",tags=[["anode","02"]],description="Current Anode 02 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I03", data_type="float",tags=[["anode","03"]],description="Current Anode 03 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I04", data_type="float",tags=[["anode","04"]],description="Current Anode 04 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I05", data_type="float",tags=[["anode","05"]],description="Current Anode 05 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I06", data_type="float",tags=[["anode","06"]],description="Current Anode 06 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I07", data_type="float",tags=[["anode","07"]],description="Current Anode 07 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I08", data_type="float",tags=[["anode","08"]],description="Current Anode 08 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I09", data_type="float",tags=[["anode","09"]],description="Current Anode 09 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I10", data_type="float",tags=[["anode","10"]],description="Current Anode 10 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I11", data_type="float",tags=[["anode","11"]],description="Current Anode 11 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I12", data_type="float",tags=[["anode","12"]],description="Current Anode 12 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I13", data_type="float",tags=[["anode","13"]],description="Current Anode 13 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I14", data_type="float",tags=[["anode","14"]],description="Current Anode 14 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I15", data_type="float",tags=[["anode","15"]],description="Current Anode 15 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I16", data_type="float",tags=[["anode","16"]],description="Current Anode 16 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I17", data_type="float",tags=[["anode","17"]],description="Current Anode 17 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I18", data_type="float",tags=[["anode","18"]],description="Current Anode 18 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I19", data_type="float",tags=[["anode","19"]],description="Current Anode 19 device 1001"},
		{name="current", identifier="ABCD_1234_RA.ID.I20", data_type="float",tags=[["anode","20"]],description="Current Anode 20 device 1001"},
        {name="total_current", identifier="ABCD_1234_RA.ID.I_Total", data_type="float",description="Current Total device 1001"},
		
]

Logs from Telegraf

2022-01-19T18:27:21Z D! [agent] Starting service inputs
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:22Z E! [inputs.opcua] status not OK for node total_current:  Bad (0x80000000)
2022-01-19T18:27:22Z D! [outputs.influxdb] Buffer fullness: 0 / 100000 metrics
2022-01-19T18:27:23Z D! [inputs.opcua] Previous collection has not completed; scheduled collection skipped
2022-01-19T18:27:23Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2022-01-19T18:27:23Z D! [outputs.influxdb] Buffer fullness: 0 / 100000 metrics
2022-01-19T18:27:24Z D! [inputs.opcua] Previous collection has not completed; scheduled collection skipped
2022-01-19T18:27:24Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node current:  Bad (0x80000000)
2022-01-19T18:27:24Z E! [inputs.opcua] status not OK for node total_current:  Bad (0x80000000)
2022-01-19T18:27:24Z D! [outputs.influxdb] Wrote batch of 1000 metrics in 20.0946ms

System info

Telegraf 1.21, Operating system: Windows Server 2019, Memory Available (GB): 6GB CPU cores Available: unclear, but at least 2 Disks (HDD/SSD/etc..): 50GB, OPC-UA Server: Kepware KEPServer EX 5.21

Docker

No response

Steps to reproduce

Run Telegraf connecting to 40+ group, 20+ nodes
Get status not OK for node current: Bad (0x80000000) error
Data only collects in 3s intervals
...

Expected behavior

Running with an agent interval of 1s. Everything works fine until one or more nodes comes back with a BAD Quality (0x80000000) value. Once a BAD Quality node is observed, the acquisition interval changes to 3s.

Actual behavior

Acquisition should remain at 1s as specified in agent interval

Additional info

No response

The text was updated successfully, but these errors were encountered:

sjwang90 · 2022-02-16T20:38:31Z

To do: Need to determine what does the Bad (0x80000000) mean that's causing this cascade of issues. This python opcua client bug has the same error comes up and it takes multiple seconds for the error to show up.

Other Notes:

The warning messages about "Collection took longer than expected; not complete after interval of 1s" are the cause of the collection changing from every second to every three seconds. The interval of one second does not complete in time, because the input is still waiting on the output possibly due to the bad value, as a result the collection interval is skipped until things clear up. This results in intervals of three seconds that the customer is seeing.
The "status not OK for node current: Bad (0x80000000)" error is from the OPC UA plugin attempting to read from their device and getting 'The value is bad but the reason is unknown'.

sjwang90 · 2022-02-16T21:17:41Z

@magiconair Would you have any insight specifically into the Bad (0x80000000) error and why it could be causing the behavior outlined in the bug report?

To do: Need to determine what does the Bad (0x80000000) mean that's causing this cascade of issues. This FreeOpcUa/python-opcua#1004 has the same error comes up and it takes multiple seconds for the error to show up.

magichair · 2022-02-16T21:27:38Z

@sjwang90 this is outside of my area of contribution, sorry. I have no experience with opcua. 🙇

sjwang90 · 2022-02-16T22:02:40Z

Oops sorry tagged the wrong person!

@sjwang90 this is outside of my area of contribution, sorry. I have no experience with opcua. 🙇

R290 · 2022-02-23T15:55:51Z

What setting do you use for request_timeout?

powersj · 2022-02-23T16:37:35Z

What setting do you use for request_timeout?

I do not believe the user has it set. Is there a suggested value?

I have wondered about the interval being 1 second as well if there is any delay and if that is causing issues.

R290 · 2022-02-23T18:16:51Z

The default value is 5s but I'd say it always should be less than the collection interval. No clue if this is actually part of the cause for the problem. @sjwang90 could you retry while setting request_timeout = "500ms"?

powersj · 2022-02-25T17:56:08Z

This is what started showing up in the logs:

2022-02-25T14:21:50Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2022-02-25T14:21:50Z D! [inputs.opcua] Previous collection has not completed; scheduled collection skipped
2022-02-25T14:21:50Z D! [outputs.influxdb] Buffer fullness: 0 / 100000 metrics
2022-02-25T14:21:50Z E! [inputs.opcua] Error in plugin: get Data Failed: RegisterNodes Read failed: The operation timed out. StatusBadTimeout (0x800A0000)
2022-02-25T14:21:51Z D! [outputs.influxdb] Buffer fullness: 0 / 100000 metrics
2022-02-25T14:21:52Z D! [inputs.opcua] Previous collection has not completed; scheduled collection skipped
2022-02-25T14:21:52Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2022-02-25T14:21:52Z D! [outputs.influxdb] Buffer fullness: 0 / 100000 metrics
2022-02-25T14:21:52Z E! [inputs.opcua] Error in plugin: get Data Failed: RegisterNodes Read failed: The operation timed out. StatusBadTimeout (0x800A0000)
2022-02-25T14:21:53Z D! [outputs.influxdb] Buffer fullness: 0 / 100000 metrics
2022-02-25T14:21:54Z W! [inputs.opcua] Collection took longer than expected; not complete after interval of 1s
2022-02-25T14:21:54Z D! [inputs.opcua] Previous collection has not completed; scheduled collection skipped
2022-02-25T14:21:54Z D! [outputs.influxdb] Buffer fullness: 0 / 100000 metrics
2022-02-25T14:21:54Z E! [inputs.opcua] Error in plugin: get Data Failed: RegisterNodes Read failed: The operation timed out. StatusBadTimeout (0x800A0000)

The user also said that anything with a timeout was under 4 seconds and Telegraf didn't appear to get the data.

Does this mean the device may not be able to be queried every second?

R290 · 2022-02-27T14:02:56Z

What I understand from the comments on the Python bug mentioned by sjwang90 is that the kepware server does not return after 3-4 seconds when a bad quality node is part of the registered read request. Currently we do not support subscriptions or events which is the solution mentioned (#8083). For now, a hacky work-around could be to split the node groups between several opcua input plugins. Only the plugin with the bad read request would drop a few values while the others can continue at the once per second rate.

powersj · 2022-02-28T16:39:49Z

bad quality node is part of the registered read request

Is this the result of the sensor itself having a bad value? Is there something the OPC UA client should do to avoid the delay?

For now, a hacky work-around could be to split the node groups between several opcua input plugins. Only the plugin with the bad read request would drop a few values while the others can continue at the once per second rate.

I agree, without an event-based plugin, this does seem like the right way forward for now.

R290 · 2022-03-01T20:19:39Z

Is this the result of the sensor itself having a bad value? Is there something the OPC UA client should do to avoid the delay?

I indeed think the sensor itself is assigned a bad quality. If the server takes this long to reply there is not much we can do as a client. A Wireshark capture or similar could provide definitive evidence that the server is causing the delay.

magiconair · 2022-04-08T09:43:40Z

@sjwang90 not really. This error vaguely rings a bell with the Kepware server depending on how it is hooked up to the backend data collection but it has been a looong time I've played with Kepware. Try to use the go run ./examples/read/read.go from the https://github.com/gopcua/opcua repo to see if you can read the node at all.

sjwang90 added the bug unexpected problem or unintended behavior label Feb 16, 2022

telegraf-tiger bot added area/opcua platform/windows labels Feb 16, 2022

powersj closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OPC-UA input plugin collection interval changes after bad quality #10665

OPC-UA input plugin collection interval changes after bad quality #10665

sjwang90 commented Feb 16, 2022

sjwang90 commented Feb 16, 2022 •

edited

Loading

sjwang90 commented Feb 16, 2022 •

edited

Loading

magichair commented Feb 16, 2022

sjwang90 commented Feb 16, 2022

R290 commented Feb 23, 2022

powersj commented Feb 23, 2022

R290 commented Feb 23, 2022

powersj commented Feb 25, 2022

R290 commented Feb 27, 2022

powersj commented Feb 28, 2022

R290 commented Mar 1, 2022

magiconair commented Apr 8, 2022

OPC-UA input plugin collection interval changes after bad quality #10665

OPC-UA input plugin collection interval changes after bad quality #10665

Comments

sjwang90 commented Feb 16, 2022

Relevant telegraf.conf

Logs from Telegraf

System info

Docker

Steps to reproduce

Expected behavior

Actual behavior

Additional info

sjwang90 commented Feb 16, 2022 • edited Loading

sjwang90 commented Feb 16, 2022 • edited Loading

magichair commented Feb 16, 2022

sjwang90 commented Feb 16, 2022

R290 commented Feb 23, 2022

powersj commented Feb 23, 2022

R290 commented Feb 23, 2022

powersj commented Feb 25, 2022

R290 commented Feb 27, 2022

powersj commented Feb 28, 2022

R290 commented Mar 1, 2022

magiconair commented Apr 8, 2022

sjwang90 commented Feb 16, 2022 •

edited

Loading

sjwang90 commented Feb 16, 2022 •

edited

Loading