Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usbhid-ups status show alarm when build from source with EATON Ellipse ECO 1600 #1286

Closed
toxic0berliner opened this issue Feb 14, 2022 · 19 comments · Fixed by #1301
Closed

Comments

@toxic0berliner
Copy link

Hello,
Using an EATON Ellipse ECO 1600 USB at home, I encountered the known issue #9 : I was unable to change beeper status.
I saw in #801 that it seemed not to have been included in the packages, so I followed the instructions to build from source.

I'm happy to report that now I'm able to disable the beeper sucessfully.
Sadly, I now have an unhealthy status reported on this brand new UPS, I believe it's wrong... In fact, I'm seeing "Fan Failure!" error, but my unit has no fan to my knowledge, I see no vent on the box itself, and using the official Eaton companion app for Windows I get no error reported. As such I believe it's something in the driver I build that is misinterpreting what the UPS reports.

I'm using nut on a raspberry pi 1 model B, so building takes quite a while but I saw no specific error. I'm open to re-build & test or provide more details if anyone is nice enough to look into it.
Not sure what I can provide as traces, I'm not seeing anything in journalctl.
Here is the output of upsc :

myUser@ups1:~ $upsc bureau
Init SSL without certificate database
battery.charge: 100
battery.charge.low: 2
battery.charge.restart: 2
battery.protection: yes
battery.runtime: 2887
battery.type: PbAc
device.mfr: EATON
device.model: Ellipse ECO Ellipse ECO
device.serial: 000000000
device.type: ups
driver.name: usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.synchronous: no
driver.version: 2.7.4-4613-ge1218569
driver.version.data: MGE HID 1.45
driver.version.internal: 0.45
driver.version.usb: libusb-0.1 (or compat)
input.transfer.boost.high: 264
input.transfer.boost.low: 264
input.transfer.high: 264
input.transfer.low: 184
input.transfer.trim.high: 264
input.transfer.trim.low: 264
outlet.1.delay.shutdown: 1
outlet.1.delay.start: 1
outlet.1.desc: PowerShare Outlet 1
outlet.1.id: 2
outlet.1.powerfactor: 1.00
outlet.1.status: on
outlet.1.switchable: no
outlet.2.delay.shutdown: 0
outlet.2.delay.start: 0
outlet.2.desc: PowerShare Outlet 2
outlet.2.id: 3
outlet.2.powerfactor: 0.00
outlet.2.status: on
outlet.2.switchable: no
outlet.desc: Main Outlet
outlet.id: 1
outlet.powerfactor: 25.00
outlet.switchable: no
output.frequency.nominal: 50
output.powerfactor: 2.64
output.voltage: 230.0
output.voltage.nominal: 230
ups.alarm: Emergency stop! Fatal EEPROM fault! Fan failure!
ups.beeper.status: disabled
ups.date: 1970/01/01
ups.delay.shutdown: 20
ups.delay.start: 30
ups.efficiency: 264
ups.firmware: Ellipse ECO
ups.load: 10
ups.mfr: EATON
ups.model: Ellipse ECO Ellipse ECO
ups.power.nominal: 1600
ups.productid: ffff
ups.realpower: 320
ups.serial: 000000000
ups.status: ALARM OL CHRG
ups.test.interval: 1
ups.time: 01:00:02
ups.timer.shutdown: -1
ups.timer.start: -1
ups.vendorid: 0463

Thanks in advance for any help.

@jimklimov
Copy link
Member

For a few days got access to an "Eaton Protection Station Protection Station" (consumer UPS with "Shuko" sockets), connected to a Mirabox (armv7) and in fact with recent NUT master build it behaves similarly with libusb-1.0 as well:

battery.charge: 100
battery.charge.low: 2
battery.charge.restart: 2
battery.runtime: 1875
battery.type: PbAc
device.mfr: EATON
device.model: Protection Station Protection Station
device.serial: AN2E49008
device.type: ups
driver.flag.pollonly: enabled
driver.name: usbhid-ups
driver.parameter.bus: 003
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.product: Protection Station
driver.parameter.productid: FFFF
driver.parameter.serial: AN2E49008
driver.parameter.synchronous: no
driver.parameter.vendor: EATON
driver.parameter.vendorid: 0463
driver.version: 2.7.4-4663-g20dca82
driver.version.data: MGE HID 1.45
driver.version.internal: 0.45
driver.version.usb: libusb-1.0.19 (API: 0x1000103)
input.transfer.boost.high: 284
input.transfer.boost.low: 284
input.transfer.high: 284
input.transfer.low: 161
input.transfer.trim.high: 284
input.transfer.trim.low: 284
outlet.1.delay.shutdown: 1
outlet.1.delay.start: 1
outlet.1.desc: PowerShare Outlet 1
outlet.1.id: 2
outlet.1.powerfactor: 1.00
outlet.1.status: on
outlet.1.switchable: no
outlet.2.delay.shutdown: 1
outlet.2.delay.start: 1
outlet.2.desc: PowerShare Outlet 2
outlet.2.id: 3
outlet.2.powerfactor: 1.00
outlet.2.status: on
outlet.2.switchable: no
outlet.desc: Main Outlet
outlet.id: 1
outlet.powerfactor: 25.00
outlet.switchable: no
output.frequency.nominal: 50
output.powerfactor: 2.84
output.voltage: 230.0
output.voltage.nominal: 230
ups.alarm: Emergency stop! Fatal EEPROM fault! Fan failure!
ups.beeper.status: enabled
ups.date: 1970/01/01
ups.delay.shutdown: 20
ups.delay.start: 30
ups.efficiency: 284
ups.firmware: Protection Station
ups.load: 1
ups.mfr: EATON
ups.model: Protection Station Protection Station
ups.power.nominal: 650
ups.productid: ffff
ups.realpower: 13
ups.serial: AN2E49008
ups.status: ALARM OL
ups.test.interval: 2
ups.time: 00:00:02
ups.timer.shutdown: -1
ups.timer.start: -1
ups.vendorid: 0463

(on a side note, required "pollonly" option - default interrupt mode said connection timed out).

@jimklimov
Copy link
Member

jimklimov commented Feb 15, 2022

Wondering if this is a similar issue to one uncovered in #1189 discussion (byte-signedness interpretation flaws with recent libusb handling changes after #300).

@nbriggs
Copy link
Contributor

nbriggs commented Feb 15, 2022

Interesting thought there. Extra bits set in a flag word because something came up negative that shouldn't have. Worth grabbing the debug info while you've got access to a device.

@nbriggs
Copy link
Contributor

nbriggs commented Feb 15, 2022

I'd also look in the debug output for "Lookup ... failed ... for" -- I'm suspicious that the change to eaton_converter_online_fun() that allows it to return NULL where previously it always returned a string ("online" or "!online") might have broken its caller(s).

@toxic0berliner
Copy link
Author

I'm not sure if I happen to be extremely lucky and someone with actual knowledge stumbled on the exact same error as me within days 9f me getting my new ups or if you're just so nice to get a loaner unit to investigate some random issue by an unknown guy like me, but thanks anyway for looking into it ;)
I'm not sure what I can do to help since you have the issue reproduced already, but let me know, I can do quite a bit in bash and Linux in general even if I'll need quite a bit of help with using nut as it's my first ups. I'm good enough to build from any branch if you happen to submit a fix ;)
Thanks in advance anyway!

@nbriggs
Copy link
Contributor

nbriggs commented Feb 16, 2022

@toxic0berliner -- try running (from nut/drivers if it's not an installed binary)
usbhid-ups -a bureau -DDDD -x explore -x vendorid=0463
(assuming "bureau" is what you named your UPS configuration) and interrupt it after about 10 seconds and attach the output here.

@toxic0berliner
Copy link
Author

Thanks @nbriggs for the directions ;)
bureau is indeed how I named my ups, french for "office" :D
I forwarded stderr to stdout to capture all of it in a file, hope that didn't mess with the expected content.
usbhid-ups.debug.log

Thanks in advance for any help and feel free to ask for anything else I could do to help solve this ;)

Not sure how to flag it, but while not a workaround, I can in fact revert to the packaged version of nut that reports the proper status for my prometheus/grafana monitoring, it only has issues when trying to change the beeper status from what I found, but now that I've set the ups to silent it doesn't get overwritten so I could totally live without this fix, so no pressure here ;)
And thanks a lot for the very nice work on nut, I picked the ups for it's linux support and even small issues don't disappoint !

@nbriggs
Copy link
Contributor

nbriggs commented Feb 16, 2022

@toxic0berliner -- ok, got the log, things check out. Now, if I'm thinking correctly I could use one more log without the -x options, and at a higher debug level (5)--
usbhid-ups -a bureau -DDDDD
if it's the problem I suspect there should be evidence of it in the first 20s or so.

I'm not one of the main developers here but recently I've been asked for ideas about some weird bugs because I tracked down a weird one that affected a NUT installation on a friend's system (only on big-endian, with 32/64 bit differences).
My own systems depend on NUT, too, but in the release configuration because it's plugged into a TrueNAS server where I don't want to perturb the setup.

@jimklimov
Copy link
Member

Me having an access to that UPS was a coincidence, setting up for an acquaintance.

Unfortunately, that setup got rebooted and now the alarm is not reproduced:

upsc nutdev1
battery.charge: 100
battery.charge.low: 2
battery.charge.restart: 2
battery.runtime: 1875
battery.type: PbAc
device.mfr: EATON
device.model: Protection Station Protection Station
device.serial: AN2E49008
device.type: ups
driver.flag.pollonly: enabled
driver.name: usbhid-ups
driver.parameter.bus: 003
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.product: Protection Station
driver.parameter.productid: FFFF
driver.parameter.serial: AN2E49008
driver.parameter.synchronous: no
driver.parameter.vendor: EATON
driver.parameter.vendorid: 0463
driver.version: 2.7.4-4698-gcefbdc8
driver.version.data: MGE HID 1.45
driver.version.internal: 0.45
driver.version.usb: libusb-1.0.19 (API: 0x1000103)
input.transfer.boost.high: 284
input.transfer.boost.low: 284
input.transfer.high: 284
input.transfer.low: 161
input.transfer.trim.high: 284
input.transfer.trim.low: 284
outlet.1.delay.shutdown: 1
outlet.1.delay.start: 1
outlet.1.desc: PowerShare Outlet 1
outlet.1.id: 2
outlet.1.powerfactor: 1.00
outlet.1.status: on
outlet.1.switchable: no
outlet.2.delay.shutdown: 1
outlet.2.delay.start: 1
outlet.2.desc: PowerShare Outlet 2
outlet.2.id: 3
outlet.2.powerfactor: 1.00
outlet.2.status: on
outlet.2.switchable: no
outlet.desc: Main Outlet
outlet.id: 1
outlet.powerfactor: 25.00
outlet.switchable: no
output.frequency.nominal: 50
output.powerfactor: 2.84
output.voltage: 230.0
output.voltage.nominal: 230
ups.beeper.status: enabled
ups.date: 1970/01/01
ups.delay.shutdown: 20
ups.delay.start: 30
ups.efficiency: 284
ups.firmware: Protection Station
ups.load: 1
ups.mfr: EATON
ups.model: Protection Station Protection Station
ups.power.nominal: 650
ups.productid: ffff
ups.realpower: 5
ups.serial: AN2E49008
ups.status: OL
ups.test.interval: 2
ups.time: 00:00:02
ups.timer.shutdown: -1
ups.timer.start: -1
ups.vendorid: 0463

or so I thought - Nick's comment above popped while I was posting this, so I re-checked... and now the alarm is there after a minute or two:

battery.charge: 100
battery.charge.low: 2
battery.charge.restart: 2
battery.runtime: 1875
battery.type: PbAc
device.mfr: EATON
device.model: Protection Station Protection Station
device.serial: AN2E49008
device.type: ups
driver.flag.pollonly: enabled
driver.name: usbhid-ups
driver.parameter.bus: 003
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.product: Protection Station
driver.parameter.productid: FFFF
driver.parameter.serial: AN2E49008
driver.parameter.synchronous: no
driver.parameter.vendor: EATON
driver.parameter.vendorid: 0463
driver.version: 2.7.4-4698-gcefbdc8
driver.version.data: MGE HID 1.45
driver.version.internal: 0.45
driver.version.usb: libusb-1.0.19 (API: 0x1000103)
input.transfer.boost.high: 284
input.transfer.boost.low: 284
input.transfer.high: 284
input.transfer.low: 161
input.transfer.trim.high: 284
input.transfer.trim.low: 284
outlet.1.delay.shutdown: 1
outlet.1.delay.start: 1
outlet.1.desc: PowerShare Outlet 1
outlet.1.id: 2
outlet.1.powerfactor: 1.00
outlet.1.status: on
outlet.1.switchable: no
outlet.2.delay.shutdown: 1
outlet.2.delay.start: 1
outlet.2.desc: PowerShare Outlet 2
outlet.2.id: 3
outlet.2.powerfactor: 1.00
outlet.2.status: on
outlet.2.switchable: no
outlet.desc: Main Outlet
outlet.id: 1
outlet.powerfactor: 25.00
outlet.switchable: no
output.frequency.nominal: 50
output.powerfactor: 2.84
output.voltage: 230.0
output.voltage.nominal: 230
ups.alarm: Emergency stop! Fatal EEPROM fault! Fan failure!
ups.beeper.status: enabled
ups.date: 1970/01/01
ups.delay.shutdown: 20
ups.delay.start: 30
ups.efficiency: 284
ups.firmware: Protection Station
ups.load: 1
ups.mfr: EATON
ups.model: Protection Station Protection Station
ups.power.nominal: 650
ups.productid: ffff
ups.realpower: 13
ups.serial: AN2E49008
ups.status: ALARM OL
ups.test.interval: 2
ups.time: 00:00:02
ups.timer.shutdown: -1
ups.timer.start: -1
ups.vendorid: 0463

@jimklimov
Copy link
Member

eaton-ups-1.txt
eaton-ups-2.txt

The status changes are sent out around 36'th second, probably at some half-a-minute refresh. Already around 4th second of driver uptime we can see e.g. "fanfail value 1", though not widely announced:

   4.199336     [D5] hid_lookup_usage: UPS -> 00840004
   4.199362     [D5] hid_lookup_usage: PowerSummary -> 00840024
   4.199385     [D5] hid_lookup_usage: PresentStatus -> 00840002
   4.199405     [D5] hid_lookup_usage: FanFailure -> ffff0077
   4.199420     [D1] string_to_path: couldn't parse FanFailure from UPS.PowerSummary.PresentS
tatus.FanFailure
   4.199433     [D4] string_to_path: depth = 3
   4.199460     [D3] Report[buf]: (4 bytes) => 01 25 00 00
   4.199476     [D5] PhyMax = 0, PhyMin = 0, LogMax = 1, LogMin = 0
   4.199490     [D5] Unit = 00000000, UnitExp = 0
   4.199503     [D5] Exponent = 0
   4.199521     [D2] Path: UPS.PowerSummary.PresentStatus.FanFailure, Type: Feature, ReportID
: 0x01, Offset: 0, Size: 1, Value: 1
   4.199536     [D5] hu_find_infoval: found fanfail (value: 1)
   4.199549     [D5] process_boolean_info: fanfail

and half a minute later,

  36.724488     [D3] Report[buf]: (4 bytes) => 01 25 00 00
  36.724505     [D5] PhyMax = 0, PhyMin = 0, LogMax = 1, LogMin = 0
  36.724521     [D5] Unit = 00000000, UnitExp = 0
  36.724535     [D5] Exponent = 0
  36.724555     [D2] Path: UPS.PowerSummary.PresentStatus.NeedReplacement, Type: Feature, Rep
ortID: 0x01, Offset: 7, Size: 1, Value: 0
  36.724571     [D5] hu_find_infoval: found !replacebatt (value: 0)
  36.724585     [D5] process_boolean_info: !replacebatt
  36.724609     [D3] Report[buf]: (4 bytes) => 01 25 00 00
  36.724626     [D5] PhyMax = 0, PhyMin = 0, LogMax = 1, LogMin = 0
  36.724642     [D5] Unit = 00000000, UnitExp = 0
  36.724656     [D5] Exponent = 0
  36.724675     [D2] Path: UPS.PowerSummary.PresentStatus.Good, Type: Feature, ReportID: 0x01
, Offset: 5, Size: 1, Value: 1
  36.724691     [D5] hu_find_infoval: found !off (value: 1)
  36.724706     [D5] process_boolean_info: !off
  36.724729     [D3] Report[buf]: (4 bytes) => 01 25 00 00
  36.724746     [D5] PhyMax = 0, PhyMin = 0, LogMax = 1, LogMin = 0
  36.724762     [D5] Unit = 00000000, UnitExp = 0
  36.724776     [D5] Exponent = 0
  36.724795     [D2] Path: UPS.PowerSummary.PresentStatus.FanFailure, Type: Feature, ReportID
: 0x01, Offset: 0, Size: 1, Value: 1
  36.724811     [D5] hu_find_infoval: found fanfail (value: 1)
  36.724825     [D5] process_boolean_info: fanfail
  36.724850     [D3] Report[buf]: (4 bytes) => 01 25 00 00
  36.724867     [D5] PhyMax = 0, PhyMin = 0, LogMax = 1, LogMin = 0
  36.724882     [D5] Unit = 00000000, UnitExp = 0
  36.724897     [D5] Exponent = 0
  36.724916     [D2] Path: UPS.PowerSummary.PresentStatus.InternalFailure, Type: Feature, ReportID: 0x01, Offset: 6, Size: 1, Value: 0
  36.724932     [D5] hu_find_infoval: found !commfault (value: 0)
  36.724946     [D5] process_boolean_info: !commfault

It looks very suspicious however that all those Report[buf] are the same 4 bytes: 01 25 00 00 just interpreted against different Path meanings... maybe something is not getting cleared and re-read?

@jimklimov
Copy link
Member

I don't quite get the log actually, gotta dig into code too I guess. For each "[D4] Entering libusb_get_report" there are often more than one Path's and (same) Reportbuf's so maybe some loop ran away or it is "lockpicking"?.. Many of those reports have different values after a new "Entering..." line, though still many values seem recurrent.

@nbriggs
Copy link
Contributor

nbriggs commented Feb 16, 2022

I believe that UPS.PowerSummary.PresentStatus is an array of bits which arrive as a single report, so it's extracting each field from the same data bytes as it translates from the HID bytes to the NUT variables. The report descriptor will describe which bit of the report each of the flags is coming from...

@nbriggs
Copy link
Contributor

nbriggs commented Feb 16, 2022

The

   4.199405     [D5] hid_lookup_usage: FanFailure -> ffff0077
   4.199420     [D1] string_to_path: couldn't parse FanFailure from UPS.PowerSummary.PresentS
tatus.FanFailure

is a little odd. Look in string_to_path for how that can fail. I'll be offline for the next 12h, but I'll check in later.

@jimklimov
Copy link
Member

Good catch for the bit-by-bit, now I see it is indeed looking at different offsets with size 1

@jimklimov
Copy link
Member

jimklimov commented Feb 16, 2022

Thanks for string_to_path nudge, that must be it - and mea culpa from overly zealous bug hunting, it seems.

The first block for usage table lookups with a HIDType_t (uint32_t currently) processed as a long ended up as negative with 0xFnnnnnnn numbers. Previously (2.7.4) the check was for -1 specifically as an error for entry not found in the tables; my "fix" for >=0 misfired.

After a few minutes of driver uptime, there is no alarm appearing, and the set of values is a bit different (e.g. no bogus ups.date catching my eye):

battery.charge: 100
battery.charge.low: 20
battery.runtime: 1875
battery.type: PbAc
device.mfr: EATON
device.model: Protection Station 650
device.serial: AN2E49008
device.type: ups
driver.flag.pollonly: enabled
driver.name: usbhid-ups
driver.parameter.bus: 003
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 2
driver.parameter.port: auto
driver.parameter.product: Protection Station
driver.parameter.productid: FFFF
driver.parameter.serial: AN2E49008
driver.parameter.synchronous: no
driver.parameter.vendor: EATON
driver.parameter.vendorid: 0463
driver.version: 2.7.4-4700-g66445ec
driver.version.data: MGE HID 1.45
driver.version.internal: 0.45
driver.version.usb: libusb-1.0.19 (API: 0x1000103)
input.transfer.high: 284
input.transfer.low: 161
outlet.1.desc: PowerShare Outlet 1
outlet.1.id: 2
outlet.1.status: on
outlet.1.switchable: no
outlet.2.desc: PowerShare Outlet 2
outlet.2.id: 3
outlet.2.status: on
outlet.2.switchable: no
outlet.desc: Main Outlet
outlet.id: 1
outlet.power: 25
outlet.switchable: no
output.frequency.nominal: 50
output.voltage: 230.0
output.voltage.nominal: 230
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.delay.start: 30
ups.firmware: 1.13
ups.load: 1
ups.mfr: EATON
ups.model: Protection Station 650
ups.power.nominal: 650
ups.productid: ffff
ups.realpower: 5
ups.serial: AN2E49008
ups.status: OL
ups.timer.shutdown: -1
ups.timer.start: -1
ups.vendorid: 0463

@jimklimov
Copy link
Member

@toxic0berliner : for a quick check, would be great if you can rebuild local NUT master with this essential fix applied:

https://github.com/networkupstools/nut/pull/1301/files#diff-8dc52da32d935c64c234bbd7955b6db39136d4d21ed30b6e0eca1a58ffb1311cR912

-    if ((usage = hid_lookup_usage(token, utab)) >= 0)
+    if ((usage = hid_lookup_usage(token, utab)) != -1) 

and verify that solves the bogus alarms for you too?

@toxic0berliner
Copy link
Author

Wow, all this went way over my head, but changing a line and rebuilding is in my skillset ;)
Pi will take ages again, will report back when it's done, might be tomorrow if it's slow and I get to sleep before it ends ;)

@jimklimov
Copy link
Member

Ok, thanks a lot!
It may help to change the line and rebuild a small scope for this one driver, like

:; (cd drivers && make usbhid-ups)

(and also ccache is your friend, if not yet deployed there)

@toxic0berliner
Copy link
Author

somehow it did build very fast anyway, had to reboot since I'm unclear as to the unit name for the upstart services, but after reboot, issue is fixed ! I read status as online, no Fan error anymore, and I still am able to disable/enable the beeper which was broken from the packaged version !

All fine for me with this fix ! Congrats, and thank you very very much for the incredible reactivity and help !

Now that won't help a bit with packaging this in the debian repos, and even less for my other UPS that runs a costomized nut version from synology, but hey, I have the latest one working perfectly on my pi ! thanks a lot !

And congrats again on the very nice instructions, it has been years since I compiled anything and you made it a breeze with your docs !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants