Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZNP Adapter Manager/Backup Refactor #303

Merged
merged 115 commits into from
May 13, 2021

Conversation

castorw
Copy link
Contributor

@castorw castorw commented Jan 31, 2021

This PR addresses issues discussed in #286. Although quite large - it is a complex refactor of ZNP adapter startup, backup management and commissioning process.

Status:

  • Introduce advanced structures for NIB and other NV types for easier work with them
  • Introduce unified backup format (inspired by @puddly's [RFC] Unified radio command line interface and standard network backup format zigpy/zigpy#557 (comment))
  • Introduce more comprehensive startup strategy determination process to prevent network re-commissions when not expected (ZNP Adapter Commissioning/Restore Decision Process #286 (comment))
  • Respect target adapter limits and other network properties during adapter restore (a provisioning network is commissioned and relevant backup data is restored over the newly created NIB)
  • Prevent commissioning of network with colliding PAN ID (compatible with previous colliding setups)
  • Ensure NIB conversion between 8-bit platforms (CC2530/CC2531) and 16-bit platforms (CC13x2/CC2538/CC26x2)
  • Pass logger to adapter instances (ability to issue warnings or other possibly relevant stuff)
  • Clear tables before restoring (NWK_SEC_MATERIAL_TABLE, NWK_TCLK_TABLE)
  • Backup devices with short address, IEEEE address, derived APS key with counters (if present) (seed-based/specific)
  • Backup TCLK seed
  • Recover APS link keys (differentiate between seed-based and arbitrary)
  • Type backups properly (currently any - WIP)
  • Research: Ability to use APP_CNF_SET_NWK_FRAME_COUNTER to restore frame counter instead of restoring security material table/ZCD_NW_NWKKEY - may be good for real-time updates - not for cold restore
  • Research: Network update instead of full re-commissioning on parameter change - out of scope of this PR
  • Research extended PAN ID commissioning behaviour (seems to be ignored on some platforms) and add to startups strategy evaluations if feasible - fixed by using ZCD_NV_APS_USE_EXT_PANID as well
  • Test commissioning/restore with default EPID DD:DD:DD:DD:DD:DD:DD:DD
  • Test with physical adapters (CC2531, CC2538, CC2652RB)
  • Test with Z-Stack 1.2 HA (CC2531) - commissioning only, backups are not yet supported
  • Add warnings for configuration inconsistencies
  • Add code comments
  • Add/fix tests

@puddly - you are linking frame counter with network key in your proposal (zigpy/zigpy#557 (comment)), however from what I see the frame counters are bound to extended PAN IDs in NWK_SEC_MATERIAL table. Or am I doing this wrong?

@Koenkk If you'd like I would be happy to discuss this further, preferably in some more interactive manner - Discord, Teams, ... you name it. Thanks.

@Koenkk
Copy link
Owner

Koenkk commented Jan 31, 2021

You can contact me on telegram (@Koenkk)

@puddly
Copy link

puddly commented Jan 31, 2021

@puddly - you are linking frame counter with network key in your proposal (zigpy/zigpy#557 (comment)), however from what I see the frame counters are bound to extended PAN IDs in NWK_SEC_MATERIAL table. Or am I doing this wrong?

It's stored in different places for each Z-Stack version but I believe the only way to reset the counter is to rotate the network key. You can't rotate the EPID. This also matched the structure of the SiLabs EZSP backup format.

Determine if NWK_TCLK_TABLE backup/restore is necessary

You'll need those for APS encryption, which uses unique keys for each device. They're derived from the "TCLK_SEED", the device's IEEE, and a shift, to save space, so you'll have to actually "export" them before use. I haven't had time to investigate if Z-Stack can re-import them in their expanded form, since each stack handles hashed keys differently.

It you want to discuss the backup stuff further (it's not set in stone but diverging from the unified backup format this soon won't make it very unified :) ) or find anything interesting within Z-Stack, you can contact me over Discord: puddly#8728.

@nurikk
Copy link
Contributor

nurikk commented May 26, 2021

@nurikk would it be ok for you if the data is provided as json to the frontend?

yes, should be fine

@Koenkk
Copy link
Owner

Koenkk commented May 27, 2021

@nurikk thinking more about this, there are 3 things to backup: database.db, coordinator_backup.json and the configuration.yaml. When the user would click the backup button in the frontend, I think it's the most handy that a zip file is downloaded with those 3 files in it. Does it make more sense that the data is returned as a binary string instead?

@nurikk
Copy link
Contributor

nurikk commented May 28, 2021

@nurikk thinking more about this, there are 3 things to backup: database.db, coordinator_backup.json and the configuration.yaml. When the user would click the backup button in the frontend, I think it's the most handy that a zip file is downloaded with those 3 files in it. Does it make more sense that the data is returned as a binary string instead?

we can keep backup file as simple json with key-value pairs, key - path, value - data uri like:

{
    "meta": {
        "date": "2021-05-28T00:36:24.604Z",
        "some_random_stats": "here",
         "z2m_version": "0.15.5"
    },
    "files": {
        "./database.db": "data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==",
        "./configuration.yaml": "data:text/plain;base64,......"
    }
}

@Hedda
Copy link
Contributor

Hedda commented May 28, 2021

@nurikk thinking more about this, there are 3 things to backup: database.db, coordinator_backup.json and the configuration.yaml. When the user would click the backup button in the frontend, I think it's the most handy that a zip file is downloaded with those 3 files in it.

The ability for an end-user to get a such full backup package as zip file from the frontend would be awesome and make it very user-friendly to perform manual backups and restores when for example migrating between different computers/installations/setups.

@Koenkk
Copy link
Owner

Koenkk commented May 28, 2021

@nurikk but how would the user restore it? (since it cannot restore a json file, the files need to be replaced)

@nurikk
Copy link
Contributor

nurikk commented May 29, 2021

@nurikk but how would the user restore it? (since it cannot restore a json file, the files need to be replaced)

I thought about two ways of restoring this backups.

  1. User uploads this backup file from ui and boom, magic happens
  2. User drops this backup file into z2m directory and restart z2m, and then magic happens.

But I agree that simple zip file works nice as well KISS :)

@Koenkk
Copy link
Owner

Koenkk commented May 29, 2021

If I provide the info like #303 (comment) (as json), could you contstruct a zip from the frontend?

@nurikk
Copy link
Contributor

nurikk commented May 29, 2021

If I provide the info like #303 (comment) (as json), could you contstruct a zip from the frontend?

Yes

@kirovilya
Copy link
Contributor

A small note / question ... Some iobroker.zigbee users had their own ExtPanID different from the default ("DDDDDDDDDDDDDDDD").
These users now have an error at startup:


Configuration is not consistent with adapter state/backup!
 - PAN ID: configured=5203, adapter=5204
 - Extended PAN ID: configured=341bd2fe158bdcb5, adapter=00124b00192e822f
 - Network Key: configured=0a03050709cb0d0f66020406080a0c0e, adapter=0a03050709cb0d0f66020406080a0c0e
 - Channel List: configured=25, adapter=25
 Please update configuration to prevent further issues.
 If you wish to re-commission your network, please remove coordinator backup at /opt/iobroker/iobroker-data/zigbee_0/nvbackup.json.
 Re-commissioning your network will require re-pairing of all devices!
 Starting zigbee-herdsman problem : "startup failed - configuration-adapter mismatch - see logs above for more information"
 Failed to start Zigbee
 Error herdsman start

It can be seen that the "Extended PAN ID" in the settings and in the adapter do not match.
If they go back to the previous version, then there is no error and everything works.

341bd2fe158bdcb5 - correct extpanid from settings.
00124b00192e822f - extpanid read from chip / adapter, matches ieee chip.

Hence the questions:

  • why is the extpanid not correct in the adapter?
  • previously configured networks did not correctly register extpanid in the adapter?
  • what should such users do now so that they do not have to reconfigure the network again? try to set extpanid DDDDDDDDDDDDDDDD in settings?

@kirovilya
Copy link
Contributor

Another case - wrong values in adapter side:

Configuration is not consistent with adapter state/backup!
 - PAN ID: configured=6754, adapter=65535
 - Extended PAN ID: configured=00124b00192e81a5, adapter=0000000000000000
 - Network Key: configured=01030507090b0d0f00020406080a0c0d, adapter=01030507090b0d0f00020406080a0c0d
 - Channel List: configured=11, adapter=
 Please update configuration to prevent further issues.
 If you wish to re-commission your network, please remove coordinator backup at /opt/iobroker/iobroker-data/zigbee_0/nvbackup.json.
 Re-commissioning your network will require re-pairing of all devices!
 Starting zigbee-herdsman problem : "startup failed - configuration-adapter mismatch - see logs above for more information"
 Failed to start Zigbee
 Error herdsman start

@Koenkk
Copy link
Owner

Koenkk commented May 30, 2021

@castorw can you check this?

@castorw
Copy link
Contributor Author

castorw commented May 31, 2021

  • why is the extpanid not correct in the adapter?

EPID configuration wasn't implemented properly before in ZH. This PR fixed that, but it is possible that adapters initialized by pervious version of ZH may not have EPID properly configured. The occurrence of this issue is related to different firmwares in radios.

  • previously configured networks did not correctly register extpanid in the adapter?

See above.

  • what should such users do now so that they do not have to reconfigure the network again? try to set extpanid DDDDDDDDDDDDDDDD in settings?

I would suggest they set the EPID to the value reported in adapter. This is the easiest way to fix the issue for now.

@castorw
Copy link
Contributor Author

castorw commented May 31, 2021

Another case - wrong values in adapter side:

Configuration is not consistent with adapter state/backup!
 - PAN ID: configured=6754, adapter=65535
 - Extended PAN ID: configured=00124b00192e81a5, adapter=0000000000000000
 - Network Key: configured=01030507090b0d0f00020406080a0c0d, adapter=01030507090b0d0f00020406080a0c0d
 - Channel List: configured=11, adapter=
 Please update configuration to prevent further issues.
 If you wish to re-commission your network, please remove coordinator backup at /opt/iobroker/iobroker-data/zigbee_0/nvbackup.json.
 Re-commissioning your network will require re-pairing of all devices!
 Starting zigbee-herdsman problem : "startup failed - configuration-adapter mismatch - see logs above for more information"
 Failed to start Zigbee
 Error herdsman start

This looks like improperly configured adapter - possibly even in Router/EndDevice mode (wonder if this adapter could have worked before at all). This worked before migrating to updated ZH codebase? Was it run with ZH before?

@kirovilya
Copy link
Contributor

EPID configuration wasn't implemented properly before in ZH. This PR fixed that, but it is possible that adapters initialized by pervious version of ZH may not have EPID properly configured. The occurrence of this issue is related to different firmwares in radios.

Of course the adapter and network were initialized in previous versions of ZH (most users will have this case).
It is necessary to somehow give recommendations on what to do in this case, otherwise there will be many questions about this.

This is the easiest way to fix the issue for now.

it seems to me the simplest way is to indicate DDDDDDDDDDDDDDDD in extpanid.

@kirovilya
Copy link
Contributor

This looks like improperly configured adapter - possibly even in Router/EndDevice mode (wonder if this adapter could have worked before at all). This worked before migrating to updated ZH codebase? Was it run with ZH before?

this adapter used to work as a coordinator (in the previous version of ZH). this is probably a problem with early firmwares for cc2538, but they worked for users. I'm afraid this problem cannot be solved by re-initializing the network. most likely the settings are saved at different addresses than in other firmware.

maybe add the ability to ignore some checks? otherwise some adapters will not be able to work

@castorw
Copy link
Contributor Author

castorw commented May 31, 2021

@kirovilya

  • Yeah, updating the EPID to 0xDDDDDDDDDDDDDDDD is as good solution as putting the value reported by adapter in the startup.
  • Regarding the other issue, I am not really sure this can be caused by different firmware version. Only if NIB was relocated to another NV item ID - which I think is unlikely. Can you provide me with details on the firmware used in the CC2538 radio?

@kirovilya
Copy link
Contributor

Regarding the other issue, I am not really sure this can be caused by different firmware version. Only if NIB was relocated to another NV item ID - which I think is unlikely. Can you provide me with details on the firmware used in the CC2538 radio?

first version of firmware was https://github.com/antst/CC2538-ZNP-Coordinator-firmware (without sources)
then moved here https://github.com/reverieline/CC2538-CC2592-ZNP with sources and history.
there was some moves of NV_MEM reverieline/CC2538-CC2592-ZNP@6329103#diff-b0b2076efea010195f6a749aa25eddc1c1ef974c3d96ad5a2dc1eae71527d542

@castorw
Copy link
Contributor Author

castorw commented Jun 3, 2021

@kirovilya the diff does not imply NV relocation, it's only NV stretch so larger tables can be included. Anyway its unrelated to the NV item indexes which hasn't changed for a long time. I am positive that NIB location (item this problem is related to) is the same for Z-Stack 1.2 and Z-Stack 3.0.x and Z-Stack 3.x.0+ as well. The reason for the NV being in the state described above is unknown to me and so far the only occurrence of such state.

@kirovilya
Copy link
Contributor

I see a lot of problems with the latest update. therefore, I will repeat my proposal to make it possible to disable the check at startup in order to work with the current settings.

@castorw
Copy link
Contributor Author

castorw commented Jun 3, 2021

@kirovilya Could you provide me with NV dump from the adapter which caused this issue (with PANID being 65535)? Possibly using https://github.com/zigpy/zigpy-znp/blob/dev/zigpy_znp/tools/nvram_read.py.

@kirovilya
Copy link
Contributor

kirovilya commented Jun 3, 2021

@castorw I can't do it quickly. this is not my stick, and the user is not an expert. will have to figure out a way for him to execute the code

@castorw
Copy link
Contributor Author

castorw commented Jun 3, 2021

@kirovilya Thank you, this would be essential for further diagnostics. I am trying to think about the ability to disable the checks, but most of the updated code-base loses sense in such way. The goal was to stabilize the way ZH handles network provisioning and not create another cracks allowing improperly configured adapters to run.

@Koenkk any thoughts on this?

@Koenkk
Copy link
Owner

Koenkk commented Jun 3, 2021

@kirovilya @castorw let's continue in #376

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants