Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZNP Adapter Commissioning/Restore Decision Process #286

Closed
castorw opened this issue Jan 10, 2021 · 39 comments
Closed

ZNP Adapter Commissioning/Restore Decision Process #286

castorw opened this issue Jan 10, 2021 · 39 comments
Labels

Comments

@castorw
Copy link
Contributor

castorw commented Jan 10, 2021

Current State

Currently the following startup process is implemented (as per startZnp.ts):

  • Check if the adapter has been configured using hasConfigured NV item:
    • Yes: Check if configuration matches data in NV RAM:
      • Yes: Regular startupFromApp
      • No: Start BDB commissioning
    • No: Check if coordinator backup exists:
      • Yes: Restore from backup and startupFromApp,
      • No: Check if configuration matches data in NV RAM:
        • Yes: Regular startupFromApp
        • No: Start BDB commissioning

BDB Commissioning

Part of BDB commissioning which forms a new ZigBee network is to make sure, no other network with the same PAN ID lives on the same channel in the vicinity of the adapter. This is done by Z-Stack adapter firmware by sending a Beacon Request when forming a new network. Adapter then waits for Beacon frames and determines if the configured PAN ID is in use by other ZigBee network nearby.

If a network with the same PAN ID is present, the commissioning process will keep incrementing the PAN ID by one until suitable (free) PAN ID is found. If the configured PAN ID is NOT found it is used. This information is then stored within NIB in adapters NV memory.

This means that contents of ZCD_NV_PANID and ZCD_NV_NIB may differ after network commissioning. This information is discovered by Z2M by ZDO:extNwkInfo SREQ/SRSP but not taken into account.

Problem

During an adapter upgrade or replacement, where the new adapter has been previously used by Z2M without having NVRAM cleared (eg. used in other network or used before with other parameters), Z2M would start BDB commissioning instead of restoring NV backup to recover the previous network. This is due to the decision process implementation - hasConfigured parameter is set from previous Z2M instance but parameters mismatch -> BDB Commissioning. The expected action, however, would be coordinator restore from backup.

Proposed Solutions

  1. NIB Error - terminate application if adapters NV memory value of ZCD_NV_PANID (therefore configured PAN ID) mismatches PAN ID within ZCD_NV_NIB ,
  2. NIB Management - if ZDO:extNwkInfo SRSP mismatches ZCD_NV_PANID (therefore configured PAN ID) update the PAN ID in NIB and write it back to device - then SREQ SYS:resetReq,
  3. Restore If Config Matches (suggested by @Koenkk) - restore coordinator backup if NV memory and configuration mismatches, but configuration and coordinator backup matches (I would still keep in mind the NIB mismatch thingy),
  4. "Sign" The NV Memory - instead of (or in addition to) simple hasConfigured NV item in adapter introduce a value, which uniquely represents every Z2M instance (eg. UUIDv4) - if the instance ID mismatches (use in different Z2M instance) - coordinator NV is rather restored than network re-commissioned.

This might be 2 issues as well (the decision process and commissioned PAN ID validation). I am posting this issue to open a discussion in this matter since it has caused me several problems and by searching on forums and other issues I see other people struggle with this as well (sometimes without a gist where the problem lies).

If we can find a suitable solution, I am willing to do the implementation and submit a PR.

@Koenkk
Copy link
Owner

Koenkk commented Jan 11, 2021

If a network with the same PAN ID is present, the commissioning process will keep incrementing the PAN ID by one until suitable (free) PAN ID is found.

I'm wondering if this is true. At least for the CC2652 and CC1352 it's not (that's why https://www.zigbee2mqtt.io/information/FAQ.html#how-do-i-migrate-from-a-cc2531-to-a-more-powerful-coordinator-eg-zzh is needed). For the CC2530/CC2531 (1.2) the firmware is hacked always start on the specified pan_id, regardless if any other networks are running on it. Does this only apply to the CC2538 and CC2530/CC2531 on zstack 3.0.x?

Also involving @hacker-cb in the discussion since we've discussed this earlier.

@castorw
Copy link
Contributor Author

castorw commented Jan 11, 2021

Good, I base my CC2538 firmware on @hacker-cb's patches.

Cannot tell about CC2652/CC1352 since I haven't had my hands of one of those yet. I guess these are based on SimpleLink and you may know better if they use the same NV structure or not.

Just tested with CC2531 running pre-built firmware from https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_3.0.x. In the commissioning process the PAN ID is incremented. I have configured PAN ID 6754 and ended up with 6755:

Zigbee2MQTT:debug 2021-01-11 17:23:14: Using zigbee-herdsman with settings: '{"adapter":{"concurrent":null,"delay":null},"backupPath":"/Users/castor/Documents/zigbee2mqtt-2/data/coordinator_backup.json","databaseBackupPath":"/Users/castor/Documents/zigbee2mqtt-2/data/database.db.backup","databasePath":"/Users/castor/Documents/zigbee2mqtt-2/data/database.db","network":{"channelList":[11],"extendedPanID":[221,221,221,221,221,221,221,221],"networkKey":"HIDDEN","panID":6754},"serialPort":{"path":"/dev/tty.usbmodem144401","rtscts":false}}'
.....
Zigbee2MQTT:debug 2021-01-11 17:23:26: Zigbee network parameters: {"channel":11,"extendedPanID":"0x00124b0019387286","panID":6755}

I'm wondering if this is true. At least for the CC2652 and CC1352 it's not (that's why https://www.zigbee2mqtt.io/information/FAQ.html#how-do-i-migrate-from-a-cc2531-to-a-more-powerful-coordinator-eg-zzh is needed). For the CC2530/CC2531 (1.2) the firmware is hacked always start on the specified pan_id, regardless if any other networks are running on it. Does this only apply to the CC2538 and CC2530/CC2531 on zstack 3.0.x?

Actually I guess it's true as well - the re-commissioning is the reason you need to increment the PAN ID in config, otherwise it would be done automatically making inconsistencies between config / NV RAM and inside NV RAM between configured PAN_ID and NIB.

@hacker-cb
Copy link
Contributor

My last suggestion was to generate instance_id and save it to the NVMEM during initializing.

Koenkk/zigbee2mqtt#4960

@Koenkk
Copy link
Owner

Koenkk commented Jan 11, 2021

The first problem we should tackle it the wrong pan_id being used by the CC2538. This indeed seems to be something in 3.0.x firmwares. Can you apply the following change? https://github.com/Koenkk/Z-Stack-firmware/blob/79d7e2f48f331dca7437b3a58540360ff02d34bc/coordinator/Z-Stack_Home_1.2/firmware.patch#L122

@hacker-cb
Copy link
Contributor

The first problem we should tackle it the wrong pan_id being used by the CC2538. This indeed seems to be something in 3.0.x firmwares. Can you apply the following change? https://github.com/Koenkk/Z-Stack-firmware/blob/79d7e2f48f331dca7437b3a58540360ff02d34bc/coordinator/Z-Stack_Home_1.2/firmware.patch#L122

@Koenkk you wrote to me?

@castorw
Copy link
Contributor Author

castorw commented Jan 11, 2021

@hacker-cb I am going to test it right away.

@castorw
Copy link
Contributor Author

castorw commented Jan 11, 2021

@Koenkk I have found that function before but just assumed it would crash the radio firmware since I expected the underlying proprietary code to call it in loop until suitable PAN ID was found. But it actually worked, the network has been commissioned with colliding PAN ID and basically hijacked my other network.

This might be considered a solution, however it's a pretty rough one - since you can basically create rogue networks with no warning at all. I would still consider finding other solution to mitigate the issue while minimising the possible security impact. However it can still be done if someone really wants to.

So the second issue in question: commissioning instead of coordinator restore.

@Koenkk
Copy link
Owner

Koenkk commented Jan 11, 2021

Good, since the pan id will now match the one in the config I think we can go for solution 3 from the OP?

@castorw
Copy link
Contributor Author

castorw commented Jan 11, 2021

Ok, so I will "rape" my radio firmware to do such in a configurable manner.

For the rest of the issue I think we could go with your solution (3). However, wouldn't you think it would be a good idea to crash the application with error if right after commissioning (initialisation) the response from ZDO:extNwkInfo does not match the configured PAN ID?

Shall I provide a PR or you want to do this yourself?

@Koenkk
Copy link
Owner

Koenkk commented Jan 11, 2021

Agree about: the response from ZDO:extNwkInfo does not match the configured PAN ID?, but note that is is a breaking change for older firmwares. Firmware here have to be updated first since those are the once linked from z2m docs: https://github.com/jethome-ru/zigbee-firmware/tree/master/ti/coordinator/cc2538_cc2592

@castorw
Copy link
Contributor Author

castorw commented Jan 11, 2021

@hacker-cb What do you think about updating the firmware mentioned above to force PAN ID even if conflict is detected?

@Koenkk I don't think it's a breaking change since this will only happen while initialising the adapter. Any adapter that already is configured according to configuration.yaml will continue to work. Only new network initialisations and/or recovers would fail. Even case where coordinator is restored won't make any trouble. This will just prevent people from creating a network where the configured PAN ID differs from the actual PAN ID used by radio and may only reduce number of issues in this matter. And it's not bound to whether the adapter allows it as per currently discussed "hack" or it won't.

@castorw
Copy link
Contributor Author

castorw commented Jan 11, 2021

Okay I have managed to make Force PAN ID Commissioning optional in Z-Stack firmware:

ZComDef.h (add custom NV item):

...
#define ZCD_NV_ZW_FORCE_PAN_ID                    0x04A1
...

ZDApp.c (ignore PAN ID conflicts if configured to do so):

...
uint16 ZDApp_CoordStartPANIDConflictCB( uint16 panid )
{
  uint8 force_pan_id = 0;
  uint8 nv_read_ret = osal_nv_read(ZCD_NV_ZW_FORCE_PAN_ID, 0, 1, (void *)&force_pan_id);
  return (nv_read_ret == SUCCESS && force_pan_id > 0) ? panid : (panid + 1);
}
...

@hacker-cb
Copy link
Contributor

@castorw
I will ask my collegue Egor, who is responsible for the Zigbee firmware.

@0x3EC
Copy link

0x3EC commented Jan 12, 2021

Okay I have managed to make Force PAN ID Commissioning optional in Z-Stack firmware:

ZComDef.h (add custom NV item):

...
#define ZCD_NV_ZW_FORCE_PAN_ID                    0x04A1
...

ZDApp.c (ignore PAN ID conflicts if configured to do so):

...
uint16 ZDApp_CoordStartPANIDConflictCB( uint16 panid )
{
  uint8 force_pan_id = 0;
  uint8 nv_read_ret = osal_nv_read(ZCD_NV_ZW_FORCE_PAN_ID, 0, 1, (void *)&force_pan_id);
  return (nv_read_ret == SUCCESS && force_pan_id > 0) ? panid : (panid + 1);
}
...

I think this is a good solution.
The problem you described often happens. I have test devices running on PANID, PANID +1, PANID +2, PANID +3. If the PANID inconsistency is written in bold bold letters, then this will give the user a chance to "repair" the network, and not reconnect all devices.

@castorw
Copy link
Contributor Author

castorw commented Jan 12, 2021

@0x3EC This can be determined from NIB, however I have found out that NIBs may differ in size (so far 110B and 116B). This is probably due to internal includes based on firmware compile-time configuration. Also 8/16/32-bit architecture of radio MCU may play role due to memory alignment. I will be looking deeper into this later today and try to figure out a suitable way to prevent/detect PAN ID collision and coordinator NV restore issues.

I also had issue restoring coordinator backup - didn't notice it then - but I have restored 110 byte NIB over firmware which produced 116 byte NIBs. This caused the radio to listen on channel 12 instead of 11 as expected and don't know what else. Hopefully I will be able to identify the differences between these NIB formats.

@castorw
Copy link
Contributor Author

castorw commented Jan 12, 2021

So it looks like NIBs generated in CC2530/CC2531 are not aligned since they run 8051 (8-bit) MCU. Therefore size of CC2530/CC2531 NIB is 110 bytes. CC2538 however creates 116 byte NIB due to word-alignment (16-bit aligned memory access). Therefore size of nwkIB_t struct is different on the platforms in compile-time.

That this means: restoring CC2531 backup on CC2538 or other 16/32-bit MCUs shouldn't be done or the NIB should be fixed beforehand.

@Koenkk @hacker-cb @0x3EC Would any of you be able to provide coordinator_backup.json sample from CC13xx/CC2652X? I would really like to compare them to NIBs from CC2531/CC2538.

Here's a little detail on the matter:

                                                                                                       nwkDevAddress
                                                                                                       |       nwkLogicalChannel
                                                                                                       |       |       nwkCoordAddress
                                                                                                       |       |       |       nwkCoordExtAddress
                                                                                                       |       |       |       |                               uint16 panId
                                                                                                       |       |       |       |                               |       uint8 nwkState
                                                                                                       |       |       |       |                               |       |       uint32 channelList
                                                                                                       |       |       |       |                               |       |       |
                                                                                                       _______ ___     _______ _______________________________ _______ ___     _______________
CC2652 NIB 116[025,005,002,051,020,051,000,030,000,000,000,001,005,001,143,000,007,000,002,005,030,000,000,000,018,000,000,000,000,000,000,000,000,000,000,000,018,058,008,000,000,000,004,000,015,015,004,000,001,000,000,000,001,000,000,000,000,221,221,221,221,221,221,221,221,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,015,003,000,001,120,010,001,000,000,000,185,103,000,000]
CC2652 NIB 116[199,005,002,051,020,051,000,030,000,000,000,001,005,001,143,000,020,000,002,005,030,000,000,000,015,000,000,000,000,000,000,000,000,000,000,000,174,119,008,000,000,128,000,000,015,015,004,000,001,000,000,000,001,000,000,000,000,055,167,071,151,119,215,162,036,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,015,003,000,001,120,010,001,000,000,000,000,000,000,000]
CC2538 NIB 116[153,005,002,081,020,081,000,100,000,000,000,001,005,001,143,000,007,000,002,013,030,000,000,000,011,000,000,000,000,000,000,000,000,000,000,000,098,026,008,000,000,008,000,000,015,015,004,000,001,000,000,000,001,000,000,000,000,061,118,217,020,000,075,018,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,060,012,000,001,120,010,001,000,000,000,136,031,000,000]
CC2531 NIB 110[207,005,002,016,020,016,000,020,000,000,000,001,005,001,143,   ,007,000,002,005,030,   ,000,000,011,   ,000,000,000,000,000,000,000,000,000,000,098,026,008,   ,000,008,000,000,015,015,004,000,001,000,000,000,001,000,000,000,000,061,118,217,020,000,075,018,000,001,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,060,003,000,001,120,010,001,   ,000,000,108,000,000,   ]
POS            0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99  100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
                                                                           ^                       ^               ^                                                       ^                                                                                                                                                                                                                                                                                       ^                       ^
                                                                           |                       |               |                                                       |                                                                                                                                                                                                                                                                                       |                       |
                                                                           |                       |               |                                                       |                                                                                                                                                                                                                                                                                       |                       pad structure to be (%2 == 0) long
                                                                           |                       |               |                                                       |                                                                                                                                                                                                                                                                                       align uint16_t nwkManagerAddr
                                                                           |                       |               |                                                       align uint32 channelList
                                                                           |                       |               align uint16 nwkCoordAddress
                                                                           |                       align uint16 nwkDevAddress
                                                                           align uint16 TransactionPersistenceTime

EDIT: By the looks of nwk.h inside SimpleLink Z-Stack 3.X.0 the NIB stucture is the same. However the alignment may potentially differ. I do not currently have access to SimpleLink-based hardware (CC135x/CC2652x) so I will need a backup sample.

@puddly
Copy link

puddly commented Jan 12, 2021

I've been messing around with the low-level NIB stuff for a while in zigpy-znp so if you want to compare results, here's what I am using right now: https://github.com/zigpy/zigpy-znp/blob/dev/zigpy_znp/znp/nib.py#L50-L123

A few other data type sizes change between the different architectures so it's not enough to tweak padding. I've documented all of the differences between the two structs in the comment of the CC2531NIB class right below the linked code snippet.

I also have complete NVRAM dumps for a CC2652R, though the format is a little different than what Z2M uses: https://github.com/zigpy/zigpy-znp/tree/dev/tests/nvram. I would be very interested in some complete CC2538 NVRAM dumps if you have any :)

Regarding your main issue with PAN ID conflicts in newer Z-Stacks (especially with existing routers on the network): I have been trying to form a network with a PAN ID of 0xFFFF (to let Z-Stack pick) and then overriding it after a reset by modifying the NIB: https://github.com/zigpy/zigpy-znp/blob/dev/zigpy_znp/zigbee/application.py#L385-L396. I believe the reset is necessary to make sure the frame counter in the NIB is rounded up by Z-Stack, since it's only written to NVRAM periodically (and I think in multiple places in different versions of Z-Stack)

In my tests it has worked and causes my CC2652R to happily form a conflicting network, even with other routers sharing that same PAN ID.

Not entirely related to this discussion, but if you want to collaborate on a unified network backup format between Z2M and zigpy/ZHA and possibly get deCONZ on board, it would be awesome: zigpy/zigpy#557 (comment)

@Koenkk
Copy link
Owner

Koenkk commented Jan 12, 2021

@castorw here you go: https://gist.github.com/Koenkk/bcbd2d472b71676b51216ec697aa8037

@hacker-cb
Copy link
Contributor

@castorw this is from 2652:

coordinator_backup.json
{
  "adapterType": "zStack",
  "time": "Tue, 12 Jan 2021 17:11:13 GMT",
  "meta": {
    "product": 1
  },
  "data": {
    "ZCD_NV_EXTADDR": {
      "id": 1,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        38,
        31,
        129,
        34,
        0,
        75,
        18,
        0
      ],
      "len": 8
    },
    "ZCD_NV_NIB": {
      "id": 33,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        25,
        5,
        2,
        51,
        20,
        51,
        0,
        30,
        0,
        0,
        0,
        1,
        5,
        1,
        143,
        0,
        7,
        0,
        2,
        5,
        30,
        0,
        0,
        0,
        18,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        18,
        58,
        8,
        0,
        0,
        0,
        4,
        0,
        15,
        15,
        4,
        0,
        1,
        0,
        0,
        0,
        1,
        0,
        0,
        0,
        0,
        221,
        221,
        221,
        221,
        221,
        221,
        221,
        221,
        1,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        15,
        3,
        0,
        1,
        120,
        10,
        1,
        0,
        0,
        0,
        185,
        103,
        0,
        0
      ],
      "len": 116
    },
    "ZCD_NV_PANID": {
      "id": 131,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        18,
        58
      ],
      "len": 2
    },
    "ZCD_NV_EXTENDED_PAN_ID": {
      "id": 45,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        221,
        221,
        221,
        221,
        221,
        221,
        221,
        221
      ],
      "len": 8
    },
    "ZCD_NV_NWK_ACTIVE_KEY_INFO": {
      "id": 58,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0,
        167,
        13,
        78,
        236,
        154,
        129,
        229,
        186,
        153,
        253,
        107,
        247,
        192,
        237,
        0,
        154
      ],
      "len": 17
    },
    "ZCD_NV_NWK_ALTERN_KEY_INFO": {
      "id": 59,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0,
        167,
        13,
        78,
        236,
        154,
        129,
        229,
        186,
        153,
        253,
        107,
        247,
        192,
        237,
        0,
        154
      ],
      "len": 17
    },
    "ZCD_NV_APS_USE_EXT_PANID": {
      "id": 71,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ],
      "len": 8
    },
    "ZCD_NV_PRECFGKEY": {
      "id": 98,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        167,
        13,
        78,
        236,
        154,
        129,
        229,
        186,
        153,
        253,
        107,
        247,
        192,
        237,
        0,
        154
      ],
      "len": 16
    },
    "ZCD_NV_PRECFGKEY_ENABLE": {
      "id": 99,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0
      ],
      "len": 1
    },
    "ZCD_NV_CHANLIST": {
      "id": 132,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0,
        0,
        4,
        0
      ],
      "len": 4
    },
    "ZCD_NV_EX_TCLK_TABLE": {
      "sysid": 1,
      "itemid": 4,
      "subid": 0,
      "product": 1,
      "osal": false,
      "offset": 0,
      "value": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        255,
        0,
        0,
        0
      ],
      "len": 20
    },
    "ZCD_NV_EX_NWK_SEC_MATERIAL_TABLE": {
      "sysid": 1,
      "itemid": 7,
      "subid": 0,
      "product": 1,
      "osal": false,
      "offset": 0,
      "value": [
        247,
        102,
        100,
        0,
        221,
        221,
        221,
        221,
        221,
        221,
        221,
        221
      ],
      "len": 12
    }
  }
}

@hacker-cb
Copy link
Contributor

About Frame Counter problem @0x3EC wrote some time ago: Koenkk/Z-Stack-firmware#225

@0x3EC
Copy link

0x3EC commented Jan 12, 2021

@0x3EC This can be determined from NIB, however I have found out that NIBs may differ in size (so far 110B and 116B). This is probably due to internal includes based on firmware compile-time configuration. Also 8/16/32-bit architecture of radio MCU may play role due to memory alignment. I will be looking deeper into this later today and try to figure out a suitable way to prevent/detect PAN ID collision and coordinator NV restore issues.

I also had issue restoring coordinator backup - didn't notice it then - but I have restored 110 byte NIB over firmware which produced 116 byte NIBs. This caused the radio to listen on channel 12 instead of 11 as expected and don't know what else. Hopefully I will be able to identify the differences between these NIB formats.

In NVMEM, memory is allocated for PANID (#define ZCD_NV_PANID 0x0083). It is necessary to add in the firmware so that the current PANID (on which the network started) is recorded in this memory.

@0x3EC
Copy link

0x3EC commented Jan 12, 2021

coordinator_backup.json

{
  "adapterType": "zStack",
  "time": "Tue, 12 Jan 2021 19:10:01 GMT",
  "meta": {
    "product": 2
  },
  "data": {
    "ZCD_NV_EXTADDR": {
      "id": 1,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        123,
        89,
        16,
        6,
        0,
        75,
        18,
        0
      ],
      "len": 8
    },
    "ZCD_NV_NIB": {
      "id": 33,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        183,
        5,
        2,
        81,
        20,
        81,
        0,
        100,
        0,
        0,
        0,
        1,
        5,
        1,
        143,
        0,
        7,
        0,
        2,
        13,
        30,
        0,
        0,
        0,
        11,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        98,
        26,
        8,
        0,
        0,
        8,
        0,
        0,
        15,
        15,
        4,
        0,
        1,
        0,
        0,
        0,
        1,
        0,
        0,
        0,
        0,
        123,
        89,
        16,
        6,
        0,
        75,
        18,
        0,
        1,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        60,
        12,
        0,
        1,
        120,
        10,
        1,
        0,
        0,
        0,
        97,
        77,
        0,
        0
      ],
      "len": 116
    },
    "ZCD_NV_PANID": {
      "id": 131,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        98,
        26
      ],
      "len": 2
    },
    "ZCD_NV_EXTENDED_PAN_ID": {
      "id": 45,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        221,
        221,
        221,
        221,
        221,
        221,
        221,
        221
      ],
      "len": 8
    },
    "ZCD_NV_NWK_ACTIVE_KEY_INFO": {
      "id": 58,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0,
        1,
        3,
        5,
        7,
        9,
        11,
        13,
        15,
        0,
        2,
        4,
        6,
        8,
        10,
        12,
        13
      ],
      "len": 17
    },
    "ZCD_NV_NWK_ALTERN_KEY_INFO": {
      "id": 59,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0,
        1,
        3,
        5,
        7,
        9,
        11,
        13,
        15,
        0,
        2,
        4,
        6,
        8,
        10,
        12,
        13
      ],
      "len": 17
    },
    "ZCD_NV_APS_USE_EXT_PANID": {
      "id": 71,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0,
        0,
        0,
        0,
        0,
        0,
        0,
        0
      ],
      "len": 8
    },
    "ZCD_NV_PRECFGKEY": {
      "id": 98,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        1,
        3,
        5,
        7,
        9,
        11,
        13,
        15,
        0,
        2,
        4,
        6,
        8,
        10,
        12,
        13
      ],
      "len": 16
    },
    "ZCD_NV_PRECFGKEY_ENABLE": {
      "id": 99,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0
      ],
      "len": 1
    },
    "ZCD_NV_CHANLIST": {
      "id": 132,
      "offset": 0,
      "osal": true,
      "product": -1,
      "value": [
        0,
        8,
        0,
        0
      ],
      "len": 4
    },
    "ZCD_NV_LEGACY_TCLK_TABLE_START": {
      "id": 257,
      "product": 2,
      "offset": 0,
      "osal": true,
      "value": [
        146,
        26,
        47,
        171,
        93,
        191,
        58,
        16,
        17,
        172,
        207,
        189,
        149,
        209,
        108,
        60
      ],
      "len": 16
    },
    "ZCD_NV_LEGACY_NWK_SEC_MATERIAL_TABLE_START": {
      "id": 117,
      "product": 2,
      "offset": 0,
      "osal": true,
      "value": [
        73,
        75,
        12,
        0,
        123,
        89,
        16,
        6,
        0,
        75,
        18,
        0
      ],
      "len": 12
    }
  }

@puddly
Copy link

puddly commented Jan 12, 2021

In NVMEM, memory is allocated for PANID (#define ZCD_NV_PANID 0x0083). It is necessary to add in the firmware so that the current PANID (on which the network started) is recorded in this memory.

I believe this is only used when the network is being formed, at least on newer devices. Once the network has formed and the NIB's network state attribute has the correct value, the NIB will be the authoritative source after a reset for all of the runtime network state (including the pan id, extended pan id, channel mask, current channel, etc.)

@castorw
Copy link
Contributor Author

castorw commented Jan 12, 2021

I believe this is only used when the network is being formed, at least on newer devices. Once the network has formed and the NIB's network state attribute has the correct value, the NIB will be the authoritative source after a reset for all of the runtime network state (including the pan id, extended pan id, channel mask, current channel, etc.)

From what I have tested for several tens of times you are right! That parameter is only used in BDB commissioning network formation process. And I suppose the other parameters as well (Extended PAN ID, Channel List, Keys, ...). All of these can be found in NIB afterwards.

@castorw
Copy link
Contributor Author

castorw commented Jan 12, 2021

I have updated the original comment with data from CC2652's provided by you all: #286 (comment)

@puddly:
First of all, thanks for sharing your knowledge ;-)

Here are CC2538 NVRAM dumps for you (created with zigpy_znp/tools/nvram_read.py):
https://gist.github.com/castorw/d697a6b0952c9577038296574b872703
https://gist.github.com/castorw/df2cf1c75c4436897925628f067e5063

Regarding the nwkState (the only notable difference betwen 8-bit/16-bit addressing) - I don't think we need to care about this parameter. We can just restore it shifted as is. And even if you want to read it, you still read the same position since the MCUs are all little endian and the maximum value does not reach nowhere near 0xff or above.

The idea of unified backup format seems nice. And the finding that NIB can be safely altered is great. I will dedicate some time for testing this with CC2531/CC2538 and conversions between NIBs as well.

@Koenkk:
To get this issue back on track (since it seems to have spiralled a bit) I would like to split this in separate issues and close this one:

  • ZNP Commissioning PAN ID Collision Management
  • ZNP Adapter Restore Process (NIB Conversion)

Agreed?

@MattWestb
Copy link

One recommendation for Z2M to implanting erasing / resetting of the NVR for TI ZNP so user dont need reflashing it then its being corrupted (oft reported then having strange problems with the mesh and solved then reflashing the ZNP).
ZHA (puddly) have implanting it and its working.

And if going little further doing the same for the other ZNP that Z2M is supporting for getting it working for all ZNP (radio) platforms.

@Koenkk
Copy link
Owner

Koenkk commented Jan 13, 2021

@castorw agree

@castorw
Copy link
Contributor Author

castorw commented Jan 14, 2021

@Koenkk I have decided to give it a spin and already am working on solution for the issues described here. When ready I will update this issue and hopefully produce a PR. If I hit a wall or have any issues I will split this one to smaller parts.

@castorw
Copy link
Contributor Author

castorw commented Jan 15, 2021

The primary goals are:

  • Ensure network is not re-provisioned if user just replaces Z-Stack 3.x.x adapter (even one that's been used before),
  • No network with colliding PAN ID is commissioned (of course can't break existing colliding setups),
  • Warn user if they run network with inconsistent NIB / configuration,
  • Backup NIB is restored properly according to target platform (memory alignment),
  • Backup respects limits/other settings of new adapter (MaxDevice, MaxRouters, ...) by updating NIB generated by target instead of pushing old NIB back,
  • Optionally enhancing backup format (might get inspired by @puddly's idea).

Here's a rough and ugly flow chart of updated startup process I am trying to achieve.
znp_startup

I am already WIP, any thoughts are welcome.

@Koenkk
Copy link
Owner

Koenkk commented Jan 16, 2021

@castorw Following suggestion: "Backup NIB is restored properly according to target platform (memory alignment)" has a low priority here. For the CC2531 1.2 we don't create backups (and I think we should keep it as such, seems to work fine). CC2538 <-> CC2652/CC1352 already works. That would only leave the CC2531 and 3.0 which is a firmware I never recommend (and I expect it is rarely used).

@castorw
Copy link
Contributor Author

castorw commented Jan 16, 2021

@Koenkk It actually is a case I encountered several times and basically it breaks your network up without any warning. I already have base for this ready - NIB structure + conversion methods. Also I am updating the restore process in a way it will basically not use the backup NIB, rather only parts of it. If you change adapter - NIB from new adapter will be used and only required values will be updated so other parts of target adapter are respected.

I have one more question however - is there any way currently I could log a warning from zigbee-herdsman - the wrapped winston seems to be available only within zigbee2mqtt app? If no, is there a preferred way I should do this?

@Koenkk
Copy link
Owner

Koenkk commented Jan 16, 2021

I have one more question however - is there any way currently I could log a warning from zigbee-herdsman - the wrapped winston seems to be available only within zigbee2mqtt app? If no, is there a preferred way I should do this?

This is currently not possible, what do you want to use it for?

@castorw
Copy link
Contributor Author

castorw commented Jan 16, 2021

@Koenkk To issue warnings regarding inconsistencies between:

  • Config <-> Adapter State (NIB),
  • Internal Backup Inconsistencies (conflicting PAN ID),
  • and similar...

@Koenkk
Copy link
Owner

Koenkk commented Jan 16, 2021

@castorw I'm fine with passing an optional logger for such cases. Change the constructor in controller.ts from public constructor(options: Options) { to public constructor(options: Options, logger) {

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale label Mar 3, 2021
@Koenkk Koenkk removed the stale label Mar 3, 2021
@castorw
Copy link
Contributor Author

castorw commented Mar 3, 2021

Thanks for removing the stale tag. Currently short on free time, still fighting tests. Hopefully I will be ready in a few days ;-)

@github-actions
Copy link
Contributor

github-actions bot commented Apr 3, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

@github-actions github-actions bot added the stale label Apr 3, 2021
@castorw
Copy link
Contributor Author

castorw commented Apr 4, 2021

#303 ready for review. I guess we can close this upon review and merge.

@github-actions github-actions bot removed the stale label Apr 5, 2021
@github-actions
Copy link
Contributor

github-actions bot commented May 5, 2021

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants