Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TC-MF-1.21][TC-MF-1.22] Failed to advertise commissionable node #19774

Closed
adamb-q opened this issue Jun 20, 2022 · 8 comments · Fixed by #19900
Closed

[TC-MF-1.21][TC-MF-1.22] Failed to advertise commissionable node #19774

adamb-q opened this issue Jun 20, 2022 · 8 comments · Fixed by #19900

Comments

@adamb-q
Copy link
Contributor

adamb-q commented Jun 20, 2022

In both MF-1.21 and MF-1.22 the DUT behavior is different than described in test instructions. After commissioning max number of fabrics and then trying to commission another one DUT reports:

Failed to advertise commissionable node: Error CHIP:0x00000019

and is not discovered by chip tool when trying to pair. As a result the error code is mDNS timeout instead of:

Device returned status 5 on receiving the NOC

which is expected by instructions.

MF-1.21_DUT.txt
MF-1.21-chip_tool.txt
MF-1.22_DUT.txt
MF-1.22-chip_tool.txt

@bzbarsky-apple
Copy link
Contributor

Error 0x19 is CHIP_ERROR_BUFFER_TOO_SMALL.

@adamb-q What DUT is being used? That is, which exact example app (if this is an example app) and on what exact platform? Is it using minimal mdns or platform mdns?

Is it possible to compile the DUT with CHIP_CONFIG_ERROR_SOURCE set to 1, so that the error will include some information about where it's being returned from?

@bzbarsky-apple
Copy link
Contributor

I suspect based on where I see CHIP_ERROR_BUFFER_TOO_SMALL around advertising code:

  1. This is a Thread device.
  2. CHIP_DEVICE_CONFIG_THREAD_SRP_MAX_SERVICES is set too small: either it's set to the same value as CHIP_CONFIG_MAX_FABRICS or set to CHIP_CONFIG_MAX_FABRICS + 1 but the device supports commissioner discovery.

Would it be possible to check the values of the following configuration variables?

  • CHIP_CONFIG_MAX_FABRICS
  • CHIP_DEVICE_CONFIG_THREAD_SRP_MAX_SERVICES
  • CHIP_DEVICE_CONFIG_ENABLE_COMMISSIONER_DISCOVERY

@adamb-q
Copy link
Contributor Author

adamb-q commented Jun 21, 2022

Hi @bzbarsky-apple, yes this is a Thread device - light app on qpg platform. The configuration is as follows:

#define CHIP_CONFIG_MAX_FABRICS 4
#define CHIP_DEVICE_CONFIG_THREAD_SRP_MAX_SERVICES (CHIP_CONFIG_MAX_FABRICS + 1)
#define CHIP_DEVICE_CONFIG_ENABLE_COMMISSIONER_DISCOVERY 0
#define CHIP_DEVICE_CONFIG_ENABLE_THREAD_COMMISSIONABLE_DISCOVERY 1

unfortunately I cannot recompile with CHIP_CONFIG_ERROR_SOURCE 1 because we don't have enough flash on the device. Any other way to provide useful logs?

@woody-apple
Copy link
Contributor

Cert Blocker Review: Not clear what's going on here yet, @bzbarsky-apple will investigate.

@bzbarsky-apple
Copy link
Contributor

@adamb-q Thank you, that's very useful. Those config defines look reasonable: we have one service for the commissionable advertisement, 4 for the fabrics.

Looking at the MD-1.21 DUT log again, here's something else interesting. Here's us finishing the 4th commissioning and advertising our 5 things (the device has extended discovery enabled, presumably):

[P][SVR] Commissioning completed successfully
[P][DIS] Updating services using commissioning mode 0
[P][DIS] Advertise operational node 56DB792B0D0A224F-0000000000000001
[P][DL] advertising srp service: 56DB792B0D0A224F-0000000000000001._matter._tcp
[P][DIS] Advertise operational node 3BD7C62BDBE8F26C-0000000000000002
[P][DL] advertising srp service: 3BD7C62BDBE8F26C-0000000000000002._matter._tcp
[P][DIS] Advertise operational node 22C285C2FD90B09A-0000000000000003
[P][DL] advertising srp service: 22C285C2FD90B09A-0000000000000003._matter._tcp
[P][DIS] Advertise operational node 33DE8CA151B4F25D-0000000000000004
[P][DL] advertising srp service: 33DE8CA151B4F25D-0000000000000004._matter._tcp
[P][DIS] Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15
[P][DL] advertising srp service: 35510C7DED531836._matterc._udp

And here's us trying to advertise after opening another commissioning window:

[P][ZCL] Received command to open commissioning window
[P][DIS] Updating services using commissioning mode 2
[P][DIS] Advertise operational node 56DB792B0D0A224F-0000000000000001
[P][DL] advertising srp service: 56DB792B0D0A224F-0000000000000001._matter._tcp
[P][DIS] Advertise operational node 3BD7C62BDBE8F26C-0000000000000002
[P][DL] advertising srp service: 3BD7C62BDBE8F26C-0000000000000002._matter._tcp
[P][DIS] Advertise operational node 22C285C2FD90B09A-0000000000000003
[P][DL] advertising srp service: 22C285C2FD90B09A-0000000000000003._matter._tcp
[P][DIS] Advertise operational node 33DE8CA151B4F25D-0000000000000004
[P][DL] advertising srp service: 33DE8CA151B4F25D-0000000000000004._matter._tcp
[P][DIS] Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15
[E][DIS] Failed to advertise commissionable node: Error CHIP:0x00000019
[P][DL] removing srp service: 35510C7DED531836._matterc._udp
[P][ZCL] Commissioning window is now open

Here's what a previous (successful) opening of a commissioning window looked like:

[P][DIS] Updating services using commissioning mode 2
[P][DIS] Advertise operational node 56DB792B0D0A224F-0000000000000001
[P][DL] advertising srp service: 56DB792B0D0A224F-0000000000000001._matter._tcp
[P][DIS] Advertise operational node 3BD7C62BDBE8F26C-0000000000000002
[P][DL] advertising srp service: 3BD7C62BDBE8F26C-0000000000000002._matter._tcp
[P][DIS] Advertise operational node 22C285C2FD90B09A-0000000000000003
[P][DL] advertising srp service: 22C285C2FD90B09A-0000000000000003._matter._tcp
[P][DIS] Advertise commission parameter vendorID=65521 productID=32773 discriminator=3840/15
[P][DL] advertising srp service: 35510C7DED531836._matterc._udp
[P][DL] removing srp service: B7FDCAD3FB5FEF0E._matterc._udp

Note the order of operations: we advertise all the operational services, then the new commissionable service (because opening a commissioning window creates a new instance name), then we remove the old commissionable service.

In particular, GenericThreadStackManagerImpl_OpenThread<ImplClass>::_AddSrpService will reuse an existing service slot if the instance name matches, but otherwise will look for a new unused slot, even if there are used-but-marked-invalid slots (because maybe those will match a later valid service). So in this case we run out of service slots: we have 5 slots, we have 4 fabrics, we have our old commissionable instance name and our new commissionable instance name.

We could work around that by reusing the _matterc._udp slot for another _matterc._udp instance name, since we only have one of those at a time, but that would not allow us to do the SRP service removal properly for the old instance name. And in any case I expect that would leave us with a failure for the case of UpdateNOC, where one of the operational services would get a new instance name, if it's done when all fabrics are commissioned and extended discovery is being advertised.

@Damian-Nordic @kkasperczyk-no it looks to me like we should have one extra slot in the SRP services array than the max number we plan to advertise at a time, so we allow for the instance name of one of them at a time to change....

@Damian-Nordic
Copy link
Contributor

@adamb-q Posted a PR that changes the default number of service slots: #19900. Can you verify that increasing it helps in your case?

@adamb-q
Copy link
Contributor Author

adamb-q commented Jun 24, 2022

Hi @Damian-Nordic I tested your patch and there is no more error 0x19, but still mDNS timeout when trying to commission N+1-th fabric. Attaching logs:

TC_MF_1_21_patched.log
TC_MF_1_22_patched.log
MF-1.22_patched_DUT.txt
MF-1.21_patched_DUT.txt

@Damian-Nordic
Copy link
Contributor

Ok, so now it looks like: #19642. As I suggested there, it's worth collecting logs from the OTBR as it seems the OTBR cannot handle the SRP update. You may try using a newer OTBR & RCP versions, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants