-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
packing selftests/fix kbuild output #2
packing selftests/fix kbuild output #2
Conversation
Here's a summary of the changes in size for the sja1105 module. From base up to the fe932ef ("net: dsa: sja1105: replace sja1105_xfer_buf()")
These changes seem to add a significant bloat in the function sizes, for an extremely minor savings in the RO Data and Data sections. From fe932ef ("net: dsa: sja1105: replace sja1105_xfer_buf()") to 25e2d5f ("net: dsa: sja1105: convert sja1105_spi_message_pack() to packed_fields array")
Here, we lose 2600 bytes in code at a cost of 2100 bytes in RO data. This is nice, but we're still up about 9500 bytes just from switching to pack/unpack. From 25e2d5f ("net: dsa: sja1105: convert sja1105_spi_message_pack() to packed_fields array") to dcab922 ("net: dsa: sja1105: adapt to struct packed_field_8")
Here, we see that we gain 3k in code for a savings of 1876 bytes in RO data. You're correct that the macro is significantly worse for multiple uses like in sja1105. But the reality is this was a net cost of 1KB, while the other changes just to switch to pack/unpack appears to have cost ~9k bytes. summary of the entire changes since the base to the tip
We gain ~9500 bytes of code, and 76 bytes of RO data, and we save 32 bytes of regular data. The majority of these bytes seem to be from refactoring to pack/unpack, if I am reading this correctly. |
For ice, changing from packed_fields to packed_fields_m with packed_field_8
I suspect the cost of having pack_fields_8 and pack_fields_16 vs having pack_fields_m is worthwhile? I guess it depends on how many call sites actually use the packing library.. It costs ~261 bytes of code but saves ~1300 bytes of RO data to reduce the size down to 4 bytes per field description. For ice that is a definite win. |
…s_m() Jacob Keller came with data in #2 that proves that defining pack_fields_m() and unpack_fields_m() as macros directly callable by consumer drivers is not a great idea. We can hide those macros inside the lib/packing.c translation module, and just provide pack_fields_8(), pack_fields_16(), unpack_fields_8() and unpack_fields_16() as entry points into the library. We can even go one step further and expose just pack_fields() and unpack_fields(), and use the new C11 _Generic() selection feature, which can call one function or the other, depending on the type of the "fields" array - a caveman form of polymorphism. It is evaluated at compile time which function will actually be called. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
3b8582e
to
c77f375
Compare
I did some more work myself on top of this:
|
…rnel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: Patch vladimiroltean#1 adds ctnetlink support for kernel side filtering for deletions, from Changliang Wu. Patch vladimiroltean#2 updates nft_counter support to Use u64_stats_t, from Sebastian Andrzej Siewior. Patch #3 uses kmemdup_array() in all xtables frontends, from Yan Zhen. Patch #4 is a oneliner to use ERR_CAST() in nf_conntrack instead opencoded casting, from Shen Lichuan. Patch #5 removes unused argument in nftables .validate interface, from Florian Westphal. Patch torvalds#6 is a oneliner to correct a typo in nftables kdoc, from Simon Horman. Patch torvalds#7 fixes missing kdoc in nftables, also from Simon. Patch torvalds#8 updates nftables to handle timeout less than CONFIG_HZ. Patch torvalds#9 rejects element expiration if timeout is zero, otherwise it is silently ignored. Patch torvalds#10 disallows element expiration larger than timeout. Patch torvalds#11 removes unnecessary READ_ONCE annotation while mutex is held. Patch torvalds#12 adds missing READ_ONCE/WRITE_ONCE annotation in dynset. Patch torvalds#13 annotates data-races around element expiration. Patch torvalds#14 allocates timeout and expiration in one single set element extension, they are tighly couple, no reason to keep them separated anymore. Patch torvalds#15 updates nftables to interpret zero timeout element as never times out. Note that it is already possible to declare sets with elements that never time out but this generalizes to all kind of set with timeouts. Patch torvalds#16 supports for element timeout and expiration updates. * tag 'nf-next-24-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: set element timeout update support netfilter: nf_tables: zero timeout means element never times out netfilter: nf_tables: consolidate timeout extension for elements netfilter: nf_tables: annotate data-races around element expiration netfilter: nft_dynset: annotate data-races around set timeout netfilter: nf_tables: remove annotation to access set timeout while holding lock netfilter: nf_tables: reject expiration higher than timeout netfilter: nf_tables: reject element expiration with no timeout netfilter: nf_tables: elements with timeout below CONFIG_HZ never expire netfilter: nf_tables: Add missing Kernel doc netfilter: nf_tables: Correct spelling in nf_tables.h netfilter: nf_tables: drop unused 3rd argument from validate callback ops netfilter: conntrack: Convert to use ERR_CAST() netfilter: Use kmemdup_array instead of kmemdup for multiple allocation netfilter: nft_counter: Use u64_stats_t for statistic. netfilter: ctnetlink: support CTA_FILTER for flush ==================== Link: https://patch.msgid.link/20240905232920.5481-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Daniel Machon says: ==================== net: lan966x: use the newly introduced FDMA library This patch series is the second of a 2-part series [1], that adds a new common FDMA library for Microchip switch chips Sparx5 and lan966x. These chips share the same FDMA engine, and as such will benefit from a common library with a common implementation. This also has the benefit of removing a lot of open-coded bookkeeping and duplicate code for the two drivers. In this second series, the FDMA library will be taken into use by the lan966x switch driver. ################### # Example of use: # ################### - Initialize the rx and tx fdma structs with values for: number of DCB's, number of DB's, channel ID, DB size (data buffer size), and total size of the requested memory. Also provide two callbacks: nextptr_cb() and dataptr_cb() for getting the nextptr and dataptr. - Allocate memory using fdma_alloc_phys() or fdma_alloc_coherent(). - Initialize the DCB's with fdma_dcb_init(). - Add new DCB's with fdma_dcb_add(). - Free memory with fdma_free_phys() or fdma_free_coherent(). ##################### # Patch breakdown: # ##################### Patch vladimiroltean#1: select FDMA library for lan966x. Patch vladimiroltean#2: includes the fdma_api.h header and removes old symbols. Patch #3: replaces old rx and tx variables with equivalent ones from the fdma struct. Only the variables that can be changed without breaking traffic is changed in this patch. Patch #4: uses the library for allocation of rx buffers. This requires quite a bit of refactoring in this single patch. Patch #5: uses the library for adding DCB's in the rx path. Patch torvalds#6: uses the library for freeing rx buffers. Patch torvalds#7: uses the library for allocation of tx buffers. This requires quite a bit of refactoring in this single patch. Patch torvalds#8: uses the library for adding DCB's in the tx path. Patch torvalds#9: uses the library helpers in the tx path. Patch torvalds#10: ditch last_in_use variable and use library instead. Patch torvalds#11: uses library helpers throughout. Patch torvalds#12: refactor lan966x_fdma_reload() function. [1] https://lore.kernel.org/netdev/20240902-fdma-sparx5-v1-0-1e7d5e5a9f34@microchip.com/ Signed-off-by: Daniel Machon <daniel.machon@microchip.com> ==================== Link: https://patch.msgid.link/20240905-fdma-lan966x-v1-0-e083f8620165@microchip.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
I've done even more cleanup on the patch set and grouped things a little bit (also rebased onto latest net-next).
Note that the ice patch for pack_fields() should be squashed, I didn't do that. Preferably grouped in 2 sets, as mentioned. |
Sounds good. I'll see about that today hopefully, after I finish double checking ice functionality. |
Current plan:
I can't find a reasonable split to get below 15 patches otherwise, and I think getting the kunit tests and bug fixes in is a good idea. I don't really want to refactor the kunit tests back to using packing, so I think that is a reasonable approach. |
It looks like netdev has an overload of patches from this week, and the merge window may close this weekend. I'm going to wait to send anything for now. I may send next week if the window doesn't close this weekend. |
Yes please. Sounds good for the KUnit tests to cover pack() and unpack() rather than packing(). |
I don't know, it's your choice. The overload of patches is not going to stop me from sending some of my own :) |
This series squashes in a fix for the packing-checks.h generation, moving it
from lib/Makefile to Kbuild. I am not 100% sure why this fixes things, but I
can now build with both KBUILD_OUTPUT and O=..
I also fixed ice to use packed_field_8, saving a few more bytes, and i'm
working on testing the module sizes for sja1105.