Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[URGENT] Reducing our usage of GitHub Runners #14376

Closed
lupyuen opened this issue Oct 17, 2024 · 70 comments · Fixed by #14377, apache/nuttx-apps#2750, #14386, apache/nuttx-apps#2753 or #14400
Closed
Assignees

Comments

@lupyuen
Copy link
Member

lupyuen commented Oct 17, 2024

Hi All: We have an ultimatum to reduce (drastically) our usage of GitHub Actions. Or our Continuous Integration will halt totally in Two Weeks. Here's what I'll implement within 24 hours for nuttx and nuttx-apps repos:

  1. When we submit or update a Complex PR that affects All Architectures (Arm, RISC-V, Xtensa, etc): CI Workflow shall run only half the jobs. Previously CI Workflow will run arm-01 to arm-14, now we will run only arm-01 to arm-07. (This will reduce GitHub Cost by 32%)

  2. When the Complex PR is Merged: CI Workflow will still run all jobs arm-01 to arm-14

    (Simple PRs with One Single Arch / Board will build the same way as before: arm-01 to arm-14)

  3. For NuttX Admins: Our Merge Jobs are now at github.com/NuttX/nuttx. We shall have only Two Scheduled Merge Jobs per day

    I shall quickly Cancel any Merge Jobs that appear in nuttx and nuttx-apps repos. Then at 00:00 UTC and 12:00 UTC: I shall start the Latest Merge Job at nuttxpr. (This will reduce GitHub Cost by 17%)

  4. macOS and Windows Jobs (msys2 / msvc): They shall be totally disabled until we find a way to manage their costs. (GitHub charges 10x premium for macOS runners, 2x premium for Windows runners!)

    Let's monitor the GitHub Cost after disabling macOS and Windows Jobs. It's possible that macOS and Windows Jobs are contributing a huge part of the cost. We could re-enable and simplify them after monitoring.

    (This must be done for BOTH nuttx and nuttx-apps repos. Sadly the ASF Report for GitHub Runners doesn't break down the usage by repo, so we'll never know how much macOS and Windows Jobs are contributing to the cost. That's why we need CI: Disable all jobs for macOS and Windows #14377)

    (Wish I could run NuttX CI Jobs on my M2 Mac Mini. But the CI Script only supports Intel Macs sigh. Buy a Refurbished Intel Mac Mini?)

We have done an Analysis of CI Jobs over the past 24 hours:

https://docs.google.com/spreadsheets/d/1ujGKmUyy-cGY-l1pDBfle_Y6LKMsNp7o3rbfT1UkiZE/edit?gid=0#gid=0

Many CI Jobs are Incomplete: We waste GitHub Runners on jobs that eventually get superseded and cancelled

Screenshot 2024-10-17 at 1 18 14 PM

When we Half the CI Jobs: We reduce the wastage of GitHub Runners

Screenshot 2024-10-17 at 1 15 30 PM

Scheduled Merge Jobs will also reduce wastage of GitHub Runners, since most Merge Jobs don't complete (only 1 completed yesterday)

Screenshot 2024-10-17 at 1 16 16 PM

See the ASF Policy for GitHub Actions

lupyuen added a commit to lupyuen2/wip-nuttx that referenced this issue Oct 17, 2024
This PR disables all CI Jobs for macOS and Windows, to reduce GitHub Cost. Details here: apache#14376
lupyuen added a commit to lupyuen2/wip-nuttx-apps that referenced this issue Oct 17, 2024
This PR disables all CI Jobs for macOS and Windows, to reduce GitHub Cost. Details here: apache/nuttx#14376
@lupyuen
Copy link
Member Author

lupyuen commented Oct 17, 2024

As commented by @xiaoxiang781216:

can we reduce the board on Linux host to keep macOS/Windows? it's very easy to break these host if without these basic coverage.

I suggest that we monitor the GitHub Cost after disabling macOS and Windows Jobs. It's possible that macOS and Windows Jobs are contributing a huge part of the cost. We could re-enable and simplify them after monitoring.

@raiden00pl
Copy link
Contributor

One of the methods proposed by, if I remember correctly @btashton, is to replace many simple configurations for some boards (mostly for peripherals testing) with one large jumbo config activating everything possible.
This won't work for chips with low memory, but it will save some CI resources anyway.

@lupyuen
Copy link
Member Author

lupyuen commented Oct 17, 2024

@raiden00pl Yep I agree. Or we could test a complex target like board:lvgl?

@lupyuen
Copy link
Member Author

lupyuen commented Oct 17, 2024

Here's another comment about macOS and Windows by @yamt: #14377 (comment)

@yamt
Copy link
Contributor

yamt commented Oct 17, 2024

sorry, let me ask a dumb question.
what plan are we using? https://github.com/pricing
is apache paying for it?

@lupyuen
Copy link
Member Author

lupyuen commented Oct 17, 2024

what plan are we using? https://github.com/pricing

@yamt It's probably a special plan negotiated by ASF and GitHub? It's not mentioned in the ASF Policy for GitHub Actions: https://infra.apache.org/github-actions-policy.html

I find this "contract" a little strange. Why are all ASF Projects subjected to the same quotas? And why can't we increase the quota if we happen to have additional funding?

Update: More info here: https://cwiki.apache.org/confluence/display/INFRA/GitHub+self-hosted+runners

If your project uses GitHub Actions, you share a queue with all other Apache projects using Github Actions, which can quickly lead to frustration for everyone involved. Builds can be stuck in "queued" for 6+ hours.

One option (if you want to stick with GitHub and don't want to use the Infra-managed Jenkins) is for your project to create its own self-hosted runners, which means your jobs will run on a virtual machine (VM) under your project's control. However this is not something to tackle lightly, as Infra will not manage or secure your VM - that is up to you.

Update 2: This sounds really complicated. I'd rather use my own Mac Mini to execute the NuttX CI Tests, once a day?

@yamt
Copy link
Contributor

yamt commented Oct 17, 2024

what plan are we using? https://github.com/pricing

@yamt It's probably a special plan negotiated by ASF and GitHub? It's not mentioned in the ASF Policy for GitHub Actions: https://infra.apache.org/github-actions-policy.html

do you know if the macos/windows premium applies as usual?
the policy page seems to have no mention about it.

I find this "contract" a little strange. Why are all ASF Projects subjected to the same quotas? And why can't we increase the quota if we happen to have additional funding?

yea, i guess projects have very different sizes/demands.
(i feel nuttx is using too much anyway though :-)

@TimJTi
Copy link
Contributor

TimJTi commented Oct 17, 2024

...I'd rather use my own Mac Mini to execute the NuttX CI Tests, once a day?

Is there any merit in "farming out" CI tests to those with boards? I think there was a discussion about NuttX owning a suite of boards but not sure where that got to - and would depend on just 1 or 2 people managing it.

As an aside, is there a guide to self-running CI? As I work on a custom board it would be good for me to do this occasionally but I have noi idea where to start!

@lupyuen
Copy link
Member Author

lupyuen commented Oct 17, 2024

@TimJTi Here's how I do daily testing on Milk-V Duo S SBC: https://lupyuen.github.io/articles/sg2000a

@TimJTi
Copy link
Contributor

TimJTi commented Oct 17, 2024

@TimJTi Here's how I do daily testing on Milk-V Duo S SBC: https://lupyuen.github.io/articles/sg2000a

And I just RTFM...the "official" guide is here so I'll review both and hopefully get it working - and submit any tweaks/corrections/enhancements I find are needed to the NuttX "How To" documentation

@jerpelea
Copy link
Contributor

jerpelea commented Oct 17, 2024 via email

@michallenc
Copy link
Contributor

michallenc commented Oct 17, 2024

@TimJTi Here's how I do daily testing on Milk-V Duo S SBC: https://lupyuen.github.io/articles/sg2000a

And I just RTFM...the "official" guide is here so I'll review both and hopefully get it working - and submit any tweaks/corrections/enhancements I find are needed to the NuttX "How To" documentation

These work, but it does not describe the entire CI, just how to run pytest checks for sim:citest configuration.

@cederom
Copy link
Contributor

cederom commented Oct 17, 2024

Yes let's cut what we can (but to keep at least minimal functional configure, build, syntax testing) and see what are the cost reduction. We need to show Apache we are working on the problem. So far optimitzations did not cut the use and we are in danger of loosing all CI :-(

On the other hand that seems not fair to share the same CI quota as small projects. NuttX is a fully featured RTOS working on ~1000 different devices. In order to keep project code quality we need the CI.

Maybe its time to rethink / redesign from scratch the CI test architecture and implementation?

@cederom
Copy link
Contributor

cederom commented Oct 17, 2024

Another problem is that people very often send unfinished undescribed PRs that are updated without a comment or request that triggers whole big CI process several times :-(

Some changes are sometimes required and we cannot avoid that this is part of the process. But maybe we can make something more "adaptive" so only minimal CI is launched by default, preferably only in area that was changed, then with all approvals we can make one manual trigger final big check before merge?

Long story short: We can switch CI test runs to manual trigger for now to see how it reduces costs. I would see two buttons to start Basic and Advanced (maybe also Full = current setup) CI.

@lupyuen
Copy link
Member Author

lupyuen commented Oct 17, 2024

@cederom Maybe our PRs should have a Mandatory Field: Which NuttX Config to build, e.g. rv-virt:nsh. Then the CI Workflow should do tools/configure.sh rv-virt:nsh && make. Before starting the whole CI Build?

@cederom
Copy link
Contributor

cederom commented Oct 17, 2024

@cederom Maybe our PRs should have a Mandatory Field: Which NuttX Config to build, e.g. rv-virt:nsh. Then the CI Workflow should do tools/configure.sh rv-virt:nsh && make. Before starting the whole CI Build?

People often cant fill even one single sentence to describe Summary, Impact, Testing :D This may be detected automatically.. or we can just see what architecture is the cheapest one and use it for all basic tests..?

@raiden00pl
Copy link
Contributor

Another problem is that people very often send unfinished undescribed PRs that are updated without a comment or request that triggers whole big CI process several times :-(

Often contributors use CI to test all configuration instead of testing changes locally. On one hand I understand this because compiling all configurations on a local machine takes a lot of time, on the other hand I'm not sure if CI is for this purpose (especially when we have limits on its use).

@cederom Maybe our PRs should have a Mandatory Field: Which NuttX Config to build, e.g. rv-virt:nsh. Then the CI Workflow should do tools/configure.sh rv-virt:nsh && make. Before starting the whole CI Build?

It won't work. Users are lazy, and in order to choose what needs to be compiled correctly, you need a comprehensive knowledge of the entire NuttX, which is not that easy.
The only reasonable option is to automate this process.

@cederom
Copy link
Contributor

cederom commented Oct 17, 2024

So it looks like for now, where dramatic steps need to be taken, we need to mark all PR as drafts and start CI by hand when we are sure all is ready for merge? o_O

@jerpelea
Copy link
Contributor

jerpelea commented Oct 17, 2024 via email

xiaoxiang781216 pushed a commit to apache/nuttx-apps that referenced this issue Oct 17, 2024
This PR disables all CI Jobs for macOS and Windows, to reduce GitHub Cost. Details here: apache/nuttx#14376
lupyuen added a commit to lupyuen2/wip-nuttx that referenced this issue Oct 17, 2024
When we submit or update a Complex PR that affects All Architectures (Arm, RISC-V, Xtensa, etc): CI Workflow shall run only half the jobs. Previously CI Workflow will run `arm-01` to `arm-14`, now we will run only `arm-01` to `arm-07`.

When the Complex PR is Merged: CI Workflow will still run all jobs `arm-01` to `arm-14`

Simple PRs with One Single Arch / Board will build the same way as before: `arm-01` to `arm-14`

This is explained here: apache#14376

Note that this version of `arch.yml` has diverged from `nuttx-apps`, since we are unable to merge apache#14377
stbenn pushed a commit to stbenn/nuttx that referenced this issue Oct 25, 2024
Initial STM32H5 Commit

Initial commit of what I deemed essential files for bringing up the STM32H5. src/stm32h5/hardware files were edited by me, but need review. files in src/stm32h5 all need review and edits. include/stm32h5 files need review, some were edited by me.

Add Nucleo-H563ZI Folder

Add the board folder for the nucleo-h563zi. Right now this is largely a copy of the stm32l562e-dk configuration. Some files may be deleted in the future. Also made minor modifications to arch/arm/src/stm32h5/Kconfig file.

hardware/stm32h562xx_rcc.h update

Finished register and bit mapping for STM32H5 RCC

Rename hardware/stm32h5_rcc.h

Renamed stm32h562xx_rcc.h to stm32h5_rcc.h. The RCC register is the same for all versions of the STM32H5.

Defined rcc_enableperipherals functions

Defined all the functions wihtin rcc_enableperipherals. Getting started on stm32h5_stdclockconfig.

Incremental STM32H5 RCC Updates

Incremental Updates apache#2

Added stm32h5_lse.c and stm32h5_lsi.c files. Incremental updates to board.h, stm32h5xx_rcc.c, and hardware/stm32h5_rcc.h

Incremental Updates apache#3

Added stm32h5_hsi48.c and stm32h5_hsi48.h files. Incremental updates to board.h, stm32h5xx_rcc.c, and hardware/stm32h5_rcc.h. Renamed hardware crs file. Fixed lse.c and lsi.c for STM32H5.

Incremental Updates apache#4

Updated setting of VOS for STM32H5. Added HSIDIV definition to hardware/stm32h5_rcc.h for potential of changing HSIDIV from default. Changed board.h to use HSI of 32 MHz, which is the default. We still set SYSCLK to the max of 250MHz.

First STM32H5 PWR Commit

Rewrote hardware/stm32h5_pwr.h. Added stm32h5_pwr.c and stm32h5_pwr.h. Made minor changes to RCC files based on PWR peripheral.

PWR Peripheral Changes

Removed enablesmps function. LDO or SMPS is decided by hardware. Removed enablepwrclk. There is no PWREN for the STM32H5. Rewrote adustvcore. vcore must be adjusted incrementally.

Incremental Updates apache#5

Changed stm32 to stm32h5 in pwr.c. Added additional logic for selecting PLL sources. Added additional logic for enabling LSE or LSI. Set VCORE properly with stm32h5_pwr.c function. Fixes to adjustvcore function.

STM32H5 Power and RCC cleanup

Fixed some errors with private functions and incorrect preprocessor variables. Changed adjustvcore to not select intermediate VOS levels. Figure 49 in RM shows changing directly from VOS3 to VOS1. Added function adjustvos_ext for externally supplied VCORE. However I'm not sure if VOS should be incremented, then voltage incremented, then frequency incremented, or if VOS should be incremented one by one to final setting, then adjust voltage, then frequency. adjustvos does the former. Won't be used in stdclockconfig.

STM32H5 serial update

This commit primarily adds functionality taken from the stm32g4 lpuart implementation. The template I used, from the stm32l5, already had the LPUART in there but did not calculate the baud correctly. Added more USARTS and UARTS supported by STM32H5. Minor changes to chip.h, stm32h5_start.c, and Kconfig.

STM32H5 Serial Update apache#2

Added support for additional USARTS and UARTS on STM32H5. Other minor serial updates.

Build Fixes

Various fixes to get the stm32h5 arch to build. Many changes to follow. But for now, Nuttx builds.

Remove unnecessary hardware files from STM32H5 directory

More build changes

Even more build fixes

Minor fixes in stm32h5_rcc.c and stm32h5_pwr.c. Changed nucleo-h563zi defconfig to use std clock config. This resulted in errors that were fixed here. Also added stm32h5_lse.c and stm32h5_lsi.c to Make.defs.

Removed legacy pinmap. It is deprecated and should not be used on new designs.

Confirmed hardware crs and i2c files are correct. Will keep them for now.

IRQ info for STM32H52, STM32H53, STM32H56, STM32H57

libcxx: fix compile error

                 from ServiceManager.cpp:17:
/home/ligd/platform/dev/apps/external/android/frameworks/native/libs/binder/ndk/include_cpp/android/binder_to_string.h:71:24: error: expected nested-name-specifier before numeric constant
   71 |     template <typename _U>
      |                        ^~
/home/ligd/platform/dev/apps/external/android/frameworks/native/libs/binder/ndk/include_cpp/android/binder_to_string.h:71:24: error: expected ‘>’ before numeric constant
In file included from /home/ligd/platform/dev/apps/external/android/frameworks/native/libs/binder/aidl/android/os/ConnectionInfo.h:3,
                 from /home/ligd/platform/dev/apps/external/android/frameworks/native/libs/binder/aidl/android/os/IServiceManager.h:3,
                 from /home/ligd/platform/dev/apps/external/android/frameworks/native/libs/binder/aidl/android/os/BnServiceManager.h:4:
/home/ligd/platform/dev/apps/external/android/frameworks/native/libs/binder/ndk/include_cpp/android/binder_to_string.h:72:56: error: no matching function for call to ‘declval<1>()’
   72 |     static auto _test(int) -> decltype(std::declval<_U>().toString(), std::true_type());
      |                                        ~~~~~~~~~~~~~~~~^~
In file included from /home/ligd/platform/dev/nuttx/include/libcxx/__type_traits/is_convertible.h:18,

Signed-off-by: ligd <liguiding1@xiaomi.com>

libc string:Separate code.

Separate the code that follows the BSD license into independent files.

Signed-off-by: yangguangcai <yangguangcai@xiaomi.com>

arch/sim/cmake: remove the host specific -U when HOSTSRCS

fix macos compile hostfs.c compile issue.
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.0.sdk/usr/include/_string.h:131:62: error: expected function body after function declarator
  131 | char    *stpncpy(char *__dst, const char *__src, size_t __n) __OSX_AVAILABLE_STARTING(__MAC_10_7, __IPHONE_4_3);
      |                                                              ^

Signed-off-by: buxiasen <buxiasen@gmail.com>

Revert "libc/lib_bzero:Add bzero prototype."

This reverts commit 908814a.

In macos, memset will be automatic optmize to bzero, caused dead loop, as we not using bzero, macro re-define should ablt to cover the requirements.

Signed-off-by: buxiasen <buxiasen@gmail.com>

arhc/arm64: vector table may be far away form arm64_fatal_handle

use 33-bit (+/-4GB) pc-relative addressing to load
the address of arm64_fatal_handle

Signed-off-by: lipengfei28 <lipengfei28@xiaomi.com>

sim: fix asan address space conflict

Modify the starting position of the elf segment to 0x5000000

==2561587==Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
==2561587==ASan shadow was supposed to be located in the [0x1ffff000-0x3fffffff] range.
==2561587==Process memory map follows:

Signed-off-by: yinshengkai <yinshengkai@xiaomi.com>

arm64/toolchains:Add the following kasan compilation options

Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>

remove unused variable 'cpu_freq'

Signed-off-by: lipengfei28 <lipengfei28@xiaomi.com>

drivers/timers/arch_alarm.c: Remove ndelay_accurate

Using ONESHOT_CURRENT retrieves the tick number multiplied by tick time; thus
it doesn't give the accurate monotonic time - it is quantized by
the tick time. This cannot be used as a ndelay timer, it would always loop
at least to the end of the ongoing tick.

Revert the up_udelay to use the original "coarse" looping. The "accurate" udelay,
if such is needed, should either be done under arch specific code, or there should be
a function for getting the accurate time that is available for all the platforms.

Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>

boards/imx93-evk: Define CONFIG_BOARD_LOOPSPERMSEC

Use value measured with 1.8GHz CPU speed

Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>

arch/x86_64:Fix variable used before assignment

Signed-off-by: liwenxiang1 <liwenxiang1@xiaomi.com>

arch/arm64: vector table 2K align

Signed-off-by: lipengfei28 <lipengfei28@xiaomi.com>

arm/build: suppress LOAD RWX linker warning

Add --no-warn-rwx-segments in case of RAM boot mode to linker to
suppress the below warning:
"nuttx has a LOAD segment with RWX permissions"

Signed-off-by: Jinliang Li <lijinliang1@lixiang.com>

arch/arm64/src/imx9/imx9_lpspi.c: Fix 9-16 bit transfers

Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>

arch/arm64/src/imx9/imx9_lpspi.c: Small cache operation optimization

There is no need to invalidate the RX buffer before every transfer.
It is never gets dirty, so it is good to invalidate initially after allocation,
and after each transfer.

Signed-off-by: Jukka Laitinen <jukkax@ssrc.tii.ae>

libxx: C++ low level library select LIBSUPCXX by default.

Signed-off-by: cuiziwei <cuiziwei@xiaomi.com>

nuttx/sim: Fix m64 build error.

LD:  nuttx
 nuttx.rel: in function `ff_dct32_float_sse2':
 (.text+0x66f9e): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_32' defined in .bss.ff_cos_32 section in nuttx.rel
 (.text+0x66fa7): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_32' defined in .bss.ff_cos_32 section in nuttx.rel
 (.text+0x672a6): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_16' defined in .bss.ff_cos_16 section in nuttx.rel
 (.text+0x672ae): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_16' defined in .bss.ff_cos_16 section in nuttx.rel
 nuttx.rel: in function `ff_imdct_calc_sse':
 (.text+0x67905): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_64' defined in .bss.ff_cos_64 section in nuttx.rel
 (.text+0x67948): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_128' defined in .bss.ff_cos_128 section in nuttx.rel
 (.text+0x67988): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_256' defined in .bss.ff_cos_256 section in nuttx.rel
 (.text+0x679c8): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_512' defined in .bss.ff_cos_512 section in nuttx.rel
 (.text+0x67a08): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_1024' defined in .bss.ff_cos_1024 section in nuttx.rel
 (.text+0x67a48): relocation truncated to fit: R_X86_64_32S against symbol `ff_cos_2048' defined in .bss.ff_cos_2048 section in nuttx.rel
 (.text+0x67a88): additional relocation overflows omitted from the output

Signed-off-by: cuiziwei <cuiziwei@xiaomi.com>

tls.h: list.h should depends on CONFIG_PTHREAD_ATFORK

Signed-off-by: ligd <liguiding1@xiaomi.com>

bluetooth: fix bt missing header files nuttx/wqueue.h

Signed-off-by: ligd <liguiding1@xiaomi.com>

lib_gdbstub: fix container of

Signed-off-by: buxiasen <buxiasen@xiaomi.com>
Signed-off-by: ligd <liguiding1@xiaomi.com>

container_of: fix compile failed cause of list.h not support container_of

Signed-off-by: ligd <liguiding1@xiaomi.com>

nuttx/arch:Enabling ARCH_MATH_H is required when compiling sim with the 13.2 version of the toolchain.

Signed-off-by: cuiziwei <cuiziwei@xiaomi.com>
Signed-off-by: ligd <liguiding1@xiaomi.com>

arm/stm32f401rc-rs485: Add support to WS2812 addressable LED

Signed-off-by: Rodrigo Sim <rcsim10@gmail.com>

syslog: Don't allow blocking when in signal handler

Blocking while running a signal handler is not advisable, instead write
the log string character by character.

There is also a potential for a deadlock, as discussed in apache#6618

Note: querying for rtcb->sigdeliver is not 100% ideal, as it only tells
_if_ a signal handler has been queued, not if it is running. However, it
makes syslog safe / usable which is a debug feature anyhow.

boards/risc-v: Remove ref to riscv_internal.h

`riscv_internal.h` is a private chip level header file,
and it should not be included in the board files.

Signed-off-by: Huang Qi <huangqi3@xiaomi.com>

boards/esp32s3: Merge MCUboot and "simple-boot" linker scripts

To make it easier to keep the linker scripts updated for both
MCUboot and "simple-boot", this commit merges them into a single
linker script with macros to enable/disable specific sections.

task_exit.c: Add missing sched_note_stop()

A regression from apache#13728 ; sched_note_stop() is never called for tasks
that exit normally via exit().

nuttx: Add LIBSUPCXX_TOOLCHAIN to link the prebuilt library provide by toolchain.

Signed-off-by: cuiziwei <cuiziwei@xiaomi.com>

serial/gdbstub:Adjust serial port gdbstub Kconfig dependencies

Signed-off-by: anjiahao <anjiahao@xiaomi.com>

gdbstub:fix typo

Signed-off-by: anjiahao <anjiahao@xiaomi.com>

coredump: coredump_add_memory_region need use flags

Signed-off-by: anjiahao <anjiahao@xiaomi.com>

arm64: fix fvp smp faild to boot

reason:
we should give a busy wait addr

This commit fixes the regression from apache#13640

Signed-off-by: hujun5 <hujun5@xiaomi.com>

CI: Enable sim-02 build when we create or update a Complex PR

CI Build Job sim-02 was disabled to reduce our usage of GitHub Runners, to comply with ASF Policy: apache#14376 (comment)

However this causes the Scheduled Merge Job to fail, due to reduced CI Checks: https://github.com/NuttX/nuttx/actions/runs/11490041505/job/31980056690#step:7:465

This PR re-enables sim-02 when we create or update a Complex PR.

arch/Kconfig: remove ARCH_MATH_H if LIBCXX

Because some libraries do require a full libm implementation.

Signed-off-by: zhanghongyu <zhanghongyu@xiaomi.com>

Documentation: migrate README.txt from boards and fixes for mps boards

migrate some README.txt form boards/ and fixes for mps boards rst

samv7: fix QSPI build

Commit 313d6df caused the following build error:

CC:  fixedmath/lib_b16atan2.c chip/sam_qspi.c: In function 'qspi_memory':
chip/sam_qspi.c:1552:7: warning: implicit declaration of function 'IS_ALIGNED' [-Wimplicit-function-declaration]
 1552 |       IS_ALIGNED((uintptr_t)meminfo->buffer, 4) &&
      |       ^~~~~~~~~~
In file included from chip/sam_qspi.c:41:
chip/sam_qspi.c: In function 'qspi_alloc':
chip/sam_qspi.c:1591:21: warning: implicit declaration of function 'ALIGN_UP' [-Wimplicit-function-declaration]
 1591 |   return kmm_malloc(ALIGN_UP(buflen, 4));

This was caused by missing include of nuttx.h header defining ALIGN_UP
and IS_ALIGNED.

Signed-off-by: Michal Lenc <michallenc@seznam.cz>

mmcsd: SDIO_CAPS_4BIT_ONLY set buswidth MMCSD_SCR_BUSWIDTH_4BIT

uint8_t buswidth:4;              /* Bus widths supported (SD only) */

Signed-off-by: zhangshoukui <zhangshoukui@xiaomi.com>

armv8m/clang.cmake: add armv8m clang config

Its makefile is implemented in arch/arm/src/armv8-m/Toolchain.defs as follows:
ifeq ($(CONFIG_ARM_TOOLCHAIN_CLANG),y)

  ifeq ($(CONFIG_ARCH_CORTEXM23),y)
    TOOLCHAIN_CLANG_CONFIG = armv8m.main_soft_nofp
  else ifeq ($(CONFIG_ARCH_CORTEXM33),y)
    ifeq ($(CONFIG_ARCH_FPU),y)
      TOOLCHAIN_CLANG_CONFIG = armv8m.main_hard_fp
    else
      TOOLCHAIN_CLANG_CONFIG = armv8m.main_soft_nofp
    endif
  else ifeq ($(CONFIG_ARCH_CORTEXM35P),y)
    ifeq ($(CONFIG_ARCH_FPU),y)
      TOOLCHAIN_CLANG_CONFIG = armv8m.main_hard_fp
    else
      TOOLCHAIN_CLANG_CONFIG = armv8m.main_soft_nofp
    endif
  else ifeq ($(CONFIG_ARCH_CORTEXM55),y)
    ifeq ($(CONFIG_ARCH_FPU),y)
      TOOLCHAIN_CLANG_CONFIG = armv8.1m.main_hard_fp
    else
      TOOLCHAIN_CLANG_CONFIG = armv8.1m.main_soft_nofp_nomve
    endif
  else ifeq ($(CONFIG_ARCH_CORTEXM85),y)
    ifeq ($(CONFIG_ARCH_FPU),y)
      TOOLCHAIN_CLANG_CONFIG = armv8.1m.main_hard_fp
    else
      TOOLCHAIN_CLANG_CONFIG = armv8.1m.main_soft_nofp_nomve
    endif
  endif

Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>

Writing documentation related to SPI slave.

Fix build issues

Fix xtensa build error with choice LIBSUPCXX by default.

Signed-off-by: cuiziwei <cuiziwei@xiaomi.com>

sim/cmake: compatible when nuttx COMPILE_OPTIONS is not set yet

Signed-off-by: buxiasen <buxiasen@xiaomi.com>

Fix cdcncm printf formatter compiler warning

esp32s3: Increase the init task stask size when using NSH

After recent changes on nuttx-apps (not limited to, but related
to nuttx-apps#2738, for instance), the stack usage for the NSH
task increased, causing stack overflows under specific situations
(when running `ps` command, for instance). This commit increases
the init task stack size to avoid it. Please note that, even before
these changes, the stack usage of the NSH task was around 90% and,
then, increasing the stack size of it was recommended.

kconfig: Add link parameters that can print remaining memory information

LD: nuttx
Memory region         Used Size  Region Size  %age Used
           flash:      284272 B       512 KB     54.22%
           sram1:       13296 B         2 MB      0.63%
           sram2:          0 GB         2 MB      0.00%
CP: nuttx.hex
CP: nuttx.bin

Signed-off-by: wangmingrong1 <wangmingrong1@xiaomi.com>

Fixed selection of irq file.

Added flash.ld script to nucleo-h563zi/scripts folder. Changed Make.defs to use it. Minor change to Kconfig regarding flash configurations.

Various changes

Fix include guards.
@stbenn
Copy link
Contributor

stbenn commented Oct 25, 2024

@lupyuen It looks I made a mistake with some commit messages, that caused our branch to get referenced to a few issues in the apache repo. My apologies. I believe I have removed the commit message references, but if there is anything else I need to do to fix this, please let me know and I will get on it ASAP.

@lupyuen
Copy link
Member Author

lupyuen commented Oct 25, 2024

@stbenn No worries thanks :-)

@lupyuen
Copy link
Member Author

lupyuen commented Oct 25, 2024

4 Days to Festivity: Yesterday we consumed 13 Full-Time GitHub Runners (half of the ASF Quota for GitHub Runners)...

Screenshot 2024-10-26 at 7 32 10 AM

Past 7 Days: We used an average of 9 Full-Time GitHub Runners...

Screenshot 2024-10-26 at 7 37 14 AM

So we're on track to make ASF very happy on 30 Oct! Let's monitor today...

(Live Image) (Live Log)

@cederom
Copy link
Contributor

cederom commented Oct 26, 2024

Thank you @lupyuen for your amazing work!! Have a good calm weekend :-) :-)

@lupyuen
Copy link
Member Author

lupyuen commented Oct 26, 2024

3 Days to Tranquility: Yesterday was a quiet Saturday (no more Release Builds yay!). We consumed only 4 Full-Time GitHub Runners...

Screenshot 2024-10-27 at 6 08 34 AM

Let's hope today will be a peaceful Sunday...

(Live Image) (Live Log)

@lupyuen
Copy link
Member Author

lupyuen commented Oct 27, 2024

Something strange about Network Timeouts in our Docker Workflows: First Run fails while downloading something from GitHub:

Configuration/Tool: imxrt1050-evk/libcxxtest,CONFIG_ARM_TOOLCHAIN_GNU_EABI
curl: (28) Failed to connect to github.com port 443 after 134188 ms: Connection timed out
make[1]: *** [libcxx.defs:28: libcxx-17.0.6.src.tar.xz] Error 28

Second Run fails again, while downloading NimBLE from GitHub:

Configuration/Tool: nucleo-wb55rg/nimble,CONFIG_ARM_TOOLCHAIN_GNU_EABI
curl: (28) Failed to connect to github.com port [443](https://github.com/nuttxpr/nuttx/actions/runs/11535899222/job/32112716849#step:7:444) after 134619 ms: Connection timed out
make[2]: *** [Makefile:55: /github/workspace/sources/apps/wireless/bluetooth/nimble_context] Error 2

Third Run succeeds. Why do we keep seeing these errors: GitHub Actions with Docker, can't connect to GitHub itself?

Is something misconfigured in our Docker Image? But the exact same Docker Image runs fine on my own Build Farm. It doesn't show any errors.

Is GitHub Actions starting our Docker Container with the wrong MTU (Network Packet Size)? 🤔

Meanwhile I'm running a script to Restart Failed Jobs on our NuttX Mirror Repos: restart-failed-job.sh

@lupyuen
Copy link
Member Author

lupyuen commented Oct 27, 2024

2 Days to Transcendence: Yesterday we consumed 10 Full-Time GitHub Runners. We peaked briefly at 21 while compiling a few NuttX Apps.

Screenshot 2024-10-28 at 6 16 33 AM

Let's keep on monitoring thanks!

(Live Image) (Live Log)

@lupyuen
Copy link
Member Author

lupyuen commented Oct 28, 2024

Monitoring our CI Servers 24 x 7

This runs on my 4K TV (Xiaomi 65-inch) all day, all night:

Screenshot 2024-10-28 at 1 53 26 PM

When I'm out on Overnight Hikes: I check my phone at every water break:
GridArt_20241028_150938083

I have GitHub Scripts that will run on Termux Android (remember to pkg install gh and set GITHUB_TOKEN):

@cederom
Copy link
Contributor

cederom commented Oct 28, 2024

Lup's Operations Center =)

@lupyuen
Copy link
Member Author

lupyuen commented Oct 28, 2024

1 Day to Utopia: Yesterday was a busy Monday, we consumed 14 Full-Time GitHub Runners. That's 56% of the ASF Quota for Full-Time Runners...

Screenshot 2024-10-29 at 6 01 52 AM

We peaked briefly at 26 Full-Time Runners. Let's hang in there thanks! :-)

(Live Image) (Live Log)

@cederom
Copy link
Contributor

cederom commented Oct 28, 2024

2 days but we should be fine thanks to our Super Hero @lupyuen !! AVE =)

@lupyuen
Copy link
Member Author

lupyuen commented Oct 28, 2024

Thank you so much @cederom! :-)

@jerpelea
Copy link
Contributor

jerpelea commented Oct 29, 2024 via email

@lupyuen
Copy link
Member Author

lupyuen commented Oct 29, 2024

0 Days to Final Audit: ASF Infra Team will be checking on us one last time today! Yesterday was a super busy Tuesday, we consumed 15 Full-Time GitHub Runners (peaked briefly at 31)

Screenshot 2024-10-30 at 6 02 25 AM

Past 7 Days: We consumed 12 Full-Time Runners, which is half the ASF Quota of 25 Full-Time Runners yay!

Screenshot 2024-10-30 at 6 06 21 AM

FYI: Our "Monthly Bill" for GitHub Actions used to be $18K...

before-30days

Right now our Monthly Bill is $14K. And still dropping!

after-30days

Let's wait for the good news from ASF, thank you everyone! 🙏

(Live Image) (Live Log)

@cederom
Copy link
Contributor

cederom commented Oct 29, 2024

🙏 🙏 🙏

@lupyuen
Copy link
Member Author

lupyuen commented Oct 30, 2024

GitHub Actions had some laggy issues just now: https://www.githubstatus.com/incidents/9yk1fbk0qjjc

So please ignore the over-inflated data in our report (because everything got lagged). Thanks!

(Live Image) (Live Log)

@lupyuen
Copy link
Member Author

lupyuen commented Oct 31, 2024

It's Oct 31 and our CI Servers are still running. We made it yay! 🎉

We got plenty to do:

  1. We made lots of fixes to the CI Workflow. I'll document everything in an article.

  2. Become more resilient and self-sufficient with Our Own Build Farm (away from GitHub)

  3. Analyse our Build Logs with Our Own Tools (instead of GitHub)

Thank you everyone for making this happen! 🙏

Live Update: Full-Time GitHub Runners

(Live Image) (Live Log)

@lupyuen lupyuen closed this as completed Oct 31, 2024
@cederom
Copy link
Contributor

cederom commented Oct 31, 2024

BIG THANK YOU @lupyuen FOR YOUR HELP, TIME, AND PATIENCE!!
YOU SAVED NUTTX'S CI MAAAN :-)

lupyuen added a commit to lupyuen2/wip-nuttx-apps that referenced this issue Nov 3, 2024
Due to the [recent cost-cutting](apache/nuttx#14376), we are no longer running PR Merge Jobs in the `nuttx` and `nuttx-apps` repos. For this to happen, I am now running a script on my computer that will cancel any PR Merge Jobs that appear: [kill-push-master.sh](https://github.com/lupyuen/nuttx-release/blob/main/kill-push-master.sh)

This PR disables PR Merge Jobs permanently, so that we no longer need to run the script. This prevents our CI Charges from over-running, in case the script fails to operate properly.
lupyuen added a commit to lupyuen2/wip-nuttx that referenced this issue Nov 3, 2024
Due to the [recent cost-cutting](apache#14376), we are no longer running PR Merge Jobs in the `nuttx` and `nuttx-apps` repos. For this to happen, I am now running a script on my computer that will cancel any PR Merge Jobs that appear: [kill-push-master.sh](https://github.com/lupyuen/nuttx-release/blob/main/kill-push-master.sh)

This PR disables PR Merge Jobs permanently, so that we no longer need to run the script. This prevents our CI Charges from over-running, in case the script fails to operate properly.
xiaoxiang781216 pushed a commit that referenced this issue Nov 4, 2024
Due to the [recent cost-cutting](#14376), we are no longer running PR Merge Jobs in the `nuttx` and `nuttx-apps` repos. For this to happen, I am now running a script on my computer that will cancel any PR Merge Jobs that appear: [kill-push-master.sh](https://github.com/lupyuen/nuttx-release/blob/main/kill-push-master.sh)

This PR disables PR Merge Jobs permanently, so that we no longer need to run the script. This prevents our CI Charges from over-running, in case the script fails to operate properly.
xiaoxiang781216 pushed a commit to apache/nuttx-apps that referenced this issue Nov 4, 2024
Due to the [recent cost-cutting](apache/nuttx#14376), we are no longer running PR Merge Jobs in the `nuttx` and `nuttx-apps` repos. For this to happen, I am now running a script on my computer that will cancel any PR Merge Jobs that appear: [kill-push-master.sh](https://github.com/lupyuen/nuttx-release/blob/main/kill-push-master.sh)

This PR disables PR Merge Jobs permanently, so that we no longer need to run the script. This prevents our CI Charges from over-running, in case the script fails to operate properly.
@lupyuen
Copy link
Member Author

lupyuen commented Nov 10, 2024

[Article] Optimising the Continuous Integration for NuttX

Within Two Weeks: We squashed our GitHub Actions spending from $4,900 (weekly) down to $890. Thank you everyone for helping out, we saved our CI Servers from shutdown! 🎉

This article explains everything we did in the (Semi-Chaotic) Two Weeks:

(1) Shut down the macOS and Windows Builds, revive them in a different form

(2) Merge Jobs are super costly, we moved them to the NuttX Mirror Repo

(3) We Halved the CI Checks for Complex PRs

(4) Simple PRs are already quite fast. (Sometimes 12 Mins!)

(5) Coding the Build Rules for our CI Workflow, monitoring our CI Servers 24 x 7

(6) We can’t run All CI Checks, but NuttX Devs can help ourselves!

Check out the article: https://lupyuen.codeberg.page/articles/ci3.html

ci3-title

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment