Skip to content

Conversation

@DedeHai
Copy link
Collaborator

@DedeHai DedeHai commented Jan 31, 2026

WiFi.hostByName() is blocking, leading to freezeing/glitches on NTP requests - what makes it worse is that there is a bug in the IDF that leads to a timeout on every call if the look-up ever fails - if that happens, LEDs freeze for a few seconds every 10 seconds.
The new class does the look-up async without blocking and also handles that bug with workaround code. IDF V5 should have that bug fixed though.

  • adding a wrapper class for DNS lookup (lwIP)
  • replaces blockig WiFi.hostByName() in NTP and virtual Bus DNS look-ups

Fixes #3285

Summary by CodeRabbit

  • New Features

    • Non-blocking asynchronous DNS for NTP and hostname resolution with MDNS fallback, ESP32-specific async path, and time-gated hostname lookups to reduce blocking and unnecessary queries.
  • Configuration

    • Default Wi‑Fi AP channel fallback changed from 1 to 6.
  • Behavior Changes

    • NTP sync now waits for DNS resolution; retry/backoff and sync cadence adjusted to handle async resolution and failures.

✏️ Tip: You can customize this high-level summary in your review settings.

- adding a wrapper class for DNS lookup (lwIP)
- replaces blockig WiFi.hostByName() in NTP and virtual Bus DNS look-ups
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 31, 2026

Walkthrough

Adds a non-blocking AsyncDNS header for lwIP-based DNS resolution, integrates async hostname resolution into BusManager and NTP flows, and changes the AP channel fallback default from 1 to 6 during config deserialization.

Changes

Cohort / File(s) Summary
Async DNS implementation
wled00/asyncDNS.h
New header defining AsyncDNS class, result enum, non-blocking query(), getIP(), renew(), reset(), status(), getErrorCount(), atomic status, raw IP storage, and lwIP callback.
Bus manager integration
wled00/bus_manager.h, wled00/bus_manager.cpp
ESP32-only include of asyncDNS.h; replaces previous synchronous hostname resolution with an async flow using a local AsyncDNS instance, retains mDNS fallback, adds per-hostname resolve cadence gating and a small blocking wait while async resolution completes.
NTP integration
wled00/ntp.cpp
Adds AsyncDNS-based asynchronous resolution for NTP server hostnames (static per-call DNS state), defers sending NTP packets until resolution success, updates retry/backoff and error handling, and removes DNS work from sendNTPPacket.
Configuration tweak
wled00/cfg.cpp
AP config deserialization: invalid apChannel now defaults to channel 6 instead of 1.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR partially addresses #3285 by implementing async DNS to reduce NTP-related freezes, but does not implement the primary requirement of configurable user-specified NTP polling intervals. Implement user-configurable NTP polling interval (hh:mm format) as specified in #3285 to fully meet the linked issue requirements.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely describes the main change: implementing async DNS lookup to prevent NTP request freezing.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing async DNS lookup and integrating it into NTP and bus manager hostname resolution flows, with a minor unrelated WiFi channel default change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Jan 31, 2026

I tested this on a C3 and ESP8266 for NTP, I have no hardware setup to thest the virtual bus changes.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
wled00/bus_manager.cpp (2)

675-679: ⚠️ Potential issue | 🔴 Critical

Initialization of _DNSlookup is outside the ESP32 guard.

Line 675 initializes _DNSlookup = nullptr but this line is before the #ifdef ARDUINO_ARCH_ESP32 block that starts at line 676. Since _DNSlookup is only declared for ESP32 in the header, this will cause a compilation error on ESP8266.

Proposed fix
   _client = IPAddress(bc.pins[0],bc.pins[1],bc.pins[2],bc.pins[3]);
-  _DNSlookup = nullptr;
   `#ifdef` ARDUINO_ARCH_ESP32
+  _DNSlookup = nullptr;
   _hostname = bc.text;
   resolveHostname(); // resolve hostname to IP address if needed
   `#endif`

775-781: ⚠️ Potential issue | 🟡 Minor

Missing cleanup of _DNSlookup in BusNetwork::cleanup().

When BusNetwork::cleanup() is called, the _DNSlookup pointer is not deleted, which could cause a memory leak if a DNS lookup is pending when the bus is destroyed.

Proposed fix
 void BusNetwork::cleanup() {
   DEBUGBUS_PRINTLN(F("Virtual Cleanup."));
+  `#ifdef` ARDUINO_ARCH_ESP32
+  delete _DNSlookup;
+  _DNSlookup = nullptr;
+  `#endif`
   d_free(_data);
   _data = nullptr;
   _type = I_NONE;
   _valid = false;
 }
🤖 Fix all issues with AI agents
In `@wled00/asyncDNS.h`:
- Around line 35-36: Fix the typo in the comment inside asyncDNS.h: change
"handlig" to "handling" in the sentence "with the current IDF bug external error
handlig is required anyway or dns will just stay stuck" so the comment reads
"...external error handling is required anyway or dns will just stay stuck."

In `@wled00/ntp.cpp`:
- Around line 209-212: The debug prints occur before ntpServerIP is updated,
causing the wrong IP to be logged; move the assignment ntpServerIP =
ntpDNSlookup->getIP() to before the DEBUG_PRINTF_P/Serial.printf_P calls (and
keep sendNTPPacket() after the assignment) so both debug outputs and
sendNTPPacket() use the newly resolved ntpServerIP.
- Line 201: Replace the direct Serial.printf_P debug prints with the conditional
debug macro so debug output is gated; specifically, change the
Serial.printf_P(...) calls that reference ntpServerName (and the other
occurrence) in ntp.cpp to use DEBUG_PRINTF_P(...) instead, preserving the PSTR
format strings and arguments so the messages only appear when debugging is
enabled.
🧹 Nitpick comments (3)
wled00/bus_manager.cpp (2)

716-716: Static nextResolve is shared across all BusNetwork instances.

The static unsigned long nextResolve variable is shared among all network bus instances. This means if you have multiple network buses with different hostnames, they'll all share the same 10-minute resolution timer. This might be intentional to reduce DNS traffic, but if independent resolution per bus is desired, consider making this a member variable instead.


749-752: Unreachable else branch.

This else block handles states other than Error, but at this point in the code, the only possible states are Idle, Busy, Success, or Error. Idle and Success are handled earlier (line 719 checks Success), Busy would mean an in-progress query, and Error is handled in the if block. This else branch appears to be defensive but unreachable.

Consider simplifying or adding a comment
           } else {
-            delete _DNSlookup; // cleanup if other error just in case (should not happen)
-            _DNSlookup = nullptr;
+            // Note: This branch should be unreachable as all DnsResult states are handled above
+            // Kept as defensive cleanup in case of unexpected state
+            delete _DNSlookup;
+            _DNSlookup = nullptr;
           }
wled00/ntp.cpp (1)

184-184: Global pointer declaration could be more explicit.

While C++ zero-initializes global pointers, an explicit = nullptr would make the intent clearer and be consistent with the initialization in bus_manager.cpp.

-AsyncDNS* ntpDNSlookup;
+AsyncDNS* ntpDNSlookup = nullptr;

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@wled00/bus_manager.cpp`:
- Around line 717-718: The static local nextResolve in
BusNetwork::resolveHostname is shared across all instances, causing only one
hostname to be resolved per cycle; make the timer instance-specific by adding an
unsigned long member (e.g. _nextResolve = 0) to the BusNetwork class (in
bus_manager.h) and replace the static nextResolve usage inside
BusNetwork::resolveHostname with this member (_nextResolve), ensuring any logic
that checks or updates nextResolve now operates on the instance field.
- Around line 727-728: The comparison using millis() > nextResolve will fail
after millis() wraps; change the check to use unsigned subtraction to handle
overflow correctly (e.g., (millis() - nextResolve) >= 0 or millis() -
nextResolve < someInterval). Update the condition around nextResolve and
_hostname in the bus_manager.cpp block where millis() and nextResolve are used
so the resolve runs every 600000ms even after overflow and keep the existing
update to nextResolve = millis() + 600000 to schedule the next run.
- Around line 751-754: The AsyncDNS instance can be deleted while a DNS query is
still pending, causing a use-after-free when lwIP invokes the static callback;
add a destructor to the AsyncDNS class that checks for a pending state (e.g.
_status == DnsResult::Busy) and cancels any outstanding query via the
appropriate lwIP cancel API (matching how dns_gethostbyname was used) before
allowing destruction; ensure the else branch that deletes _DNSlookup (and any
other code paths) relies on this destructor to safely unregister/cancel
callbacks so lwIP never receives a pointer to a destroyed AsyncDNS.
🧹 Nitpick comments (1)
wled00/bus_manager.cpp (1)

675-681: Consider merging consecutive #ifdef blocks.

Two adjacent #ifdef ARDUINO_ARCH_ESP32 blocks can be combined into one for clarity.

♻️ Proposed refactor
   _client = IPAddress(bc.pins[0],bc.pins[1],bc.pins[2],bc.pins[3]);
   `#ifdef` ARDUINO_ARCH_ESP32
   _DNSlookup = nullptr;
-  `#endif`
-  `#ifdef` ARDUINO_ARCH_ESP32
   _hostname = bc.text;
   resolveHostname(); // resolve hostname to IP address if needed
   `#endif`

//int ap_pskl = ap[F("pskl")];
CJSON(apChannel, ap[F("chan")]);
if (apChannel > 13 || apChannel < 1) apChannel = 1;
if (apChannel > 13 || apChannel < 1) apChannel = 6; // reset to default if invalid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seams unrelated to the PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is. I should have changed this line in #5115 but was unaware at the time.

@netmindz
Copy link
Member

Great to find this issue, but not sure async dns is the right approach. surely ntp should but it's own task not part of our main loop?

Copy link
Member

@willmmiles willmmiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea and definitely the right general approach. The possible use-after-free when the object is destructed needs to be fixed though.

wled00/ntp.cpp Outdated
}
}

AsyncDNS* ntpDNSlookup;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::unique_ptr instead of naked pointers. Also it should be static inside the function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::unique_ptr gives me errors as make_unique is not defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a shim in bus_manager.h. (This might be moot re fixing the safety issues though -- shared_ptr might allow a cleaner solution.)

uint16_t _errorcount = 0;

// callback for dns_gethostbyname(), called when lookup is complete or timed out
static void _dns_callback(const char *name, const ip_addr_t *ipaddr, void *arg) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this class is destructed while a query is still outstanding, this will result in the arg pointer becoming invalid, resulting in memory corruption when the callback is eventually executed. Disconnecting from the wifi does not dequeue DNS callbacks -- they're supposedly guaranteed to be called "eventually" by the lwip stack.

Fixing this is nontrivial -- the state data needs to be stored on the heap separately from the interface class, and scoped with tools like std::shared_ptr so it won't be released until the callback is run. I made a little sketch of this, I can add it to the PR if you'd like.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I am not opposed to "do it properly" but I also see two more trivial solutions:

  • make the class global and instantiate only one, add a "key" or a char array and share the class between callers. this adds the complexity of a shared resource though and probably has implications I am currently not thinking of.
  • simplest: do not delete if status is busy (works well for NTP use)
    both of these will not solve the issue for virtual buses though - maybe adding another function to the class that is actually blocking would simplify its use for places that "need an IP immediately" - what I have implemented in busmanager is way too complex IMHO but mostly due to the fact that it needs to dance around that bug gracefully.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just deletion: renew() and reset() are also invalid if status == Busy as the results of a new query will be overwritten by the old callback at some point. If a query is stuck, the state object has to be abandoned -- there's no real alternative.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote it the way I did because I never got it to not run the callback - you are right it is not as robust as it could be but it was not my intention to write it "fool proof" i.e. reset while a request is going on.
So which one do you prefer:

  • using seperate heap for the callback as you propose
  • not use a pointer and make it fully static (like I did now in the busmanager) and not go down the road of managing dangling pointers?
    to me the option of not using it dynamically makes sense, at least ATM as DNS is only used for NTP and virtual bus. The class only uses a few bytes of RAM.

Copy link
Member

@willmmiles willmmiles Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to be able to test/treat an AsyncDNS object as a unit, with safety by design -- "it doesn't crash because we don't happen to delete/reuse it while it's active" isn't good enough IMO, static or otherwise. Eventually that will bite us because the class violates the C++ object requirement that a destructor is always safe.

I took a stab at safety, while keeping the current object and re-use semantics:
willmmiles@26c202a

I think it might be done better with std::future<> style semantics though; the code would be closer to the current class design. Eg. the fundamental call is:

class AsyncDNS {
   static std::shared_ptr<AsyncDNS> query(const char* hostname, std::shared_ptr<AsyncDNS> previous = {});

   private:
       AsyncDNS();   // Can only be produced by making a query
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-like formulation: willmmiles@efcfc5e

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say that I'm having a lot of trouble with DNS hanging on my test system. Do you have any references on that IDF v4 bug you mention?

I do too, that is why I am pretty sure its a bug, it just locks up and does not recover until reconnect. I did not really find concret info about it but the changelog mentions some fix for DNS deadlock: https://github.com/espressif/esp-idf/releases?page=1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-like formulation: willmmiles@efcfc5e

that is a whole lot of fancy c++ magic I do not (yet) understand. Is it to make it multi-thread safe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Future-like formulation: willmmiles@efcfc5e

that is a whole lot of fancy c++ magic I do not (yet) understand. Is it to make it multi-thread safe?

Sort of. The goal is to make sure that the addresses that will be touched by the callback in the future cannot under any circumstances be released until the callback has run. std::shared_ptr<> is a reference-counting pointer: it keeps track of how many shared_ptrs address the same storage, so the pointed-to object won't be deleted until the last shared_ptr pointing at it is removed. shared_ptr guarantees that the reference count is atomic and so it is thread-safe in this context.

The first implementation hides the shared_ptr by making the AsyncDNS class a wrapper around it: the only member is a shared_ptr to the DNS results. A copy of the shared_ptr is "gifted" to the callback, who is then guaranteed to be able to write the state. If the AsyncDNS object is destroyed, or the request is abandoned via renew() or reset(), that copy keeps the DNS result struct alive until it's finished with it.

The second implementations strips off the wrapper layer, so clients always have to store a shared_ptr<AsyncDNS>. This is intended to make the behaviour to the client more clear, ie. the thing you hold is a handle to the real object (and removes an extra layer of code). The major change to make that work is the interface to query(): it lets a client pass in a shared_ptr<AsyncDNS> that's been created before, so it can re-use the context (ie. error count) or storage if that request has been completed. (If the request is still busy, or if you pass in an empty pointer, query() makes a new storage object for you.)

I liken the second approach to the use of std::promise<>/std::future<>. Promises and futures are part of the C++11 async library. You can think of them as a thread-safe single result queue: the end holding the std::promise can write a single result value in a safe way, such that the std::future can wait() for it and read it later. Unfortunately actual std::promise and std::future are quite heavyweight, particularly with their use of exceptions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave this to your judgement. I would keep it as light-weight as possible. Feel free to add to this PR.

@DedeHai
Copy link
Collaborator Author

DedeHai commented Feb 1, 2026

Great to find this issue, but not sure async dns is the right approach. surely ntp should but it's own task not part of our main loop?

That is a good point actually, we should think about putting the rendering and background work in different tasks but I think the current code structure is not ready for that. Sure could put NTP in a task but that eats up RAM and on single core ESPs the arduino function this PR replaces may still be blocking, I did not look how the espressif "wait for bit set" part is implemented, it may not play nicely with the RTOS as it sounds like a pretty bare-metal thing to wait for a bit to be set.
What I have implemented here is the correct way to use the DNS look-up provided by espressif.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@wled00/ntp.cpp`:
- Line 13: NTP_SYNC_INTERVAL is currently set to 15 (seconds) which is far too
frequent; change the macro NTP_SYNC_INTERVAL back to a realistic interval (e.g.,
replace 15 with the previously-intended 42000UL or another sensible value like
43200UL for 12 hours) and restore/update the comment to reflect "Get fresh NTP
time about twice per day"; if 15 was used only for debugging, add a short
comment noting that and ensure the production value is the larger interval.
🧹 Nitpick comments (3)
wled00/asyncDNS.h (3)

35-47: const qualifiers are misplaced on return types instead of member functions.

The const before the return type (e.g., const IPAddress, const result) has no meaningful effect. To indicate these methods don't modify the object state, move const after the parameter list.

♻️ Suggested fix
-  const IPAddress getIP() {
+  IPAddress getIP() const {
     if (_status != result::Success) return IPAddress(0,0,0,0);
     `#ifdef` ARDUINO_ARCH_ESP32
       return IPAddress(_raw_addr.u_addr.ip4.addr);
     `#else`
       return IPAddress(_raw_addr.addr);
     `#endif`
   }

   void renew() { _status = result::Idle; } // reset status to allow re-query
   void reset() { _status = result::Idle; _errorcount = 0; } // reset status and error count
-  const result status() { return _status; }
-  const uint16_t getErrorCount() { return _errorcount; }
+  result status() const { return _status.load(); }
+  uint16_t getErrorCount() const { return _errorcount; }

50-52: Consider thread safety for _raw_addr and _errorcount access.

While _status is correctly atomic, _raw_addr and _errorcount are accessed from both the lwIP callback context and the main thread without explicit synchronization. In practice, the atomic _status check before reading these values provides implicit ordering (the callback sets _raw_addr/_errorcount before setting _status), but this relies on the atomic's acquire-release semantics.

For robustness, consider either:

  1. Making _errorcount atomic (low overhead)
  2. Adding a comment documenting the synchronization guarantee through _status

This is a minor concern given the current usage pattern where callers always check status() first.


13-64: Document the callback lifecycle constraint to guide future maintainers.

The AsyncDNS class relies on lwIP callbacks that invoke _dns_callback after the query completes. While the current callers (ntp.cpp and bus_manager.cpp) safely handle this by only deleting after completion or using stack allocation, the API should explicitly document this constraint.

ntp.cpp deletes only after Success or Error states (lines 211, 221). bus_manager.cpp avoids dynamic allocation entirely with a stack-allocated instance and polls with a timeout loop—its TODO comment ("make this dynamic? requires to handle the callback properly") shows developers were aware of the callback lifecycle issue.

Add a comment to the class definition clarifying that instances must not be deleted while status() == Busy, or update callers to use stack allocation where feasible (as bus_manager already does).

@netmindz
Copy link
Member

netmindz commented Feb 1, 2026

Great to find this issue, but not sure async dns is the right approach. surely ntp should but it's own task not part of our main loop?

That is a good point actually, we should think about putting the rendering and background work in different tasks but I think the current code structure is not ready for that. Sure could put NTP in a task but that eats up RAM and on single core ESPs the arduino function this PR replaces may still be blocking, I did not look how the espressif "wait for bit set" part is implemented, it may not play nicely with the RTOS as it sounds like a pretty bare-metal thing to wait for a bit to be set. What I have implemented here is the correct way to use the DNS look-up provided by espressif.

NTP sync should never have been anything other than a background task. Rather than adding a load of complex code to handle async DNS lookups, we should be correcting the failure in design

@willmmiles
Copy link
Member

NTP sync should never have been anything other than a background task. Rather than adding a load of complex code to handle async DNS lookups, we should be correcting the failure in design

Hard disagree. Every FreeRTOS task requires >4k of RAM for stack and ancillary data structures. That would be an enormous waste just for NTP synchronization.

@netmindz
Copy link
Member

netmindz commented Feb 1, 2026

NTP sync should never have been anything other than a background task. Rather than adding a load of complex code to handle async DNS lookups, we should be correcting the failure in design

Hard disagree. Every FreeRTOS task requires >4k of RAM for stack and ancillary data structures. That would be an enormous waste just for NTP synchronization.

I retract my statement, I was unaware that there was such a high penalty for running this way

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@wled00/ntp.cpp`:
- Around line 186-229: The DNS helper ntpDNSlookup can remain allocated if the
server becomes a literal IP (the else branch calling sendNTPPacket()); ensure
you delete and null out ntpDNSlookup when ntpServerIP.fromString(ntpServerName)
returns true. In practice, before calling sendNTPPacket() in the IP path (and
also before creating a new AsyncDNS in the hostname path), check if ntpDNSlookup
!= nullptr, call delete ntpDNSlookup, set ntpDNSlookup = nullptr to free the
stale allocation and avoid leaks or stale state.
🧹 Nitpick comments (1)
wled00/ntp.cpp (1)

218-224: Replace retry/backoff magic numbers with named constants.
This makes the backoff policy self-documenting and easier to adjust.

♻️ Suggested refactor
 `#define` NTP_SYNC_INTERVAL 42000UL //Get fresh NTP time about twice per day
+#define NTP_DNS_MAX_ERRORS 6
+#define NTP_DNS_RETRY_BACKOFF_MS 300000UL // 5 minutes

@@
-            if (ntpDNSlookup->getErrorCount() > 6) {
+            if (ntpDNSlookup->getErrorCount() > NTP_DNS_MAX_ERRORS) {
@@
-            ntpLastSyncTime = millis() - (1000*NTP_SYNC_INTERVAL - 300000); // pause for 5 minutes
+            ntpLastSyncTime = millis() - (1000*NTP_SYNC_INTERVAL - NTP_DNS_RETRY_BACKOFF_MS); // pause for 5 minutes

Based on learnings: In WLED code reviews, when code is modified or added, look for "magic numbers" (hardcoded numeric literals) and suggest replacing them with appropriate defined constants when those constants are meaningful in the context of the PR. This improves code maintainability and reduces the risk of inconsistencies.

@willmmiles
Copy link
Member

NTP sync should never have been anything other than a background task. Rather than adding a load of complex code to handle async DNS lookups, we should be correcting the failure in design

Hard disagree. Every FreeRTOS task requires >4k of RAM for stack and ancillary data structures. That would be an enormous waste just for NTP synchronization.

I retract my statement, I was unaware that there was such a high penalty for running this way

Yeah, the minimum stack requirement for proper Tasks is really kind of frustrating honestly. The end result seems to be that every reasonably sized project ends up writing their own async executive, from as simple as just loop() callback hooks to more complex systems (#4808's deferred executor) all the way to full on async environments (the latest hotness is Embassy in Rust).

softhack007 added a commit to MoonModules/WLED-MM that referenced this pull request Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adjustable NTP polling interval

3 participants