gstreamer - high IAT(Inter Arrival Time) around ~10ms observed between UDP datagrams of SRT client output #673

ravtr · 2019-04-30T17:20:55Z

Using below pipeline to convert RTP to SRT
udpsrc name=srtudpsrc multicast-iface=lo address=<address port= caps="application/x-rtp, media=(string)video, clock-rate=(int)90000, payload=(int)33, encoding-name=(string)MP2T" ! queue name=rtpqueue max-size-buffers=80000 max-size-bytes=100000000 max-size-time=10000000000 min-threshold-time=0 min-threshold-buffers=0 min-threshold-bytes=0 leaky=upstream ! rtpjitterbuffer latency=40 ! queue name=rtpjitterbufferqueue ! rtpmp2tdepay ! queue name=outputqueue ! srtclientsink uri= latency=1000 max-lateness=100000000 key-length=16 passphrase=""

When monitored the input IAT was seen to be 1ms. But output of SRT client is around 10ms around 10% of the time. The same is observed when packets captured at SRT output.

Need some help in debugging this issue.

maxsharabayko · 2019-04-30T20:09:36Z

@ravtr This might be related to #637. SRT inter packet sending has a minimum waiting interval of 10 ms. So if each packet should have 1 ms interval, then the first one will have 10 ms inter packet delay, and further 9 will be sent ASAP to make the interval 1 ms on average.
According to the experiments in #637, this minimum interval can be reduced to 1 ms, as the sleeping accuracy on Linux is good enough. We are looking forward to check how this will affect the overall performance of SRT.
Furthermore, there is an option to use busy waiting loop for delivery to achieve higher accuracy. To check this delete definitions of NO_BUSY_WAITING from srt.h and udt.h. And I would recommend using the following code in CUDT::packData(...) to preserve the inter-packet interval adjustment:

//#ifndef NO_BUSY_WAITING
#if 0
	ts_tk = entertime_tk + m_ullInterval_tk;
#else
	if (m_ullTimeDiff_tk >= m_ullInterval_tk)
	{
		ts_tk = entertime_tk;
		m_ullTimeDiff_tk -= m_ullInterval_tk;
	}
	else
	{
		ts_tk = entertime_tk + m_ullInterval_tk - m_ullTimeDiff_tk;
		m_ullTimeDiff_tk = 0;
	}
#endif

ravtr · 2019-05-01T07:15:20Z

@maxlovic thanks for the quick reply, will try the suggested changes and share the results

ravtr · 2019-05-01T11:27:14Z

@maxlovic I have disabled NO_BUSY_WAITING in srt.h and udt.h then I can see no more 10ms peaks.
below are the versions of modules we are using
libsrt .- 1.3.1 (libsrt1_1.3.1-1_amd64.deb)

Is there any plan to make this change as a configurable feature?
Can I make the change and raise a feature request?

oviano · 2019-05-01T15:20:47Z

Would the NO_BUSY_WAITING change described above be appropriate for Windows too? I understand that Windows timers are even worse, so this could help, right?

maxsharabayko · 2019-05-01T17:10:51Z

@ravtr

Is there any plan to make this change as a configurable feature?
Can I make the change and raise a feature request?

We are looking into this. Some tests on several platforms are to be performed.

I think that BUSY_WAITING has for sure go to predefined options, visible to CMake. 10 ms waiting (current) should be the default one. But one should be able to enable busy waiting as a build option.
10 ms interval might be reduced to 1 ms for higher accuracy. Will have to check platforms support.

@oviano

Would the NO_BUSY_WAITING change described above be appropriate for Windows too? I understand that Windows timers are even worse, so this could help, right?

Yes, Windows timers are far less accurate. But busy waiting will produce some undesireable CPU load, so it can't be the general case solution. Maybe a fix will be to wait with timer if waiting time is higher than 10 ms, and with busy waiting - if less.. Will see.

oviano · 2019-05-02T09:53:59Z

So here is a suggested modified version of CTimer:sleepto() which uses the timer for waits >= 10ms and busy waits for any smaller amounts.

The * 10000 could eventually be replaced with some sort of platform-specific granularity constant.

Maybe @ravtr you could test this and observe CPU load vs the change you already made that just busy waits in all cases?

void CTimer::sleepto(uint64_t nexttime)
{
   // Use class member such that the method can be interrupted by others
   m_ullSchedTime = nexttime;

   uint64_t t;
   rdtsc(t);

#ifndef NO_BUSY_WAITING
   uint64_t dt = s_ullCPUFrequency * 10000;
   while (t < m_ullSchedTime && m_ullSchedTime - t >= dt)
#else
   while (t < m_ullSchedTime)
#endif
   {
       timeval now;
       timespec timeout;
       gettimeofday(&now, 0);
       if (now.tv_usec < 990000)
       {
           timeout.tv_sec = now.tv_sec;
           timeout.tv_nsec = (now.tv_usec + 10000) * 1000;
       }
       else
       {
           timeout.tv_sec = now.tv_sec + 1;
           timeout.tv_nsec = (now.tv_usec + 10000 - 1000000) * 1000;
       }
       THREAD_PAUSED();
       pthread_mutex_lock(&m_TickLock);
       pthread_cond_timedwait(&m_TickCond, &m_TickLock, &timeout);
       pthread_mutex_unlock(&m_TickLock);
       THREAD_RESUMED();

       rdtsc(t);
   }

#ifndef NO_BUSY_WAITING
   while (t < m_ullSchedTime)
   {
#ifdef IA32
       __asm__ volatile ("pause; rep; nop; nop; nop; nop; nop;");
#elif IA64
       __asm__ volatile ("nop 0; nop 0; nop 0; nop 0; nop 0;");
#elif AMD64
       __asm__ volatile ("nop; nop; nop; nop; nop;");
#endif

       rdtsc(t);
   }
#endif

}

ravtr · 2019-05-08T09:36:02Z

@oviano @maxlovic
As suggested by @oviano, sleep is made busy wait only below 10ms and moved NO_BUSY_WAITING definition to CMakeList. please check the pull request yvvarun#1
You can plan to merge it based on your schedule. Let me know if you have any comments.
Without changes, CPU was around 30% and with the above change it's going up to 70%

oviano · 2019-05-08T09:53:52Z

The only thing I notice about that PR is it doesn't include the change suggested by @maxlovic to preserve the inter-packet interval adjustment. Just mentioning in case that was an oversight.

ravtr · 2019-05-08T10:05:26Z

@maxtomilov I can see #678 is raised on master
Currently, we are using 1.3.1. Is it possible for you to give a pull request on 1.3.1
So that we can use your changes

oviano · 2019-05-08T10:10:41Z

Also, there are these lines in queue.cpp:

#ifdef NO_BUSY_WAITING
m_pTimer->tick();
#endif

But with the above hybrid sleep/busy wait loop, you still want it to call the ->tick() to wake it up because it may be in the pthread_cond_wait portion of the code, i.e. just make it:

m_pTimer->tick();

maxsharabayko · 2019-05-08T10:40:29Z

@ravtr @oviano Busy waiting usage together with normal waiting is better to be added as a next step.
First, the waiting interval can be reduced from 10 ms down to the desired time. Checking the impact of this change.

oviano · 2019-05-08T12:11:48Z

@ravtr @oviano Busy waiting usage together with normal waiting is better to be added as a next step.
First, the waiting interval can be reduced from 10 ms down to the desired time. Checking the impact of this change.

Makes sense. What would be the desired time though - it's going to vary per platform. The best I've been able to achieve on Windows is about 1700us.

maxsharabayko · 2019-05-10T12:46:22Z

@ravtr Version 1.3.1 with changes from PR 678: link. Please check the default build (without busy waiting).

ravtr · 2019-05-10T12:49:44Z

@maxlovic thanks

oviano · 2019-08-26T21:44:21Z

So, here is the revised sleepto I am using having updated my SRT to 1.3.4.

Taking this second step is is very important - it reduces idle CPU from around 20% to almost zero on both Windows machines I have tried.

void CTimer::sleepto(uint64_t nexttime)
{
   // Use class member such that the method can be interrupted by others
   m_ullSchedTime = nexttime;

   uint64_t t;
   rdtsc(t);

#if USE_BUSY_WAITING
   uint64_t dt = s_ullCPUFrequency * 10000;
   while (t < m_ullSchedTime && m_ullSchedTime - t >= dt)
#else
   while (t < m_ullSchedTime)
#endif
   {
      const uint64_t wait_us = (m_ullSchedTime - t) / s_ullCPUFrequency;
      // The while loop ensures that (t < m_ullSchedTime).
      // Division by frequency ????? loos prevision.
      if (wait_us == 0)
         break;

      timeval now;
      gettimeofday(&now, 0);
      const uint64_t time_us = now.tv_sec * uint64_t(1000000) + now.tv_usec + wait_us;
      timespec timeout;
      timeout.tv_sec = time_us / 1000000;
      timeout.tv_nsec = (time_us % 1000000) * 1000;

      THREAD_PAUSED();
      pthread_mutex_lock(&m_TickLock);
      pthread_cond_timedwait(&m_TickCond, &m_TickLock, &timeout);
      pthread_mutex_unlock(&m_TickLock);
      THREAD_RESUMED();

      rdtsc(t);
   }

#if USE_BUSY_WAITING
   while (t < m_ullSchedTime)
   {
#ifdef IA32
      __asm__ volatile ("pause; rep; nop; nop; nop; nop; nop;");
#elif IA64
      __asm__ volatile ("nop 0; nop 0; nop 0; nop 0; nop 0;");
#elif AMD64
      __asm__ volatile ("nop; nop; nop; nop; nop;");
#elif defined(_WIN32) && !defined(__MINGW__)
      __nop ();
      __nop ();
      __nop ();
      __nop ();
      __nop ();
#endif

      rdtsc(t);
}
#endif

}

maxsharabayko · 2019-08-27T10:49:50Z

@oviano
In your approach you first wait on a CV if waiting time is more than 10 ms, and the rest time you wait in spin loop (busy waiting).

10 ms only works for Windows. For Linux and MacOS better to use 1 ms.
pthread_cond_timedwait will wait at least the specified time, meaning that it can take longer, but not shorter time. It can wake up earlier only due to a spurious wake up.
At the same time you are trying to wait on a CV for the whole interval.
wait_us = (m_ullSchedTime - t) / s_ullCPUFrequency;
So effectively busy waiting will take place only if a spurious wake up happens within the following interval: [t - 10ms; t), where t is the target wake up time (timeout).

To summ up, In this approach busy waiting is used only for waiting durations below 10 ms or after a spurious wake up, when the remaining waiting time is less than 10 ms.

What bitrate did you use for testing?

Can you create a PR?

oviano · 2019-08-27T10:58:49Z

Yes sorry there was a flaw in my edit this morning, it needs to be more like the below so that in the busy waiting mode it sleeps only for 10ms each iteration, until there is < 10ms remaining at which point it does the actual busy waiting in the second loop.

void CTimer::sleepto(uint64_t nexttime)
{
   // Use class member such that the method can be interrupted by others
   m_ullSchedTime = nexttime;

   uint64_t t;
   rdtsc(t);

#if USE_BUSY_WAITING
   const uint64_t wait_us = 10000;
   uint64_t dt = s_ullCPUFrequency * wait_us;
   while (t < m_ullSchedTime && m_ullSchedTime - t >= dt)
   {
#else
   while (t < m_ullSchedTime)
   {
      const uint64_t wait_us = (m_ullSchedTime - t) / s_ullCPUFrequency;
      // The while loop ensures that (t < m_ullSchedTime).
      // Division by frequency ????? loos prevision.
      if (wait_us == 0)
         break;
#endif

      timeval now;
      gettimeofday(&now, 0);
      const uint64_t time_us = now.tv_sec * uint64_t(1000000) + now.tv_usec + wait_us;
      timespec timeout;
      timeout.tv_sec = time_us / 1000000;
      timeout.tv_nsec = (time_us % 1000000) * 1000;

      THREAD_PAUSED();
      pthread_mutex_lock(&m_TickLock);
      pthread_cond_timedwait(&m_TickCond, &m_TickLock, &timeout);
      pthread_mutex_unlock(&m_TickLock);
      THREAD_RESUMED();

      rdtsc(t);
   }

#if USE_BUSY_WAITING
   while (t < m_ullSchedTime)
   {
#ifdef IA32
      __asm__ volatile ("pause; rep; nop; nop; nop; nop; nop;");
#elif IA64
      __asm__ volatile ("nop 0; nop 0; nop 0; nop 0; nop 0;");
#elif AMD64
      __asm__ volatile ("nop; nop; nop; nop; nop;");
#elif defined(_WIN32) && !defined(__MINGW__)
      __nop ();
      __nop ();
      __nop ();
      __nop ();
      __nop ();
#endif

      rdtsc(t);
   }
#endif
}

maxsharabayko · 2019-08-27T11:21:53Z

I would rather wait till m_ullSchedTime - 10 ms then.

oviano · 2019-08-27T11:22:42Z

Yes that would be even better, you are right.

oviano · 2019-08-27T12:04:22Z

So a refined version; now it will only do the pthread sleep up until t - 10ms (or 1ms for non-Windows).

void CTimer::sleepto(uint64_t nexttime)
{
   // Use class member such that the method can be interrupted by others
   m_ullSchedTime = nexttime;

   uint64_t t;
   rdtsc(t);

#if USE_BUSY_WAITING
#if defined(_WIN32)
   const uint64_t threshold = 10000;
#else
   const uint64_t threshold = 1000;
#endif
#endif

   while (t < m_ullSchedTime)
   {
#if USE_BUSY_WAITING
      uint64_t wait_us = (m_ullSchedTime - t) / s_ullCPUFrequency;
      if (wait_us > threshold)
         wait_us -= threshold;
      if (wait_us < threshold)
         break;
#else
      const uint64_t wait_us = (m_ullSchedTime - t) / s_ullCPUFrequency;
      if (wait_us == 0)
         break;
#endif

      timeval now;
      gettimeofday(&now, 0);
      const uint64_t time_us = now.tv_sec * uint64_t(1000000) + now.tv_usec + wait_us;
      timespec timeout;
      timeout.tv_sec = time_us / 1000000;
      timeout.tv_nsec = (time_us % 1000000) * 1000;

      THREAD_PAUSED();
      pthread_mutex_lock(&m_TickLock);
      pthread_cond_timedwait(&m_TickCond, &m_TickLock, &timeout);
      pthread_mutex_unlock(&m_TickLock);
      THREAD_RESUMED();

      rdtsc(t);
   }

#if USE_BUSY_WAITING
   while (t < m_ullSchedTime)
   {
#ifdef IA32
      __asm__ volatile ("pause; rep; nop; nop; nop; nop; nop;");
#elif IA64
      __asm__ volatile ("nop 0; nop 0; nop 0; nop 0; nop 0;");
#elif AMD64
      __asm__ volatile ("nop; nop; nop; nop; nop;");
#elif defined(_WIN32) && !defined(__MINGW__)
      __nop ();
      __nop ();
      __nop ();
      __nop ();
      __nop ();
#endif

      rdtsc(t);
   }
#endif

}

ravtr changed the title ~~high IAT(Inter Arrival Time) around ~10ms observed between UDP datagrams of SRT client output~~ gstreamer - high IAT(Inter Arrival Time) around ~10ms observed between UDP datagrams of SRT client output Apr 30, 2019

maxsharabayko added the [core] Area: Changes in SRT library core label Apr 30, 2019

maxsharabayko added this to the v.1.3.3 milestone Apr 30, 2019

maxsharabayko mentioned this issue May 8, 2019

Improving sending queue sleepto() accuracy #678

Merged

10 tasks

maxsharabayko self-assigned this May 29, 2019

maxsharabayko modified the milestones: v.1.3.3, v.1.3.4 Jun 5, 2019

maxsharabayko mentioned this issue Jun 5, 2019

[core] Add USE_BUSY_WAITING build option #711

Merged

maxsharabayko closed this as completed in #678 Aug 19, 2019

maxsharabayko mentioned this issue Nov 4, 2019

[core] Use busy wait only to refine below 1 ms. #936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gstreamer - high IAT(Inter Arrival Time) around ~10ms observed between UDP datagrams of SRT client output #673

gstreamer - high IAT(Inter Arrival Time) around ~10ms observed between UDP datagrams of SRT client output #673

ravtr commented Apr 30, 2019

maxsharabayko commented Apr 30, 2019

ravtr commented May 1, 2019

ravtr commented May 1, 2019

oviano commented May 1, 2019

maxsharabayko commented May 1, 2019

oviano commented May 2, 2019 •

edited

Loading

ravtr commented May 8, 2019 •

edited

Loading

oviano commented May 8, 2019

ravtr commented May 8, 2019

oviano commented May 8, 2019 •

edited

Loading

maxsharabayko commented May 8, 2019 •

edited

Loading

oviano commented May 8, 2019

maxsharabayko commented May 10, 2019

ravtr commented May 10, 2019

oviano commented Aug 26, 2019 •

edited

Loading

maxsharabayko commented Aug 27, 2019

oviano commented Aug 27, 2019 •

edited

Loading

maxsharabayko commented Aug 27, 2019

oviano commented Aug 27, 2019

oviano commented Aug 27, 2019

gstreamer - high IAT(Inter Arrival Time) around ~10ms observed between UDP datagrams of SRT client output #673

gstreamer - high IAT(Inter Arrival Time) around ~10ms observed between UDP datagrams of SRT client output #673

Comments

ravtr commented Apr 30, 2019

maxsharabayko commented Apr 30, 2019

ravtr commented May 1, 2019

ravtr commented May 1, 2019

oviano commented May 1, 2019

maxsharabayko commented May 1, 2019

oviano commented May 2, 2019 • edited Loading

ravtr commented May 8, 2019 • edited Loading

oviano commented May 8, 2019

ravtr commented May 8, 2019

oviano commented May 8, 2019 • edited Loading

maxsharabayko commented May 8, 2019 • edited Loading

oviano commented May 8, 2019

maxsharabayko commented May 10, 2019

ravtr commented May 10, 2019

oviano commented Aug 26, 2019 • edited Loading

maxsharabayko commented Aug 27, 2019

oviano commented Aug 27, 2019 • edited Loading

maxsharabayko commented Aug 27, 2019

oviano commented Aug 27, 2019

oviano commented Aug 27, 2019

oviano commented May 2, 2019 •

edited

Loading

ravtr commented May 8, 2019 •

edited

Loading

oviano commented May 8, 2019 •

edited

Loading

maxsharabayko commented May 8, 2019 •

edited

Loading

oviano commented Aug 26, 2019 •

edited

Loading

oviano commented Aug 27, 2019 •

edited

Loading