Why txhashset download response is so resource heavy #2787

garyyu · 2019-04-30T10:32:11Z

Look at this picture, the CPU load during the txhashset downloading (server side) is quite high, we need check if it's reasonable and whether there's any optimization possibility.

garyyu · 2019-05-22T05:23:27Z

this could be optimized by #2813
to decrease the chance of txhashset packaging operation.

bladedoyle · 2019-05-29T14:44:25Z

@garyyu In addition to CPU and network use, there are disk space use issues.

I routinely see my grin server generate many gigabytes of txhashset zip files per hour. Those files are left to fill up the disk. I have resorted to adding a cronjob to clean them up. Ex:

*/5 * * * * find /data/grin/.grin -maxdepth 1 -name txhashset_\* -cmin +15 -exec rm -rf {} \;

hashmap · 2019-05-29T14:47:35Z

@bladedoyle this is fixed in 1.1

garyyu · 2019-06-07T01:26:47Z

After taking #2813, I can investigate this issue again.

And very interesting new graph of CPU usage vs network traffic:

This time, there's no txhashset creating and packaging CPU load (thanks for @cadmuspeverell 's 12 hours txhashset_archive_interval), pls pay attention to the red circle I draw there, we can find an obvious pattern:

The lower the network traffic (remote node downloading the txhashset zip file) speed must lead to a higher CPU load, that's quite strange and doesn't make sense.

And considering the recent finding about the Rust thread::sleep performance problem (for example: #2773, and #2855), I get this finding:

In util/src/read_write.rs:

pub fn write_all(stream: &mut dyn Write, mut buf: &[u8], timeout: Duration) -> io::Result<()> {
	let sleep_time = Duration::from_micros(10);
	let mut count = Duration::new(0, 0);

	while !buf.is_empty() {
		match stream.write(buf) {
			Ok(0) => {
				return Err(io::Error::new(
					io::ErrorKind::WriteZero,
					"failed to write whole buffer",
				));
			}
			Ok(n) => buf = &buf[n..],
			Err(ref e) if e.kind() == io::ErrorKind::Interrupted => {}
			Err(ref e) if e.kind() == io::ErrorKind::WouldBlock => {}
			Err(e) => return Err(e),
		}
		if !buf.is_empty() {
			thread::sleep(sleep_time);        <<<<  problem is here!  10us sleep
			count += sleep_time;
		} else {
			break;
		}
		if count > timeout {
			return Err(io::Error::new(io::ErrorKind::TimedOut, "writing to stream"));
		}
	}
	Ok(())
}

So, an easy "fix" is to change this 10us sleep to 1ms sleep, since 8000byte/1ms = 8M Byte/s speed limitation should cover 99% case.

Perhaps the perfect fixing solution is to switch from non-blocking stream to blocking stream as in #2855, but that need more discussion and test. Before #2855 got a conclusion, I will write a two-lines PR for this improvement.

garyyu · 2019-06-08T02:41:12Z

After a fix (changing above 10us to 5ms sleep), the test result is quite good, no more huge CPU-Load increasing when response the txhashset request:

Note: Each sharp increasing on the network traffic means a txhashset request from a remote node.

And another good point for this 5ms sleep is: it can limit each txhashset request to a maximum 8000 bytes/5ms = 1.6M Bytes/s downloading speed, to avoid much overloading the Grin server's network bandwidth. And if want to enlarge this bandwidth limitation in the future, we can increase that 8000 in the caller of write_all.

garyyu · 2019-08-27T10:50:28Z

Closed by #2855

garyyu mentioned this issue Jun 8, 2019

change the 10us sleep to 5ms #2881

Closed

garyyu closed this as completed Aug 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why txhashset download response is so resource heavy #2787

Why txhashset download response is so resource heavy #2787

garyyu commented Apr 30, 2019

garyyu commented May 22, 2019

bladedoyle commented May 29, 2019

hashmap commented May 29, 2019

garyyu commented Jun 7, 2019 •

edited

Loading

garyyu commented Jun 8, 2019

garyyu commented Aug 27, 2019

Why txhashset download response is so resource heavy #2787

Why txhashset download response is so resource heavy #2787

Comments

garyyu commented Apr 30, 2019

garyyu commented May 22, 2019

bladedoyle commented May 29, 2019

hashmap commented May 29, 2019

garyyu commented Jun 7, 2019 • edited Loading

garyyu commented Jun 8, 2019

garyyu commented Aug 27, 2019

garyyu commented Jun 7, 2019 •

edited

Loading