[Question]: About pre-fork model #641

Howar-sz · 2024-07-11T03:44:20Z

Hi, I'm a pyftpdlib user, and I'm looking to enhance the performance of my FTP server implementation. I came across the pre-fork model in the tutorial (https://github.com/giampaolo/pyftpdlib/blob/master/docs/tutorial.rst#pre-fork), but I'm having difficulty grasping how worker processes acquire connections. I attempted to integrate this model into unix_daemon.py, but it didn't yield any significant performance improvements.

# 50k files, 64k size, 1 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277063424 bytes transferred in 217 seconds (14.41M/s)
real	3m42.103s
user	1m25.694s
sys	0m15.360s

# 50k files, 64k size, 2 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277128960 bytes transferred in 518 seconds (6.03M/s)
real	5m0.634s
user	1m33.642s
sys	0m18.385s

# 50k files, 64k size, 4 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277260032 bytes transferred in 1123 seconds (2.78M/s)
real	5m0.588s
user	1m42.999s
sys	0m22.693s

# 50k files, 64k size, 8 parallel
Total: 5 directories, 50012 files, 0 symlinks
New: 50012 files, 0 symlinks
3277704304 bytes transferred in 1878 seconds (1.66M/s)
real	5m0.585s
user	1m26.395s
sys	0m19.545s

Look forward to hearing from you

The text was updated successfully, but these errors were encountered:

giampaolo · 2024-09-04T17:26:33Z

I'm having difficulty grasping how worker processes acquire connections.

As far as I remember, the parent / master process "passes" every new connection to one of the workers, so this may make things slower compared to the 1 process async model. If this is true, you may have more luck changing your benchmark so that it downloads, say, 10 files of 1G each instead of 50k files of 64K each. But it's just a supposition.

Also, what are you using for your benchmarks? Is it only one client downloading the file serially or there's multiple clients in parallel?

Note: I've never conducted benchmarks for the pre-fork model, so you're a pioneer in this sense. :)

giampaolo · 2024-09-04T17:27:15Z

PS: I see you're from Shenzhen. My wife is from there. :-)

Howar-sz · 2024-09-12T03:50:13Z

As far as I remember, the parent / master process "passes" every new connection to one of the workers, so this may make things slower compared to the 1 process async model. If this is true, you may have more luck changing your benchmark so that it downloads, say, 10 files of 1G each instead of 50k files of 64K each. But it's just a supposition.

Is "passes" means any new ftp connection need to allocated by parent/master process? If subprocess was busied, parent process will waiting?

Also, what are you using for your benchmarks? Is it only one client downloading the file serially or there's multiple clients in parallel?

It's an uploads test. I use lftp with -e "mirror -R -c -P <parallel>" arguments as my benchmark tool. I think lftp is multiple ftp connections in parallel if parallel argument greater than one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question]: About pre-fork model #641

[Question]: About pre-fork model #641

Howar-sz commented Jul 11, 2024

giampaolo commented Sep 4, 2024

giampaolo commented Sep 4, 2024

Howar-sz commented Sep 12, 2024

[Question]: About pre-fork model #641

[Question]: About pre-fork model #641

Comments

Howar-sz commented Jul 11, 2024

giampaolo commented Sep 4, 2024

giampaolo commented Sep 4, 2024

Howar-sz commented Sep 12, 2024