Limits on Nursery Size #527

Miserlou · 2018-05-10T15:07:00Z

I'd like to use Trio for a simple scraping project. Let's say I have a million URLs, but I only have the resources on my box to handle 15 connections at a time. Is there a way to nursery to hold off on executing the rest of the jobs until the current 15 being executed have finished?

njsmith · 2018-05-10T16:10:32Z

Nurseries don't have this built in, no. Two ways that come to mind:

start 15 tasks, and have them each loop, pulling items off your list of 1,000,000 urls.
start 1,000,000 tasks, and have them all use a CapacityLimiter so that only 15 are actually making a request at any given time.

The second one unfortunately will use a lot more memory, because tasks in trio are pretty cheap but still do take some memory, more than a simpler string sitting in a list.

If you want to get fancier, it would also be possible to implement a reusable capacity-limited nursery class... We should think about adding that to docs as an example, or something. (I guess the semantic subtlety would be what to do with the queued tasks if there's an unhandled error – normally trio guarantees that after you call start_soon the task will run at least until the first await, even if it's cancelled, which is a useful guarantee because it gives with blocks and such a chance to run and clean up resources that might have been passed into the task. But then, maybe if you're using a special object whose whole point is to avoid allocating too many resources at once, though, then you shouldn't be allocating resources until the task actually starts.)

ric2b · 2018-05-16T10:15:52Z

I was also wondering about this.

I'm doing a performance benchmark of a web service, I want to measure the rate at which it accepts messages under sustained load.

I tried the second solution with the CapacityLimiter using the "asks" library but I easily run out of file descriptors (sockets, most likely) and the script crashes with errno 24: too many open files unless I set a really low limit like 4. (ulimit -n returns 1024 but even a limit of 8 is enough to cause the issue frequently)

Maybe I'm doing something wrong, my code is more or less this:

asks.init('trio')

async def send_message(limit):    
    async with limit:
        response = await asks.post(URL,  data=DATA)
    print(f'data: {DATA}, response status: {response.status_code}')

async def send_n_messages(n):
    limit = trio.CapacityLimiter(4)

    async with trio.open_nursery() as nursery:
        for i in range(n):
            nursery.start_soon(send_message, limit)       

trio.run(send_n_messages, 35000)

Because I want to make sure that the script isn't a bottleneck, I'd like to have a much larger number of concurrent connections, like a few hundreds.

Using the CapacityLimiter seems like the cleaner solution, so I'd prefer to keep it instead of adding a task queue, any ideas?

njsmith · 2018-05-16T11:05:27Z

Running out of sockets with a limit of 4 seems weird... Maybe you're keeping the socket open past the end of the CapacityLimiter block? Can you close the response object or something?

Another thing to watch out for is: https://asks.readthedocs.io/en/latest/a-look-at-sessions.html#important-connection-un-limiting
But I think that only matters if you're using an explicit Session object, which this code isn't.

ric2b · 2018-05-16T13:53:16Z

I think you're correct that the sockets are being kept open for longer than they should, but there seems to be no way to explicitly close them. They are probably closed/cleaned up automatically after each send_message terminates but not quickly enough to not cause problems.

I was able to solve it by using an asks Session with a few hundred connections, that way I don't even need to use a CapacityLimiter and I don't run out of file descriptors/sockets, so thanks for the link! :) (I just started experimenting with trio and asks yesterday, but so far I'm absolutely loving it!)

For anyone that ends up here with the same problem, this is what I did:

asks.init('trio')

async def send_message(session):    
    response = await session.post(URL,  data=DATA)
    print(f'data: {DATA}, response status: {response.status_code}')

async def send_n_messages(n):
    session = asks.Session(connections=200)

    async with trio.open_nursery() as nursery:
        for i in range(n):
            nursery.start_soon(send_message, session)       

trio.run(send_n_messages, 35000)

theelous3 · 2018-05-17T14:01:40Z

This is on asks' end, and a fix will be pushed this evening :)

@njsmith was correct. The base methods each create a Session. Currently the code relies on python to clean up the sockets, but evidently this may not be fast enough. I'll start force closing them. Using the Session is the correct way to go anyway, and doesn't have this issue :D Thanks for the feedback guys.

ric2b · 2018-05-17T15:09:21Z

That's great!
I'm really loving the trio + asks combination, it makes large amounts of concurrent http requests so effortless and readable!

Thanks to everyone contributing to them :) (hopefully I will too, eventually)

bronger · 2023-12-31T16:08:57Z

The following seems to work, but I would appreciate any confirmation: I use a CapacityLimiter in the task function, and start it in the nursery with nursery.start instead of nursery.start_soon. Moreover, the .started() method is called within the async with my_capacity_limiter: in the task function.

This way, I want to use the capacity limiter for both, limiting the number of concurrent tasks as well as the size of the nursery. Is this effective?

njsmith mentioned this issue May 10, 2018

Add collection of worked examples to tutorial #472

Open

ric2b mentioned this issue May 16, 2018

Clarify in the documentation that sessions should be used whenever doing large numbers of requests theelous3/asks#67

Closed

njsmith closed this as completed Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limits on Nursery Size #527

Limits on Nursery Size #527

Miserlou commented May 10, 2018

njsmith commented May 10, 2018

ric2b commented May 16, 2018 •

edited

Loading

njsmith commented May 16, 2018

ric2b commented May 16, 2018 •

edited

Loading

theelous3 commented May 17, 2018 •

edited

Loading

ric2b commented May 17, 2018

bronger commented Dec 31, 2023 •

edited

Loading

Limits on Nursery Size #527

Limits on Nursery Size #527

Comments

Miserlou commented May 10, 2018

njsmith commented May 10, 2018

ric2b commented May 16, 2018 • edited Loading

njsmith commented May 16, 2018

ric2b commented May 16, 2018 • edited Loading

theelous3 commented May 17, 2018 • edited Loading

ric2b commented May 17, 2018

bronger commented Dec 31, 2023 • edited Loading

ric2b commented May 16, 2018 •

edited

Loading

ric2b commented May 16, 2018 •

edited

Loading

theelous3 commented May 17, 2018 •

edited

Loading

bronger commented Dec 31, 2023 •

edited

Loading