[argobots-discuss] question about stack reuse on dedicated pools
Phil Carns
carns at mcs.anl.gov
Fri Feb 23 15:00:21 CST 2018
Thanks Halim. For now I've changed our library to set that value all
the way down to 8 in our most common initialization path.
Let me ask a sanity check question though:
Once it became clear what was happening, I assumed we were probably
triggering the same scenario in our abt-io
(https://xgitlab.cels.anl.gov/sds/abt-io) library. It sets aside a pool
to service I/O operations that will likewise always be consuming work
and never producing it.
I checked abt-io with a clear-cut test case that should have triggered
this, and it's fine, though. The memory usage is very low.
One difference in abt-io is that it is using tasklets rather than
threads in the consumer pool, because in that case we never need the
functions to yield. They always run to completion once started.
Are tasklets immune to this particular problem?
thanks,
-Phil
On 02/23/2018 02:35 PM, Halim Amer wrote:
> Phil,
>
> Yes, your understanding is correct. Single-producer-multiple-consumer
> cases trigger this issue. I had another user complain about this and I
> advised him to tune the ABT_MEM_MAX_NUM_STACKS for his needs. I was
> considering lowering the default value for a better performance vs.
> memory footprint tradeo-ff. Since this is the second time the issue is
> reported, I think it's time to raise it's priority. We will do some
> performance testing on our end and update the default value.
>
> Thanks for reporting.
>
> Halim
> http://www.mcs.anl.gov/~aamer
>
> On 2/23/18 1:13 PM, Phil Carns wrote:
>> Ok, I think I'm on the trail now.
>>
>> The default num stacks (how many it will queue for reuse on an ES) in
>> Argobots is 65536. You can override this with "export
>> ABT_MEM_MAX_NUM_STACKS=64". If I do that, then my memory consumption
>> stays down and I see that the stacks are overflowing to a global pool
>> where they (presumably) are available for reuse by the ES that is
>> launching ULTs in my case.
>>
>> Anecdotally, the default limit of 65536 looks like it is wasting
>> around 500 MiB of memory *per ES* that we create in our usage
>> scenario, since the stacks can't be reused from the local pool of the
>> ES's that are executing our ULTs.
>>
>> Is the moral of the story that we should set ABT_MEM_MAX_NUM_STACKS
>> much lower in our use case? I'm going see if I can do that
>> automatically from the library that we are using to set up this
>> scenario (Margo).
>>
>> thanks,
>> -Phil
>>
>> On 02/23/2018 02:01 PM, Phil Carns wrote:
>>> Is the issue that when a ULT is created it draws from the stack
>>> queue on the calling ES, but when the ULT is free'd it is put on the
>>> stack queue of the ES that executed it?
>>>
>>> In the example I have that is consuming more memory than expected,
>>> the ULTs are always dispatched by one specific ES (the one that is
>>> driving progress on the network transport) and executed on a
>>> different ES (the dedicated pool set aside for servicing I/O
>>> operations).
>>>
>>> thanks,
>>> -Phil
>>>
>>> On 02/23/2018 01:09 PM, Phil Carns wrote:
>>>> Hi all,
>>>>
>>>> I'm trying to debug problem with a service that spawns a new ULT
>>>> for each incoming client request. The service works correctly, but
>>>> memory consumption is considerably higher over time if I create a
>>>> dedicated pool with it's own execution streams to run the ULTs
>>>> instead of just running them on the main ES.
>>>>
>>>> There are some details here:
>>>> https://xgitlab.cels.anl.gov/sds/margo/issues/40
>>>>
>>>> That specific test example is creating a total of 40,000 ULTs over
>>>> the course of execution. It doesn't have that many active
>>>> concurrently, though. The client program is issuing no more than 4
>>>> concurrent requests. A ULT will slightly outlive the request
>>>> handler, but the number of active ULTS isn't growing indefinitely.
>>>> I can confirm that argobots is freeing them throughout execution if
>>>> I turn on Argobots logging.
>>>>
>>>> I can see in the Argobots source code that when a ULT is freed it
>>>> doesn't literally free() the stack memory for that ULT; it is kept
>>>> on a queue:
>>>>
>>>> https://github.com/pmodels/argobots/blob/master/src/include/abti_mem.h#L287
>>>>
>>>>
>>>> If I print the value of num_stacks in Argobots at the above line, I
>>>> get the following:
>>>>
>>>> - when using the default/main ES: no higher than 512
>>>>
>>>> - when explicitly creating a new pools and ES: 40,000
>>>>
>>>> I haven't tracked down the logic to understand why there is a
>>>> difference here, but is this expected? It looks like Argobots is
>>>> keeping a much larger number of stacks around (without reusing
>>>> them?) if I create a new pool and ES to run my ULTs.
>>>>
>>>> thanks,
>>>>
>>>> -Phil
>>>>
>>>> _______________________________________________
>>>> discuss mailing list
>>>> discuss at lists.argobots.org
>>>> https://lists.argobots.org/mailman/listinfo/discuss
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list
>>> discuss at lists.argobots.org
>>> https://lists.argobots.org/mailman/listinfo/discuss
>>
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at lists.argobots.org
>> https://lists.argobots.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list
> discuss at lists.argobots.org
> https://lists.argobots.org/mailman/listinfo/discuss
More information about the discuss
mailing list