[argobots-discuss] question about stack reuse on dedicated pools

Fri Feb 23 13:13:21 CST 2018

Ok, I think I'm on the trail now.

The default num stacks (how many it will queue for reuse on an ES) in 
Argobots is 65536.  You can override this with "export 
ABT_MEM_MAX_NUM_STACKS=64".  If I do that, then my memory consumption 
stays down and I see that the stacks are overflowing to a global pool 
where they (presumably) are available for reuse by the ES that is 
launching ULTs in my case.

Anecdotally, the default limit of 65536 looks like it is wasting around 
500 MiB of memory *per ES* that we create in our usage scenario, since 
the stacks can't be reused from the local pool of the ES's that are 
executing our ULTs.

Is the moral of the story that we should set ABT_MEM_MAX_NUM_STACKS much 
lower in our use case?  I'm going see if I can do that automatically 
from the library that we are using to set up this scenario (Margo).

thanks,
-Phil

On 02/23/2018 02:01 PM, Phil Carns wrote:
> Is the issue that when a ULT is created it draws from the stack queue 
> on the calling ES, but when the ULT is free'd it is put on the stack 
> queue of the ES that executed it?
>
> In the example I have that is consuming more memory than expected, the 
> ULTs are always dispatched by one specific ES (the one that is driving 
> progress on the network transport) and executed on a different ES (the 
> dedicated pool set aside for servicing I/O operations).
>
> thanks,
> -Phil
>
> On 02/23/2018 01:09 PM, Phil Carns wrote:
>> Hi all,
>>
>> I'm trying to debug problem with a service that spawns a new ULT for 
>> each incoming client request.  The service works correctly, but 
>> memory consumption is considerably higher over time if I create a 
>> dedicated pool with it's own execution streams to run the ULTs 
>> instead of just running them on the main ES.
>>
>> There are some details here: 
>> https://xgitlab.cels.anl.gov/sds/margo/issues/40
>>
>> That specific test example is creating a total of 40,000 ULTs over 
>> the course of execution.  It doesn't have that many active 
>> concurrently, though.  The client program is  issuing no more than 4 
>> concurrent requests.  A ULT will slightly outlive the request 
>> handler, but the number of active ULTS isn't growing indefinitely.  I 
>> can confirm that argobots is freeing them throughout execution if I 
>> turn on Argobots logging.
>>
>> I can see in the Argobots source code that when a ULT is freed it 
>> doesn't literally free() the stack memory for that ULT; it is kept on 
>> a queue:
>>
>> https://github.com/pmodels/argobots/blob/master/src/include/abti_mem.h#L287 
>>
>>
>> If I print the value of num_stacks in Argobots at the above line, I 
>> get the following:
>>
>> - when using the default/main ES: no higher than 512
>>
>> - when explicitly creating a new pools and ES: 40,000
>>
>> I haven't tracked down the logic to understand why there is a 
>> difference here, but is this expected?  It looks like Argobots is 
>> keeping a much larger number of stacks around (without reusing them?) 
>> if I create a new pool and ES to run my ULTs.
>>
>> thanks,
>>
>> -Phil
>>
>> _______________________________________________
>> discuss mailing list
>> discuss at lists.argobots.org
>> https://lists.argobots.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list
> discuss at lists.argobots.org
> https://lists.argobots.org/mailman/listinfo/discuss