[argobots-discuss] scheduling with priority for resumed ULTs

Carns, Philip H. carns at mcs.anl.gov
Wed Jul 8 07:59:22 CDT 2020

That's interesting.

For us the issue of how to block on two pools would be a problem.  I don't think we have an application-specific rules that would help; either pool could receive new work while the scheduler is blocked on a pop.

Something along the lines of #154 that would allows some control over what happens within the pool data structure would be helpful, but it's not a high priority.

In the meantime (since our use case is so simple) I wonder if we could do something within the confines of the current pool interface.  The linked list pointers are not exposed to the caller (right?), so nothing is stopping a pool from maintaining multiple linked lists internally if it wants to.  Multiple work unit queues within a single pool could share a single internal condition/signalling mechanism for blocking pop calls.

For that to work, though, the pool implementation would need to be able to inspect the ABT_unit at push() time and tell whether it is a newly created thread or a resumed thread so that it could track them separately.

Is there any way to do that?  It might not be a great idea from a software engineering perspective for a pool to dig too deep into the unit or thread data structures, but if there were something in there that could indicate if a thread had ever been run or not, then we could hack it as a proof of concept to see if it makes a performance difference before spending time on something more invasive.


From: Iwasaki, Shintaro <siwasaki at anl.gov>
Sent: Tuesday, July 7, 2020 6:44 PM
To: discuss at lists.argobots.org <discuss at lists.argobots.org>
Cc: Carns, Philip H. <carns at mcs.anl.gov>
Subject: Re: scheduling with priority for resumed ULTs

Hello, Phil,

Thank you for your excellent question.  The current Argobots does not provide a very straightforward way.

1. The simplest idea

In my opinion, the easiest way should be one that uses two pools, new-thread-pool and old-thread-pool.
The new threads/tasklets are pushed to one of new-thread-pools.  The user-defined scheduler looks like following:

  while (1) {
    if (unit = ABT_pool_pop(old_thread_pool)) {
      /* Prioritize resumed/yielded threads */
      ABT_xstream_run_unit(unit, old_thread_pool);
    if (unit = ABT_pool_pop(new_thread_pool)) {
      ABT_unit_set_associated_pool(unit, old_thread_pool);
      /* Threads are moved to old_thread_pool, so if this "unit" suspends or yields, it is
       * pushed to old_thread_pool, which will be prioritized over new threads. */
      ABT_xstream_run_unit(unit, old_thread_pool);

However, this scheduler may cause a deadlock with a certain dependency.  For example, thread2 is never scheduled forever since thread1 is in old_thread_pool.

g_flag = 0;
void thread1() {
  ABT_thread_create(thread2, ... new_thread_pool); /* newly created thread is pushed to new_thread_pool */
  while (g_flag == 0)
    ABT_thread_yield(); /* thread1 was associated with old_thread_pool when thread1 was scheduled for the first time. */
void thread2() {
  g_flag = 1;

To avoid this, the scheduler can sometimes check and run threads in new_thread_pool (for example, every N iterations).

2. Does it work with ABT_pool_pop_timedwait() (i.e., ABT_POOL_FIFO_WAIT)?

ABT_pool_pop_timedwait() only takes a single pool; users cannot timed-wait for multiple pools.  Consider using ABT_pool_pop_timedwait() instead of ABT_pool_pop() in the scheduler I mentioned above.  In general, a scheduler can timed-wait (= sleep) for either old_thread_pool or new_thread_pool even though the other pool has threads.  If there is application-specific knowledge (e.g., old_thread_pool can be empty only when new_thread_pool is empty etc), ABT_pool_pop_timedwait() + the scheduling strategy above is a good idea, though.

For now, there is no general solution.  One idea is using more execution streams: some ESs are dedicated to new-thread-pool while the other ESs to old-thread-pool.  If they sleep in ABT_pool_pop_timedwait(), the performance penalty of oversubscription etc should be small.

Creating a customized pool is another way (e.g., marking a thread when it is scheduled for the first time and manages newly created threads and suspend threads separately in a pool), but it is complicated.

The fundamental solution should be allowing different pool operations corresponding to yield/create/suspend/... (e.g., push to the head of the list on creation but pushed to the tail of the list on suspension: https://github.com/pmodels/argobots/issues/154), but it is under development.  If this option is the most promising, I will prioritize this.

If you have any questions, please let us know.


From: Carns, Philip H. via discuss <discuss at lists.argobots.org>
Sent: Tuesday, July 7, 2020 4:40 PM
To: discuss at lists.argobots.org <discuss at lists.argobots.org>
Cc: Carns, Philip H. <carns at mcs.anl.gov>
Subject: [argobots-discuss] scheduling with priority for resumed ULTs

Hi all,

I thought this question may be of general interest so I am asking on the mailing list.

My understanding is that the default pool/scheduler combination uses FIFO ordering.  Suppose we wanted to try a slight variation: FIFO ordering, but with resumed ULTs always taking priority over new ULTs that have not yet begun execution.

The use case for this would be for a data service to expedite requests that are already in progress (and were suspended while waiting on disk or network activity) to try to get them out of the system before starting to process new requests, assuming that there is work available in either category.  We create a new ULT for every incoming request.  Under heavy client process load it is plausible that the final step(s) of servicing an existing request could be delayed behind newly incoming requests, but we haven't empirically confirmed yet.

What would be the easiest way to accomplish this?  I think I can find a way to do it, but it probably would not be the cleverest solution 🙂

FWIW we usually use ABT_POOL_FIFO_WAIT and ABT_SCHED_BASIC_WAIT rather than the default pool and scheduler, but I don't think that should change anything.  They are based on the default pool and scheduler and only differ in terms of their idle behavior.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.argobots.org/pipermail/discuss/attachments/20200708/0c3ebc8d/attachment.html>

More information about the discuss mailing list