[argobots-discuss] scheduling with priority for resumed ULTs

Iwasaki, Shintaro siwasaki at anl.gov
Wed Jul 8 09:20:05 CDT 2020

Hello Phil,

Thank you. I understand the situation more. In my understanding, all the following options can be implemented in the current Argobots. Some are less invasive while others are easy to implement.

- Single execution stream + one pool + ABT_pool_pop_timedwait()
For that to work, though, the pool implementation would need to be able to inspect the ABT_unit at push() time and tell whether it is a newly created thread or a resumed thread so that it could track them separately.
Without https://github.com/pmodels/argobots/issues/154, one can use a flag stored in a user-created descriptor corresponding to the thread to check if that thread has been already executed or not.  A hash table is a general solution, but it would be heavy. In some applications, such descriptor can be obtained via ABT_thread_get_arg().

Another way is to use a ULT-specific value (e.g., ABT_thread_get_specific()) to manage such a flag.  A quick hack is using `ABT_thread_set_arg()` and `ABT_thread_get_arg()` to manage an execution flag, which may be faster than ABT_thread_set_specific() and ABT_thread_get_specific() in the current Argobots implementation (related to https://github.com/pmodels/argobots/issues/159).

This idea is less invasive, but implementing a correct and reasonably scalable pool with flag management might not be an easy task.

- Single execution stream + multiple pools + ABT_pool_pop_timedwait()

Even if you use two pools (for example, the scheduler I suggested in the previous mail), it should work well if these two pools (old-thread-pool and new-thread-pool) share the same Pthreads mutex/condition variable. This change of the pool implementation can be minimum.

- Multiple execution streams + each has one pool + ABT_pool_pop_timedwait()

The easiest way that does not change the pool implementation is using multiple execution streams: some for newly created threads (these execution streams only check new-thread-pool) and the others for suspended threads (these execution streams only check old-thread-pool). The oversubscription cost should not be very high if these execution streams are sleeping immediately when no work is available. This does not need to change the pool implementation but is very invasive.

I would also like to note that, presently there is no good example of custom pool implementation in Argobots, so it is hard to show how to implement it in a reasonably scalable way.  I will add a reasonable example in one to two days, which might be helpful.


From: Carns, Philip H. <carns at mcs.anl.gov>
Sent: Wednesday, July 8, 2020 7:59 AM
To: Iwasaki, Shintaro <siwasaki at anl.gov>; discuss at lists.argobots.org <discuss at lists.argobots.org>
Subject: Re: scheduling with priority for resumed ULTs

That's interesting.

For us the issue of how to block on two pools would be a problem.  I don't think we have an application-specific rules that would help; either pool could receive new work while the scheduler is blocked on a pop.

Something along the lines of #154 that would allows some control over what happens within the pool data structure would be helpful, but it's not a high priority.

In the meantime (since our use case is so simple) I wonder if we could do something within the confines of the current pool interface.  The linked list pointers are not exposed to the caller (right?), so nothing is stopping a pool from maintaining multiple linked lists internally if it wants to.  Multiple work unit queues within a single pool could share a single internal condition/signalling mechanism for blocking pop calls.

For that to work, though, the pool implementation would need to be able to inspect the ABT_unit at push() time and tell whether it is a newly created thread or a resumed thread so that it could track them separately.

Is there any way to do that?  It might not be a great idea from a software engineering perspective for a pool to dig too deep into the unit or thread data structures, but if there were something in there that could indicate if a thread had ever been run or not, then we could hack it as a proof of concept to see if it makes a performance difference before spending time on something more invasive.


From: Iwasaki, Shintaro <siwasaki at anl.gov>
Sent: Tuesday, July 7, 2020 6:44 PM
To: discuss at lists.argobots.org <discuss at lists.argobots.org>
Cc: Carns, Philip H. <carns at mcs.anl.gov>
Subject: Re: scheduling with priority for resumed ULTs

Hello, Phil,

Thank you for your excellent question.  The current Argobots does not provide a very straightforward way.

1. The simplest idea

In my opinion, the easiest way should be one that uses two pools, new-thread-pool and old-thread-pool.
The new threads/tasklets are pushed to one of new-thread-pools.  The user-defined scheduler looks like following:

  while (1) {
    if (unit = ABT_pool_pop(old_thread_pool)) {
      /* Prioritize resumed/yielded threads */
      ABT_xstream_run_unit(unit, old_thread_pool);
    if (unit = ABT_pool_pop(new_thread_pool)) {
      ABT_unit_set_associated_pool(unit, old_thread_pool);
      /* Threads are moved to old_thread_pool, so if this "unit" suspends or yields, it is
       * pushed to old_thread_pool, which will be prioritized over new threads. */
      ABT_xstream_run_unit(unit, old_thread_pool);

However, this scheduler may cause a deadlock with a certain dependency.  For example, thread2 is never scheduled forever since thread1 is in old_thread_pool.

g_flag = 0;
void thread1() {
  ABT_thread_create(thread2, ... new_thread_pool); /* newly created thread is pushed to new_thread_pool */
  while (g_flag == 0)
    ABT_thread_yield(); /* thread1 was associated with old_thread_pool when thread1 was scheduled for the first time. */
void thread2() {
  g_flag = 1;

To avoid this, the scheduler can sometimes check and run threads in new_thread_pool (for example, every N iterations).

2. Does it work with ABT_pool_pop_timedwait() (i.e., ABT_POOL_FIFO_WAIT)?

ABT_pool_pop_timedwait() only takes a single pool; users cannot timed-wait for multiple pools.  Consider using ABT_pool_pop_timedwait() instead of ABT_pool_pop() in the scheduler I mentioned above.  In general, a scheduler can timed-wait (= sleep) for either old_thread_pool or new_thread_pool even though the other pool has threads.  If there is application-specific knowledge (e.g., old_thread_pool can be empty only when new_thread_pool is empty etc), ABT_pool_pop_timedwait() + the scheduling strategy above is a good idea, though.

For now, there is no general solution.  One idea is using more execution streams: some ESs are dedicated to new-thread-pool while the other ESs to old-thread-pool.  If they sleep in ABT_pool_pop_timedwait(), the performance penalty of oversubscription etc should be small.

Creating a customized pool is another way (e.g., marking a thread when it is scheduled for the first time and manages newly created threads and suspend threads separately in a pool), but it is complicated.

The fundamental solution should be allowing different pool operations corresponding to yield/create/suspend/... (e.g., push to the head of the list on creation but pushed to the tail of the list on suspension: https://github.com/pmodels/argobots/issues/154), but it is under development.  If this option is the most promising, I will prioritize this.

If you have any questions, please let us know.


From: Carns, Philip H. via discuss <discuss at lists.argobots.org>
Sent: Tuesday, July 7, 2020 4:40 PM
To: discuss at lists.argobots.org <discuss at lists.argobots.org>
Cc: Carns, Philip H. <carns at mcs.anl.gov>
Subject: [argobots-discuss] scheduling with priority for resumed ULTs

Hi all,

I thought this question may be of general interest so I am asking on the mailing list.

My understanding is that the default pool/scheduler combination uses FIFO ordering.  Suppose we wanted to try a slight variation: FIFO ordering, but with resumed ULTs always taking priority over new ULTs that have not yet begun execution.

The use case for this would be for a data service to expedite requests that are already in progress (and were suspended while waiting on disk or network activity) to try to get them out of the system before starting to process new requests, assuming that there is work available in either category.  We create a new ULT for every incoming request.  Under heavy client process load it is plausible that the final step(s) of servicing an existing request could be delayed behind newly incoming requests, but we haven't empirically confirmed yet.

What would be the easiest way to accomplish this?  I think I can find a way to do it, but it probably would not be the cleverest solution 🙂

FWIW we usually use ABT_POOL_FIFO_WAIT and ABT_SCHED_BASIC_WAIT rather than the default pool and scheduler, but I don't think that should change anything.  They are based on the default pool and scheduler and only differ in terms of their idle behavior.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.argobots.org/pipermail/discuss/attachments/20200708/ba47b4d3/attachment-0001.html>

More information about the discuss mailing list