[argobots-discuss] Argobots ABT_eventual_set too slow

Carns, Philip H. carns at mcs.anl.gov
Tue Apr 23 10:16:13 CDT 2019

Hi Houjun,

Which scheduler are you using with Argobots?

If you can reproduce this easily it might be worth toggling between ABT_SCHED_BASIC and ABT_SCHED_BASIC_WAIT (whichever one you are not using) to narrow down a possible issue there.  I don't expect a problem there, but something unusual must be going on.  In my experience eventual_set() is an inexpensive call.

From: Iwasaki, Shintaro via discuss <discuss at lists.argobots.org>
Sent: Monday, April 22, 2019 9:45 AM
To: Houjun Tang
Cc: Iwasaki, Shintaro; discuss at lists.argobots.org
Subject: Re: [argobots-discuss] Argobots ABT_eventual_set too slow

Hi, Houjun,

Thank you for your detailed explanation! There are two possible issues:

1. Because tasklet (= task) is nonpreemptive, the current Argobots-aware HDF5 code might fail to overlap communications. For example, ABT_eventual_set uses busy-wait based synchronization (https://github.com/pmodels/argobots/blob/master/src/eventual.c#L260) if tasklet is used. If ULT (= thread) is used, it does user-level context-switch based synchronization (https://github.com/pmodels/argobots/blob/master/src/eventual.c#L257). However, ultimately it depends on how the code is written. This problem should be addressed by changing the HDF5 implementation (e.g., by using ULTs properly), if this is the case.

2. Currently the Argobots eventual uses a simple linked list, which performs poorly if so many threads and tasks are waiting. This performance should be improved by more advanced management of waiting threads in the Argobots runtime.

>From your explanation, I can hardly get which problem is more significant; intuitively, this operation itself does not seem to consume 0.1 seconds or so (though it depends on how many tasks are in the linked list).
I am happy to examine your code if it is okay, but I would really appreciate it if you would give me a simplified code that reproduces your performance issue.

Thank You,
Shintaro Iwasaki

On Fri, Apr 19, 2019 at 5:57 PM Houjun Tang <htang4 at lbl.gov<mailto:htang4 at lbl.gov>> wrote:
Hi Shintaro,

Thanks for the quick reply, here is a brief summary of what I have been working on.
I'm adding the asynchronous I/O feature to the HDF5 library using Argobots as the background thread execution engine. So whenever there is an HDF5 I/O call, the main thread will create an Argobots task (with ABT_task_create) and have it executed by Argobots in the background. Only one Argobots pool is used, so consecutive tasks are executed sequentially. The application code linked to the async HDF5 library is creating and writing a lot of HDF5 attributes, as seen in the figure I sent previously, the time taken by ABT_eventual_set varies greatly, from 2 us to 0.45 s with an average ~0.07s.

Any additional information would you like to know? Do you want the code?

Houjun Tang

On Fri, Apr 19, 2019 at 2:11 PM Iwasaki, Shintaro via discuss <discuss at lists.argobots.org<mailto:discuss at lists.argobots.org>> wrote:
Hello Houjun,

Thank you for reporting a performance issue with data!
Unfortunately, I haven't experienced this issue. I checked the code, but it is hard to judge if the implementation of ABT_eventual_set is bad or not. As far as I checked the implementation of ABT_eventual_set, this function does not looks very optimized (I mean, it uses a naive spinlock), but doe not seem very slow (I mean, it does not allocate memory every time).

In any case, this single operation should be finished within 1us or less (under no contention). I guess it might be caused by a scheduling issue or an affinity issue, but since the performance of this function has not been fully examined, the current implementation might have some performance bugs. I could diagnose this problem more if you would give me more details.

Thank You,
Shintaro Iwasaki

On Fri, Apr 19, 2019 at 2:56 PM Houjun Tang via discuss <discuss at lists.argobots.org<mailto:discuss at lists.argobots.org>> wrote:

I'm using Argobots as the engine for executing asynchronous I/O operations in the background of an HDF5 application, but found it to be slow in some operations. With profiling, the slowdown comes mostly from ABT_eventual_set. Below is a boxplot of the ABT_eventual_set time (measured by calling gettimeofday before and after it) from 385 operations, running with one process and one Argobots thread. The *_fn are different functions executed by Argobots. In most cases it's below 0.1s, but there are several cases that are taking more than 0.25 seconds. As these HDF5 operations take less than 0.1 seconds, the overhead of ABT_eventual_set becomes dominant.

Any idea what could have caused this?

Houjun Tang

discuss mailing list
discuss at lists.argobots.org<mailto:discuss at lists.argobots.org>
discuss mailing list
discuss at lists.argobots.org<mailto:discuss at lists.argobots.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.argobots.org/pipermail/discuss/attachments/20190423/828661d7/attachment.html>

More information about the discuss mailing list