[argobots-discuss] Argobots ABT_eventual_set too slow
Houjun Tang
htang4 at lbl.gov
Thu Apr 25 15:19:48 CDT 2019
Hi all,
On Phil's earlier question, my code that uses Argobots is not compiled with
OpenMP, it's only the application code that by default enables OpenMP.
I have tried different things as you suggested, and it seems setting
OMP_PROC_BIND=false
(previously it is set to "spread") can solve the problem. With that
variable set, ABT_eventual_set() takes similar time (under 0.00001 seconds)
to when OpenMP is disabled. So maybe it's because the Argobots threads and
OpenMP threads are executing on the same core and causes the slowdown?
Thanks,
Houjun Tang
On Thu, Apr 25, 2019 at 9:57 AM Balaji, Pavan via discuss <
discuss at lists.argobots.org> wrote:
> Hi Quincey,
>
> As I mentioned in my other email, maybe my guess wasn't correct. We
> should dig a bit deeper to figure it out.
>
> FYI, BOLT is an implementation of the Intel OpenMP runtime API on top of
> Argobots. It doesn't change the compiler, and is a simple LD_PRELOAD for
> your already compiled applications. You can still use ICC, Clang, GCC,
> etc. You should try it out.
>
> -- Pavan
>
> > On Apr 25, 2019, at 10:12 AM, Quincey Koziol <koziol at lbl.gov> wrote:
> >
> > Hi Pavan,
> > So the BOLT version of OpenMP does lightweight threads? Is there
> any way to quiesce the OpenMP threads, so that they don’t have this level
> of impact when they aren’t executing code? Taking a core from OpenMP
> seems also reasonable - is it a runtime controllable thing? (i.e. with an
> API?)
> >
> > Quincey
> >
> >
> >
> >> On Apr 24, 2019, at 7:39 PM, Balaji, Pavan <balaji at anl.gov> wrote:
> >>
> >> Quincey,
> >>
> >> My guess is that you are running into the same problem as general
> pthreads + OpenMP interoperability issues. That is, OpenMP is likely
> creating as many pthreads as the number of cores, leaving nothing for
> Argobots. You can also try to reduce the number of cores available to
> OpenMP. But that seems like a suboptimal solution. We ideally want all
> layers to create as many lightweight threads as possible, irrespective of
> the number of cores available.
> >>
> >> -- Pavan
> >>
> >>> On Apr 24, 2019, at 6:31 PM, Quincey Koziol <koziol at lbl.gov> wrote:
> >>>
> >>> Hi Pavan,
> >>> That might not be a very good option for general use, at
> facilities. Any ideas about how Argobots & OpenMP are conflicting here?
> >>>
> >>> Quincey
> >>>
> >>>
> >>>> On Apr 24, 2019, at 1:51 PM, Balaji, Pavan via discuss <
> discuss at lists.argobots.org> wrote:
> >>>>
> >>>> Houjun,
> >>>>
> >>>> You could try BOLT, which allows you to have OpenMP internally use
> Argobots too. So they won't conflict with each other.
> >>>>
> >>>> -- Pavan
> >>>>
> >>>>> On Apr 24, 2019, at 1:49 PM, Houjun Tang via discuss <
> discuss at lists.argobots.org> wrote:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> I think I found the what is causing the performance slowdown. My
> previous experiments were using the default application configuration, with
> OpenMP enabled. I've just tried to compile the application without OpenMP,
> and the performance gets much better, the ABT_eventual_set takes less than
> 0.00002 seconds in all operations. So it looks like running Argobots with
> OpenMP may sometimes cause a slowdown. Any idea on how to resolve this?
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Houjun Tang
> >>>>>
> >>>>> On Tue, Apr 23, 2019 at 4:43 PM Houjun Tang <htang4 at lbl.gov> wrote:
> >>>>> Hi Phil,
> >>>>>
> >>>>> Thanks for the suggestion. I'm using the basic scheduler, and have
> changed to ABT_SCHED_BASIC as you mentioned, but the issue remains.
> >>>>>
> >>>>> I've talked to Shintaro earlier this afternoon, and sent him my
> codes to run the experiments, hopefully we can figure out what was going on.
> >>>>>
> >>>>>
> >>>>> Thanks,
> >>>>> Houjun Tang
> >>>>>
> >>>>> On Tue, Apr 23, 2019 at 8:16 AM Carns, Philip H. <carns at mcs.anl.gov>
> wrote:
> >>>>> Hi Houjun,
> >>>>>
> >>>>> Which scheduler are you using with Argobots?
> >>>>>
> >>>>> If you can reproduce this easily it might be worth toggling between
> ABT_SCHED_BASIC and ABT_SCHED_BASIC_WAIT (whichever one you are not using)
> to narrow down a possible issue there. I don't expect a problem there, but
> something unusual must be going on. In my experience eventual_set() is an
> inexpensive call.
> >>>>>
> >>>>> thanks,
> >>>>> -Phil
> >>>>> From: Iwasaki, Shintaro via discuss <discuss at lists.argobots.org>
> >>>>> Sent: Monday, April 22, 2019 9:45 AM
> >>>>> To: Houjun Tang
> >>>>> Cc: Iwasaki, Shintaro; discuss at lists.argobots.org
> >>>>> Subject: Re: [argobots-discuss] Argobots ABT_eventual_set too slow
> >>>>>
> >>>>> Hi, Houjun,
> >>>>>
> >>>>> Thank you for your detailed explanation! There are two possible
> issues:
> >>>>>
> >>>>> 1. Because tasklet (= task) is nonpreemptive, the current
> Argobots-aware HDF5 code might fail to overlap communications. For example,
> ABT_eventual_set uses busy-wait based synchronization (
> https://github.com/pmodels/argobots/blob/master/src/eventual.c#L260) if
> tasklet is used. If ULT (= thread) is used, it does user-level
> context-switch based synchronization (
> https://github.com/pmodels/argobots/blob/master/src/eventual.c#L257).
> However, ultimately it depends on how the code is written. This problem
> should be addressed by changing the HDF5 implementation (e.g., by using
> ULTs properly), if this is the case.
> >>>>>
> >>>>> 2. Currently the Argobots eventual uses a simple linked list, which
> performs poorly if so many threads and tasks are waiting. This performance
> should be improved by more advanced management of waiting threads in the
> Argobots runtime.
> >>>>>
> >>>>> From your explanation, I can hardly get which problem is more
> significant; intuitively, this operation itself does not seem to consume
> 0.1 seconds or so (though it depends on how many tasks are in the linked
> list).
> >>>>> I am happy to examine your code if it is okay, but I would really
> appreciate it if you would give me a simplified code that reproduces your
> performance issue.
> >>>>>
> >>>>> Thank You,
> >>>>> Shintaro Iwasaki
> >>>>>
> >>>>> On Fri, Apr 19, 2019 at 5:57 PM Houjun Tang <htang4 at lbl.gov> wrote:
> >>>>> Hi Shintaro,
> >>>>>
> >>>>> Thanks for the quick reply, here is a brief summary of what I have
> been working on.
> >>>>> I'm adding the asynchronous I/O feature to the HDF5 library using
> Argobots as the background thread execution engine. So whenever there is an
> HDF5 I/O call, the main thread will create an Argobots task (with
> ABT_task_create) and have it executed by Argobots in the background. Only
> one Argobots pool is used, so consecutive tasks are executed sequentially.
> The application code linked to the async HDF5 library is creating and
> writing a lot of HDF5 attributes, as seen in the figure I sent previously,
> the time taken by ABT_eventual_set varies greatly, from 2 us to 0.45 s with
> an average ~0.07s.
> >>>>>
> >>>>> Any additional information would you like to know? Do you want the
> code?
> >>>>>
> >>>>>
> >>>>> Regards,
> >>>>> Houjun Tang
> >>>>>
> >>>>> On Fri, Apr 19, 2019 at 2:11 PM Iwasaki, Shintaro via discuss <
> discuss at lists.argobots.org> wrote:
> >>>>> Hello Houjun,
> >>>>>
> >>>>> Thank you for reporting a performance issue with data!
> >>>>> Unfortunately, I haven't experienced this issue. I checked the code,
> but it is hard to judge if the implementation of ABT_eventual_set is bad or
> not. As far as I checked the implementation of ABT_eventual_set, this
> function does not looks very optimized (I mean, it uses a naive spinlock),
> but doe not seem very slow (I mean, it does not allocate memory every time).
> >>>>> https://github.com/pmodels/argobots/blob/master/src/eventual.c#L229
> >>>>>
> >>>>> In any case, this single operation should be finished within 1us or
> less (under no contention). I guess it might be caused by a scheduling
> issue or an affinity issue, but since the performance of this function has
> not been fully examined, the current implementation might have some
> performance bugs. I could diagnose this problem more if you would give me
> more details.
> >>>>>
> >>>>> Thank You,
> >>>>> Shintaro Iwasaki
> >>>>>
> >>>>> On Fri, Apr 19, 2019 at 2:56 PM Houjun Tang via discuss <
> discuss at lists.argobots.org> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> I'm using Argobots as the engine for executing asynchronous I/O
> operations in the background of an HDF5 application, but found it to be
> slow in some operations. With profiling, the slowdown comes mostly from
> ABT_eventual_set. Below is a boxplot of the ABT_eventual_set time (measured
> by calling gettimeofday before and after it) from 385 operations, running
> with one process and one Argobots thread. The *_fn are different functions
> executed by Argobots. In most cases it's below 0.1s, but there are several
> cases that are taking more than 0.25 seconds. As these HDF5 operations take
> less than 0.1 seconds, the overhead of ABT_eventual_set becomes dominant.
> >>>>>
> >>>>> Any idea what could have caused this?
> >>>>>
> >>>>> Thanks,
> >>>>> Houjun Tang
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> discuss mailing list
> >>>>> discuss at lists.argobots.org
> >>>>> https://lists.argobots.org/mailman/listinfo/discuss
> >>>>> _______________________________________________
> >>>>> discuss mailing list
> >>>>> discuss at lists.argobots.org
> >>>>> https://lists.argobots.org/mailman/listinfo/discuss
> >>>>> _______________________________________________
> >>>>> discuss mailing list
> >>>>> discuss at lists.argobots.org
> >>>>> https://lists.argobots.org/mailman/listinfo/discuss
> >>>>
> >>>> _______________________________________________
> >>>> discuss mailing list
> >>>> discuss at lists.argobots.org
> >>>> https://lists.argobots.org/mailman/listinfo/discuss
> >>>
> >>
> >
>
> _______________________________________________
> discuss mailing list
> discuss at lists.argobots.org
> https://lists.argobots.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.argobots.org/pipermail/discuss/attachments/20190425/37c04b5a/attachment.html>
More information about the discuss
mailing list