<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<p>That's all fantastic Shintaro, thank you for the updates!</p>
<p><br>
</p>
<p>-Phil</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 4/21/21 10:39 AM, Iwasaki, Shintaro
wrote:<br>
</div>
<blockquote type="cite" cite="mid:DM6PR09MB5750CFD47F79C1123BD9C74DD5479@DM6PR09MB5750.namprd09.prod.outlook.com">
<style type="text/css" style="display:none;">P {margin-top:0;margin-bottom:0;}</style>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
Hi Phil,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
Though you should have already known, I would like to tell you
that:</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
- The Argobots Spack package supports several new options
including stack guard and libunwind settings (see <a href="https://github.com/spack/spack/pull/23133" moz-do-not-send="true">https://github.com/spack/spack/pull/23133</a>)</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
- Argobots now supports mprotect-based stack guard, which causes
SEGV when a ULT smashes a stack (see <a href="https://github.com/pmodels/argobots/pull/327" moz-do-not-send="true">https://github.com/pmodels/argobots/pull/327</a>)</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
- This mprotect-based mechanism should work on x86/64, ARM, and
POWER machines. I tried Linux (Debian/RedHat), FreeBSD, and
<span style="background-color:rgb(255, 255, 255);display:inline
!important">Intel-based </span>OSX (see
<a href="https://github.com/pmodels/argobots/pull/328" moz-do-not-send="true">https://github.com/pmodels/argobots/pull/328</a>).</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
If you have any requests, suggestions, or bug reports, please
let us know.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
(I am aware of the Spack issue related to Argobots. I plan to
write a quick patch tomorrow:
<a href="https://github.com/spack/spack/issues/23168" moz-do-not-send="true">https://github.com/spack/spack/issues/23168</a>)</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
Best,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif;
font-size: 12pt; color: rgb(0, 0, 0);">
Shintaro </div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Carns,
Philip H. <a class="moz-txt-link-rfc2396E" href="mailto:carns@mcs.anl.gov"><carns@mcs.anl.gov></a><br>
<b>Sent:</b> Wednesday, April 14, 2021 4:01 PM<br>
<b>To:</b> Iwasaki, Shintaro <a class="moz-txt-link-rfc2396E" href="mailto:siwasaki@anl.gov"><siwasaki@anl.gov></a>;
<a class="moz-txt-link-abbreviated" href="mailto:discuss@lists.argobots.org">discuss@lists.argobots.org</a> <a class="moz-txt-link-rfc2396E" href="mailto:discuss@lists.argobots.org"><discuss@lists.argobots.org></a><br>
<b>Subject:</b> Re: [argobots-discuss] modifying scheduler
event frequency?</font>
<div> </div>
</div>
<div>
<p><br>
</p>
<div class="x_moz-cite-prefix">On 4/14/21 4:55 PM, Iwasaki,
Shintaro wrote:<br>
</div>
<blockquote type="cite">
<style type="text/css" style="display:none">p
{margin-top:0;
margin-bottom:0}</style>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
Hi Phil,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
Thanks. I can understand a bigger picture.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">> ABT_info_</span><span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">print_thread_stacks_in_pool()</span><br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
I hope it works. <span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">Note that </span><span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">print_thread_stacks_in_pool() is not
async-signal safe</span><span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt"> (ABT_info_trigger_print_all_thread_stacks()
is an exception), so please don't call it in a signal
handler.</span></div>
</blockquote>
<p><br>
</p>
<p>Ok, no problem. We don't do much via signals in Mochi
(almost all of our control capabilities are triggered via RPCs
that launch ULTs to do the work).<br>
</p>
<p><br>
</p>
<blockquote type="cite">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
> We use argobots almost exclusively with spack at this
point.</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
Many HPC users use Spack to build dependent libraries. <span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">I will add some debug options (including
libunwind, stack guard, ...) as well as other major
options to the Spack Argobots package. We are also
implementing an mprotect-based stack guard option (which
is not in Argobots 1.1, though).</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt"><br>
</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<span style="color:rgb(0,0,0);
font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt">Overall, p<span style="background-color:rgb(255,255,255);
display:inline!important">lease give us a week or so in
total.</span></span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
There is large room for improvement of t<span style="background-color:rgb(255,255,255);
display:inline!important">he debugging/profiling
capability.</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<span style="background-color:rgb(255,255,255);
display:inline!important">If you have any questions,
requests, and/or suggestions, please feel free to tell us.</span><br>
</div>
</blockquote>
<p><br>
</p>
<p>Sounds great! Debuggability seems to be the next frontier
for our project, so we'll probably be experimenting with more
of these capabilities as time goes on. It's taken a little
while for us to recognize which design/debugging patterns
would be most useful.
<br>
</p>
<p><br>
</p>
<p>thanks,</p>
<p>-Phil<br>
</p>
<blockquote type="cite">
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
Thanks,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
Shintaro</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Carns, Philip H.
<a class="x_moz-txt-link-rfc2396E" href="mailto:carns@mcs.anl.gov" moz-do-not-send="true"><carns@mcs.anl.gov></a><br>
<b>Sent:</b> Wednesday, April 14, 2021 3:42 PM<br>
<b>To:</b> Iwasaki, Shintaro <a class="x_moz-txt-link-rfc2396E" href="mailto:siwasaki@anl.gov" moz-do-not-send="true">
<siwasaki@anl.gov></a>; <a class="x_moz-txt-link-abbreviated" href="mailto:discuss@lists.argobots.org" moz-do-not-send="true">
discuss@lists.argobots.org</a> <a class="x_moz-txt-link-rfc2396E" href="mailto:discuss@lists.argobots.org" moz-do-not-send="true">
<discuss@lists.argobots.org></a><br>
<b>Subject:</b> Re: [argobots-discuss] modifying scheduler
event frequency?</font>
<div> </div>
</div>
<div>
<p>Ah, thanks for the thorough information as always
Shintaro :)</p>
<p><br>
</p>
<p>print_all_thread_stacks() was tempting because it would
potentially encompass more (in the Mochi use case, it
would pick up hypothetical pools created by higher level
components that we don't have a reference to). Based on
the information in this email thread, though, I think I'm
better off focusing on pools under our control so that I
can use print_thread_stacks_in_pool(). This should work
fine; I was just over-thinking the use case. The pools
are under our own control in the vast majority of
configurations.<br>
</p>
<p><br>
</p>
<p>In the big picture, I was exploring this because of a bug
report we have from one of our collaborators who is
getting a nonsensical hang in a complex scenario that we
can't easily reproduce or attach a debugger to. I would
like to be able to send an RPC to a process at an
arbitrary point in time and dump what it is up to so that
we can understand why it didn't complete something it was
trying to do.<br>
</p>
<p><br>
</p>
<p>libunwind sounds great :) I probably would have been
asking about that next.</p>
<p><br>
</p>
<p>I guess I'll use this as an opportunity to
request/suggest that the libunwind capability be added as
a variant to the argobots spack package (along with a way
to enable future mprotect / stack canary checks).</p>
<p><br>
</p>
<p>We use argobots almost exclusively with spack at this
point. Not that argobots itself is hard to compile
manually, but it is often one of a large number of
dependencies that we need to build, so it's best to just
unify them in one packaging system. It would be
straightforward for us to set up an alternative
environment yaml with various argobots debugging
capabilities enabled for development/debugging purposes.<br>
</p>
<p><br>
</p>
<p>thanks!</p>
<p>-Phil<br>
</p>
<p><br>
</p>
<div class="x_x_moz-cite-prefix">On 4/14/21 3:57 PM,
Iwasaki, Shintaro wrote:<br>
</div>
<blockquote type="cite">
<style type="text/css" style="display:none">p
{margin-top:0;
margin-bottom:0}</style>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
Hi Phil,</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<div><br>
</div>
<div>Thanks for using Argobots! The following is my
answers to your questions in addition to some tips.</div>
<div>We would appreciate it if you could share more
information about your workload and the purpose so
that we can give you more specific suggestions. Also,
we welcome any feature requests and bug reports.</div>
<div><br>
</div>
<div>1. How to change a scheduler's event frequency?</div>
<div>1.1. Predefined scheduler</div>
<div>First, there is no way to dynamically change the
event frequency (even if you hack ABT_sched or a
pointer you used in ABT_sched_get_data()... since
event_freq is loaded to a local variable).</div>
<div><a class="x_x_moz-txt-link-freetext" href="https://github.com/pmodels/argobots/blob/main/src/sched/basic_wait.c#L102" moz-do-not-send="true">https://github.com/pmodels/argobots/blob/main/src/sched/basic_wait.c#L102</a></div>
<div>Currently, using a special ABT_sched_config when
you create a scheduler is the cleanest and the only
way to change the event frequency.</div>
<div>```</div>
<div>ABT_sched_config config;</div>
<div>int new_freq = 16; // The default value is 50 (<a class="x_x_moz-txt-link-freetext" href="https://github.com/pmodels/argobots/blob/main/src/arch/abtd_env.c#L13" moz-do-not-send="true">https://github.com/pmodels/argobots/blob/main/src/arch/abtd_env.c#L13</a>)</div>
<div>ABT_sched_config_create(&config,
ABT_sched_basic_freq, 16, ABT_sched_config_var_end);</div>
<div>```</div>
<div>1.2. Custom scheduler<br>
</div>
<div>You can call ABT_xstream_check_events() more
frequently after calling
ABT_info_trigger_print_all_thread_stacks() (e.g., when
a global flag is on, a scheduler calls
ABT_xstream_check_events() in every iteration).</div>
<div><br>
</div>
<div>2. ABT_info_trigger_print_all_thread_stacks()</div>
<div>ABT_info_trigger_print_all_thread_stacks() is
designed for deadlock/livelock detection, so if your
program is just (extremely) slow,
ABT_info_trigger_print_all_thread_stacks() might not
be a right routine to try.</div>
<div><br>
</div>
<div>> The first example I tried appeared to
essentially defer dump until shutdown.</div>
<div>When one of your ULTs encounters a deadlock, the
scheduling loop might not be called. You might want to
set timeout for
ABT_info_trigger_print_all_thread_stacks(). For
example, the following test will forcibly print stacks
after 3.0 seconds even if some execution streams have
not reached ABT_xstream_check_events().</div>
<div><a class="x_x_moz-txt-link-freetext" href="https://github.com/pmodels/argobots/blob/main/test/basic/info_stackdump2.c#L30" moz-do-not-send="true">https://github.com/pmodels/argobots/blob/main/test/basic/info_stackdump2.c#L30</a></div>
<div>This is dangerous (I mean, it can dump a stack of a
running ULT), so Argobots does not guarantee anything
but it might be helpful to understand a deadlock issue
sometimes.</div>
<div><br>
</div>
<div>===</div>
<div><br>
</div>
<div>3. Some tips</div>
<div>3.1. gdb</div>
<div>I would use gdb if it would be available to check a
deadlock/performance issue. For example, if a program
looks hanging, I will attach a debugger to that
process and see what's happening.</div>
<div>3.2. libunwind for
ABT_info_trigger_print_all_thread_stacks()</div>
<div>Unless you are an extremely skillful low-level
programmer, I would recommend you enable libunwind for
better understanding of stacks. By default,
ABT_info_trigger_print_all_thread_stacks() dumps raw
hex stack data.</div>
<div>3.3. "occasionally tied up in system calls"</div>
<div>I'm not sure if it's happening in the Argobots
runtime (now Argobots uses futex for synchronization
on external threads), but if you are calling
ABT_info_trigger_print_all_thread_stacks() in a signal
handler, please be aware that system calls terminate
(e.g., futex, poll, or pthread_cond_wait) if a signal
hits the process.</div>
<div>(Argobots synchronization implementation is aware
of it and should not be affected by an external
signal. This property is thoroughly tested:
<a class="x_x_moz-txt-link-freetext" href="https://github.com/pmodels/argobots/blob/main/test/util/abttest.c#L245-L287" moz-do-not-send="true">
https://github.com/pmodels/argobots/blob/main/test/util/abttest.c#L245-L287</a>)</div>
<div>Note that the user can call
ABT_info_trigger_print_all_thread_stacks() on a normal
thread without any problem. It is implemented just in
an async-signal safe manner.</div>
<div>3.4. Stack dump</div>
<div>ABT_info_print_thread_stacks_in_pool() is a less
invasive way to print stacks, especially if you know a
list of pools. It prints stacks immediately.
Basically, ABT_info_trigger_print_all_thread_stacks()
sets a flag to call
ABT_info_print_thread_stacks_in_pool() for all pools
after all the execution streams stop in
ABT_xstream_check_events().</div>
<div><br>
</div>
<div>Thanks,</div>
<div>Shintaro</div>
</div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;
font-size:12pt; color:rgb(0,0,0)">
<br>
</div>
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="x_x_divRplyFwdMsg" dir="ltr"><font style="font-size:11pt" face="Calibri, sans-serif" color="#000000"><b>From:</b> Phil Carns via discuss
<a class="x_x_moz-txt-link-rfc2396E" href="mailto:discuss@lists.argobots.org" moz-do-not-send="true"><discuss@lists.argobots.org></a><br>
<b>Sent:</b> Wednesday, April 14, 2021 2:18 PM<br>
<b>To:</b> <a class="x_x_moz-txt-link-abbreviated" href="mailto:discuss@lists.argobots.org" moz-do-not-send="true">
discuss@lists.argobots.org</a> <a class="x_x_moz-txt-link-rfc2396E" href="mailto:discuss@lists.argobots.org" moz-do-not-send="true">
<discuss@lists.argobots.org></a><br>
<b>Cc:</b> Carns, Philip H. <a class="x_x_moz-txt-link-rfc2396E" href="mailto:carns@mcs.anl.gov" moz-do-not-send="true">
<carns@mcs.anl.gov></a><br>
<b>Subject:</b> [argobots-discuss] modifying scheduler
event frequency?</font>
<div> </div>
</div>
<div>
<p>Hi all,</p>
<p>Is there a clean way to change a scheduler's event
frequency on the fly?</p>
<p>Browsing the API, I see two possibilities:</p>
<ul>
<li>set it when the scheduler is first created (using
ABT_sched_basic_freq?)</li>
<li>set it dynamically by manipulating the
ABT_sched_get_data() pointer, but this seems
especially dangerous since the sched data struct
definition isn't public (i.e. it could cause memory
corruption if the internal struct def changed)<br>
</li>
</ul>
<p>For some context (in case there is a different way to
go about this entirely), I'm trying to figure out how
to get ABT_info_trigger_print_all_thread_stacks() to
print information more quickly, which IIUC relies on
getting the active schedulers to call get_events()
sooner.</p>
<p>I'm happy to add some explicit ABT_thread_yield()
shortly after the
ABT_info_trigger_print_all_thread_stacks() to at least
get the calling ES to execute it's scheduler loop
immediately, but I think that won't matter much if it
doesn't trip the frequency counter when I do it.</p>
<p>Without this (at least with the _wait scheduler and
threads that are occasionally tied up in system calls)
I think the stack dump is likely to trigger too late
to display what I'm hoping to capture when I call it.
The first example I tried appeared to essentially
defer dump until shutdown.<br>
</p>
<p>thanks!</p>
<p>-Phil<br>
</p>
</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
</body>
</html>