<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">
<div dir="ltr">Hello, Phil,
<div><br>
</div>
<div>Thank you for your report. We could find a bug.</div>
<div>The problem you reported seems to happen when `ABT_THREAD_STACKSIZE` is equal to `ABT_SCHED_STACKSIZE` (by default 4MB).</div>
<div>For now, it can be avoided by setting a different value (e.g., `ABT_THREAD_STACKSIZE=$((4 * 1024 * 1024 + 64))`).</div>
<div>If the problem persists, please let us know.</div>
<div><br>
</div>
<div>Please see the GitHub issue (<a href="https://github.com/pmodels/argobots/issues/94" target="_blank">https://github.com/pmodels/argobots/issues/94</a>) and the PR (<a href="https://github.com/pmodels/argobots/pull/95" target="_blank">https://github.com/pmodels/argobots/pull/95</a>)
for details.<br>
</div>
<div><br>
</div>
<div>
<div>I note that an error might occur if quite large thread stack size is requested (e.g., `ABT_THREAD_STACKSIZE` > `ABTD_MEM_STACK_PAGE_SIZE`, which is by default 8MB).</div>
<div>If really needed, please increase `ABTD_MEM_STACK_PAGE_SIZE` as well, though I believe 2MB ~ 4MB should be enough in most cases.</div>
<div>The latter issue will be investigated and fixed in the future.<br>
</div>
<br class="m_-806029887442346753gmail-Apple-interchange-newline">
</div>
<div>Best Regards,</div>
<div>Shintaro Iwasaki</div>
</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Feb 22, 2019 at 10:34 AM Iwasaki, Shintaro via discuss <<a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div>
<div dir="ltr">
<div dir="ltr">
<div>Hello,</div>
<div><br>
</div>
<div dir="ltr">Thank you, Phil. I will create an GitHub issue about the the stack overflow detection once we understand all the problems.
<div><br>
</div>
<div>I'm happy to hear that margo worked with 1MB stack size. However, Argobots should work with 4MB stack size</div>
<div>(which is the default Pthreads stack size on some machines) or larger as well; there should be no upper bound.</div>
<div><br>
</div>
<div>A scheduler has its own stack size (tunable via ABT_SCHED_STACKSIZE) and I suspect it is related since the bug happens in ABT_xstream_set_main_sched.</div>
<div>
<div>I will check this issue soon after some urgent tasks.</div>
</div>
<div>(Note that scheduler stack size is independent of ULT stack size by design since it virtually affects the stack size of every tasklet execution)</div>
<div><br>
</div>
<div>Thank You,</div>
<div>Shintaro Iwasaki</div>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, Feb 22, 2019 at 9:57 AM Carns, Philip H. <<a href="mailto:carns@mcs.anl.gov" target="_blank">carns@mcs.anl.gov</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<div style="font-family:"Segoe UI",Frutiger,"Frutiger Linotype","Dejavu Sans","Helvetica Neue",Arial,sans-serif;font-size:14px">
<div class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040hiri-body-wrapper">
Thanks Shintaro. It seems like a minimum we will have to use 1. We can't control our component scope enough to get away with 0 generally. I'm interested in following up on 2, or techniques like it, later on. Maybe for regression testing. I'll have to circle
back to it a little later, though.<br>
<br>
Fortunately for our use case we have one particular library that is used in most of our builds where we can centralize some of these parameters. I'm setting this (and another parameter we have been using) in this subroutine:<br>
<br>
<a href="https://xgitlab.cels.anl.gov/sds/margo/blob/master/src/margo.c#L180" target="_blank">https://xgitlab.cels.anl.gov/sds/margo/blob/master/src/margo.c#L180</a><br>
<br>
This works fine (setting stack size to 1M). My first guess was to set it to 4M to be even more conservative until we understand the issue better, but that caused a different crash almost immediately when I tried that:<br>
<br>
(gdb) where
<div>#0 __GI___libc_free (mem=0x7ffff72d9000) at malloc.c:3085</div>
<div>#1 0x00007ffff7fb16d8 in ABTI_sched_free (</div>
<div> p_sched=p_sched@entry=0x55555555bbb0) at ../../src/sched/sched.c:747</div>
<div>#2 0x00007ffff7fa5795 in ABTI_sched_discard_and_free (p_sched=0x55555555bbb0)</div>
<div> at ../../src/include/abti_sched.h:51</div>
<div>#3 ABTI_xstream_set_main_sched (p_xstream=p_xstream@entry=0x55555555ba10, </div>
<div> p_sched=0x55555555bfb0) at ../../src/stream.c:1721</div>
<div>#4 0x00007ffff7fa5f09 in ABT_xstream_set_main_sched (xstream=0x55555555ba10, </div>
<div> sched=<optimized out>) at ../../src/stream.c:811</div>
<div>#5 0x00007ffff7fc1ee0 in margo_init_opt (addr_str=0x7fffffffc90a "na+sm://", </div>
<div> mode=1, hg_init_info=0x0, use_progress_thread=0, rpc_thread_count=-1)</div>
<div> at ../src/margo.c:248</div>
<div>#6 0x0000555555556aad in main (argc=2, argv=0x7fffffffc488)</div>
<div> at ../examples/margo-example-server.c:39</div>
<div><br>
I suspect that line number just happened to get caught in the crossfire of a memory corruption but I haven't investigated. At any rate, is there a practical (and lower than 4M) limit to how big we can set the stack size? Or is there another tunable that must
be boosted as well to allow for larger stacks?</div>
<br>
thanks!<br>
-Phil</div>
<div class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040hiri-extra">
<p>On 2019-02-21 17:55:56-05:00 Iwasaki, Shintaro via discuss wrote:</p>
<blockquote style="padding-left:10px;border-left:1px solid rgb(204,204,204);margin:0px">
<div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<span style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255);display:inline">Hello All,</span></div>
<div style="font-family:Calibri,Arial,Helvetica,sans-serif;font-size:12pt;color:rgb(0,0,0)">
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
Thank you for your reports!</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
As one of the developers, I would like to summarize the current status of Argobots.</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
As pointed out, Argobots, by default, uses 16KB for ULTs.</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
At present, these three ways are relatively reasonable to work around, find, or solve this issue;</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
0. Use knowledge that, if Argobots is used alone, it should "typically" happens in ABT_finalize. The stack size is set by ABT_thread_attr_set_stacksize individually.</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
1. Use larger stacksize (e.g., ABT_THREAD_STACKSIZE=$((4 * 1024 * 1024)) for example) and see what happens.</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
2. [Uncertain] Use Valgrind with --enable-valgrind (although it is extremely slow, so not practical for large applications)</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
For now, I think the workaround 1. (using larger stack size by default) is best among them.</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
I haven't tried other tools, but I strongly believe that Argobots-unaware tools won't detect this problem; only Valgind can detect it.</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
---</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
There are several issues:</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
1. Too small default stack size</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
It might be too small to drive large system libraries (e.g., a ULT as a progress thread)</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
I'm not sure how much it should be increased, or first of all, whether we should increase it.</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
This does not solve the problem of the "silent stack corruption", though.</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
In other words, if Argobots can detect stack overflow, users can change the value</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
by increasing the default stack size <span style="color:rgb(34,34,34);font-family:Arial,Helvetica,sans-serif;font-size:small">or the stack size of a specific thread requiring large amount of stack.</span></div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
2. Lack of stack overflow detection</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
<div style="margin:0px">For example, the following two are often used;</div>
<div style="margin:0px">- Stack canaries (lazy but cheap)</div>
<div style="margin:0px">- mprotect (eager but expensive)</div>
<div style="margin:0px">I will create a GitHub issue for further details if detection is preferable.</div>
<div style="margin:0px"></div>
<div style="margin:0px">3. Check if Valgrind works for this issue</div>
<div style="margin:0px">If --enable-valgrind is set, Argobots registers <span style="font-family:Arial,Helvetica,sans-serif;font-size:small;background-color:rgb(255,255,255);display:inline">
ULT's stacks to </span>Valgrind.</div>
<div style="margin:0px">It should work but I haven't tested it yet.</div>
<div style="margin:0px"><span style="font-family:Arial,Helvetica,sans-serif;font-size:small;background-color:rgb(255,255,255);display:inline">Another problem is that --enable-valgrind degrades performance of Argobots even if it is not run on Valgrind</span></div>
<div style="margin:0px"><span style="font-family:Arial,Helvetica,sans-serif;font-size:small;background-color:rgb(255,255,255);display:inline">(see
<a href="https://github.com/pmodels/argobots/issues/78" target="_blank">https://github.com/pmodels/argobots/issues/78</a>).</span></div>
<div style="margin:0px"></div>
<div style="margin:0px">We would appreciate any feedback.</div>
<div style="margin:0px"></div>
</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
Thank You,</div>
<div style="margin:0px;font-size:small;font-family:Arial,Helvetica,sans-serif;color:rgb(34,34,34);background-color:rgb(255,255,255)">
Shintaro Iwasaki</div>
</div>
<div id="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040appendonsend">
</div>
<hr style="display:inline-block;width:98%">
<div id="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040divRplyFwdMsg" dir="ltr">
<font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Lombardi, Johann via discuss <<a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a>><br>
<b>Sent:</b> Thursday, February 21, 2019 4:30 PM<br>
<b>To:</b> <a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a><br>
<b>Cc:</b> Lombardi, Johann; Liu, Xuezhao; Wang, Di<br>
<b>Subject:</b> Re: [argobots-discuss] how to debug a stack overrun in Argobots</font>
<div></div>
</div>
<div lang="FR">
<div class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_WordSection1">
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal">
<span lang="EN-US">Hi Phil,</span></p>
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal">
<span lang="EN-US"> </span></p>
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal">
<span lang="EN-US">I think we hit the same issue recently on the DAOS side and had to bump the stack size as well. Wangdi & Xuezhao should know more.</span></p>
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal">
<span lang="EN-US">Maybe a regression in ABT?<br>
<br>
Cheers,</span></p>
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal">
<span lang="EN-US">Johann</span></p>
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal">
<span lang="EN-US"> </span></p>
<div style="border-right:none;border-bottom:none;border-left:none;border-top:1pt solid rgb(181,196,223);padding:3pt 0cm 0cm">
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal" style="margin-left:36pt">
<b><span style="font-size:12pt;color:black">From: </span></b><span style="font-size:12pt;color:black">"Carns, Philip H. via discuss" <<a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a>><br>
<b>Reply-To: </b>"<a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a>" <<a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a>><br>
<b>Date: </b>Thursday, 21 February 2019 at 15:50<br>
<b>To: </b>"<a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a>" <<a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a>><br>
<b>Cc: </b>"Carns, Philip H." <<a href="mailto:carns@mcs.anl.gov" target="_blank">carns@mcs.anl.gov</a>><br>
<b>Subject: </b>Re: [argobots-discuss] how to debug a stack overrun in Argobots</span></p>
</div>
<div></div>
<div>
<div>
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal" style="margin-left:36pt">
Just to follow up a little bit; I realized from looking at README.envvar just now that the default value of ABT_THREAD_STACKSIZE is 16K. That's almost certainly too low for us because we have ULTs that make calls into a variety of system libraries (including
fairly big things like libfabric) that are beyond our control.<br>
<br>
It seems likely that we will have to run with a larger stack size, but I would still like to have a better understanding of where the problem paths are, and how much head room we really need, if anyone has suggestions.<br>
<br>
thanks!<br>
-Phil</p>
</div>
</div>
<div>
<p style="margin-left:36pt">On 2019-02-21 15:31:53-05:00 Carns, Philip H. via discuss wrote:</p>
<blockquote style="border-top:none;border-right:none;border-bottom:none;border-left:1pt solid rgb(204,204,204);padding:0cm 0cm 0cm 8pt;margin-left:0cm;margin-right:0cm">
<div>
<div>
<div>
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal" style="margin-left:36pt">
Hi all, <br>
<br>
There is a little bit of back story on <a href="https://github.com/pmodels/argobots/issues/93" target="_blank">https://github.com/pmodels/argobots/issues/93</a> , but make a long story short we have realized that we have some code that is overflowing the stack
in Argobots. Many thanks to Shintaro for his help and insight or we may have never figured this out. We can work around the problem with `export <code><span style="font-size:10pt">ABT_THREAD_STACKSIZE=$((1024 * 1024))`. This not only fixes a Power8 test
case for us, but also appears to solve a different frustrating, nonsensical segmentation fault that we've been chasing with a different code permutation on x86_64.</span></code><br>
<br>
<span style="font-size:10pt;font-family:"Courier New""><code>Any suggestions on how to track down what's triggering this in our code or get a better idea of how much stack we need? </code></span><span style="font-family:"Courier New"">We are using a considerable
number of libraries, many of which are not maintained by us, so I don't even know where to start looking yet. </span>My usual go to tool for this would be asan in gcc or clang, but I don't think that will work correctly with Argobots, and maybe there is a
better solution anyway. </p>
</div>
<div>
<p class="m_-806029887442346753gmail-m_-9028257069011153826gmail-m_1550809522785091040x_MsoNormal" style="margin-left:36pt">
thanks,<br>
-Phil</p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
<p>---------------------------------------------------------------------<br>
Intel Corporation SAS (French simplified joint stock company)<br>
Registered headquarters: "Les Montalets"- 2, rue de Paris,<br>
92196 Meudon Cedex, France<br>
Registration Number: 302 456 199 R.C.S. NANTERRE<br>
Capital: 4,572,000 Euros</p>
<p>This e-mail and any attachments may contain confidential material for<br>
the sole use of the intended recipient(s). Any review or distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
_______________________________________________<br>
discuss mailing list<br>
<a href="mailto:discuss@lists.argobots.org" target="_blank">discuss@lists.argobots.org</a><br>
<a href="https://lists.argobots.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.argobots.org/mailman/listinfo/discuss</a><br>
</blockquote>
</div>
</div>
</body>
</html>