[argobots-discuss] ABT segfault in MacOS
siwasaki at anl.gov
Fri Jun 25 15:36:29 CDT 2021
Thank you for checking it.
> No, I'm not using an M1 chip and yes, it seemed to be deterministic.
> Setting the ABT_THREAD_STACKSIZE=100000 seems to have fixed the issue.
If this is the case, I believe the stack size is too small for your workload.
It's basically a stack overflow (Pthreads typically uses a few megabytes but Argobots by default sets 4KB because Argobots applications tend to create many ULTs). There's no quick patch to fundamentally solve it.
(We had so many discussions: one solution is https://github.com/pmodels/argobots/pull/327)
Please set ABT_THREAD_STACKSIZE manually by yourself or ask the application developer to set it before initialization (supposedly, using setenv() before ABT_init()).
if you have any questions or find other issues, please feel free to let us know!
From: Jean Luca Bez <jlbez at lbl.gov>
Sent: Friday, June 25, 2021 2:09 PM
To: Iwasaki, Shintaro <siwasaki at anl.gov>
Cc: discuss at argobots.org <discuss at argobots.org>; discuss at lists.argobots.org <discuss at lists.argobots.org>
Subject: Re: [argobots-discuss] ABT segfault in MacOS
Thank you for the quick response!
No, I'm not using an M1 chip and yes, it seemed to be determinitic.
Setting the ABT_THREAD_STACKSIZE=100000 seems to have fixed the issue.
On Fri, Jun 25, 2021 at 12:02 PM Iwasaki, Shintaro <siwasaki at anl.gov<mailto:siwasaki at anl.gov>> wrote:
Thank you for reporting an issue. Just a quick confirmation:
0. Just in case, are you using an M1 chip?
1. Can you increase the stack size of ULT (by default 4KB, the following increases it to 100KB) and see if the problem happens?
2. Is it deterministic? Does it happen at the same place every time?
3. Maybe it encounters another issue. I'd truly appreciate it if you could send me a program/script that can reproduce this issue.
From: Jean Luca Bez via discuss <discuss at lists.argobots.org<mailto:discuss at lists.argobots.org>>
Sent: Friday, June 25, 2021 1:45 PM
To: discuss at argobots.org<mailto:discuss at argobots.org> <discuss at argobots.org<mailto:discuss at argobots.org>>
Cc: Jean Luca Bez <jlbez at lbl.gov<mailto:jlbez at lbl.gov>>
Subject: [argobots-discuss] ABT segfault in MacOS
I compiled the last git version of Argobots in MacOS, ran the tests and they all passed. However, when running an application I am getting some strange segfault that appears to come from ABT. Does anyone have faced similar issues?
*** Process received signal ***
Signal: Segmentation fault: 11 (11)
Signal code: (0)
Failing at address: 0x0
[ 0] 0 libsystem_platform.dylib 0x00007fff20428d7d _sigtramp + 29
[ 1] 0 ??? 0x0000000000000000 0x0 + 0
[ 2] 0 libabt.1.dylib 0x0000000105bdbdc0 ABT_thread_create + 128
[ 3] 0 libh5async.dylib 0x00000001064bde1f push_task_to_abt_pool + 559
[ 4] 0 libh5async.dylib 0x00000001064e6a02 async_group_create + 1890
[ 5] 0 libh5async.dylib 0x00000001064c4061 H5VL_async_group_create + 321
[ 6] 0 libhdf5.1000.dylib 0x0000000105f6f794 H5VL__group_create + 180
[ 7] 0 libhdf5.1000.dylib 0x0000000105f6f569 H5VL_group_create + 217
[ 8] 0 libhdf5.1000.dylib 0x0000000105d48a04 H5G__create_api_common + 660
[ 9] 0 libhdf5.1000.dylib 0x0000000105d485f5 H5Gcreate2 + 325
 0 async_test_parallel.exe 0x0000000105bbcb43 main + 739
 0 libdyld.dylib 0x00007fff203fef5d start + 1
 0 ??? 0x0000000000000001 0x0 + 1
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss