[argobots-discuss] ABT segfault in MacOS

Iwasaki, Shintaro siwasaki at anl.gov
Fri Jun 25 15:36:29 CDT 2021


Hi Luca,

Thank you for checking it.

> No, I'm not using an M1 chip and yes, it seemed to be deterministic.
> Setting the ABT_THREAD_STACKSIZE=100000 seems to have fixed the issue.
If this is the case, I believe the stack size is too small for your workload.
It's basically a stack overflow (Pthreads typically uses a few megabytes but Argobots by default sets 4KB because Argobots applications tend to create many ULTs). There's no quick patch to fundamentally solve it.
(We had so many discussions: one solution is https://github.com/pmodels/argobots/pull/327)

Please set ABT_THREAD_STACKSIZE manually by yourself or ask the application developer to set it before initialization (supposedly, using setenv() before ABT_init()).

if you have any questions or find other issues, please feel free to let us know!

Thanks,
Shintaro

________________________________
From: Jean Luca Bez <jlbez at lbl.gov>
Sent: Friday, June 25, 2021 2:09 PM
To: Iwasaki, Shintaro <siwasaki at anl.gov>
Cc: discuss at argobots.org <discuss at argobots.org>; discuss at lists.argobots.org <discuss at lists.argobots.org>
Subject: Re: [argobots-discuss] ABT segfault in MacOS

Hi Shintaro,

Thank you for the quick response!

No, I'm not using an M1 chip and yes, it seemed to be determinitic.

Setting the ABT_THREAD_STACKSIZE=100000 seems to have fixed the issue.

Best,
Jean Luca

On Fri, Jun 25, 2021 at 12:02 PM Iwasaki, Shintaro <siwasaki at anl.gov<mailto:siwasaki at anl.gov>> wrote:
Hi Luca,

Thank you for reporting an issue. Just a quick confirmation:
0. Just in case, are you using an M1 chip?
1. Can you increase the stack size of ULT (by default 4KB, the following increases it to 100KB) and see if the problem happens?
ABT_THREAD_STACKSIZE=100000 ./your_app.exe
2. Is it deterministic? Does it happen at the same place every time?
3. Maybe it encounters another issue. I'd truly appreciate it if you could send me a program/script that can reproduce this issue.

Best,
Shintaro

________________________________
From: Jean Luca Bez via discuss <discuss at lists.argobots.org<mailto:discuss at lists.argobots.org>>
Sent: Friday, June 25, 2021 1:45 PM
To: discuss at argobots.org<mailto:discuss at argobots.org> <discuss at argobots.org<mailto:discuss at argobots.org>>
Cc: Jean Luca Bez <jlbez at lbl.gov<mailto:jlbez at lbl.gov>>
Subject: [argobots-discuss] ABT segfault in MacOS

Hi,

I compiled the last git version of Argobots in MacOS, ran the tests and they all passed. However, when running an application I am getting some strange segfault that appears to come from ABT. Does anyone have faced similar issues?


 *** Process received signal ***

 Signal: Segmentation fault: 11 (11)

 Signal code:  (0)

 Failing at address: 0x0

 [ 0] 0   libsystem_platform.dylib            0x00007fff20428d7d _sigtramp + 29

 [ 1] 0   ???                                 0x0000000000000000 0x0 + 0

 [ 2] 0   libabt.1.dylib                      0x0000000105bdbdc0 ABT_thread_create + 128

 [ 3] 0   libh5async.dylib                    0x00000001064bde1f push_task_to_abt_pool + 559

 [ 4] 0   libh5async.dylib                    0x00000001064e6a02 async_group_create + 1890

 [ 5] 0   libh5async.dylib                    0x00000001064c4061 H5VL_async_group_create + 321

 [ 6] 0   libhdf5.1000.dylib                  0x0000000105f6f794 H5VL__group_create + 180

 [ 7] 0   libhdf5.1000.dylib                  0x0000000105f6f569 H5VL_group_create + 217

 [ 8] 0   libhdf5.1000.dylib                  0x0000000105d48a04 H5G__create_api_common + 660

 [ 9] 0   libhdf5.1000.dylib                  0x0000000105d485f5 H5Gcreate2 + 325

 [10] 0   async_test_parallel.exe             0x0000000105bbcb43 main + 739

 [11] 0   libdyld.dylib                       0x00007fff203fef5d start + 1

 [12] 0   ???                                 0x0000000000000001 0x0 + 1


Thank you,

Jean Luca
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.argobots.org/pipermail/discuss/attachments/20210625/59c86733/attachment-0001.html>


More information about the discuss mailing list