[argobots-discuss] Program aborting in ABT_finalize (ABTI_pool_release failed)

Dorier, Matthieu mdorier at anl.gov
Tue Mar 23 09:20:05 CDT 2021


I’m creating the pool using ABT_pool_create_basic and “automatic” set to false, and I do indeed call ABT_pool_free on it.
I guess Argobots isn’t checking whether a pool is in use before freeing it (or whether it’s used by the first ES)?

My code is pretty complex so I tried to write a reproducer but I don’t see the error with it. Maybe I’m corrupting memory, somehow, and it just happens to be fine in small code that don’t reuse the memory after the ABT_pool_free… (though I ran address sanitizer and didn’t see any invalid access).

I’ll see what happens if I remove the ABT_pool_free.

Thanks,

Matthieu

From: "Iwasaki, Shintaro" <siwasaki at anl.gov>
Date: Tuesday, 23 March 2021 at 14:01
To: "discuss at argobots.org" <discuss at argobots.org>, "discuss at lists.argobots.org" <discuss at lists.argobots.org>
Cc: "Dorier, Matthieu" <mdorier at anl.gov>
Subject: Re: Program aborting in ABT_finalize (ABTI_pool_release failed)

Hi Matthieu,

I don't know when such an error typically happens. I'd be happy if you could share your code with me so that I can closely look at it.

This is just my guess, but perhaps ABT_pool_free() is used; in Argobots, primary ES's pool and schedulers may not be freed before ABT_finalize().
If you free that fifo_wait MPMC pool before ABT_finalize(), an already freed pool is accessed in ABT_finalize(), which may cause the error you wrote.
(It is related to https://github.com/pmodels/argobots/issues/58).
I attached a sample code to explain this issue. Please read the comments in the code if you are interested in it.

Thanks,
Shintaro

________________________________
From: Dorier, Matthieu via discuss <discuss at lists.argobots.org>
Sent: Tuesday, March 23, 2021 6:56 AM
To: discuss at argobots.org <discuss at argobots.org>
Cc: Dorier, Matthieu <mdorier at anl.gov>
Subject: [argobots-discuss] Program aborting in ABT_finalize (ABTI_pool_release failed)


Hi,



I’m getting this error in a program, when calling ABT_finalize:



../src/include/abti_pool.h:238: ABTI_pool_release: Assertion `ABTD_atomic_acquire_load_int32(&p_pool->num_scheds) > 0' failed.



The program only uses the primary ES, it does not create additional ES, however it does replace the primary ES’s pool and scheduler by creating a fifo_wait MPMC pool and passing it to ABT_xstream_set_main_sched_basic, with ABT_SCHED_BASIC_WAIT as sched_predef.



I’m using Argobots 1.0. I haven’t tested with other versions of Argobots. I’m sure I must be doing something wrong somewhere in my code. Could you tell me what I could look for that would lead to such an error?



For additional information, here it the stack trace when the error happens:



#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

#1  0x00007f56797fe535 in __GI_abort () at abort.c:79

#2  0x00007f56797fe40f in __assert_fail_base (fmt=0x7f5679960ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",

    assertion=0x7f5679d0e4c8 "ABTD_atomic_acquire_load_int32(&p_pool->num_scheds) > 0", file=0x7f5679d0e44b "../src/include/abti_pool.h", line=238,

    function=<optimized out>) at assert.c:92

#3  0x00007f567980c102 in __GI___assert_fail (assertion=0x7f5679d0e4c8 "ABTD_atomic_acquire_load_int32(&p_pool->num_scheds) > 0",

    file=0x7f5679d0e44b "../src/include/abti_pool.h", line=238, function=0x7f5679d0e660 <__PRETTY_FUNCTION__.6315> "ABTI_pool_release")

    at assert.c:101

#4  0x00007f5679d09e5e in ABTI_sched_free.part.4 ()

   from /projects/spack/opt/spack/linux-debian10-sandybridge/gcc-8.3.0/argobots-1.0-e4x7h6mgt7igmocc4ajro2g743ej322m/lib/libabt.so.0

#5  0x00007f5679cfe2c5 in ABTI_xstream_free ()

   from /projects/spack/opt/spack/linux-debian10-sandybridge/gcc-8.3.0/argobots-1.0-e4x7h6mgt7igmocc4ajro2g743ej322m/lib/libabt.so.0

#6  0x00007f5679cfa080 in ABT_finalize ()



Thanks,



Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.argobots.org/pipermail/discuss/attachments/20210323/b6a7cf8a/attachment-0002.html>


More information about the discuss mailing list