[argobots-discuss] Program aborting in ABT_finalize (ABTI_pool_release failed)

Dorier, Matthieu mdorier at anl.gov
Tue Mar 23 10:33:25 CDT 2021


I see, thanks!
Matthieu

From: "Iwasaki, Shintaro" <siwasaki at anl.gov>
Date: Tuesday, 23 March 2021 at 15:28
To: "Dorier, Matthieu" <mdorier at anl.gov>, "discuss at argobots.org" <discuss at argobots.org>
Subject: Re: Program aborting in ABT_finalize (ABTI_pool_release failed)

Hi Matthieu,

Thanks for your quick update.

If a pool created by ABT_pool_create() is associated with the primary execution stream (and thus automatic = false), you do not need to free it explicitly. ABT_finalize() will free everything that is associated with the primary stream. If you use Argobots 1.0, doing so for a non-automatic pool might cause memory leak, though.

> I think it would be good if Argobots allowed pools created by ABT_pool_create to have an “automatic” flag as well, and be free automatically. Otherwise I’m not even sure how we are supposed to free the pool associated with the primary ES if that pool happens to be a custom one.

Yes, I fully agree.  If I could redesign all functions from scratch, ABT_pool_create() would have an automatic argument. Unfortunately, Argobots 1.0 and 1.1 do not have such a configuration option, so please maintain flags externally to selectively free only pools that are created by ABT_pool_create() if necessary. We will extend ABT_pool_config to support automatic configuration in the future.

---

If your program uses a user-defined pool, please check https://github.com/pmodels/argobots/issues/316.
This issue should be fixed somehow before Argobots 1.1, which is planned to be released soon (within one or two weeks).

Thanks,
Shintaro

________________________________
From: Dorier, Matthieu <mdorier at anl.gov>
Sent: Tuesday, March 23, 2021 9:55 AM
To: Iwasaki, Shintaro <siwasaki at anl.gov>; discuss at argobots.org <discuss at argobots.org>
Subject: Re: Program aborting in ABT_finalize (ABTI_pool_release failed)


Hi Shintaro,



I think I’m narrowing down the problem (I’m testing right now). We initially had all our pools created with automatic=TRUE in Margo, and weren’t calling ABT_pool_free anywhere. But a recent commit from Phil enabled the use of a custom pool implementation, created with ABT_pool_create. The problem with these pools is that there is no “automatic” flag, so he changed to automatic=FALSE for the basic pools so he could call ABT_pool_free on all the pools, however not keeping track nor checking that a pool was associated with the primary ES.



I think it would be good if Argobots allowed pools created by ABT_pool_create to have an “automatic” flag as well, and be free automatically. Otherwise I’m not even sure how we are supposed to free the pool associated with the primary ES if that pool happens to be a custom one.



Thanks,



Matthieu





From: "Iwasaki, Shintaro" <siwasaki at anl.gov>
Date: Tuesday, 23 March 2021 at 14:44
To: "Dorier, Matthieu" <mdorier at anl.gov>
Subject: Re: Program aborting in ABT_finalize (ABTI_pool_release failed)



Hi Matthieu,



Even if automatic is false, please do not call ABT_pool_free() for the main ES's pool. It must be automatically freed by ABT_finalize().

Argobots 1.0 "works" though a few hundred bytes will be leaked because of its bug (a few hundred bytes per pool on ABT_finalize() as far as I checked with Valgrind).

Argobots 1.1 (or the Argobots main branch) has fixed this memory leak issue, so I would recommend you use the latest one if possible.



> Argobots isn’t checking whether a pool is in use before freeing it whether it’s used by the first ES.

Currently Argobots checks neither of them, but it should be added to ease debugging.

I created an issue: https://github.com/pmodels/argobots/issues/315



Thanks,

Shintaro







________________________________

From: Dorier, Matthieu <mdorier at anl.gov>
Sent: Tuesday, March 23, 2021 9:20 AM
To: Iwasaki, Shintaro <siwasaki at anl.gov>; discuss at argobots.org <discuss at argobots.org>; discuss at lists.argobots.org <discuss at lists.argobots.org>
Subject: Re: Program aborting in ABT_finalize (ABTI_pool_release failed)



I’m creating the pool using ABT_pool_create_basic and “automatic” set to false, and I do indeed call ABT_pool_free on it.

I guess Argobots isn’t checking whether a pool is in use before freeing it (or whether it’s used by the first ES)?



My code is pretty complex so I tried to write a reproducer but I don’t see the error with it. Maybe I’m corrupting memory, somehow, and it just happens to be fine in small code that don’t reuse the memory after the ABT_pool_free… (though I ran address sanitizer and didn’t see any invalid access).



I’ll see what happens if I remove the ABT_pool_free.



Thanks,



Matthieu



From: "Iwasaki, Shintaro" <siwasaki at anl.gov>
Date: Tuesday, 23 March 2021 at 14:01
To: "discuss at argobots.org" <discuss at argobots.org>, "discuss at lists.argobots.org" <discuss at lists.argobots.org>
Cc: "Dorier, Matthieu" <mdorier at anl.gov>
Subject: Re: Program aborting in ABT_finalize (ABTI_pool_release failed)



Hi Matthieu,



I don't know when such an error typically happens. I'd be happy if you could share your code with me so that I can closely look at it.



This is just my guess, but perhaps ABT_pool_free() is used; in Argobots, primary ES's pool and schedulers may not be freed before ABT_finalize().

If you free that fifo_wait MPMC pool before ABT_finalize(), an already freed pool is accessed in ABT_finalize(), which may cause the error you wrote.

(It is related to https://github.com/pmodels/argobots/issues/58).

I attached a sample code to explain this issue. Please read the comments in the code if you are interested in it.



Thanks,

Shintaro



________________________________

From: Dorier, Matthieu via discuss <discuss at lists.argobots.org>
Sent: Tuesday, March 23, 2021 6:56 AM
To: discuss at argobots.org <discuss at argobots.org>
Cc: Dorier, Matthieu <mdorier at anl.gov>
Subject: [argobots-discuss] Program aborting in ABT_finalize (ABTI_pool_release failed)



Hi,



I’m getting this error in a program, when calling ABT_finalize:



../src/include/abti_pool.h:238: ABTI_pool_release: Assertion `ABTD_atomic_acquire_load_int32(&p_pool->num_scheds) > 0' failed.



The program only uses the primary ES, it does not create additional ES, however it does replace the primary ES’s pool and scheduler by creating a fifo_wait MPMC pool and passing it to ABT_xstream_set_main_sched_basic, with ABT_SCHED_BASIC_WAIT as sched_predef.



I’m using Argobots 1.0. I haven’t tested with other versions of Argobots. I’m sure I must be doing something wrong somewhere in my code. Could you tell me what I could look for that would lead to such an error?



For additional information, here it the stack trace when the error happens:



#0  __GI_raise (sig=sig at entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

#1  0x00007f56797fe535 in __GI_abort () at abort.c:79

#2  0x00007f56797fe40f in __assert_fail_base (fmt=0x7f5679960ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",

    assertion=0x7f5679d0e4c8 "ABTD_atomic_acquire_load_int32(&p_pool->num_scheds) > 0", file=0x7f5679d0e44b "../src/include/abti_pool.h", line=238,

    function=<optimized out>) at assert.c:92

#3  0x00007f567980c102 in __GI___assert_fail (assertion=0x7f5679d0e4c8 "ABTD_atomic_acquire_load_int32(&p_pool->num_scheds) > 0",

    file=0x7f5679d0e44b "../src/include/abti_pool.h", line=238, function=0x7f5679d0e660 <__PRETTY_FUNCTION__.6315> "ABTI_pool_release")

    at assert.c:101

#4  0x00007f5679d09e5e in ABTI_sched_free.part.4 ()

   from /projects/spack/opt/spack/linux-debian10-sandybridge/gcc-8.3.0/argobots-1.0-e4x7h6mgt7igmocc4ajro2g743ej322m/lib/libabt.so.0

#5  0x00007f5679cfe2c5 in ABTI_xstream_free ()

   from /projects/spack/opt/spack/linux-debian10-sandybridge/gcc-8.3.0/argobots-1.0-e4x7h6mgt7igmocc4ajro2g743ej322m/lib/libabt.so.0

#6  0x00007f5679cfa080 in ABT_finalize ()



Thanks,



Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.argobots.org/pipermail/discuss/attachments/20210323/e0a1985d/attachment-0001.html>


More information about the discuss mailing list