[argobots-discuss] Possible issue

POLYKARPOS THOMADAKIS pthom001 at odu.edu
Thu Nov 1 19:00:04 CDT 2018


Hello, Shintaro,
Thank you for your reply. That's exactly what I'm doing and how I fixed the
problem on my side. I just wanted to see whether there is something else
that I am missing.
Best,
Polykarpos

On Thu, Nov 1, 2018 at 7:49 PM Iwasaki, Shintaro <siwasaki at anl.gov> wrote:

> Hello, Polykarpos,
>
>
> I am Shitnaro Iwasaki, who is currently working on Argobots.
>
> Thank you for your reporting an issue!
>
>
> (0. Bug)
>
>
> ES1 pushes ULT1 in Pool1 <= (according to the SPMC rule, only ES1 can be
> a single producer of Pool1)
>
> ES1 pops ULT1 from Pool1
>
> ES1 suspends ULT1 to wait for something (e.g., ABT_self_suspend,
> ABT_thread_join, or ABT_mutex_lock)
>
> ES2 resumes ULT1 <= In this operation, ES2 pushes ULT1 to Pool1 (breaking
> the SPMC rule)
>
>
> 1. Is this behavior by design?
>
>
> For now, this problem is known and it's user's responsibility to deal
> with it (i.e., using SPMC correctly is challenging).
>
> It can happen even without using a custom pool if the producer/consumer
> error check is enabled.
>
>
> 2. Workarounds?
>
>
> 2A. Use ABT_unit_set_associated_pool somewhere
>
>
> There's no good place to put it, since everything happens in Argobots.
>
> Even if you use ABT_self_suspend, don't change the associated pool of the
> suspended ULT (though I haven't confirmed it).
>
>
> 2B. Disable the suspend-resume optimization in ABT_thread_join and
> ABT_mutex_lock
>
>
> For mutex: --enable-simple-mutex might help.
>
> For join: no way to avoid it.
>
> thread.c 426: add "goto yield_based;" can disable it (just FYI)
>
>
> 2C. Use MPMC pool.
>
>
> 3. Workaround in your case
>
>
> The following is based on my guess.
>
> If you are implementing a scalable (maybe Cilk-like) work stealing queue, you
> can create a single MPMC pool, which internally contains multiple SPMC
> queues (per execution stream).
>
> You can differentiate the caller by execution stream rank (
> ABT_xstream_self_rank) so that you can always push into its local queue.
>
> I think it is the most beautiful workaround at present.
>
>
> 4. Misc
>
>
> To address this issue, the natural extension of Argobots
> calls different push/pop functions depending on contexts.
>
>
> Typically, work stealing queues need to differentiate them:
>
> - push (when creating a thread) and push (when suspending a thread (e.g.,
> yield) (*)), push (when push back to the pool (e.g., set_ready))
>
> - pop (locally) and pop (remotely)
>
> (*) About push (suspend): In general, work stealing queues do not work
> if you limit local push and pop only to/from the bottom.
>
> For example, ES will reschedule the same ULT after ABT_thread_yield().
>
>
> Currently Argobots does not distinguish them. We are happy to have
> discussion about it.
>
>
> If I am misunderstanding and/or you have any questions, please feel free
> to send an e-mail (or post a github issue).
>
>
> Best Regards,
>
> Shintaro iwasaki
>
>
> ------------------------------
> *From:* POLYKARPOS THOMADAKIS via discuss <discuss at lists.argobots.org>
> *Sent:* Thursday, November 1, 2018 10:25:17 AM
> *To:* discuss at argobots.org
> *Cc:* POLYKARPOS THOMADAKIS
> *Subject:* [argobots-discuss] Possible issue
>
> Hello there,
>
> I am experiencing the following issue using Argobots, I'm not sure if
> that's a bug or an assumption made by Argobots that is not explicit.
> Here is the issue:
>
> I have a set of custom pools, 1 per Execution Stream.
>
> In each of the pools, only the associated execution stream can push while
> other streams can grab a unit from other streams' pool safely.
> Those are the characteristics of an SPMC pool, so that's the type I
> specify on their creation.
>
> The underlying data structure is a lock-free cyclic deque where the
> pushing is done at the bottom, the popping (where the owner stream of the
> pool grabs a unit)
> also at the bottom, and the stealing (where a stream grabs a unit from the
> pools of another stream) from the top.
>
> In this way when a ULT is created it's always pushed on the pool of the
> creating stream and can be stolen by other streams safely. The application
> works perfectly
> with just creating and executing ULTs, however, when I need the join
> functionality is where the problem occurs. The workflow is as follows:
>
> 1) ULT 0 spawns its children ULTs in ES 0
> 2) ES 1 steals one (or more) of the ULT from ES 0
> 3) ULT 0 on ES 0 joins one of its children -> Argobots suspends its
> execution
> 4) ES 1 terminates with the execution of the stolen ULT. This will cause
> Argobots to try to awaken the blocked ULT 0 by pushing it back to the pool
> of its last stream,
>    which is ES 0.
>
> And here is the problem, ES 1 awakens ULT 0, pushes it to the pool of ES
> 0, thus, breaking the rule that only the associated stream of a pool is
> allowed to push to it.
> Since the user has no API to defined to which pool a unit shall be pushed
> in such situations, I believed that by setting the pool type to SPMC
> Argobots would take care
> of this.
>
> The lines that produce this issue:
>
> arch/abtd_thread.c:88 -> Terminating thread awakens blocked joiner
> thread.c:2017 -> Joiner is pushed to the last pool it ran before blocking,
> causing one stream to push to the pools of another
>
> My question is whether this behavior is the expected one or not. In other
> words, if the user is expected to take into consideration this behavior
> when designing his/her
> custom pools.
>
> Sorry for the long email and thank you for your time.
>
> Best,
> Polykarpos
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.argobots.org/pipermail/discuss/attachments/20181101/036749e6/attachment-0003.html>


More information about the discuss mailing list