Does intel c compiler support __sync_fetch_and_add

Does intel c compiler support __sync_fetch_and_add for free#
Does intel c compiler support __sync_fetch_and_add update#

In any case if you access valid memory location, then this is valid operation from CPU point of view. So this all is about CPU operations - just read, modify and write the value back (the so-called RMW operation) in atomic or not manner.

Does intel c compiler support __sync_fetch_and_add for free#

However, keep in mind that atomic operations are not for free - they are implemented through CPU cache coherence protocol, so basically simple variable increment (like i++) is faster than atomic increment and you should use atomic operations in heavy loaded code carefully.

Does intel c compiler support __sync_fetch_and_add update#

atomic_inc() in kernel or _sync_fetch_and_add() in userspace (the second one is GCC intrinsics which you can find in GCC documentation) allow multiple processors to update a variable in atomic, synchronous, manner. It seems you didn't use atomic_inc() (or atomic_add()) to update the counter in your driver. Store() by-turn calls ml_wt_dispatch::pre_write() which locks the memory location (all service data for the memory location also is taken by the same global table) and updated the release (version) of the memory location before the write (the release version is checked in pre_load() as 'recent').

Load() reads a memory region by specified address, but before that it calls ml_wt_dispatch::pre_load() function which verifies that the memory location is not locked or recent and restarts the transaction (these service data is taken from global table indexed by hash function over the address).

First one accepts only the variable address and the second one - the variable address and the stored value. _ITM_RU4 and _ITM_WU4 are just a sequence of jumps which lead (in this particular case) to ml_wt_dispatch::load() and ml_wt_dispatch::store() correspondingly. The most interesting stuff is in read and write operations. So this is most heavy part of transaction. GTM::gtm_thread::trycommit() is the place where all the threads are sleeping in futex() (which we saw in strace output) to write all modified data. _ITM_commitTransaction() is defined in libitm/ and tries to commit the transaction by calling GTM::gtm_thread::trycommit() and if it fails restarts the transaction. _ITM_beginTransaction() saves the machine state (for x86 see libitm/config/x86/sjlj.S) and calls GTM::gtm_thread::begin_transaction() (see libitm/) which initializes the transaction data, checks transaction nesting and performs other preparation steps.

Now we see four calls of _ITM_* functions (as explained in info libitm, GCC follows the Intel's Draft Specification of Transactional LanguageConstructs for C++ (v1.1) in its implementation of transactions, so _ITM_ prefix is just Intel's naming convention) for transaction begin, transaction commit and the pair of read (RU4) and write (WU4) operations. To understand what's going on inside thr_func() lets simplify it as follows: Moreover, all these operation are inside transaction, so we have to start and commit a transaction. And the last one, c = a + b, is two reads ( a and b) and one write to c.

Next b += 2 - exactly the same: read the value, add 2 and write it back. First operation is ++a which is actually read a value from memory, update it and write back, so we have two operations here - one read and one write. We have 3 memory locations, variables a, b and c, which we perform read and write operations on. Before going deeper into libitm internals lets see at the transaction code more carefully and split the code into basic read and write operations. So it means that STM in libitm (GCC implements STM as libitm library which you can see in ldd output) is implemented via futex() system call, like common mutex. % time seconds usecs/call calls errors syscall