boost/libs/lockfree/doc/lockfree.qbk - nest-cam/4320010/boost - Git at Google

 [library Boost.Lockfree
     [quickbook 1.4]
     [authors [Blechmann, Tim]]
     [copyright 2008-2011 Tim Blechmann]
     [category algorithms]
     [purpose
         lockfree concurrent data structures
     ]
     [id lockfree]
     [dirname lockfree]
     [license
         Distributed under the Boost Software License, Version 1.0.
         (See accompanying file LICENSE_1_0.txt or copy at
         [@http://www.boost.org/LICENSE_1_0.txt])
     ]
 ]

 [c++]


 [/  Images   ]

 [def _note_                         [$images/note.png]]
 [def _alert_                        [$images/caution.png]]
 [def _detail_                       [$images/note.png]]
 [def _tip_                          [$images/tip.png]]

 [/  Links   ]

 [def _lockfree_                      [^boost.lockfree]]

 [section Introduction & Motivation]

 [h2 Introduction & Terminology]

 The term *non-blocking* denotes concurrent data structures, which do not use traditional synchronization primitives like
 guards to ensure thread-safety. Maurice Herlihy and Nir Shavit (compare [@http://books.google.com/books?id=pFSwuqtJgxYC
 "The Art of Multiprocessor Programming"]) distinguish between 3 types of non-blocking data structures, each having different
 properties:

 * data structures are *wait-free*, if every concurrent operation is guaranteed to be finished in a finite number of
   steps. It is therefore possible to give worst-case guarantees for the number of operations.

 * data structures are *lock-free*, if some concurrent operations are guaranteed to be finished in a finite number of
   steps. While it is in theory possible that some operations never make any progress, it is very unlikely to happen in
   practical applications.

 * data structures are *obstruction-free*, if a concurrent operation is guaranteed to be finished in a finite number of
   steps, unless another concurrent operation interferes.


 Some data structures can only be implemented in a lock-free manner, if they are used under certain restrictions. The
 relevant aspects for the implementation of _lockfree_ are the number of producer and consumer threads. *Single-producer*
 (*sp*) or *multiple producer* (*mp*) means that only a single thread or multiple concurrent threads are allowed to add
 data to a data structure.  *Single-consumer* (*sc*) or *Multiple-consumer* (*mc*) denote the equivalent for the removal
 of data from the data structure.


 [h2 Properties of Non-Blocking Data Structures]

 Non-blocking data structures do not rely on locks and mutexes to ensure thread-safety. The synchronization is done completely in
 user-space without any direct interaction with the operating system [footnote Spinlocks do not
 directly interact with the operating system either. However it is possible that the owning thread is preempted by the
 operating system, which violates the lock-free property.]. This implies that they are not prone to issues like priority
 inversion (a low-priority thread needs to wait for a high-priority thread).

 Instead of relying on guards, non-blocking data structures require *atomic operations* (specific CPU instructions executed
 without interruption). This means that any thread either sees the state before or after the operation, but no
 intermediate state can be observed.  Not all hardware supports the same set of atomic instructions. If it is not
 available in hardware, it can be emulated in software using guards. However this has the obvious drawback of losing the
 lock-free property.


 [h2 Performance of Non-Blocking Data Structures]

 When discussing the performance of non-blocking data structures, one has to distinguish between *amortized* and
 *worst-case* costs. The definition of 'lock-free' and 'wait-free' only mention the upper bound of an operation. Therefore
 lock-free data structures are not necessarily the best choice for every use case. In order to maximise the throughput of an
 application one should consider high-performance concurrent data structures [footnote
 [@http://threadingbuildingblocks.org/ Intel's Thread Building Blocks library] provides many efficient concurrent data structures,
 which are not necessarily lock-free.].

 Lock-free data structures will be a better choice in order to optimize the latency of a system or to avoid priority inversion,
 which may be necessary in real-time applications. In general we advise to consider if lock-free data structures are necessary or if
 concurrent data structures are sufficient. In any case we advice to perform benchmarks with different data structures for a
 specific workload.


 [h2 Sources of Blocking Behavior]

 Apart from locks and mutexes (which we are not using in _lockfree_ anyway), there are three other aspects, that could violate
 lock-freedom:

 [variablelist
     [[Atomic Operations]
      [Some architectures do not provide the necessary atomic operations in natively in hardware. If this is not
       the case, they are emulated in software using spinlocks, which by itself is blocking.
      ]
     ]

     [[Memory Allocations]
      [Allocating memory from the operating system is not lock-free. This makes it impossible to implement true
       dynamically-sized non-blocking data structures. The node-based data structures of _lockfree_ use a memory pool to allocate the
       internal nodes. If this memory pool is exhausted, memory for new nodes has to be allocated from the operating system. However
       all data structures of _lockfree_ can be configured to avoid memory allocations (instead the specific calls will fail).
       This is especially useful for real-time systems that require lock-free memory allocations.
      ]
     ]

     [[Exception Handling]
      [The C++ exception handling does not give any guarantees about its real-time behavior. We therefore do
       not encourage the use of exceptions and exception handling in lock-free code.]
     ]
 ]

 [h2 Data Structures]

 _lockfree_ implements three lock-free data structures:

 [variablelist
     [[[classref boost::lockfree::queue]]
      [a lock-free multi-produced/multi-consumer queue]
     ]

     [[[classref boost::lockfree::stack]]
      [a lock-free multi-produced/multi-consumer stack]
     ]

     [[[classref boost::lockfree::spsc_queue]]
      [a wait-free single-producer/single-consumer queue (commonly known as ringbuffer)]
     ]
 ]

 [h3 Data Structure Configuration]

 The data structures can be configured with [@boost:/libs/parameter/doc/html/index.html Boost.Parameter]-style templates:

 [variablelist
     [[[classref boost::lockfree::fixed_sized]]
      [Configures the data structure as *fixed sized*. The internal nodes are stored inside an array and they are addressed by
       array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index
       type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way
       to achieve lock-freedom.
      ]
     ]

     [[[classref boost::lockfree::capacity]]
      [Sets the *capacity* of a data structure at compile-time. This implies that a data structure is fixed-sized.
      ]
     ]

     [[[classref boost::lockfree::allocator]]
      [Defines the allocator. _lockfree_ supports stateful allocator and is compatible with [@boost:/libs/interprocess/index.html Boost.Interprocess] allocators.]
     ]
 ]


 [endsect]

 [section Examples]

 [h2 Queue]

 The [classref boost::lockfree::queue boost::lockfree::queue] class implements a multi-writer/multi-reader queue. The
 following example shows how integer values are produced and consumed by 4 threads each:

 [import ../examples/queue.cpp]
 [queue_example]

 The program output is:

 [pre
 produced 40000000 objects.
 consumed 40000000 objects.
 ]


 [h2 Stack]

 The [classref boost::lockfree::stack boost::lockfree::stack] class implements a multi-writer/multi-reader stack. The
 following example shows how integer values are produced and consumed by 4 threads each:

 [import ../examples/stack.cpp]
 [stack_example]


 The program output is:

 [pre
 produced 4000000 objects.
 consumed 4000000 objects.
 ]

 [h2 Waitfree Single-Producer/Single-Consumer Queue]

 The [classref boost::lockfree::spsc_queue boost::lockfree::spsc_queue] class implements a wait-free single-producer/single-consumer queue. The
 following example shows how integer values are produced and consumed by 2 separate threads:

 [import ../examples/spsc_queue.cpp]
 [spsc_queue_example]


 The program output is:

 [pre
 produced 10000000 objects.
 consumed 10000000 objects.
 ]

 [endsect]


 [section Rationale]

 [section Data Structures]

 The implementations are implementations of well-known data structures. The queue is based on
 [@http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Michael Scott and Maged Michael],
 the stack is based on [@http://books.google.com/books?id=YQg3HAAACAAJ Systems programming: coping with parallelism by R. K. Treiber]
 and the spsc_queue is considered as 'folklore' and is implemented in several open-source projects including the linux kernel. All
 data structures are discussed in detail in [@http://books.google.com/books?id=pFSwuqtJgxYC "The Art of Multiprocessor Programming" by Herlihy & Shavit].

 [endsect]

 [section Memory Management]

 The lock-free [classref boost::lockfree::queue] and [classref boost::lockfree::stack] classes are node-based data structures,
 based on a linked list. Memory management of lock-free data structures is a non-trivial problem, because we need to avoid that
 one thread frees an internal node, while another thread still uses it. _lockfree_ uses a simple approach not returning any memory
 to the operating system. Instead they maintain a *free-list* in order to reuse them later. This is done for two reasons:
 first, depending on the implementation of the memory allocator freeing the memory may block (so the implementation would not
 be lock-free anymore), and second, most memory reclamation algorithms are patented.

 [endsect]

 [section ABA Prevention]

 The ABA problem is a common problem when implementing lock-free data structures. The problem occurs when updating an atomic
 variable using a =compare_exchange= operation: if the value A was read, thread 1 changes it to say C and tries to update
 the variable, it uses =compare_exchange= to write C, only if the current value is A. This might be a problem if in the meanwhile
 thread 2 changes the value from A to B and back to A, because thread 1 does not observe the change of the state. The common way to
 avoid the ABA problem is to associate a version counter with the value and change both atomically.

 _lockfree_ uses a =tagged_ptr= helper class which associates a pointer with an integer tag. This usually requires a double-width
 =compare_exchange=, which is not available on all platforms. IA32 did not provide the =cmpxchg8b= opcode before the pentium
 processor and it is also lacking on many RISC architectures like PPC. Early X86-64 processors also did not provide a =cmpxchg16b=
 instruction.
 On 64bit platforms one can work around this issue, because often not the full 64bit address space is used. On X86_64 for example,
 only 48bit are used for the address, so we can use the remaining 16bit for the ABA prevention tag. For details please consult the
 implementation of the =boost::lockfree::detail::tagged_ptr= class.

 For lock-free operations on 32bit platforms without double-width =compare_exchange=, we support a third approach: by using a
 fixed-sized array to store the internal nodes we can avoid the use of 32bit pointers, but instead 16bit indices into the array
 are sufficient. However this is only possible for fixed-sized data structures, that have an upper bound of internal nodes.

 [endsect]

 [section Interprocess Support]

 The _lockfree_ data structures have basic support for [@boost:/libs/interprocess/index.html Boost.Interprocess]. The only
 problem is the blocking emulation of lock-free atomics, which in the current implementation is not guaranteed to be interprocess-safe.

 [endsect]

 [endsect]

 [xinclude autodoc.xml]

 [section Appendices]

 [section Supported Platforms & Compilers]

 _lockfree_ has been tested on the following platforms:

 * g++ 4.4, 4.5 and 4.6, linux, x86 & x86_64
 * clang++ 3.0, linux, x86 & x86_64

 [endsect]

 [section Future Developments]

 * More data structures (set, hash table, dequeue)
 * Backoff schemes (exponential backoff or elimination)

 [endsect]

 [section References]

 # [@http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Michael Scott and Maged Michael],
 In Symposium on Principles of Distributed Computing, pages 267–275, 1996.
 # [@http://books.google.com/books?id=pFSwuqtJgxYC M. Herlihy & Nir Shavit. The Art of Multiprocessor Programming], Morgan Kaufmann Publishers, 2008

 [endsect]

 [endsect]
	[library Boost.Lockfree
	[quickbook 1.4]
	[authors [Blechmann, Tim]]
	[copyright 2008-2011 Tim Blechmann]
	[category algorithms]
	[purpose
	lockfree concurrent data structures
	]
	[id lockfree]
	[dirname lockfree]
	[license
	Distributed under the Boost Software License, Version 1.0.
	(See accompanying file LICENSE_1_0.txt or copy at
	[@http://www.boost.org/LICENSE_1_0.txt])
	]
	]

	[c++]


	[/ Images ]

	[def _note_ [$images/note.png]]
	[def _alert_ [$images/caution.png]]
	[def _detail_ [$images/note.png]]
	[def _tip_ [$images/tip.png]]

	[/ Links ]

	[def _lockfree_ [^boost.lockfree]]

	[section Introduction & Motivation]

	[h2 Introduction & Terminology]

	The term non-blocking denotes concurrent data structures, which do not use traditional synchronization primitives like
	guards to ensure thread-safety. Maurice Herlihy and Nir Shavit (compare [@http://books.google.com/books?id=pFSwuqtJgxYC
	"The Art of Multiprocessor Programming"]) distinguish between 3 types of non-blocking data structures, each having different
	properties:

	* data structures are wait-free, if every concurrent operation is guaranteed to be finished in a finite number of
	steps. It is therefore possible to give worst-case guarantees for the number of operations.

	* data structures are lock-free, if some concurrent operations are guaranteed to be finished in a finite number of
	steps. While it is in theory possible that some operations never make any progress, it is very unlikely to happen in
	practical applications.

	* data structures are obstruction-free, if a concurrent operation is guaranteed to be finished in a finite number of
	steps, unless another concurrent operation interferes.


	Some data structures can only be implemented in a lock-free manner, if they are used under certain restrictions. The
	relevant aspects for the implementation of _lockfree_ are the number of producer and consumer threads. Single-producer
	(sp) or multiple producer (mp) means that only a single thread or multiple concurrent threads are allowed to add
	data to a data structure. Single-consumer (sc) or Multiple-consumer (mc) denote the equivalent for the removal
	of data from the data structure.


	[h2 Properties of Non-Blocking Data Structures]

	Non-blocking data structures do not rely on locks and mutexes to ensure thread-safety. The synchronization is done completely in
	user-space without any direct interaction with the operating system [footnote Spinlocks do not
	directly interact with the operating system either. However it is possible that the owning thread is preempted by the
	operating system, which violates the lock-free property.]. This implies that they are not prone to issues like priority
	inversion (a low-priority thread needs to wait for a high-priority thread).

	Instead of relying on guards, non-blocking data structures require atomic operations (specific CPU instructions executed
	without interruption). This means that any thread either sees the state before or after the operation, but no
	intermediate state can be observed. Not all hardware supports the same set of atomic instructions. If it is not
	available in hardware, it can be emulated in software using guards. However this has the obvious drawback of losing the
	lock-free property.


	[h2 Performance of Non-Blocking Data Structures]

	When discussing the performance of non-blocking data structures, one has to distinguish between amortized and
	worst-case costs. The definition of 'lock-free' and 'wait-free' only mention the upper bound of an operation. Therefore
	lock-free data structures are not necessarily the best choice for every use case. In order to maximise the throughput of an
	application one should consider high-performance concurrent data structures [footnote
	[@http://threadingbuildingblocks.org/ Intel's Thread Building Blocks library] provides many efficient concurrent data structures,
	which are not necessarily lock-free.].

	Lock-free data structures will be a better choice in order to optimize the latency of a system or to avoid priority inversion,
	which may be necessary in real-time applications. In general we advise to consider if lock-free data structures are necessary or if
	concurrent data structures are sufficient. In any case we advice to perform benchmarks with different data structures for a
	specific workload.


	[h2 Sources of Blocking Behavior]

	Apart from locks and mutexes (which we are not using in _lockfree_ anyway), there are three other aspects, that could violate
	lock-freedom:

	[variablelist
	[[Atomic Operations]
	[Some architectures do not provide the necessary atomic operations in natively in hardware. If this is not
	the case, they are emulated in software using spinlocks, which by itself is blocking.
	]
	]

	[[Memory Allocations]
	[Allocating memory from the operating system is not lock-free. This makes it impossible to implement true
	dynamically-sized non-blocking data structures. The node-based data structures of _lockfree_ use a memory pool to allocate the
	internal nodes. If this memory pool is exhausted, memory for new nodes has to be allocated from the operating system. However
	all data structures of _lockfree_ can be configured to avoid memory allocations (instead the specific calls will fail).
	This is especially useful for real-time systems that require lock-free memory allocations.
	]
	]

	[[Exception Handling]
	[The C++ exception handling does not give any guarantees about its real-time behavior. We therefore do
	not encourage the use of exceptions and exception handling in lock-free code.]
	]
	]

	[h2 Data Structures]

	_lockfree_ implements three lock-free data structures:

	[variablelist
	[[[classref boost::lockfree::queue]]
	[a lock-free multi-produced/multi-consumer queue]
	]

	[[[classref boost::lockfree::stack]]
	[a lock-free multi-produced/multi-consumer stack]
	]

	[[[classref boost::lockfree::spsc_queue]]
	[a wait-free single-producer/single-consumer queue (commonly known as ringbuffer)]
	]
	]

	[h3 Data Structure Configuration]

	The data structures can be configured with [@boost:/libs/parameter/doc/html/index.html Boost.Parameter]-style templates:

	[variablelist
	[[[classref boost::lockfree::fixed_sized]]
	[Configures the data structure as fixed sized. The internal nodes are stored inside an array and they are addressed by
	array indexing. This limits the possible size of the queue to the number of elements that can be addressed by the index
	type (usually 2**16-2), but on platforms that lack double-width compare-and-exchange instructions, this is the best way
	to achieve lock-freedom.
	]
	]

	[[[classref boost::lockfree::capacity]]
	[Sets the capacity of a data structure at compile-time. This implies that a data structure is fixed-sized.
	]
	]

	[[[classref boost::lockfree::allocator]]
	[Defines the allocator. _lockfree_ supports stateful allocator and is compatible with [@boost:/libs/interprocess/index.html Boost.Interprocess] allocators.]
	]
	]


	[endsect]

	[section Examples]

	[h2 Queue]

	The [classref boost::lockfree::queue boost::lockfree::queue] class implements a multi-writer/multi-reader queue. The
	following example shows how integer values are produced and consumed by 4 threads each:

	[import ../examples/queue.cpp]
	[queue_example]

	The program output is:

	[pre
	produced 40000000 objects.
	consumed 40000000 objects.
	]


	[h2 Stack]

	The [classref boost::lockfree::stack boost::lockfree::stack] class implements a multi-writer/multi-reader stack. The
	following example shows how integer values are produced and consumed by 4 threads each:

	[import ../examples/stack.cpp]
	[stack_example]


	The program output is:

	[pre
	produced 4000000 objects.
	consumed 4000000 objects.
	]

	[h2 Waitfree Single-Producer/Single-Consumer Queue]

	The [classref boost::lockfree::spsc_queue boost::lockfree::spsc_queue] class implements a wait-free single-producer/single-consumer queue. The
	following example shows how integer values are produced and consumed by 2 separate threads:

	[import ../examples/spsc_queue.cpp]
	[spsc_queue_example]


	The program output is:

	[pre
	produced 10000000 objects.
	consumed 10000000 objects.
	]

	[endsect]


	[section Rationale]

	[section Data Structures]

	The implementations are implementations of well-known data structures. The queue is based on
	[@http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Michael Scott and Maged Michael],
	the stack is based on [@http://books.google.com/books?id=YQg3HAAACAAJ Systems programming: coping with parallelism by R. K. Treiber]
	and the spsc_queue is considered as 'folklore' and is implemented in several open-source projects including the linux kernel. All
	data structures are discussed in detail in [@http://books.google.com/books?id=pFSwuqtJgxYC "The Art of Multiprocessor Programming" by Herlihy & Shavit].

	[endsect]

	[section Memory Management]

	The lock-free [classref boost::lockfree::queue] and [classref boost::lockfree::stack] classes are node-based data structures,
	based on a linked list. Memory management of lock-free data structures is a non-trivial problem, because we need to avoid that
	one thread frees an internal node, while another thread still uses it. _lockfree_ uses a simple approach not returning any memory
	to the operating system. Instead they maintain a free-list in order to reuse them later. This is done for two reasons:
	first, depending on the implementation of the memory allocator freeing the memory may block (so the implementation would not
	be lock-free anymore), and second, most memory reclamation algorithms are patented.

	[endsect]

	[section ABA Prevention]

	The ABA problem is a common problem when implementing lock-free data structures. The problem occurs when updating an atomic
	variable using a =compare_exchange= operation: if the value A was read, thread 1 changes it to say C and tries to update
	the variable, it uses =compare_exchange= to write C, only if the current value is A. This might be a problem if in the meanwhile
	thread 2 changes the value from A to B and back to A, because thread 1 does not observe the change of the state. The common way to
	avoid the ABA problem is to associate a version counter with the value and change both atomically.

	_lockfree_ uses a =tagged_ptr= helper class which associates a pointer with an integer tag. This usually requires a double-width
	=compare_exchange=, which is not available on all platforms. IA32 did not provide the =cmpxchg8b= opcode before the pentium
	processor and it is also lacking on many RISC architectures like PPC. Early X86-64 processors also did not provide a =cmpxchg16b=
	instruction.
	On 64bit platforms one can work around this issue, because often not the full 64bit address space is used. On X86_64 for example,
	only 48bit are used for the address, so we can use the remaining 16bit for the ABA prevention tag. For details please consult the
	implementation of the =boost::lockfree::detail::tagged_ptr= class.

	For lock-free operations on 32bit platforms without double-width =compare_exchange=, we support a third approach: by using a
	fixed-sized array to store the internal nodes we can avoid the use of 32bit pointers, but instead 16bit indices into the array
	are sufficient. However this is only possible for fixed-sized data structures, that have an upper bound of internal nodes.

	[endsect]

	[section Interprocess Support]

	The _lockfree_ data structures have basic support for [@boost:/libs/interprocess/index.html Boost.Interprocess]. The only
	problem is the blocking emulation of lock-free atomics, which in the current implementation is not guaranteed to be interprocess-safe.

	[endsect]

	[endsect]

	[xinclude autodoc.xml]

	[section Appendices]

	[section Supported Platforms & Compilers]

	_lockfree_ has been tested on the following platforms:

	* g++ 4.4, 4.5 and 4.6, linux, x86 & x86_64
	* clang++ 3.0, linux, x86 & x86_64

	[endsect]

	[section Future Developments]

	* More data structures (set, hash table, dequeue)
	* Backoff schemes (exponential backoff or elimination)

	[endsect]

	[section References]

	# [@http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.3574 Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms by Michael Scott and Maged Michael],
	In Symposium on Principles of Distributed Computing, pages 267–275, 1996.
	# [@http://books.google.com/books?id=pFSwuqtJgxYC M. Herlihy & Nir Shavit. The Art of Multiprocessor Programming], Morgan Kaufmann Publishers, 2008

	[endsect]

	[endsect]