Concurrency Behaviors
General concurrency behaviors
Prior to the general availability of multiple core hardware
architectures there was very little to be gained by the use of concurrent or
parallel program design. The overhead of thread switching on a single processor
almost always resulted in slower performance than could be achieved with
traditional serial programming design.
Now that a majority of our computing systems, whether
personal computers, laptops, tablets, mainframes, super computers, or cell
phones operate on hardware with multiple processor cores, there is a real need
to build an understanding of concurrent or parallel processing design.
Concurrent program design promises high efficiency, if we
can only master the complexity of sharing and combining data from concurrent,
and therefore largely asynchronous execution paths in our programs. While one
can assign work to separate cores, if that work is performed in a synchronous
manner, such as is done with traditional Unix/Linux pipes and filters, very
little advantage is gained from multiple processor cores. As a result newer
concurrent architectures organize processing into networks and hierarchies.
Those networks and hierarchies may be localized to one computer or distributed
across many computers.
At the heart of these new architectures are the concepts of
producers of data or messages, and consumers of data or messages. In a
hierarchical architecture it is common for a particular task to consume data or
messages from one level in the hierarchy, process that data, and produce data
or messages for another level of the hierarchy. Communication between producers
and consumers has to be as asynchronous as possible to achieve maximum
processing efficiency. Synchronization produces wait states in program
execution. Wait states are simply periods during which an execution module
(i.e. a task or thread) does nothing
while waiting for some other execution module to release a lock or provide some
kind of handshake to coordinate data communication.
In the terms of Object Oriented Programming, we need to
reduce the coupling between execution modules and increase the cohesion of
execution modules. Coupling occurs whenever execution modules must share or
pass data or messages from a producer to a consumer. In a fully synchronous
architecture the producer and consumer must directly exchange the data, much
like runners in a relay race passing a baton from one runner to the other. This
direct passing is often called the rendezvous model. Using the rendezvous model
the consumer must wait for the producer in an assigned communication pair. Just
as the runner carrying the baton can only run his or her section of the relay,
and must transfer the baton to the next runner within a specified region of the
track, the rendezvous model relies upon close coordination or synchronization
between the producer and the consumer. A more asynchronous design employs the
use of message buffers between producers and consumers.
Message buffers provide some unique challenges. A producer writes
to the message buffer and a consumer reads from the message buffer. If the
producer and consumer execute in parallel, then we must assume that sometimes
the producer will be trying to write to the buffer while the consumer is trying
to read from the buffer. This can cause a race condition. The producer can be
changing the contents of the message while the consumer is reading the message.
The result can be corrupted data.
Furthermore, the consumer cannot read from an empty buffer.
If the buffer has a fixed maximum size the producer cannot write to the buffer
when it is full, or it will be overwriting data that has not yet been read by
the consumer. If the buffer is an unbounded message queue then the producer
never has to wait for the consumer. Additional queue elements are added as
needed during program execution.
A concurrent design cannot predict which task will consuming
execution module will process a particular message from a message queue unless
there is only one consumer. This means the consumer execution module cannot
rely upon any particular relationship between the current message being
processed and any previous or subsequent message. Each message must be treated
as an independent event in the system.
Single element message queues have specific uses. They are
very useful for a consumer that only samples data from the producer. For instance, if the producer writes sensor
readings from a very fast hardware sensor, the consumer may not be able to
process all the data as fast as the sensor produces the data. The consumer may
only need to sample a subset of the sensor data to identify trends in the data.
In this case a single element message queue can be used. The producer simply
over-writes the data as fast as it produces data, and the consumer simply reads
the data currently in the message queue when it can. This approach is useful
when there is no need to ensure the consumer processes everything written by
the producer.
This simple single element message queue provides a good
example of the race conditions between the producer and consumer. Even though
we may not care if the consumer reads all the data produced, it must read valid
data and not corrupted data. Thus the message queue must provide a mechanism to
prevent the consumer from reading data while it is being written, and to
prevent the producer from writing data while it is being read. The most common
mechanism for preventing race conditions between the producer and the consumer
is a lock or semaphore. Only the execution module in possession of the lock or
semaphore has permission to read from or write to the message queue.
Use of locking mechanisms on the message queue fixes the
race condition problem, but causes the execution module waiting for the lock or
semaphore to be in a wait state. In other words, it causes a degree of
synchronization between the execution modules.
Producer / Consumer examples
The following Ada code implements a simplified set of
producers and consumers communicating through a single unbounded message queue.
The message queue implicitly implements all the locking mechanisms to prevent
race conditions between the producers and the consumers. This example uses
constants defining the number of producers and number of consumers to allow
experimentation with different number of producers and consumers.
------------------------------------------------------------------
-- Multiple
Producer/Multiple Consumer Example --
------------------------------------------------------------------
with
Ada.Containers.Synchronized_Queue_Interfaces;
with
Ada.Containers.Unbounded_Synchronized_Queues;
with Ada.Text_IO; use
Ada.Text_IO;
With Ada.Calendar; use
Ada.Calendar;
use Ada.Containers;
procedure PC_v4 is
package Integer_Interface is new
Synchronized_Queue_Interfaces(Element_Type => Integer);
package Unbounded_Integer_Queues is new
Unbounded_Synchronized_Queues(Queue_Interfaces =>
Integer_Interface);
My_Queue :
Unbounded_Integer_Queues.Queue;
Num_Producers
: Constant := 3;
Num_Consumers : Constant
:= 1;
Max_Produced : constant :=
500_000;
Start_Time : Time := Clock;
-- The Counter protected object below is
used to count the number of
-- completed producers. This allows the
consumers to know when all the
-- data has been processed.
---------------------------------------------------------------------
protected Counter is
Procedure Task_Done;
function All_Done return boolean;
private
Count : Natural := 0;
end Counter;
protected body Counter is
procedure Task_Done is
begin
Count := Count + 1;
end Task_Done;
function All_Done return boolean is
begin
return Count = Num_Producers;
end All_Done;
end Counter;
protected Consumer_Counter is
procedure Task_Done;
entry All_Done;
private
Count : Natural := 0;
end Consumer_Counter;
protected body Consumer_Counter is
procedure Task_Done is
begin
Count := Count + 1;
end Task_Done;
entry All_Done when Count =
Num_Consumers is
begin
null;
end All_Done;
end Consumer_Counter;
-- Define the producer task type.
-- Producer is being defined as a task
type to allow multiple instances
-- of the producer to be easily created.
------------------------------------------------------------------------
task type Producer;
Task body Producer is
Value : Positive := 1;
Finis_Time : Time;
begin
loop
My_Queue.Enqueue(Value);
Value := Value + 1;
if Value > Max_Produced then
Counter.Task_Done;
Finis_Time := Clock;
Put_Line("Producer completed
in" &
Duration'Image(Finis_Time - Start_Time) &
" seconds");
exit; -- exit the
loop within the Producer task
end if;
end loop;
end Producer;
task type Consumer;
task body Consumer is
Empty_Queue : constant Count_Type := 0;
Read_Value : Integer;
Consumer_Done : Time;
begin
loop
exit when Counter.All_Done and then
My_Queue.Current_Use =
Empty_Queue;
select
My_Queue.Dequeue(Read_Value);
or
delay 0.0;
end Select;
end loop;
Consumer_Counter.Task_Done;
Consumer_Done := Clock;
Put_line("Consumer Done in"
&
Duration'Image(Consumer_Done
- Start_Time) &
" Seconds");
Put_Line("Last Value consumed:"
& Integer'Image(Read_Value));
end Consumer;
-- Create an array of producers. There are
Num_Producers in this
-- array. The Producer tasks start
executing as soon as they are
-- instantiated in the array.
----------------------------------------------------------------
The_Producers : array(1..Num_Producers) of Producer;
-- Create an array of consumers. There are
Num_Consumers in this array
----------------------------------------------------------------------
The_Consumers : array(1..Num_Consumers) of Consumer;
Processing_Done : time;
begin
-- wait until all consumer processing is
done
Consumer_Counter.All_Done;
Processing_Done := Clock;
-- print out the execution statistics
Put_Line("Queue element peak use:" &
Count_Type'Image(My_Queue.Peak_Use));
Put_Line("Elapsed time (seconds):" &
Duration'Image(Processing_Done -
Start_Time));
end PC_V4;
|
All examples of this program output were run on a quad-core
processor (AMD A10-5700 APU (3.4 GHz)) with 16 Gigabytes of memory.
Producer/Consumer Count
|
Integers Produced
|
Program output
|
Num_Producers = 3
Num_Consumers = 1
|
1,500,000
|
Producer completed in 1.295064077 seconds
Producer completed in 1.396555610 seconds
Producer completed in 1.413318047 seconds
Consumer Done in 1.966321856 Seconds
Last Value consumed: 500000
Queue element peak use: 1303519
Elapsed time (seconds): 1.966384311
|
Num_Producers = 2
Num_Consumers = 2
|
1,000,000
|
Producer completed in 0.933923591 seconds
Producer completed in 1.159170583 seconds
Consumer Done in 1.904148143 Seconds
Queue element peak use: 599280
Last Value consumed: 500000
Elapsed time (seconds): 1.904157798
Consumer Done in 1.904152669 Seconds
Last Value consumed: 499993
|
Num_Producers = 1
Num_Consumers = 3
|
500,000
|
Producer completed in 0.813527739 seconds
Consumer Done in 0.997459250 Seconds
Last Value consumed: 500000
Consumer Done in 0.997521403 Seconds
Last Value consumed: 495508
Consumer Done in 0.997531360 Seconds
Last Value consumed: 495504
Queue element peak use: 155777
Elapsed time (seconds): 0.997587781
|
Num_Producers = 1
Num_Consumers = 1
|
500,000
|
Producer completed in 0.529284707 seconds
Consumer Done in 0.720979672 Seconds
Queue element peak use: 408146
Last Value consumed: 500000
Elapsed time (seconds): 0.720982991
|
In the first row we see that each of the three producers
took about 1.3 to 1.4 seconds to write out 500,000 integers. If this effort had
been performed serially the overall production time would have been on the
order of 3.9 seconds. The consumer task finished its work in approximately 1.97
seconds. Clearly the four tasks were executing in parallel for the overall
processing to finish in 1.967 seconds.
In the second row we see that each of the two producers took
0.934 and 1.159 seconds to complete, while the two consumers each took 1.904
seconds to complete. The entire processing time was only 1.904 seconds. Clearly
the four tasks were again executing in parallel.
In the third row we see that the producer took 0.814 seconds
to execute and the three consumers took 0.997459250, 0.997521403, and
0.997531360 seconds to complete, with the total elapsed time of 0.997587781
seconds. Again, it is very clear that the four tasks were executing in
parallel.
In the fourth row we see that the single producer took
0.529284707 seconds and the single consumer took 0.720979672 seconds. The
several examples clearly show the impact of the locking overhead and the overhead
of adding elements to the unbounded queue.
Conclusion
In all of the examples above we can see a high degree of
asynchronous behavior between the producer and consumer tasks. We can also see
the design is not fully asynchronous due to message queue manipulation
overhead.
No comments:
Post a Comment