Concurrency Behaviors

General concurrency behaviors

Prior to the general availability of multiple core hardware architectures there was very little to be gained by the use of concurrent or parallel program design. The overhead of thread switching on a single processor almost always resulted in slower performance than could be achieved with traditional serial programming design.

Now that a majority of our computing systems, whether personal computers, laptops, tablets, mainframes, super computers, or cell phones operate on hardware with multiple processor cores, there is a real need to build an understanding of concurrent or parallel processing design.

Concurrent program design promises high efficiency, if we can only master the complexity of sharing and combining data from concurrent, and therefore largely asynchronous execution paths in our programs. While one can assign work to separate cores, if that work is performed in a synchronous manner, such as is done with traditional Unix/Linux pipes and filters, very little advantage is gained from multiple processor cores. As a result newer concurrent architectures organize processing into networks and hierarchies. Those networks and hierarchies may be localized to one computer or distributed across many computers.

At the heart of these new architectures are the concepts of producers of data or messages, and consumers of data or messages. In a hierarchical architecture it is common for a particular task to consume data or messages from one level in the hierarchy, process that data, and produce data or messages for another level of the hierarchy. Communication between producers and consumers has to be as asynchronous as possible to achieve maximum processing efficiency. Synchronization produces wait states in program execution. Wait states are simply periods during which an execution module (i.e. a task or thread) does nothing while waiting for some other execution module to release a lock or provide some kind of handshake to coordinate data communication.

In the terms of Object Oriented Programming, we need to reduce the coupling between execution modules and increase the cohesion of execution modules. Coupling occurs whenever execution modules must share or pass data or messages from a producer to a consumer. In a fully synchronous architecture the producer and consumer must directly exchange the data, much like runners in a relay race passing a baton from one runner to the other. This direct passing is often called the rendezvous model. Using the rendezvous model the consumer must wait for the producer in an assigned communication pair. Just as the runner carrying the baton can only run his or her section of the relay, and must transfer the baton to the next runner within a specified region of the track, the rendezvous model relies upon close coordination or synchronization between the producer and the consumer. A more asynchronous design employs the use of message buffers between producers and consumers.

Message buffers provide some unique challenges. A producer writes to the message buffer and a consumer reads from the message buffer. If the producer and consumer execute in parallel, then we must assume that sometimes the producer will be trying to write to the buffer while the consumer is trying to read from the buffer. This can cause a race condition. The producer can be changing the contents of the message while the consumer is reading the message. The result can be corrupted data.

Furthermore, the consumer cannot read from an empty buffer. If the buffer has a fixed maximum size the producer cannot write to the buffer when it is full, or it will be overwriting data that has not yet been read by the consumer. If the buffer is an unbounded message queue then the producer never has to wait for the consumer. Additional queue elements are added as needed during program execution.

A concurrent design cannot predict which task will consuming execution module will process a particular message from a message queue unless there is only one consumer. This means the consumer execution module cannot rely upon any particular relationship between the current message being processed and any previous or subsequent message. Each message must be treated as an independent event in the system.

Single element message queues have specific uses. They are very useful for a consumer that only samples data from the producer. For instance, if the producer writes sensor readings from a very fast hardware sensor, the consumer may not be able to process all the data as fast as the sensor produces the data. The consumer may only need to sample a subset of the sensor data to identify trends in the data. In this case a single element message queue can be used. The producer simply over-writes the data as fast as it produces data, and the consumer simply reads the data currently in the message queue when it can. This approach is useful when there is no need to ensure the consumer processes everything written by the producer.

This simple single element message queue provides a good example of the race conditions between the producer and consumer. Even though we may not care if the consumer reads all the data produced, it must read valid data and not corrupted data. Thus the message queue must provide a mechanism to prevent the consumer from reading data while it is being written, and to prevent the producer from writing data while it is being read. The most common mechanism for preventing race conditions between the producer and the consumer is a lock or semaphore. Only the execution module in possession of the lock or semaphore has permission to read from or write to the message queue.

Use of locking mechanisms on the message queue fixes the race condition problem, but causes the execution module waiting for the lock or semaphore to be in a wait state. In other words, it causes a degree of synchronization between the execution modules.

Producer / Consumer examples

The following Ada code implements a simplified set of producers and consumers communicating through a single unbounded message queue. The message queue implicitly implements all the locking mechanisms to prevent race conditions between the producers and the consumers. This example uses constants defining the number of producers and number of consumers to allow experimentation with different number of producers and consumers.

------------------------------------------------------------------

-- Multiple Producer/Multiple Consumer Example --

------------------------------------------------------------------

with Ada.Containers.Synchronized_Queue_Interfaces;

with Ada.Containers.Unbounded_Synchronized_Queues;

with Ada.Text_IO; use Ada.Text_IO;

With Ada.Calendar; use Ada.Calendar;

use Ada.Containers;

procedure PC_v4 is

package Integer_Interface is new

Synchronized_Queue_Interfaces(Element_Type => Integer);

package Unbounded_Integer_Queues is new

Unbounded_Synchronized_Queues(Queue_Interfaces => Integer_Interface);

My_Queue : Unbounded_Integer_Queues.Queue;

Num_Producers : Constant := 3;

Num_Consumers : Constant := 1;

Max_Produced : constant := 500_000;

Start_Time : Time := Clock;

-- The Counter protected object below is used to count the number of

-- completed producers. This allows the consumers to know when all the

-- data has been processed.

---------------------------------------------------------------------

protected Counter is

Procedure Task_Done;

function All_Done return boolean;

private

Count : Natural := 0;

end Counter;

protected body Counter is

procedure Task_Done is

begin

Count := Count + 1;

end Task_Done;

function All_Done return boolean is

begin

return Count = Num_Producers;

end All_Done;

end Counter;

protected Consumer_Counter is

procedure Task_Done;

entry All_Done;

private

Count : Natural := 0;

end Consumer_Counter;

protected body Consumer_Counter is

procedure Task_Done is

begin

Count := Count + 1;

end Task_Done;

entry All_Done when Count = Num_Consumers is

begin

null;

end All_Done;

end Consumer_Counter;

-- Define the producer task type.

-- Producer is being defined as a task type to allow multiple instances

-- of the producer to be easily created.

------------------------------------------------------------------------

task type Producer;

Task body Producer is

Value : Positive := 1;

Finis_Time : Time;

begin

loop

My_Queue.Enqueue(Value);

Value := Value + 1;

if Value > Max_Produced then

Counter.Task_Done;

Finis_Time := Clock;

Put_Line("Producer completed in" &

Duration'Image(Finis_Time - Start_Time) &

" seconds");

exit; -- exit the loop within the Producer task

end if;

end loop;

end Producer;

task type Consumer;

task body Consumer is

Empty_Queue : constant Count_Type := 0;

Read_Value : Integer;

Consumer_Done : Time;

begin

loop

exit when Counter.All_Done and then

My_Queue.Current_Use = Empty_Queue;

select

My_Queue.Dequeue(Read_Value);

delay 0.0;

end Select;

end loop;

Consumer_Counter.Task_Done;

Consumer_Done := Clock;

Put_line("Consumer Done in" &

Duration'Image(Consumer_Done - Start_Time) &

" Seconds");

Put_Line("Last Value consumed:" & Integer'Image(Read_Value));

end Consumer;

-- Create an array of producers. There are Num_Producers in this

-- array. The Producer tasks start executing as soon as they are

-- instantiated in the array.

----------------------------------------------------------------

The_Producers : array(1..Num_Producers) of Producer;

-- Create an array of consumers. There are Num_Consumers in this array

----------------------------------------------------------------------

The_Consumers : array(1..Num_Consumers) of Consumer;

Processing_Done : time;

begin

-- wait until all consumer processing is done

Consumer_Counter.All_Done;

Processing_Done := Clock;

-- print out the execution statistics

Put_Line("Queue element peak use:" & Count_Type'Image(My_Queue.Peak_Use));

Put_Line("Elapsed time (seconds):" &

Duration'Image(Processing_Done - Start_Time));

end PC_V4;

All examples of this program output were run on a quad-core processor (AMD A10-5700 APU (3.4 GHz)) with 16 Gigabytes of memory.

Producer/Consumer Count	Integers Produced	Program output
Num_Producers = 3 Num_Consumers = 1	1,500,000	Producer completed in 1.295064077 seconds Producer completed in 1.396555610 seconds Producer completed in 1.413318047 seconds Consumer Done in 1.966321856 Seconds Last Value consumed: 500000 Queue element peak use: 1303519 Elapsed time (seconds): 1.966384311
Num_Producers = 2 Num_Consumers = 2	1,000,000	Producer completed in 0.933923591 seconds Producer completed in 1.159170583 seconds Consumer Done in 1.904148143 Seconds Queue element peak use: 599280 Last Value consumed: 500000 Elapsed time (seconds): 1.904157798 Consumer Done in 1.904152669 Seconds Last Value consumed: 499993
Num_Producers = 1 Num_Consumers = 3	500,000	Producer completed in 0.813527739 seconds Consumer Done in 0.997459250 Seconds Last Value consumed: 500000 Consumer Done in 0.997521403 Seconds Last Value consumed: 495508 Consumer Done in 0.997531360 Seconds Last Value consumed: 495504 Queue element peak use: 155777 Elapsed time (seconds): 0.997587781
Num_Producers = 1 Num_Consumers = 1	500,000	Producer completed in 0.529284707 seconds Consumer Done in 0.720979672 Seconds Queue element peak use: 408146 Last Value consumed: 500000 Elapsed time (seconds): 0.720982991

In the first row we see that each of the three producers took about 1.3 to 1.4 seconds to write out 500,000 integers. If this effort had been performed serially the overall production time would have been on the order of 3.9 seconds. The consumer task finished its work in approximately 1.97 seconds. Clearly the four tasks were executing in parallel for the overall processing to finish in 1.967 seconds.

In the second row we see that each of the two producers took 0.934 and 1.159 seconds to complete, while the two consumers each took 1.904 seconds to complete. The entire processing time was only 1.904 seconds. Clearly the four tasks were again executing in parallel.

In the third row we see that the producer took 0.814 seconds to execute and the three consumers took 0.997459250, 0.997521403, and 0.997531360 seconds to complete, with the total elapsed time of 0.997587781 seconds. Again, it is very clear that the four tasks were executing in parallel.

In the fourth row we see that the single producer took 0.529284707 seconds and the single consumer took 0.720979672 seconds. The several examples clearly show the impact of the locking overhead and the overhead of adding elements to the unbounded queue.

Conclusion

In all of the examples above we can see a high degree of asynchronous behavior between the producer and consumer tasks. We can also see the design is not fully asynchronous due to message queue manipulation overhead.

Search This Blog

Software Orthodoxy

Concurrency Behaviors

General concurrency behaviors

Producer / Consumer examples

Conclusion

Comments

Post a Comment

Popular posts from this blog

Threads of Confusion

Comparing Ada and High Integrity C++

Ada vs C++ Bit-fields