Monday, August 7, 2023

Variant Records used as a return type from search functions

 When searching for a value in an array a programmer is faced with two possibilities.

1.       The value searched for is found and the index of the value in the array is returned.

2.       The value is not found and no array index is returned.

It is common practice to return a scalar value from functions using the C programming language. This tradition goes back to the earliest days of the C language when unspecified function return types were given the int type by default by the compiler.

The C language does not have a concept of exceptions or exception handlers. Instead, the value returned by a function either indicates success or failure.

It is common for positive return values to indicate success and negative or zero values to indicate failure. This pattern was established by the Unix operating system. C was invented to write the Unix operating system.

A search function is often designed to return the index of the array element containing the value corresponding to the target value for which one is searching. This programming pattern is also commonly found in the C++ language.

The following function prototype is written using C++;

int binary_search(string names[], int len, const string target);

The binary_search function returns an int. The first parameter is an array of string elements. The second parameter is the number of elements in the array of strings. The third element is the string value to search for in the array of strings.

The binary_search function returns the index value of the array element equal to the target string or it returns -1 if no array element corresponds to the target string. This behavior creates an abstraction with a strong vulnerability of confusion. The return value serves two purposes. It identifies whether or not the target value was found in the array and it identifies the index value of the array element equaling the target value.

The complete binary_search function is defined as:

int binary_search(string names[], int len, const string target)
{
  int low = 0;
  int high = len - 1;
  int mid = 0;\

  
string item;
  while (low <= high) {
    mid = (low + high) / 2;
    item = names[mid];
    if (item == target) {
      return mid;
    }
    if (strcmp(target.c_str(), item.c_str()) < 0) {
      high = mid - 1;
    } else
      low = mid + 1;
  }
  return -1;
}

 

Logically whether or not the value was found is separate from the index value of the found element. There should be no index value for a found element when no element is found. The function allows the programmer to make the mistake of trying to reference an array element before the beginning of the array. In C and C++ doing this is undefined behavior, and is always wrong.

Avoiding undefined behavior

The C and C++ languages provide no assistance in the effort to avoid undefined behaviors. The compilers are not required to check for undefined behavior because the programmer is burdened with the responsibility of avoiding undefined behavior.

The Ada programming language provides a useful data structure known as a variant record, which C and C++ programmers may think is similar to a union. C and C++ unions provide different views of a common memory area so that, for instance, a number might be viewed as an int or a float. Ada variant records are different.

Every Ada variant record has a publicly visible discriminant field, even if the type is opaque. That discriminant field controls the contents, not just the view, of the record. The following example is used as a return type for a search function.

   type Result_T (found : Boolean) is record
      case found is
         when True =>
            Index : Natural;
         when False =>
            null;
      end case;
   end record;

The discriminant field in this example is named “found”. Found is a Boolean value. Every instance of Result_T contains a “found” field. The case statement within the record definition declares a second field named Index, which is of type Natural when and only when found is True. When found is False the record type contains no additional field. It only contains the found field.

This data structure separates the responsibility of identifying whether or not the target value was found from identifying the index value of the element equaling the target value.

The complete Ada version of the binary_search function is

   function binary_search
    
(Names : in str_arr; Target : Unbounded_String) return            Result_T is
      low  : Natural := Names'First;
      High : Natural := Names'Last;
      Mid  : Natural;
      Item : Unbounded_String;
   begin
      while low <= High loop
         Mid  := (High - low) / 2 + low;
         Item := Names (Mid);
         if Item = Target then
            return (found => True, Index => Mid);
         end if;
         if Target < Item then
            High := Mid - 1;
         else
            low := Mid + 1;
         end if;
      end loop;
      return (found => False);
   end binary_search;

If the item corresponding to Target is found the return value contains both a found and an Index field.

            return (found => True, Index => Mid);

If no item corresponding to Target is found the return value only contains the found field.

      return (found => False);

Since no Index value exists when found equals False, no undefined behavior can occur. The Ada runtime raises the pre-defined exception CONSTRAINT_ERROR along with the message “discriminant check failed” if the programmer attempts to reference the Index field when found equals False.

The proper way to deal with a variant record such as this returned by a function is:

   Result : Result_T := binary_search (Names,                                                      To_Unbounded_String ("ali2"));
begin
 if Result.found then
      Put_Line ("The given index is " & Result.Index'Image);
      Put_Line ("The found name is " & To_String (Names                                                      (Result.Index)));
 else
    Put_Line ("String not found!");
 end if;

This is similar to the proper way to respond to the return value from the C or C++ function. The difference is that when the return value is not evaluated the C or C++ program will not raise an exception. Instead it will simply exhibit an undefined behavior.

No comments:

Post a Comment

Comparison of three algorithms for summing an array of integers

This article shares a comparison of the time taken to sum an array of integers. This article tests three algorithms for summing the array. •...