Wednesday, November 3, 2021

Create a Reversed Copy of a String

 A recent question on Stack Overflow concerned creation of a C program to create and print a reverse copy of a string. Thus, for example, the string “hello” would be copied to another string and would contain “olleh”.

C versions

While this is clearly a beginner level problem, the difficulties encountered by the person submitting the question illustrate how learning C can be a struggle.

The source code posted by the author of the question is:

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#define ARR_SIZE 50

int main()
{
 char string[ARR_SIZE];
 printf("Enter char array!\n");
 fgets(string, ARR_SIZE, stdin);

 string[strlen(string) - 1] = '\0';
 int length = (strlen(string) - 1);
 int length2 = (strlen(string) - 1);

 printf("%s\t%d\n", string, length);

 for (int i = 0; i <= length; i++)
 {
 printf("INDEX = %d CHAR = %c\n", i, string[i]);
 }
 
 printf("%d", length2);
 char copy[ARR_SIZE];
 
 for (int i = 0; i <= length2; i++)
 {
 copy[i] = string[length];
 length--;
 }

 


 printf("\n%s", copy);
}

 

There are a number of beginner mistakes in this code example. One answer attempted to correct the most obvious errors with the following result:

#include <stdio.h>
#include <string.h>
// remove unneeded headers

#define ARR_SIZE 50

int main(void)
{
  char string[ARR_SIZE];
  printf("Enter char array!\n");
  fgets(string, ARR_SIZE, stdin);

  string[strlen(string) - 1] = '\0';
  // remove the -1 on the string length calculation, the NUL terminator is not
  // included in strlen's return value
  int length = strlen(string);
  // no sense in calling strlen twice
  int length2 = length;

  // fixed `length` now prints the correct length
  printf("%s\t%d\n", string, length);

  // change from <= to <. The array indices where the characters live are
  // [0, length-1].
  for (int i = 0; i < length; i++)
  {
    printf("INDEX = %d CHAR = %c\n", i, string[i]);
  }
 
  // fixed `length2` now prints the correct length
  printf("%d", length2);
  char copy[ARR_SIZE];
 
  for (int i = 0; i < length2; i++)
  {
    // last character in `string` lives at the `length`-1 index
    copy[i] = string[length-1];
    length--;
  }

  // `length2` is the index after the last char in `copy`, this needs
  // to be NUL terminated.
  copy[length2] = '\0';

  // prints the reversed string
  printf("\n%s", copy);
}

 

These suggestions do produce a working program, but it has some weaknesses inherent in the C language handling of strings. The biggest weakness is the effort needed to deal with determining how long the string is. C strings are terminated with a null character. The example above uses the strlen function to find the position of the null character.

A second answer tried to offer a more compact solution, although it does omit the necessary header files. The author of this example suggested that a proper solution should define and use a function. Note that this function is highly compact, uncommented, and extremely terse. In other words it contains all the elements commonly associated with C programming.

char *copyreverse(char *dest, const char *src)
{
  size_t len = strlen(src);
  const char *end = src + len - !!len;
  char *wrk = dest;
  while(len--)
    *wrk++ = *end--;
  *wrk = 0;
  return dest;
}
int main()
{
  char dest[10];
  char *src = "hello";
  printf("`%s` reversed `%s`\n", src, copyreverse(dest, src));
}

 

In this case we see a function named copyreverse and a function named main in the same file. In C this places the two functions in the same file scope, allowing copyreverse to be visible to the implementation of main. C does not allow a function to be implemented within the scope of another function.

This terse solution has typical C problems. The copyreverse function takes two parameters, a pointer to character named dest and a pointer to character named src. Only the programmer know that these pointers should point to the start of an array of characters and not a simple character. There is no check in the program to ensure that the dest parameter points to an array large enough to hold all the characters contained in the array pointed to by the parameter src. This failure to check sizes opens the opportunity for a buffer overflow.

Another part of the simplification of this program is its lack of ability to read the string from standard input, thus making the program unnecessarily compact.

Ada versions

Next I offer a couple of solutions created in the Ada programming language.

The first Ada example creates a function using the Ada.Strings.Unbounded package which provides an append procedure for unbounded strings. Use of unbounded strings prevents any buffer overflow problems.

with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;

procedure Main is
  Result : Unbounded_String := Null_Unbounded_String;
  Input : String (1 .. 80);
  Length : Natural;
begin
  Put ("Enter a string: ");
  Get_Line (Item => Input, Last => Length);
  Put_Line (Input (1 .. Length));
  for C of reverse Input (1 .. Length) loop
    Append (Result, C);
  end loop;
  Put_Line (To_String (Result));
end Main;

 

The Get_Line procedure reads the string from standard input and places it into a string named Input. The Get_Line function passes out the modified string as well as a parameter indicating the index of the last character of the useful information in the string. The Put_Line procedure simply writes the input data to standard output. The data to be output is in the slice of the Input variable which was filled by Put_Line.  The for loop scans through the slice of Input in reverse order, appending each value to the Unbounded_String named Result. When the loop finishes Result will contain the data in Input (1 .. Length), but in reverse order.

The final Put_Line procedure call simply converts the Unbounded_String Result to a corresponding String value and outputs that String value.

A second Ada example avoids the use of the package Ada.Strings.Unbounded and does the string copy directly to another Ada string.

with Ada.Text_IO; use Ada.Text_IO;

procedure main_2 is
   function reverse_str (src : in String) return String is
     Result : String (src'Range);
     r_idx : Positive := Result'First;
   begin
     for value of reverse src loop
       Result (r_idx) := value;
       r_idx := r_idx + 1;
     end loop;
   return Result;
  end reverse_str;

  Input : String (1 .. 80);
  Length : Natural;
begin
  Put ("enter a string: ");
  Get_Line (Input, Length);
  Put_Line (Input (1 .. Length));
  Put_Line (reverse_str (Input (1 .. Length)));
end main_2;

 

The function named reverse_str is defined within the procedure main_2. Reverse_str takes one String parameter named src and returns a String. Reverse_str creates a String variable named Result. Result is created to have the same index range as src, and therefore has exactly the same size as src, eliminating any possibility of overflow. The variable r_idx is an instance of the subtype Positive and is initialized to the first index value of Result. The for loop iterates through src in reverse assigning each value of src to the element in Result indexed by the variable r_idx. The variable r_idx is then incremented and the loop continues until all elements of src have been processed. Result is then returned by the function reverse_str.

The body of main_2 prompts for a string to be input from standard input, reads up to 80 characters from standard input and places the data in the variable Input. The value of the string Input (1 .. Length) is output and on the next output line the value of the reverse of Input (1 .. Length) is output.

While not as terse as the second C solution the second Ada version is highly readable and does not require the programmer to provide a destination string parameter of possibly dubious size as is required in the C program. The Ada version also does not require the src string to be scanned to determine its size followed by an obscure expression seen in the C example to deal with a possible empty string. Instead the Ada range syntax defines a range where the first index is greater than the last index to be an empty range. For instance, if the users passes an empty string to the Ada program the notation Input (1 .. Length) resolves to Input ( 1 .. 0), which is an empty string. The for loop then does not iterate through the values because the first index exceeds the last index and Result, which is created with the expression Result : String := (src’Range) is defined as an empty string. Result is returned without being modified in the for loop and no buffer overflow occurs.

The Ada versions are both simpler because of the differences between Ada strings and C strings and also because Ada arrays, of which strings are an example, can easily be sliced.

The Ada versions both avoid the possibility of array overflows while the second C version does not.

The Ada versions avoid any pointer manipulation syntax while the second C example performs all array element processing using explicit pointer manipulations.

 

1 comment:

  1. This is another reason why Ada is more secure than C. Thanks for sharing your work.

    ReplyDelete

Comparison of three algorithms for summing an array of integers

This article shares a comparison of the time taken to sum an array of integers. This article tests three algorithms for summing the array. •...