Handling C-style strings - Arrays and strings

7. Arrays and strings

7.4. Handling C-style strings

int main() {

int number_of_rows, rowlength;

cout<<"Number of rows: "; cin >> number_of_rows;

cout<<"The length of rows: "; cin >> rowlength;

vector< vector<int> > m (number_of_rows, rowlength);

// Accessing the array

for (int i=0; i<number_of_rows; i++) for (int j=0; j<rowlength; j++) m[i][j]= i+j;

}

7.4. Handling C-style strings

The primitive types of C/C++ do not contain the string type appropriate for storing character sequences. But since it is indispensable to store and to process strings in C/C++ programs, storage can be realised by one-dimensional character arrays. For string processing, it is also needed that valuable characters are always closed by a byte with value 0. There are no operators for managing strings; however, there are many functions for that purpose (in the cstring header file).

I.21. ábra - String constant in memory

In program codes, we often use texts enclosed within quotation marks (string literals), that compilers store among the initialized data according to the things said before.

cout << "C++ language";

When interpreting the statement above, the string is copied into memory, (I.21. ábra - String constant in memory) and as a right operand of the operation << , an address of type const char * is compiled. When the program is executed, the cout object prints out character by character the content of the selected memory block until it reaches the byte with value 0.

Strings composed of wide characters are also stored in that way but in this case the type of the elements of the array is wchar_t.

wcout << L"C++ language";

In C++, the types string and wstring can also be used to process texts, so we give an overview of these types, too.

7.4.1. Strings in one-dimensional arrays

When memory space is allocated for a character string, the byte indicating the end of the string should also be taken into consideration. If a text of at most 80 characters is intended to be stored in the array named str then its size should be 80+1=81:

char line[81];

In programming tasks, we often use strings having an initial value. In order to provide an initial value, we can use the solutions already presented in relation with arrays; however, we should not forget about the final '\0' character:

char st1[10] = { 'O', 'm', 'e', 'g', 'a', '\0' };

wchar_t wst1[10] = { L'O', L'm', L'e', L'g', L'a', L'\0' };

char st2[] = { 'O', 'm', 'e', 'g', 'a', '\0' };

wchar_t wst2[] = { L'O', L'm', L'e', L'g', L'a', L'\0' };

Compiler allocates 10 bytes of memory for the string st1 and the given characters are copied into the first 6 bytes. However, st2 will be of a size of as many bytes as many characters are provided in the initialization list.

For the wst1 and wst2 wide character strings, compilers allocate a memory space twice as much (in bytes) as in the previous ones.

Initializing character arrays is much safer by using string literals (string constants):

char st1[10] = "Omega"; wchar_t wst1[10] = L"Omega";

char st2[] = "Omega"; wchar_t wst2[] = L"Omega";

Whereas the initialization have the same results in both cases (i.e. with characters and with string constants), using string constants is easier to understand. Not to mention the fact that the 0 byte closing strings is placed in memory by the compiler.

The result of the storage in arrays is that there is no operation available in C++ for strings either (value assignment, comparison etc.). However, there are many Library functions for processing character sequences.

Let's see some basic functions processing character sequences.

Operation Function (char) Function (wchar_t)

reading text from the keyboard cin >>, cin . get (), cin . getline ()

wcin >>, wcin . get (), wcin . getline ()

printing out a text cout << wcout <<

value assignment strcpy (), strncpy () wcscpy(), wcsncpy()

concatenation strcat (), strncat () wcscat(), wcsncat()

getting the length of a string strlen () wcslen()

comparison of strings strcmp (), strcnmp () wcscmp(), wcsncmp()

searching for a character in a string strchr () wcschr()

In order to manage strings with char type characters, we need to include the iostream and cstring header files, whereas wide character functions are found in cwchar .

The following example code transforms the text read from the keyboard and prints out all its characters in capital letters and in reverse order. It can clearly be seen from the example that we should use both Library functions and character array notion to efficiently manage strings.

#include <iostream>

#include <cstring>

#include <cctype>

using namespace std;

int main() { char s[80];

cout <<"Type in a text: ";

cin.get(s, 80);

cout<<"The read text: "<< s << endl;

for (int i = strlen(s)-1; i >= 0; i--) cout<<(char)toupper(s[i]);

cout<<endl;

}

The wide character version of the previous solution is:

#include <iostream>

#include <cwchar>

#include <cwctype>

using namespace std;

int main() { wchar_t s[80];

wcout <<L"Type a text: ";

wcin.get(s, 80);

wcout<<L"The read text: "<< s << endl;

for (int i = wcslen(s)-1; i >= 0; i--) wcout<<(wchar_t)towupper(s[i]);

wcout<<endl;

}

In both examples, we used the secure cin . get () function to read a text from the input. The function reads all characters until <Enter> is pressed. However, the given array can only have a number of characters same as or less than size-1, which is provided as one of its argument.

7.4.2. Strings and pointers

Character arrays and character pointers can both be used to manage strings but pointers should be used more carefully. Let's see the following frequent definitions.

char str[16] = "alfa";

char *pstr = "gamma";

In the first case, the compiler creates the array str of 16 elements and then it copies in the characters of the provided initial value and the byte with value 0. In the second case, the compiler stores the initial text in the area reserved for string literals, then it initializes the pointer pstr to the beginning address of the string.

The value of the pointer pstr can be modified later (which causes the loss of the string "gamma"):

pstr = "iota";

A pointer value assignment takes place here since pstr now points to the address of the new string literal. On the contrary, if it is the name of the array str to which a value is assigned, an error message is obtained:

str = "iota"; // error! ↯

If a string has to be processed character by character, then we can choose from the array and the pointer approach. In the following example code, the read character sequence is first encrypted with the exclusive-or operation and then its content will be replaced again by its original content. (During encryption the string is treated as an array, and during decryption the pointer approach is used.) In both cases, the loop ends if the string closing zero byte is reached.

#include <iostream>

using namespace std;

const unsigned char key = 0xCD;

int main() {

char s[80], *p;

cout <<"Type in a text: ";

cin.get(s, 80);

for (int i = 0; s[i]; i++) // encryption s[i] ^= key;

cout << "The encrypted text: "<< s << endl;

p = s;

while (*p) // decryption *p++ ^= key;

cout << "The original text: "<< s << endl;

}

In the following example, the increment and indirection operators are used together, which requires more carefulness. In the following example, the pointer sp points to a dynamically stored character sequence. (It should be noted that most C++ implementations do not allow modifications of string literals.)

char *sp = new char [33];

strcpy(sp, "C++");

cout << ++*sp << endl; // D cout << sp << endl; // D++

cout << *sp++ << endl; // D cout << sp << endl; // ++

In the first case (++*sp), the compiler interprets first the indirection operator and then increments the referred charater. In the second case (*sp++), the compiler first steps the pointer to the next character but since it is a post-increment operator, increment takes place after processing the whole expression. The value of the expression is the referenced character.

7.4.3. Using string arrays

Most C++ programs contain texts (e.g. error messages) that are printed on the basis of a certain index (error code). The simplest solution for storing such texts is defining string arrays.

When string arrays are planned, it should be decided whether they will be two-dimensional or pointer arrays.

For beginners in C++ programming, it is often difficult to differentiate between the two. Let's see the following two definitions.

int a[4][6];

int* b[4];

a is a "real” two-dimensional array for which the compiler allocates a continuous memory block for 24 (4x6) elements of type int. On the contrary, b is a pointer vector of 4 elements. The compiler allocates space for only four pointers based on this definition. The other parts of initialization is done later in the code. Let's initialize the pointer array so that it would be able to store 5x10 integer elements.

int s1[6], s2[6], s3[6], s4[6];

int* b[4] = { s1, s2, s3, s4 };

It is clear that besides the memory block needed for 24 int elements, further memory space was also used (for the pointers). At this point, it would be logical to ask what the advantages are of using pointer arrays. The response can be found in the length of rows. While in a two-dimensional array every row contains the same number of elements,

the size of each row can be different in pointer arrays.

int s1[1], s2[3], s3[2], s4[6];

int* b[4] = { s1, s2, s3, s4 };

The other advantage of pointer arrays is that their structure is in line with the possibilities of dynamic memory allocation, thus it has an important role when dynamically allocated arrays are to be created.

After this short introduction, let's see the subject of the present subchapter: the creation of string arrays. In general, string arrays are defined by providing them initial values. In the first example, a two-dimensional array is defined with the following statement:

static char names1[][10] = { "Ivan", "Olesya", "Anna", "Adrienn" };

This definition results in the creation of a 4x10 character array: the number of the rows is determined by the compiler on the basis of the initialization list. The rows of the two-dimensional character array are placed in a linear sequence in memory (I.22. ábra - String array stored in a two-dimensional array).

I.22. ábra - String array stored in a two-dimensional array

In the second case, a pointer array is used to store the addresses of the names:

static char* names2[] = { "Ivan", "Olesya", "Anna", "Adrienn" };

The compiler allocates four blocks of different size in memory, as it is shown on I.23. ábra - Optimally stored string array:

I.23. ábra - Optimally stored string array

It is worth comparing the two solutions with respect to definition and memory access.

cout << names1[0] << endl; // Ivan cout << names1[1][4] << endl; // y cout << names2[0] << endl; // Ivan cout << names2[1][4] << endl; // y

In document Mechatronic Systems Programming in C++ (Pldal 85-90)