Special mark at the beginning of the comment - comment ends at the end of the lineend of the line

In this case the program code is more robust, protected against possible influences of a mistake mentioned above. That is why C++ also introduced this form of comments: the special tag marks the beginning of the comment, which ends at the end of the same program line.

In Ada or in Eiffel, program reliability is of primary importance, that is why here only comments from the special tag till the end of the same program line are allowed.

In many modern programming languages, there are tools to support internal documentation. These can produce an extract of the source code, which contain the method and function specifications and the contents of some - with specific symbols tagged - remarks.

In Java, comments are the same as in C++, but there is also the possibility of documentation comments:

/** documentation comment */

this form of documentation comments can be extracted by the javadoc utility.

In Eiffel, the development environment supports internal documentation, so as the documentation of a class we can query all the comments after the specification line of the attributes, the pre- and postconditions, and also the class invariant.{This is called the short form of a class, for more information see [Meyer, 1991].}

In C#, for comments the possibilities of C++ ( , ) can be used, and also the tag for one liner documentation comments. These documentation comments must/ought to comply with the XML standard, so documentation generator utilities can properly manage them.

3.6. 2.6 Summary

In this chapter we examined the symbol sets from which lexical units can be built; how this process is standardized for each of the programming languages; how the identifiers of these languages are constructed, and which numerical-, character- and text literals are allowed.

3.7. 2.7 Exercises

Exercise 2.1. Which of the following identifiers is the most readable, which would you prefer to use in your programs? Explain!

MaxNumberOfEmployees max_number_of_employees MAXNUMBEROFEMPLOYEES MAX_NUMBER_OF_EMPLOYEES mnoe m_n_o_e

u32MaxEmployees maxEmployees

Exercise 2.2. Mention a programming language, in which nested comments are allowed! What could be the reason for the low number of such languages?

Exercise 2.3. In which case is it justified to use another than the decimal base for integer and real numbers?

3.8. 2.8 Useful tips

Tip 2.1. The usable syntax of the identifiers is determined by the given programming language. For example the case, even the length and the allowed character set of the identifiers were strongly regulated by the early languages, modern languages allow more possibilities nowadays. However, naming styles and conventions for

identifiers should be followed for every language. Compliant names should always be as informative as possible.

Tip 2.2. The lexical parser normally just discards everything after a comment start until a comment end sequence. If nested comments are allowed, the content of that comment can not be simple discarded, it must be examined even on a higher semantic level just to be able to distinguish between proper nesting levels.

Consider a C++ style comment notation, and inspect following nested comment situations:

/* commented out

string Str = "aslfkalfksnflkn*/*aslkfasnflkn";

Thus, the above string constant should be ignored, and

should not cause any problems related to comment deliminators!

/* another comment starts here

// this nesting works, it is a different kind of comment /*/

// so is this comment still nested or not?

Of course different comment constructs can usually be nested since the different comment ends will not be intermixed during lexical parsing.

Tip 2.3. Think of specifying bitmask data, like Unix style file permissions!

3.9. 2.9 Solutions

Solution 2.1. The naming styles and conventions for identifiers should be followed for every language.

Compliant names should always be as informative as possible. To denote multi-word identifiers, usually CamelCase is used as a practice of writing compound words such that each part begins with a capital letter. For better grouping and separation of the different parts, underscore or hyphen may be used. A good example for a more informative naming convention of identifiers is the so called "Hungarian notation", in which the identifier indicates its type or intended use by additional prefixes. There are actually two types of this notation (see the last two examples in the exercise), differing in the prefix encoding the actual physical data type (such as for unsigned bit integer) or the logical data type or purpose (such as for an upper limit).

Solution 2.2. The lexical parser normally just discards everything after a comment start until a comment end sequence. If nested comments are allowed, the content of that comment can not be simple discarded, it must be examined even on a higher semantic level just to be able to distinguish between proper nesting levels.

This needs an additional and very tolerant semantic analyzer, since the contents of the comments can be even syntactically incorrect, still the nesting levels of comments must be balanced not to cause unwanted effects such as commenting out bigger portions of the working code. This is usually too much effort for established languages, since introducing nested comment support could break backward compatibility. On the other hand, if exactly this is desired to comment out large portions of code with its own comments, other language features should be utilized (such as #if 0 and #endif of the C preprocessor).

This is the reason why nested comments are usually supported by specific or fresh languages without the need to be backward compatible, such as Rexx, Modula-2, Modula-3, Oberon, Haskell, Frege, Newspeak, D or Ocaml.

Solution 2.3. Since machine code is using a binary representation, for accessing bitwise data, numerical basis other than decimal are much more applicable. Especially the multiple of as base is usually used for this kind of representation, such as the binary format to specify specific bitmasks, for octets for grouping bits by like the Unix style file permissions, or by to address half bytes with hexadecimal numbers.

4. 3 Control structures, statements (Balázs Csizmazia,

Attila Kispitye, Judit Nyéky-Gaizler)

In imperative programming languages, statements describe the basic steps of the programs. The programmer issues them to implement the state space change of the program. Execution of the program is a sequence of these statements. Programming languages provide a variety of features to change this execution order. In this chapter, flow control structures of the imperative languages will be reviewed through their development progress, outlining the historical background and the calculation model behind them.

In this chapter we discuss flow control structures and basic statements in imperative programming languages.

Statements describe the fundamental steps of programs, while control structures, often realized as statements, allow controlling the execution order of statements. To understand flow control, we must know how microprocessors in von Neumann computers work by executing machine code stored in memory in their order of occurrence. This is called sequential control, or sequential execution order. This sequential execution order can be modified with four fundamental control transfer statements. These modifiers are the following:

• Unconditional transfer of control;

• Conditional transfer of control;

• Subroutine call: procedure and function call (in object-oriented languages: method call). In some programming languages recursion{For program codes the synonym self-invoking is also used for recursion.

Recursion is a broader concept, as simple self-invoking, so we will use this throughout our book.} is becoming ever more popular;

• Return from the subroutine.

In addition to these there are other control structures, which are usually language dependent, but can be expressed with the four basic modifiers listed above. An example of that is multiway branching: expressed by the case statement in some languages and by the switch statement in others.

4.1. 3.1 The job of a programmer

Programmers are often asked what they do at work. A detailed, technical answer is more or less meaningful depending on who has raised the question. Someone without any background in IT will perhaps just nod, and ask themselves, why society needs this. If we think about it carefully, an IT expert solves real-life problems.

This answer will make sense to everyone and will explain why society desperately needs their service. The job of a programmer is problem solving: to reach a desired target state - the solution - by changing the initial state.

Finding the solution is controlled by some fixed rules. Rules describe how to get from one state to the other. The method of solving a given problem or more precisely, a problem class is called an algorithm.

An algorithm must fulfill some basic requirements:

• It must be described with clearly defined steps;

• It must be executable step by step;

• It must be finite (both its description and execution);

• Every description must be precise: the computer is only capable of executing precisely described steps;

• The algorithm always starts from a well defined state described by input data, and reaches a well defined endstate.

Various tools can be used to describe an algorithm, depending on the needs and resources available. These include sentence-like descriptions, flow-diagrams, D-diagrams, block diagrams and structograms (also called Nassi-Schneider diagrams), and perhaps the most important method from the point of view of this book: textual description using programming languages.

Next we provide an overview of the basic application of these tools. With the help of the Euclidean algorithm, the greatest common divisor of two natural numbers will be determined. The basic idea behind this algorithm is this: the greatest common divisor of two numbers is at the same time the divisor of their difference.

4.1.1. 3.1.1 Sentence-like description

Sentence-like description describes the steps of an algorithm using common phrases and sentences. This method considers the least whether the words of a sentence are meaningful for a computer, although precision is an important requirement here too. The sentence-like description of the Euclidean algorithm may be formulated as follows (two numbers are given, their greatest common divisor must be determined):

1. Compare the two numbers.

2. If they are equal, the result is at hand: both numbers give the greatest common divisor of the original two numbers.

3. It they differ, the smaller must be subtracted from the greater.

4. Continue from step 1.

The description above defines with sufficient precision the method needed to compute the greatest common divisor, but it would be hard to have a computer execute the steps given in a natural language. The above algorithm may be modified as to use the features of the module operation (to prove the equivalence of these two algorithm-variants is left for the Reader as an exercise):

1. Divide the two numbers, with being the greater number. The remainder will be between and .

2. If (the remainder) is , is the greatest common divisor. Otherwise move the former into , and into .

3. Continue from step 1.

4.1.2. 3.1.2 Flow diagrams

As describing the algorithm on paper, we may use a two-dimensional representation. This ensures a clear description method, but this representation is still far from the concepts of programming languages.

In flow diagrams, the execution steps are written in rectangular boxes, and execution order is determined by arrows between the boxes. Conditional branches are represented by rhombus shaped boxes: the condition is written into the box, from which two arrows can point outwards: one arrow takes the execution if the condition in the box holds, the other if the condition does not hold. These arrows can be labeled by the branch they denote.

Since they are tangled, flow diagrams are hard to understand and are hard to implement in concrete programming languages. These programs are also difficult to modify.

The Euclidean algorithm can be described with flow diagrams as shown in Figure 4.

4.1.3. 3.1.3 D-diagrams

Unsatisfied with the tangled flow diagrams, Edgser W. Dijkstra introduced a reduced description set with elemental structures which have only one possible outward (following) execution path in every case (of course, the end of the program is an exception as here the execution stops). After Dijkstra, these flow diagram elements are known as D-diagrams.

1. A simple operation is a D-diagram.

2. If and are D-diagrams, then their sequential execution (first , afterwards ) is also a D-diagram, if is a condition, then "if condition is true, then , otherwise ", "if conditions is true, then ", "as long as the condition is true, execute in loop", and "execute , then unless the condition becomes true, execute in loop" are also D-diagrams.

3. There are no other D-diagrams (requirement: at most 1 outward execution path from all constructions).

D-diagrams are shown in Figure 5. For loops there are two types, the entry and exit controlled (pre- and post-test) variants.

The limitations of the D-diagrams have caused program complexity and difficulty to grow truly proportional to their length. The wide knowledge of these patterns makes this kind of description also significantly easier to read than traditional flow diagrams. The implementation of control structures in programs in accordance to these principles, the breakdown of the program to sub-programs, the declaration of expressions, variable types and their operations, are together all known as structured programming.

Böhm and Jacopini [Böhm and Jacopini, 1966] proved that every algorithm - described with flow diagrams - may be described by using D-diagrams alone. This kind of programming style is called programming without goto (the goto statement is used by many programming languages for transferring execution control).

4.1.4. 3.1.4 Block diagrams

A simple operation written in a (rectangular) box is a block diagram. Boxes drawn sequentially will be executed in a sequence. Branches specify two boxes and a condition. The condition controls which box will be executed.

For the false case of the condition, the box is not required to be given. The whole branch is boxed in, so it can be used anywhere where a box is allowed. In the case of loops an inner box holds the operations to be repeated (the loop body). This is surrounded by an outer box, which contains the loop condition.

The building blocks of block diagrams are shown in Figure 6.

4.1.5. 3.1.5 Structograms

Structograms are another tools used for describing algorithms. This involves writing operations into boxes, and arranging program structures from these boxes. Boxes holding the operations can be easily nested; nesting can demonstrate the structure of the program (similarly to the block diagrams). Their advantage to block diagrams is their restricted form, which is generated more easily by programs or word processors.

The building blocks of the structograms are shown in Figure 7.

4.2. 3.2 Implementation in assembly

Programming languages are often categorized as low or high level languages. Low level languages are usually created for a specific computer or architecture: statements here correspond to the instruction set of the

microprocessor on the target architecture. These languages are called assembly. Assembly statements relate directly to machine code instructions. Compilation from the assembly programming language to machine code is done by an assembler. The instruction set of a high level - algorithmic - language is independent of that on any given computer architecture: before execution, a compiler must create assembly and machine code from the source code.

Next we present the implementations of the Euclidean algorithm in Pascal and LMC languages.

4.2.1. 3.2.1 The solution in Pascal

Following is a Pascal implementation of the Euclidean algorithm (a basic knowledge of the Pascal programming language is a prerequisite for understanding the implementation to follow):

procedure gcd(p, q: integer; var result: integer);

begin

while q > 0 do begin

if p > q then p := p - q else q := q - p;

end; (* while *) result := p end; (* gcd *)

4.2.2. 3.2.2 LMC

The Little Man Computer (LMC) is an instructional model of a von Neumann architecture computer, created by Dr. Stuart Madnick at MIT. Components of this simplified model are:

• The number system used for data consists of three decimal digits, representing integers in the range to . Negative values are encoded in ten's complement, which is computed for negative numbers by adding

to it and which leaves non-negative values unchanged.

• Mailboxes: this is the working memory of the model. The address range is limited to two digit decimals ( - ), each Mailbox can hold one unit of data or instruction code.

• Instruction Location Counter: a two digit display with the address of the next Mailbox to evaluate. Programs start at the address , with the push of the Reset button. Leaving the end of the address range ( ) causes abnormal program termination.

• In and Out Baskets: the input and output communication ports of the model. Data can only be read in from In, and can only be output to the Out Basket, one at a time. The handling of multiple subsequent data is implemented in a First In First Out (FIFO queue) fashion. Signed data is converted automatically to the internal ten's complement representation at Input and vice versa at Output.

• Calculator: temporary data storage for arithmetic operations. Its value range is the same as of Mailboxes ( - ); the supported operations are addition and subtraction. Numerical under and overflow lead to abnormal program termination.

• Little Man: works inside the above defined architecture and performs the following operations rigorously:

1. Reads the current value of the Instruction Location Counter.

2. Goes to the mailbox with that number and reads its content.

3. Pushes the Counter incrementer button to advance (by one) the value of the Instruction Location Counter.

4. Interprets the last read mailbox content and executes its value as an operation code.

5. If not stopped by the operation before, continues with step 1.

Note that incrementing the Instruction Location Counter occurs before executing the current operation, so that branching can land at the desired location.

The above components are illustrated in Figure 8.

The above architecture resembles the functional organization defined by von Neumann, as there is a control unit (the Little Man and the Instruction Location Counter) to execute instructions, an arithmetic unit (the Calculator) to perform calculations, and a memory (the Mailboxes) to hold both programs and data (this is known as the stored program concept) in a linearly addressed (with a two digit sequential number) location space.

Execution of an LMC program needs the following preparations:

1. Instructions (machine code) must be loaded into the mailboxes, starting from address 00.

2. Input data must be placed into the In Basket in proper order.

3. By pressing the Reset button, the execution starts, the Little Man wakes up and performs his duty.

4. The result will appear in the Out Basket.

The instruction codes of the LMC tell the Little Man what to do. These codes are the machine code of this architecture and are stored as ordinary data within the Mailboxes. Therefore each instruction is made of 3 decimal digits, the first representing the command to perform, and the next two digits addressing the mailbox for the operand of the command (this is called indirect addressing).

Encountering a non-defined instruction code leads to abnormal program termination. Due to this and the von Neumann stored program concept, special care must be taken to prevent program execution to reach pure data within the memory.

As seen by the DAT instruction, LMC also supports assembly level programming. In such cases, the LMC assembly program is written in plain text source format, using only mnemonics for the instructions. Each line can have a label at the beginning, which can be used as target for other instructions. Comments are also allowed after the instruction operand at the end of each line. Compilation from LMC assembly to machine code is the task of an LMC assembler.

The LMC assembly implementation of the Euclidean algorithm takes the following form:

INP ; 00 901 input p STO p ; 01 308 store p INP ; 02 901 input q BRZ end ; 03 705 while q > 0

BRP loop ; 04 810

end LDA p ; 05 508 result is p OUT ; 06 902

HLT ; 07 000 p DAT ; 08 q DAT ; 09

In document Advanced Programming Languages (Pldal 53-102)