7. Improving Performance

This chapter presents several topics related to program performance. It first describes some of the tradeoffs that need to be considered and some of the techniques for making your program run faster. It then documents the gnatelim tool, which can reduce the size of program executables.

7.1 Performance Considerations

7.2 Reducing the Size of Ada Executables with gnatelim

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1 Performance Considerations

The GNAT system provides a number of options that allow a trade-off between

performance of the generated code
speed of compilation
minimization of dependences and recompilation
the degree of run-time checking.

The defaults (if no options are selected) aim at improving the speed of compilation and minimizing dependences, at the expense of performance of the generated code:

no optimization
no inlining of subprogram calls
all run-time checks enabled except overflow and elaboration checks

These options are suitable for most program development purposes. This chapter describes how you can modify these choices, and also provides some guidelines on debugging optimized code.

7.1.1 Controlling Run-Time Checks

7.1.2 Use of Restrictions

7.1.3 Optimization Levels

7.1.4 Debugging Optimized Code

7.1.5 Inlining of Subprograms

7.1.6 Optimization and Strict Aliasing

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.1 Controlling Run-Time Checks

By default, GNAT generates all run-time checks, except arithmetic overflow checking for integer operations and checks for access before elaboration on subprogram calls. The latter are not required in default mode, because all necessary checking is done at compile time. Two gnat switches, `-gnatp' and `-gnato' allow this default to be modified. See section 3.2.6 Run-Time Checks.

Our experience is that the default is suitable for most development purposes.

We treat integer overflow specially because these are quite expensive and in our experience are not as important as other run-time checks in the development process. Note that division by zero is not considered an overflow check, and divide by zero checks are generated where required by default.

Elaboration checks are off by default, and also not needed by default, since GNAT uses a static elaboration analysis approach that avoids the need for run-time checking. This manual contains a full chapter discussing the issue of elaboration checks, and if the default is not satisfactory for your use, you should read this chapter.

For validity checks, the minimal checks required by the Ada Reference Manual (for case statements and assignments to array elements) are on by default. These can be suppressed by use of the `-gnatVn' switch. Note that in Ada 83, there were no validity checks, so if the Ada 83 mode is acceptable (or when comparing GNAT performance with an Ada 83 compiler), it may be reasonable to routinely use `-gnatVn'. Validity checks are also suppressed entirely if `-gnatp' is used.

Note that the setting of the switches controls the default setting of the checks. They may be modified using either pragma Suppress (to remove checks) or pragma Unsuppress (to add back suppressed checks) in the program source.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.2 Use of Restrictions

The use of pragma Restrictions allows you to control which features are permitted in your program. Apart from the obvious point that if you avoid relatively expensive features like finalization (enforceable by the use of pragma Restrictions (No_Finalization), the use of this pragma does not affect the generated code in most cases.

One notable exception to this rule is that the possibility of task abort results in some distributed overhead, particularly if finalization or exception handlers are used. The reason is that certain sections of code have to be marked as non-abortable.

If you use neither the abort statement, nor asynchronous transfer of control (select .. then abort), then this distributed overhead is removed, which may have a general positive effect in improving overall performance. Especially code involving frequent use of tasking constructs and controlled types will show much improved performance. The relevant restrictions pragmas are

pragma Restrictions (No_Abort_Statements); pragma Restrictions (Max_Asynchronous_Select_Nesting => 0);

It is recommended that these restriction pragmas be used if possible. Note that this also means that you can write code without worrying about the possibility of an immediate abort at any point.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.3 Optimization Levels

The default is optimization off. This results in the fastest compile times, but GNAT makes absolutely no attempt to optimize, and the generated programs are considerably larger and slower than when optimization is enabled. You can use the `-On' switch, where n is an integer from 0 to 3, to gcc to control the optimization level:

`-O0': No optimization (the default); generates unoptimized code but has the fastest compilation time.
`-O1': Medium level optimization; optimizes reasonably well but does not degrade compilation time significantly.
`-O2': Full optimization; generates highly optimized code and has the slowest compilation time.
`-O3': Full optimization as in `-O2', and also attempts automatic inlining of small subprograms within a unit (see section 7.1.5 Inlining of Subprograms).

Higher optimization levels perform more global transformations on the program and apply more expensive analysis algorithms in order to generate faster and more compact code. The price in compilation time, and the resulting improvement in execution time, both depend on the particular application and the hardware environment. You should experiment to find the best level for your application.

Since the precise set of optimizations done at each level will vary from release to release (and sometime from target to target), it is best to think of the optimization settings in general terms. The Using GNU GCC manual contains details about the `-O' settings and a number of `-f' options that individually enable or disable specific optimizations.

Unlike some other compilation systems, gcc has been tested extensively at all optimization levels. There are some bugs which appear only with optimization turned on, but there have also been bugs which show up only in unoptimized code. Selecting a lower level of optimization does not improve the reliability of the code generator, which in practice is highly reliable at all optimization levels.

Note regarding the use of `-O3': The use of this optimization level is generally discouraged with GNAT, since it often results in larger executables which run more slowly. See further discussion of this point in 7.1.5 Inlining of Subprograms.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.4 Debugging Optimized Code

Although it is possible to do a reasonable amount of debugging at non-zero optimization levels, the higher the level the more likely that source-level constructs will have been eliminated by optimization. For example, if a loop is strength-reduced, the loop control variable may be completely eliminated and thus cannot be displayed in the debugger. This can only happen at `-O2' or `-O3'. Explicit temporary variables that you code might be eliminated at level `-O1' or higher.

The use of the `-g' switch, which is needed for source-level debugging, affects the size of the program executable on disk, and indeed the debugging information can be quite large. However, it has no effect on the generated code (and thus does not degrade performance)

Since the compiler generates debugging tables for a compilation unit before it performs optimizations, the optimizing transformations may invalidate some of the debugging data. You therefore need to anticipate certain anomalous situations that may arise while debugging optimized code. These are the most common cases:

The "hopping Program Counter": Repeated step or next commands show the PC bouncing back and forth in the code. This may result from any of the following optimizations:
- Common subexpression elimination: using a single instance of code for a quantity that the source computes several times. As a result you may not be able to stop on what looks like a statement.
- Invariant code motion: moving an expression that does not change within a loop, to the beginning of the loop.
- Instruction scheduling: moving instructions so as to overlap loads and stores (typically) with other code, or in general to move computations of values closer to their uses. Often this causes you to pass an assignment statement without the assignment happening and then later bounce back to the statement when the value is actually needed. Placing a breakpoint on a line of code and then stepping over it may, therefore, not always cause all the expected side-effects.
The "big leap": More commonly known as cross-jumping, in which two identical pieces of code are merged and the program counter suddenly jumps to a statement that is not supposed to be executed, simply because it (and the code following) translates to the same thing as the code that was supposed to be executed. This effect is typically seen in sequences that end in a jump, such as a goto, a return, or a break in a C switch statement.
The "roving variable": The symptom is an unexpected value in a variable. There are various reasons for this effect:
- In a subprogram prologue, a parameter may not yet have been moved to its "home".
- A variable may be dead, and its register re-used. This is probably the most common cause.
- As mentioned above, the assignment of a value to a variable may have been moved.
- A variable may be eliminated entirely by value propagation or other means. In this case, GCC may incorrectly generate debugging information for the variable
In general, when an unexpected value appears for a local variable or parameter you should first ascertain if that value was actually computed by your program, as opposed to being incorrectly reported by the debugger. Record fields or array elements in an object designated by an access value are generally less of a problem, once you have ascertained that the access value is sensible. Typically, this means checking variables in the preceding code and in the calling subprogram to verify that the value observed is explainable from other values (one must apply the procedure recursively to those other values); or re-running the code and stopping a little earlier (perhaps before the call) and stepping to better see how the variable obtained the value in question; or continuing to step from the point of the strange value to see if code motion had simply moved the variable's assignments later.

In light of such anomalies, a recommended technique is to use `-O0' early in the software development cycle, when extensive debugging capabilities are most needed, and then move to `-O1' and later `-O2' as the debugger becomes less critical. Whether to use the `-g' switch in the release version is a release management issue. Note that if you use `-g' you can then use the strip program on the resulting executable, which removes both debugging information and global symbols.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.5 Inlining of Subprograms

A call to a subprogram in the current unit is inlined if all the following conditions are met:

The optimization level is at least `-O1'.
The called subprogram is suitable for inlining: It must be small enough and not contain nested subprograms or anything else that gcc cannot support in inlined subprograms.
The call occurs after the definition of the body of the subprogram.
Either pragma Inline applies to the subprogram or it is small and automatic inlining (optimization level `-O3') is specified.

Calls to subprograms in with'ed units are normally not inlined. To achieve this level of inlining, the following conditions must all be true:

The optimization level is at least `-O1'.
The called subprogram is suitable for inlining: It must be small enough and not contain nested subprograms or anything else gcc cannot support in inlined subprograms.
The call appears in a body (not in a package spec).
There is a pragma Inline for the subprogram.
The `-gnatn' switch is used in the gcc command line

Note that specifying the `-gnatn' switch causes additional compilation dependencies. Consider the following:

package R is procedure Q; pragma Inline (Q); end R; package body R is ... end R; with R; procedure Main is begin ... R.Q; end Main;

With the default behavior (no `-gnatn' switch specified), the compilation of the Main procedure depends only on its own source, `main.adb', and the spec of the package in file `r.ads'. This means that editing the body of R does not require recompiling Main.

On the other hand, the call R.Q is not inlined under these circumstances. If the `-gnatn' switch is present when Main is compiled, the call will be inlined if the body of Q is small enough, but now Main depends on the body of R in `r.adb' as well as on the spec. This means that if this body is edited, the main program must be recompiled. Note that this extra dependency occurs whether or not the call is in fact inlined by gcc.

The use of front end inlining with `-gnatN' generates similar additional dependencies.

Note: The `-fno-inline' switch can be used to prevent all inlining. This switch overrides all other conditions and ensures that no inlining occurs. The extra dependences resulting from `-gnatn' will still be active, even if this switch is used to suppress the resulting inlining actions.

Note regarding the use of `-O3': There is no difference in inlining behavior between `-O2' and `-O3' for subprograms with an explicit pragma Inline assuming the use of `-gnatn' or `-gnatN' (the switches that activate inlining). If you have used pragma Inline in appropriate cases, then it is usually much better to use `-O2' and `-gnatn' and avoid the use of `-O3' which in this case only has the effect of inlining subprograms you did not think should be inlined. We often find that the use of `-O3' slows down code by performing excessive inlining, leading to increased instruction cache pressure from the increased code size. So the bottom line here is that you should not automatically assume that `-O3' is better than `-O2', and indeed you should use `-O3' only if tests show that it actually improves performance.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.1.6 Optimization and Strict Aliasing

The strong typing capabilities of Ada allow an optimizer to generate efficient code in situations where other languages would be forced to make worst case assumptions preventing such optimizations. Consider the following example:

procedure R is type Int1 is new Integer; type Int2 is new Integer; type Int1A is access Int1; type Int2A is access Int2; Int1V : Int1A; Int2V : Int2A; ... begin ... for J in Data'Range loop if Data (J) = Int1V.all then Int2V.all := Int2V.all + 1; end if; end loop; ... end R;

In this example, since the variable Int1V can only access objects of type Int1, and Int2V can only access objects of type Int2, there is no possibility that the assignment to Int2V.all affects the value of Int1V.all. This means that the compiler optimizer can "know" that the value Int1V.all is constant for all iterations of the loop and avoid the extra memory reference required to dereference it each time through the loop.

This kind of optimziation, called strict aliasing analysis, is triggered by specifying an optimization level of `-O2' or higher and allows GNAT to generate more efficient code when access values are involved.

However, although this optimization is always correct in terms of the formal semantics of the Ada Reference Manual, difficulties can arise if features like Unchecked_Conversion are used to break the typing system. Consider the following complete program example:

package p1 is type int1 is new integer; type int2 is new integer; type a1 is access int1; type a2 is access int2; end p1; with p1; use p1; package p2 is function to_a2 (Input : a1) return a2; end p2; with Unchecked_Conversion; package body p2 is function to_a2 (Input : a1) return a2 is function to_a2u is new Unchecked_Conversion (a1, a2); begin return to_a2u (Input); end to_a2; end p2; with p2; use p2; with p1; use p1; with Text_IO; use Text_IO; procedure m is v1 : a1 := new int1; v2 : a2 := to_a2 (v1); begin v1.all := 1; v2.all := 0; put_line (int1'image (v1.all)); end;

This program prints out 0 in -O0 or -O1 mode, but it prints out 1 in -O2 mode. That's because in strict aliasing mode, the compiler can and does assume that the assignment to v2.all could not affect the value of v1.all, since different types are involved.

This behavior is not a case of non-conformance with the standard, since the Ada RM specifies that an unchecked conversion where the resulting bit pattern is not a correct value of the target type can result in an abnormal value and attempting to reference an abnormal value makes the execution of a program erroneous. That's the case here since the result does not point to an object of type int2. This means that the effect is entirely unpredictable.

However, although that explanation may satisfy a language lawyer, in practice an applications programmer expects an unchecked conversion involving pointers to create true aliases and the behavior of printing 1 seems plain wrong. In this case, the strict aliasing optimization is unwelcome.

Indeed the compiler recognizes this possibility, and the unchecked conversion generates a warning:

p2.adb:5:07: warning: possible aliasing problem with type "a2" p2.adb:5:07: warning: use -fno-strict-aliasing switch for references p2.adb:5:07: warning: or use "pragma No_Strict_Aliasing (a2);"

Unfortunately the problem is recognized when compiling the body of package p2, but the actual "bad" code is generated while compiling the body of m and this latter compilation does not see the suspicious Unchecked_Conversion.

As implied by the warning message, there are approaches you can use to avoid the unwanted strict aliasing optimization in a case like this.

One possibility is to simply avoid the use of -O2, but that is a bit drastic, since it throws away a number of useful optimizations that do not involve strict aliasing assumptions.

A less drastic approach is to compile the program using the option -fno-strict-aliasing. Actually it is only the unit containing the dereferencing of the suspicious pointer that needs to be compiled. So in this case, if we compile unit m with this switch, then we get the expected value of zero printed. Analyzing which units might need the switch can be painful, so a more reasonable approach is to compile the entire program with options -O2 and -fno-strict-aliasing. If the performance is satisfactory with this combination of options, then the advantage is that the entire issue of possible "wrong" optimization due to strict aliasing is avoided.

To avoid the use of compiler switches, the configuration pragma No_Strict_Aliasing with no parameters may be used to specify that for all access types, the strict aliasing optimization should be suppressed.

However, these approaches are still overkill, in that they causes all manipulations of all access values to be deoptimized. A more refined approach is to concentrate attention on the specific access type identified as problematic.

First, if a careful analysis of uses of the pointer shows that there are no possible problematic references, then the warning can be suppressed by bracketing the instantiation of Unchecked_Conversion to turn the warning off:

pragma Warnings (Off); function to_a2u is new Unchecked_Conversion (a1, a2); pragma Warnings (On);

Of course that approach is not appropriate for this particular example, since indeed there is a problematic reference. In this case we can take one of two other approaches.

The first possibility is to move the instantiation of unchecked conversion to the unit in which the type is declared. In this example, we would move the instantiation of Unchecked_Conversion from the body of package p2 to the spec of package p1. Now the warning disappears. That's because any use of the access type knows there is a suspicious unchecked conversion, and the strict aliasing optimization is automatically suppressed for the type.

If it is not practical to move the unchecked conversion to the same unit in which the destination access type is declared (perhaps because the source type is not visible in that unit), you may use pragma No_Strict_Aliasing for the type. This pragma must occur in the same declarative sequence as the declaration of the access type:

type a2 is access int2; pragma No_Strict_Aliasing (a2);

Here again, the compiler now knows that the strict aliasing optimization should be suppressed for any reference to type a2 and the expected behavior is obtained.

Finally, note that although the compiler can generate warnings for simple cases of unchecked conversions, there are tricker and more indirect ways of creating type incorrect aliases which the compiler cannot detect. Examples are the use of address overlays and unchecked conversions involving composite types containing access types as components. In such cases, no warnings are generated, but there can still be aliasing problems. One safe coding practice is to forbid the use of address clauses for type overlaying, and to allow unchecked conversion only for primitive types. This is not really a significant restriction since any possible desired effect can be achieved by unchecked conversion of access values.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2 Reducing the Size of Ada Executables with `gnatelim`

This section describes gnatelim, a tool which detects unused subprograms and helps the compiler to create a smaller executable for your program.

7.2.1 About gnatelim

7.2.2 Running gnatelim

7.2.3 Correcting the List of Eliminate Pragmas

7.2.4 Making Your Executables Smaller

7.2.5 Summary of the gnatelim Usage Cycle

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2.1 About `gnatelim`

When a program shares a set of Ada packages with other programs, it may happen that this program uses only a fraction of the subprograms defined in these packages. The code created for these unused subprograms increases the size of the executable.

gnatelim tracks unused subprograms in an Ada program and outputs a list of GNAT-specific pragmas Eliminate marking all the subprograms that are declared but never called. By placing the list of Eliminate pragmas in the GNAT configuration file `gnat.adc' and recompiling your program, you may decrease the size of its executable, because the compiler will not generate the code for 'eliminated' subprograms. See GNAT Reference Manual for more information about this pragma.

gnatelim needs as its input data the name of the main subprogram and a bind file for a main subprogram.

To create a bind file for gnatelim, run gnatbind for the main subprogram. gnatelim can work with both Ada and C bind files; when both are present, it uses the Ada bind file. The following commands will build the program and create the bind file:

$ gnatmake -c Main_Prog $ gnatbind main_prog

Note that gnatelim needs neither object nor ALI files.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2.2 Running `gnatelim`

gnatelim has the following command-line interface:

$ gnatelim [options] name

name should be a name of a source file that contains the main subprogram of a program (partition).

gnatelim has the following switches:

`-q': Quiet mode: by default gnatelim outputs to the standard error stream the number of program units left to be processed. This option turns this trace off.
`-v': Verbose mode: gnatelim version information is printed as Ada comments to the standard output stream. Also, in addition to the number of program units left gnatelim will output the name of the current unit being processed.
`-a': Also look for subprograms from the GNAT run time that can be eliminated. Note that when `gnat.adc' is produced using this switch, the entire program must be recompiled with switch `-a' to gnatmake.
`-Idir': When looking for source files also look in directory dir. Specifying `-I-' instructs gnatelim not to look for sources in the current directory.
`-bbind_file': Specifies bind_file as the bind file to process. If not set, the name of the bind file is computed from the full expanded Ada name of a main subprogram.
`-Cconfig_file': Specifies a file config_file that contains configuration pragmas. The file must be specified with full path.
`--GCC=compiler_name': Instructs gnatelim to use specific gcc compiler instead of one available on the path.
`--GNATMAKE=gnatmake_name': Instructs gnatelim to use specific gnatmake instead of one available on the path.

gnatelim sends its output to the standard output stream, and all the tracing and debug information is sent to the standard error stream. In order to produce a proper GNAT configuration file `gnat.adc', redirection must be used:

$ gnatelim main_prog.adb > gnat.adc

$ gnatelim main_prog.adb >> gnat.adc

in order to append the gnatelim output to the existing contents of `gnat.adc'.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2.3 Correcting the List of Eliminate Pragmas

In some rare cases gnatelim may try to eliminate subprograms that are actually called in the program. In this case, the compiler will generate an error message of the form:

file.adb:106:07: cannot call eliminated subprogram "My_Prog"

You will need to manually remove the wrong Eliminate pragmas from the `gnat.adc' file. You should recompile your program from scratch after that, because you need a consistent `gnat.adc' file during the entire compilation.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2.4 Making Your Executables Smaller

In order to get a smaller executable for your program you now have to recompile the program completely with the new `gnat.adc' file created by gnatelim in your current directory:

$ gnatmake -f main_prog

(Use the `-f' option for gnatmake to recompile everything with the set of pragmas Eliminate that you have obtained with gnatelim).

Be aware that the set of Eliminate pragmas is specific to each program. It is not recommended to merge sets of Eliminate pragmas created for different programs in one `gnat.adc' file.

[ < ]

[ > ]

[ << ]

[ Up ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

7.2.5 Summary of the gnatelim Usage Cycle

Here is a quick summary of the steps to be taken in order to reduce the size of your executables with gnatelim. You may use other GNAT options to control the optimization level, to produce the debugging information, to set search path, etc.

Produce a bind file

$ gnatmake -c main_prog $ gnatbind main_prog

Generate a list of Eliminate pragmas
$ gnatelim main_prog >[>] gnat.adc
Recompile the application
$ gnatmake -f main_prog

[ << ]

[ >> ]

[Top]

[Contents]

[Index]

[ ? ]

This document was generated by Mail Server on June, 15 2005 using texi2html

7.2.1 About `gnatelim`
7.2.2 Running `gnatelim`
7.2.3 Correcting the List of Eliminate Pragmas
7.2.4 Making Your Executables Smaller
7.2.5 Summary of the gnatelim Usage Cycle

7. Improving Performance

7.1 Performance Considerations

7.1.1 Controlling Run-Time Checks

7.1.2 Use of Restrictions

7.1.3 Optimization Levels

7.1.4 Debugging Optimized Code

7.1.5 Inlining of Subprograms

7.1.6 Optimization and Strict Aliasing

7.2 Reducing the Size of Ada Executables with gnatelim

7.2.1 About gnatelim

7.2.2 Running gnatelim

7.2.3 Correcting the List of Eliminate Pragmas

7.2.4 Making Your Executables Smaller

7.2.5 Summary of the gnatelim Usage Cycle

7.2 Reducing the Size of Ada Executables with `gnatelim`

7.2.1 About `gnatelim`

7.2.2 Running `gnatelim`