# Program design and analysis for dedicated systems Lecture 07 on Dedicated systems Teacher: Giuseppe Scollo University of Catania Department of Mathematics and Computer Science Graduate Course in Computer Science, 2018-19 DMI - Graduate Course in Computer Science Copyleft @ 2019 Giuseppe Scollo 1 di 12 ## Table of Contents - 1. Program design and analysis for dedicated systems - 2. lecture topics - 3. microprocessors, toolchain - 4. from C to (ARM) assembly: an example - 5. object code analysis - 6. data type representation - 7. variables in the memory hierarchy - 8. function calls: an example - 9. stack frame construction - 10. program layout in memory - 11. references DMI - Graduate Course in Computer Science 3 di 12 ## microprocessors, toolchain microprocessor: most successful programmable component over the past decades... why? - separation of software from hardware through definition of an instruction set - wide availability of software tools to support program development, also in high-level languages - highly efficient options of reuse of components and of interoperability with other components, both hardware (standard bus) and software (libraries) - high scalability, e.g. 4-bit up to 64-bit word length, use of a microprocessor as coordination component in a complex SoC architecture, etc. Schaumont, Figure 7.1 - Standard design flow of software source code to processor instruction DMI - Graduate Course in Computer Science 5 di 12 ## object code analysis the example just seen is developped with the GNU cross-compiler arm-linux-gcc, available as a Debian package from the Gezel repository: rijndael.ece.vt.edu/gezel2repo/pool/main/a/arm-linux-gcc the symbolic assembly code is obtained from the C source by the command: /usr/local/arm/bin/arm-linux-gcc -c -S -O2 gcd.c -o gcd.s the command to generate the ARM ELF executable is: /usr/local/arm/bin/arm-linux-gcc -O2 gcd.c -o gcd it is also possible to obtain the symbolic code from the ELF executable by means of a disassembler, in this example with the following command: /usr/local/arm/bin/arm-linux-objdump -d gcd the disassembler output also shows the binary code of each symbolic instruction and the address value of each label the use of this tool, as well as of other utilities which come along with compilers, for executable code analysis will be further explored in lab tutorials DMI — Graduate Course in Computer Science ### data type representation efficient hardware/software codesign requires a simultaneous understanding of both system architecture and software data type representation is a good starting point, compilers are aware of differences in: - ⋟ 🛮 memory size - low-level implementation of operations table 7.1 shows how C maps to the native data types supported by 32-bit processors | C data type | | |-------------|------------------------------------------------------------------| | char | B-bit | | short | signed 16-bit | | int | signed 32-bit | | long | signed 32-bit | | long long | signed 16-bit<br>signed 32-bit<br>signed 32-bit<br>signed 64-bit | Schaumont, Table 7.1 - Compiler data types Big Endia Schaumont, Figure 7.7 (a) - Alignment of data types Schaumont, Figure 7.7 (b) - Little-endian and Big-endian storage order word-based memory organization requires alignment to word boundaries, to perform a word transfer by a single memory access the compiler generates directives to this purpose byte ordering, in some cases even the bit-ordering, is relevant to hardware/software codesign 0x8000 0x8001 0x8002 0x8003 in the transition of software to hardware and back DMI - Graduate Course in Computer Science Copyleft @ 2019 Giuseppe Scollo 7 di 12 DIVIT - Graduate Course in Computer Science ## variables in the memory hierarchy another relevant aspect of data representation is the kind of physical memory they are assigned to Schaumont, Figure 7.8 - Memory hierarchy memory hierarchy is transparent to high-level programs, e.g. written in C, yet the low-level control affects performance; here is an example: ``` void accumulate(int *c, int a[10]) { int i; *c = 0; for (i=0; i<10; i++) *c += a[i]; } ``` /usr/local/arm/bin/arm-linux-gcc -O2 -c -S accumulate.c generates the following code in accumulate.s: ``` mov r3, #0 r3, [r0, #0] str ip, r3 mov .L6: r2, [r1, ip, asl #2] r3, [r0, #0] ip, ip, #1 r3, r3, r2 ; r2 \leftarrow a[i] ; r3 \leftarrow *c (memory) ldr ldr add ; increment loop ctr add ip, #9 r3, [r0, #0] cmp ; r3 \rightarrow *c (memory) str pc, lr movgt ``` in the example, the *value* of the accumulator variable travels up and down in the memory hierarchy in C a limited control is available through use of storage class specifiers and type qualifiers | Storage specifier | 7 ype qualitier | |-------------------|-----------------| | register | const | | static | volatile | | extern | | | | | DMI - Graduate Course in Computer Science .L6 ### function calls: an example ``` function calls are the fundamental structure of behavioural accumulate: ip, sp sp!, {fp, ip, lr, pc} fp, ip, #4 sp, sp, #12 r0, [fp, #-16] hierarchy of programs; here is an example of their translation mov stmfd to machine language sub int accumulate(int a[10]) { sub str ; base address a int i; r3, #0 r3, [fp, #-24] r3, #0 mov int c = 0; for (i=0; i<10; i++) mov r3, [fp, #-20] str c += a[i]; 1.2. return c; ldr r3, [fp, #-20] cmp r3, #9 .L5 ; i<10? int a[10]; ble .L3 int one = 1: .L5: int main() { r3, [fp, #-20] r2, r3, asl #2 ldr ; i * 4 return one + accumulate(a); mov r2, r3, asi #2 r3, [fp, #-16] r3, r2, r3 r2, [fp, #-24] r3, [r3, #0] r3, r2, r3 Schaumont, Listing 7.4 - Sample program *a + 4 * i add ldr ldr compiling this program without optimization shows the creation add : c = c + a[i] of the activation frame within the stack, that is dynamically r3, [fp, #-24] r3, [fp, #-20] r3, r3, #1 str ldr ; update c associated to the function execution to host local variables and add register saving r3, [fp, #-20] : i = i + 1 str in this case, the function parameter and return value are passed in register rO; when several parameters are to be 1.3. passed, then the activation frame is made use of ldr r3, [fp, #-24] ; return arg mov ldmea r0, r3 fp, {fp, sp, pc} the use of the frame pointer (FP) register enables call nesting and recursion Schaumont, Listing 7.6 - Accumulate without compiler optimizations ``` DMI - Graduate Course in Computer Science Copyleft @ 2019 Giuseppe Scollo 9 di 12 ## stack frame construction figure 7.9 shows an assumption about the construction of the activation frame in the stack the SP register points to the full top of the stack, which grows downwards; these conventions are reflected in the fd (full, descending) suffix of the multiple transfer instruction stmfd, saving registers in the stack frame Schaumont, Figure 7.9 - Stack frame construction the restoring of the saved registers and return take place by just one multiple transfer instruction in this case the converse suffix ea (*empty, ascending*) is appropriate, noting that FP, rather than SP, is the base register for the transfer start address ... however, the figure does not correctly reflect the use of these instructions, which conforms to the ARM specifications for multiple transfer instructions the analysis of this problem is deferred to the forthcoming lab tutorial experience ### program layout in memory for the physical representation of the program and its data structures in the memory hierarchy, a distinction is to be made between: - static program layout: organization of the compiler+linker output in an ELF file (or ROM) - dynamic program layout: memory organization of an executable program during execution Schaumont, Figure 7.10 - Static and dynamic program layout - the loader may assign different sections of the ELF program to different kinds of storage - in the dynamic layout, sections appear that are not present in the ELF file, for the storage of dynamic data (stack, heap etc.) DMI - Graduate Course in Computer Science Copyleft @ 2019 Giuseppe Scollo 11 di 12 ### references ## recommended readings: Schaumont, Ch. 7, Sect. 7.1, 7.3 ## for experimentation: installation of the arm-linux-gcc cross-compiler ## for further consultation: Schaumont, Ch. 7, Sect. 7.2 Introduction to the ARM® Processor Using Intel FPGA Toolchain - For Quartus Prime 16.1, Intel Corp. - FPGA University Program, November 2016 VisUAL - A highly visual ARM emulator, by Salman Arif, Imperial College London (2015)