Code Analysis
Overview
• Introduction
• Existing solutions
• Run time errors
• Design
• Implementation
• Future Work
Code Analysis
  Difference between project success & failure.


• If there's going to be a program, there has to be
  construction.
• Code is often the only accurate description of the
  software available.
• Code must follow coding standards and code
  conventions.
Source code Conventions
• 80% of the lifetime cost of a piece of software goes to
  maintenance.
• Hardly any software is maintained for its whole life by
  the original author.
• Code conventions improve the readability of the
  software.
• Source code like any other product should be well
  packaged
Code optimization based analysis
• Code Verification and Run-Time Error prediction at
  compile time using syntax directed translation.
• Predict run time errors without program execution or
  test cases.
• Uses Intermediate Code
Existing Solutions
Possible Run time Errors
1) Detecting uninitialized Variables

   Using variables before they have been initialized by the
 program can cause unpredictable results



2) Detecting Overflows, Underflows, and Divide by Zeros
Consider pseudo-code:

                   X=X/(X-Y)

    Identifying all possible causes for error on the
    operation:

o X and Y may not be initialized

     X-Y may overflow or underflow

 - X and Y may be equal and cause a division by

     zero

 e X/(X–Y) may overflow or underflow

 
All possible values of x & y in
           program p




If the value of x & y both fall on the black line, there is a
                      divide by zero error.
3) Detecting incorrect argument data types and incorrect
    number of arguments

 

• Checking of arguments for type and for the correct order of
    occurrence.

• Requires both the calling program and the called program
    to be compiled with a special compiler option.

• Checks can be made to determine if the number and types
    of arguments in function (and subroutine) calls are consistent
    with the actual function definitions.
4) Detecting errors with strings at run-time

• A string must have a null terminator at the end of the
  meaningful data in the string. A common mistake is to not
  allocate room for this extra character.

   This can also be a problem with dynamic allocation.

       char * copy_str = malloc( strlen(orig_str) + 1);

                    strcpy(copy_str, orig_str);

• The strlen() function returns a count of the data characters
  which does not include the null terminator.

• In the case of dynamic allocation, it might corrupt the heap
 

a.    Detecting   Out-of-bounds     indexing   of   statically   and
     dynamically allocated arrays

     A common run-time error is the reading and writing of arrays
     outside of their declared bounds.



b. Detecting Out-of-Bounds Pointer References

     A common run-time error for C and C++ programs occurs
     when a pointer points to memory outside its associated
     memory block.
Pseudo code for out of bound
               references
for(i=0;i<5;i++)

A[i]=i;

p=A;

for(i=0;i<=5;i++)

p++;

a=*p;

/* out-of-bounds reading using pointers */
5) Detecting Memory Allocation and Deallocation Errors

• A memory deallocation error occurs when a portion of
  memory is deallocated more than once.

• Another common source of errors in C and C++ programs is
  an attempt to use a dangling pointer. A dangling pointer is a
  pointer to storage that is no longer allocated.
6) Detecting Memory Leaks

• A program has a memory leak if during execution the
  program loses its ability to address a portion of memory
  because of a programming error;

• A pointer points to a location in memory and then all the
  pointers pointing to this location are set to point somewhere
  else

• A function/subroutine is called, memory is allocated during
  execution of the function/subroutine, and then the memory
  is not deallocated upon exit and all pointers to this memory
  are destroyed
Source code analyzer predicates

 Reliable: Proven free of run-
    time errors and under all
  operating conditions within
            the scope
  Faulty: Proven faulty each
      time the operation is
            executed.
  Dead: Proven unreachable
   (may indicate a functional
              issue)
  Unproven: Unproven code
section or beyond the scope of
         the analyzer.
Specifications




•Why Java for developing
       analyser?
Specifications




•Why C/C++ as input language?
Design for Code Analyzer
        Input program

            (C File)




                            S
       Lexical Analyzer     y
                            m
                            b
                            o
                            l
                            T
                            a
                            b
                            l
              Parser        e




             IC(SDT)

          Generation




           Run Time
        Error Predictions
Analysis of Code

Input Program

Lexical Analysis-Stream Tokenizer

Parser-
Condition = "(" Expression ("=="|"!="|">"|"<"|">="|"<=")
Expression ")"
Expression = Term {("+"|"-") Term}
Term     = Factor {("*"|"/") Factor}
Factor = number |
          identifier |

Intermediate code generation: Postfix Evaluation
3 address code generation
Target Source File:
                      argument operator operand   operand   result
Test(n){                                1         2

int b,a,n,j;          0        <        j         n

if(j<n)               1        if       0                   gotol0
{
                      2        +        a         b
a=a+b;}
                      3        =        a         2
}
                      l0:
Work Done:
Intermediate Code
Further Work


• Evaluation of intermediate code for performing data
  flow and control flow analysis.
• Prediction of run time errors using intermediate code.
• Using code optimization techniques such as constant
  folding to predict code behavior
REFERENCES
• A V. Aho, R Sethi, J D. Ullman., Compilers: Principles, Techniques and
  Tools, 2nd ed. , Addison-Wesley Pub. Co.
• G R. Luecke, J Coyle, J Hoekstra “A Survey of Systems for Detecting
  Serial Run-Time Errors”, The Iowa State University's High Performance
  Computing Group, Concurrency and Computation. : Practice and
  Experience. 18, 15(Dec. 2006), 1885-1907.
• T Erkkinen, C Hote “Code Verification and Run-Time Error Detection
  Through Abstract Interpretation”, AIAA Modeling and Simulation
  Technologies Conference and Exhibit ,21 - 24 Aug 2006, Keystone,
  Colorado.
• PolySpace Client for C/C++ 6 datasheet. Available HTTP:
  http://www.mathworks.com/products/polyspaceclientc.html.
• D.M. Dhamdhere, Compiler Construction, Tata McGraw-Hill.
• Semantic designs, “Flow analysis for control and data”, Available
  HTTP: http://www.semdesigns.com/Products/DMS/FlowAnalysis.html.

Code Analysis-run time error prediction

  • 1.
  • 2.
    Overview • Introduction • Existingsolutions • Run time errors • Design • Implementation • Future Work
  • 3.
    Code Analysis Difference between project success & failure. • If there's going to be a program, there has to be construction. • Code is often the only accurate description of the software available. • Code must follow coding standards and code conventions.
  • 4.
    Source code Conventions •80% of the lifetime cost of a piece of software goes to maintenance. • Hardly any software is maintained for its whole life by the original author. • Code conventions improve the readability of the software. • Source code like any other product should be well packaged
  • 5.
    Code optimization basedanalysis • Code Verification and Run-Time Error prediction at compile time using syntax directed translation. • Predict run time errors without program execution or test cases. • Uses Intermediate Code
  • 6.
  • 7.
    Possible Run timeErrors 1) Detecting uninitialized Variables Using variables before they have been initialized by the program can cause unpredictable results 2) Detecting Overflows, Underflows, and Divide by Zeros
  • 8.
    Consider pseudo-code: X=X/(X-Y) Identifying all possible causes for error on the operation: o X and Y may not be initialized   X-Y may overflow or underflow  - X and Y may be equal and cause a division by zero  e X/(X–Y) may overflow or underflow  
  • 9.
    All possible valuesof x & y in program p If the value of x & y both fall on the black line, there is a divide by zero error.
  • 10.
    3) Detecting incorrectargument data types and incorrect number of arguments   • Checking of arguments for type and for the correct order of occurrence. • Requires both the calling program and the called program to be compiled with a special compiler option. • Checks can be made to determine if the number and types of arguments in function (and subroutine) calls are consistent with the actual function definitions.
  • 11.
    4) Detecting errorswith strings at run-time • A string must have a null terminator at the end of the meaningful data in the string. A common mistake is to not allocate room for this extra character. This can also be a problem with dynamic allocation. char * copy_str = malloc( strlen(orig_str) + 1); strcpy(copy_str, orig_str); • The strlen() function returns a count of the data characters which does not include the null terminator. • In the case of dynamic allocation, it might corrupt the heap
  • 12.
      a. Detecting Out-of-bounds indexing of statically and dynamically allocated arrays   A common run-time error is the reading and writing of arrays outside of their declared bounds. b. Detecting Out-of-Bounds Pointer References   A common run-time error for C and C++ programs occurs when a pointer points to memory outside its associated memory block.
  • 13.
    Pseudo code forout of bound references for(i=0;i<5;i++) A[i]=i; p=A; for(i=0;i<=5;i++) p++; a=*p; /* out-of-bounds reading using pointers */
  • 14.
    5) Detecting MemoryAllocation and Deallocation Errors • A memory deallocation error occurs when a portion of memory is deallocated more than once. • Another common source of errors in C and C++ programs is an attempt to use a dangling pointer. A dangling pointer is a pointer to storage that is no longer allocated.
  • 15.
    6) Detecting MemoryLeaks • A program has a memory leak if during execution the program loses its ability to address a portion of memory because of a programming error; • A pointer points to a location in memory and then all the pointers pointing to this location are set to point somewhere else • A function/subroutine is called, memory is allocated during execution of the function/subroutine, and then the memory is not deallocated upon exit and all pointers to this memory are destroyed
  • 16.
    Source code analyzerpredicates Reliable: Proven free of run- time errors and under all operating conditions within the scope Faulty: Proven faulty each time the operation is executed. Dead: Proven unreachable (may indicate a functional issue) Unproven: Unproven code section or beyond the scope of the analyzer.
  • 17.
    Specifications •Why Java fordeveloping analyser?
  • 18.
  • 19.
    Design for CodeAnalyzer Input program (C File) S Lexical Analyzer y m b o l T a b l Parser e IC(SDT) Generation Run Time Error Predictions
  • 20.
    Analysis of Code InputProgram Lexical Analysis-Stream Tokenizer Parser- Condition = "(" Expression ("=="|"!="|">"|"<"|">="|"<=") Expression ")" Expression = Term {("+"|"-") Term} Term = Factor {("*"|"/") Factor} Factor = number | identifier | Intermediate code generation: Postfix Evaluation
  • 21.
    3 address codegeneration Target Source File: argument operator operand operand result Test(n){ 1 2 int b,a,n,j; 0 < j n if(j<n) 1 if 0 gotol0 { 2 + a b a=a+b;} 3 = a 2 } l0:
  • 22.
  • 23.
    Further Work • Evaluationof intermediate code for performing data flow and control flow analysis. • Prediction of run time errors using intermediate code. • Using code optimization techniques such as constant folding to predict code behavior
  • 24.
    REFERENCES • A V.Aho, R Sethi, J D. Ullman., Compilers: Principles, Techniques and Tools, 2nd ed. , Addison-Wesley Pub. Co. • G R. Luecke, J Coyle, J Hoekstra “A Survey of Systems for Detecting Serial Run-Time Errors”, The Iowa State University's High Performance Computing Group, Concurrency and Computation. : Practice and Experience. 18, 15(Dec. 2006), 1885-1907. • T Erkkinen, C Hote “Code Verification and Run-Time Error Detection Through Abstract Interpretation”, AIAA Modeling and Simulation Technologies Conference and Exhibit ,21 - 24 Aug 2006, Keystone, Colorado. • PolySpace Client for C/C++ 6 datasheet. Available HTTP: http://www.mathworks.com/products/polyspaceclientc.html. • D.M. Dhamdhere, Compiler Construction, Tata McGraw-Hill. • Semantic designs, “Flow analysis for control and data”, Available HTTP: http://www.semdesigns.com/Products/DMS/FlowAnalysis.html.