Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static code analysis #911

Open
gabibguti opened this issue Sep 21, 2023 · 8 comments
Open

Static code analysis #911

gabibguti opened this issue Sep 21, 2023 · 8 comments

Comments

@gabibguti
Copy link
Contributor

Hey! Do you already use a static code analysis tool? Also known as SAST.

SAST is used to identify security vulnerabilities in your source code. Vulnerabilities such as buffer overflow where attackers can modify the application execution by writing to memory. Different than Fuzzing, where you have to setup your test cases, SAST tools have their own set of test cases that they'll check against your code.

So, adding SAST helps keep your code safe from vulnerabilities, but I understand it comes along additional work to handle the reports.

Referencing here some SAST options I found for Fortran:

Additional Context

Hi! I'm Gabriela and I work on behalf of Google and the OpenSSF suggesting supply-chain security changes :)

@weslleyspereira
Copy link
Collaborator

Hi @gabibguti. Thanks for the information and for doing the research on SAST options for Fortran. I believe we don't use any SAST currently.

@ACSimon33
Copy link
Contributor

@weslleyspereira I looked at i-CodeCNES and it seems it suggests a lot of changes which are probably a bit unnecessary, e.g. avoiding multiple RETURN statements in routines (see output below). I'll let it run on the entire codebase (it's pretty slow) and collect all types of warnings it produces. Then we can discuss which of those if any really matter.

On the topic of code analysis, do we use Gfortran's sanitizers somewhere (ASAN, UBSAN, MSAN, TSAN)? If not, we (NAG) have some CMake modules which I could add to LAPACK's CMake system and enable those checks in the pipelines.

icode output on caxpy.f:

<?xml version="1.0" encoding="UTF-8"?>
<analysisProject analysisProjectName="undefined" analysisProjectVersion="undefined">
  <analysisInformations analysisConfigurationId="undefined" analysisDate="2023-09-29" author="i-Code CNES Analyzer" icodeVersion="4.1.2" />
  <analysisFile language="fr.cnes.icode.fortran77" fileName="lapack-3.11/BLAS/SRC/caxpy.f" />
  <analysisRule analysisRuleId="COM.DATA.FloatCompare">
    <result resultId="1" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="111" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>It's not allowed to compare float variables (SCABS1) with equality.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.DATA.Invariant">
    <result resultId="2" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="107" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>The variable SCABS1 must be defined as constant.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.FLOW.Exit">
    <result resultId="3" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="111" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>There is more than one exit in the function.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.FLOW.Exit">
    <result resultId="4" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="135" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>There is more than one exit in the function.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.INST.Brace">
    <result resultId="5" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="112" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>Parentheses are needed for readability.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.INST.Brace">
    <result resultId="6" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="117" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>Parentheses are needed for readability.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.INST.Brace">
    <result resultId="7" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="126" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>Parentheses are needed for readability.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.INST.Brace">
    <result resultId="8" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="126" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>Parentheses are needed for readability.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.INST.Brace">
    <result resultId="9" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="127" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>Parentheses are needed for readability.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.INST.Brace">
    <result resultId="10" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="127" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>Parentheses are needed for readability.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="COM.INST.Brace">
    <result resultId="11" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="129" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>Parentheses are needed for readability.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="F77.INST.Return">
    <result resultId="12" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="110" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>The instruction RETURN(i) is not allowed.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="F77.INST.Return">
    <result resultId="13" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="111" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>The instruction RETURN(i) is not allowed.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="F77.INST.Return">
    <result resultId="14" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="135" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY">
      <resultMessage>The instruction RETURN(i) is not allowed.</resultMessage>
    </result>
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.ComplexitySimplified">
    <result resultId="15" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE  CAXPY" resultValue="8.0" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.ComplexitySimplified">
    <result resultId="16" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="NaN" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.Nesting">
    <result resultId="17" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE  CAXPY" resultValue="2.0" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.Nesting">
    <result resultId="18" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="NaN" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.LineOfCode">
    <result resultId="19" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE  CAXPY" resultValue="26.0" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.LineOfCode">
    <result resultId="20" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="26.0" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.RatioComment">
    <result resultId="21" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY" resultValue="52.941177" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.RatioComment">
    <result resultId="22" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="NaN" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.LineOfComment">
    <result resultId="23" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="87" resultTypePlace="method" resultNamePlace="SUBROUTINE CAXPY" resultValue="27.0" />
  </analysisRule>
  <analysisRule analysisRuleId="F77.MET.LineOfComment">
    <result resultId="24" fileName="lapack-3.11/BLAS/SRC/caxpy.f" resultLine="0" resultTypePlace="class" resultValue="113.0" />
  </analysisRule>
</analysisProject>

@weslleyspereira
Copy link
Collaborator

This is all great @ACSimon33. Thanks a lot for the help on this topic!

@weslleyspereira I looked at i-CodeCNES and it seems it suggests a lot of changes which are probably a bit unnecessary, e.g. avoiding multiple RETURN statements in routines (see output below). I'll let it run on the entire codebase (it's pretty slow) and collect all types of warnings it produces. Then we can discuss which of those if any really matter.

I was already expecting these tools would suggest changes that would involve too much work on our side. Thanks for digging through it. It would be great to have a list of those suggestions. Then we could decide what we want to apply.

On the topic of code analysis, do we use Gfortran's sanitizers somewhere (ASAN, UBSAN, MSAN, TSAN)? If not, we (NAG) have some CMake modules which I could add to LAPACK's CMake system and enable those checks in the pipelines.

I believe we don't use any code sanitizer either. Please, feel welcomed to try them on! I recall we had a discussion about the usage of Valgrind tool with -fsanitize=address in #551. Related PR #844.

icode output on caxpy.f:

[...]

Oh! This is a lot of information for a single file. I can imagine how slow the complete process can be. Thanks for sharing!

@ACSimon33
Copy link
Contributor

@weslleyspereira I tried some of the static analysis tools now and here are some of the results.

i-CodeCNES

Pro: Free, opensource, easy to install and use
Con: no summary of results

Warnings (only in Reference BLAS)

  • "It's not allowed to compare float variables (...) with equality." (526)
  • "The variable ... must be defined as constant." (256)
  • "There is more than one exit in the function." (304)
  • "Parentheses are needed for readability." (1595)
  • "The IF instruction shall finish with an ELSE after the last ELSE IF." (155)
  • "Error in the following parameters: '...' variable belongs to parameter types forbidden when calling a function: a constant, an expression to be evaluated, a call to another function." (96)
  • "The instruction RETURN(i) is not allowed." (403)
  • "Mixed type COMPLEX with REAL/DOUBLE" (102)
  • "Using more than five conditions in an expression is not allowed." (32)
  • "Commented code is not allowed. It shall be suppressed." (1)
  • "It should be used the generic name of the intrinsic function instead of [ICHAR, DABS]" (7)
  • "Return code used in arithmetical statement." (72)
  • "COMPLEX*16 is not a basic type. Basic types are INTEGER, REAL, DOUBLE PRECISION, COMPLEX, LOGICAL and CHARACTER." (126)
  • "Mixed type COMPLEX with COMPLEX" (64)
  • "The code is not indented" (78)
  • "The use of labels is not allowed except with the instructions FORMAT and CONTINUE" (15)
  • "The use of * with logical units is not allowed." (1)
  • "The keyword STOP is not allowed." (1)

In the LAPACK sources there are mostly the same warnings but approx. 30 times as much. In my opinion there is nothing in here that is really a serious issue.

plusFORT

Pro: Easy to install and use, summary of results, free for non-commercial projects
Con: Problems with f90 files, bit picky about the amount of source files passed to it (crashes if I try to analyze the entire repo at once)

Warnings in BLAS sources

382 occurrences of Unreliable test for equality of reals

Warnings in LAPACK sources

    4 occurrences of Unreachable or redundant statement
 1698 occurrences of Unreliable test for equality of reals
   17 occurrences of Argument is not used
  216 occurrences of PARAMETER is not used
   49 occurrences of Variable assigned a value but not used
  227 occurrences of INTRINSIC functn name used for variable

Warnings in complex LAPACK sources

  1090 occurrences of Unreliable test for equality of reals
     3 occurrences of Subprogram argument inconsistency
     2 occurrences of Variable declared but not used
     4 occurrences of Argument is not used
   270 occurrences of PARAMETER is not used
    43 occurrences of Variable assigned a value but not used
   202 occurrences of INTRINSIC functn name used for variable

Warnings in BLAS TESTING sources

   112 occurrences of Unreliable test for equality of reals
    32 occurrences of Argument is not used
   106 occurrences of INTRINSIC functn name used for variable

Warnings in LAPACK TESTING sources

  1100 occurrences of Unreliable test for equality of reals
    26 occurrences of Variable declared but not used
   120 occurrences of Argument is not used
    40 occurrences of COMMON block is not used
    29 occurrences of PARAMETER is not used
   131 occurrences of Variable assigned a value but not used
   114 occurrences of INTRINSIC functn name used for variable

I think the interesting warnings here are those about Unreachable or redundant statement, Subprogram argument inconsistency and INTRINSIC functn name used for variable. The others are again very superficial and not really a security issue.

fpt

Pro: Nice summaries, a lot of features
Con: Commercial, very complex and difficult to use

Warnings in BLAS and LAPACK

Notes
=====

 No.  Description                                                 Count Contrib

1271 Non-standard Fortran intrinsic(s) used as local identifier(s)      2     0
1273 Fortran auxiliary keyword used as identifier name.               152     0
1281 Fortran intrinsic name used as an array name                      13     0
2323 Duplicate source code for the same sub-program name.              26     0
2447 No main program encountered                                        1     0
2493 INTENT declared OUT but argument is read:                          4     0
4409 Unable to verify read access status of this argument               4     0


Warnings
========

 No.  Description                                                 Count Contrib

1269 Fortran keyword used as local identifier name.                    58    58
2161 File renamed to avoid name conflict.                              18    18
2345 Data type of PARAMETER does not match that of the expression      20    20
2517 ANSI FORTRAN 77 intrinsic used as a local identifier              27    27
3047 Fortran 90 intrinsic name used as an identifier                  418   418
3085 Objects of .EQ., .NE. .GT. etc. are of different data types      174   174
3087 REAL or COMPLEX quantity tested for exact equality/inequality   3880  3880
3089 Comparison of REAL or COMPLEX values which differ in precision     1     1
3295 Parameter will not have the expected precision                    18    18
3437 Mixed real or complex sizes in expression - loss of precision      2     2


Errors
======

 No.  Description                                                 Count Contrib

4415 INTENT(OUT/INOUT) argument - routine may write to an expression   14   140

Warnings in TESTING sources

Notes
=====

 No.  Description                                                 Count Contrib

1271 Non-standard Fortran intrinsic(s) used as local identifier(s)     50     0
1273 Fortran auxiliary keyword used as identifier name.              1208     0
1281 Fortran intrinsic name used as an array name                      22     0
1867 Duplicate main program                                            17     0
2241 References are made to sub-programs which have not been read.      1     0
2323 Duplicate source code for the same sub-program name.             145     0
2449 Unused sub-programs encountered                                    1     0
2451 Unused primary files encountered                                   1     0
3921 This symbol is already declared to be SAVE or STATIC               5     0


Warnings
========

 No.  Description                                                 Count Contrib

1269 Fortran keyword used as local identifier name.                   261   261
1711 Named COMMON block varies in size between sub-programs            16    16
1901 Input line longer than 72 characters. Line truncated.              1     1
2161 File renamed to avoid name conflict.                              21    21
2185 White space was removed from identifier name or keyword.           1     1
2345 Data type of PARAMETER does not match that of the expression      11    11
2517 ANSI FORTRAN 77 intrinsic used as a local identifier              76    76
3047 Fortran 90 intrinsic name used as an identifier                  102   102
3085 Objects of .EQ., .NE. .GT. etc. are of different data types      106   106
3087 REAL or COMPLEX quantity tested for exact equality/inequality   1480  1480
3089 Comparison of REAL or COMPLEX values which differ in precision     6     6
3295 Parameter will not have the expected precision                    37    37
3437 Mixed real or complex sizes in expression - loss of precision     22    22
3493 Declarations follow executable statements                          8     8


Errors
======

 No.  Description                                                 Count Contrib

1823 Error in use of '.' in number, operator or structure.              2    20
2025 Unable to interpret use of '.' or '%' character                    2    20
3039 Data type of this function conflicts with prior use               32   320

Here again, the Fortran keyword used as local identifier name might be problematic as well as Parameter will not have the expected precision and Mixed real or complex sizes in expression - loss of precision.

Overall I think we can ignore most of these warnings, especially those in the testing frameworks, but I'd like to here other opinions.

@gabibguti
Copy link
Contributor Author

Sharing my point of view on i-CodeCNES:

This tool implements a set of coding rules defined by CNES and are mostly generic coding "good-practice rules". What we want to know is if rule XYZ implies in a security vulnerability or not. But, these rules cannot be directly mapped to known vulnerability systems, such as those defined by CWE, so it's harder to understand the security implications of violating a rule. I will try to give a few examples given the warnings found.

So, all of these rules are "good-practice rules" and there's no security harm in violating them if the problem does not reach a relevant operation in a relevant context, such as doing a wrong calculation when performing an authorization or entering a memory buffer overflow when manipulating user data. But, if is a relevant operation in a relevant context then the rule is important.

This is meant to explain how the warnings can represent serious issues, but really depends on the context the warning is being raised.

@christoph-conrads
Copy link
Contributor

christoph-conrads commented Oct 22, 2023

Overall I think we can ignore most of these warnings, especially those in the testing frameworks, but I'd like to here other opinions.

Based on icode_BLAS.log posted earlier, I took a look at the function ZHERK because the warnings for this function are printed first starting in line 148. The routine computes C = α A A^* + β C for caller-provided scalars α, β and matrices A, B

Diagnostic message Line numbers Comment
It's not allowed to compare float variables (X) with equality. 246, 248, 263, 289, 293, 313, 317, 347, 357, 369, 379 These warnings refer to comparisons of α and β with zero or one, respectively, for the obvious optimizations.
It's not allowed to compare float variables (X) with equality. 302, 326 Entries of the matrix A are compared with zero to avoid unnecessary operations.
The variable must be defined as constant. 180, 181 gfortran 12 does not complain and my Fortran is not good enough to see how the variables LDA, LDC, TRANSPOSE can be defined as parameters.
Using more than five conditions in an expression is not allowed. 241-242 Code with line break removed: IF ((N.EQ.0) .OR. ((ALPHA.EQ.ZERO).OR.(K.EQ.0)).AND. (BETA.EQ.ONE))) RETURN
Return code used in arithmetical statement. 345, 355, 367, 377 The return value of the intrinsic DCONJG is used in the expression TEMP = TEMP + DCONJG(A(L, I)) * A(L, J). i-CodeCNES thinks the return value should be checked by default (see the description of COM.FLOW.CheckCodeReturn in its documentation).
There is more than one exit in the function. 242 Early return when there is nothing to do beyond input validation.
There is more than one exit in the function. 278 Early return for α = 0.
There is more than one exit in the function. 389 The return statement in the last line of executable code.
Parentheses are needed for readability. 305, 330, 345, 360, 372, 377, 382 These lines contain statements of the form TEMP = TEMP + DCONJG(A(L, I)) * A(L, J) or C(I, J) = x * y + beta * f(c(j, i)).
Mixed type COMPLEX with COMPLEX 257, 295, 303, 320, 327, 348, 350, 380, 382 Is this a i-CodeCNES bug? Its documentation says about COM.TYPE.Expression: "in a expression [...] all variables should have the same type".
Mixed type COMPLEX with DOUBLE PRECISION 259, 271, 273, 297, 307, 318, 328, 360, 370, 372 IMO a false positive for LAPACK.
The IF instruction shall finish with an ELSE after the last ELSE IF. 322 This warning concerns the if block checking the validity of each argument and setting INFO to the index of the first invalid argument. That is, I do not see the need for an ELSE branch here.
Error in the following parameters: 'ZHERK ' variable belongs to parameter types forbidden when calling a function: a constant, an expression to be evaluated, a call to another function 235 Code: XERBLA('ZHERK ',INFO)
The instruction RETURN(i) is not allowed. 236, 242, 278, 389 The statement in each of those lines is RETURN.
COMPLEX*16 is not a basic type. Basic types are INTEGER, REAL, DOUBLE PRECISION, COMPLEX, LOGICAL and CHARACTER. 184, 200 This warning must be disabled for double-precision complex code.

The i-CodeCNES documentation lists the different warnings but does not always clarify why they exist. For example, why is passing constants to a function (line 235) being warned about? Without this reasoning I cannot identify a single useful warning for ZHERK.

@christoph-conrads
Copy link
Contributor

Overall I think we can ignore most of these warnings, especially those in the testing frameworks, but I'd like to here other opinions.

The i-CodeCNES documentation mentions three Fortran-specific warnings that are disabled by default but they seem useless for non-testing code since LAPACK neither opens files nor allocates memory.

Rule Description
F77.BLOC.File File access whould be done using OPEN and CLOSE instructions.
F90.DESIGN.Free Allocated memory should be free in the same conceptual level.
F90.INST.Nullify After deallocate, the use of NULLIFY into the same logical unit is mandatory DEALLOCATE (C, stat = iom ) NULLIFY ( C )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants