Flaky analysis of intrinsic functions to ValIntrinsic #190

RaoulHC · 2021-11-05T12:49:36Z

Had discussed this with @raehik before, but this is a stripped down minimal working case of the analysis transformations
for ExpFunctionCall name nodes to ValIntrinsics being flaky and seems to be effected by completely unrelated lines.

The following code:

import qualified Data.ByteString.Char8         as BC
import           Text.PrettyPrint.GenericPretty
import           Language.Fortran.Analysis
import           Language.Fortran.Analysis.Types
import           Language.Fortran.ParserMonad
import           Language.Fortran.Parser.Fortran77

main :: IO ()
main = do
  let path = "file.f"
  contents <- BC.readFile path
  pf       <-
    fromRight
    .   fromParseResult
    <$> legacy77ParserWithIncludes ["./"] contents path
  pp $ fst . analyseTypes $ initAnalysis pf

Used to parse and print the following Fortran:

C     file.f

C     Removing any line in this file causes the int to be marked as an intrinsic
C     This includes all lines in the include file too 
      SUBROUTINE func()
      include 'include.inc'
      INTEGER ICLEN
      INTEGER STR(7)
      CHARACTER CSTR*28
      INTEGER I,J,K,IN
      character inc4(4)
      CHARACTER*(*) RCSid

      iclen = len(ichar2)

      IDATE(1)=IN-INT(z'0030', 4)

      END

C     include.inc

C This is a real life header that seems in combination with the file causes the
C intrinsic int to be parsed an ValVariable instead of ValIntrinsic
      integer*2 int2
      LOGICAL BTESST
      INTEGER*4 ILBYTE
      character*12 itoc
      automatic int201,int202,int203,int204,x_01,x_02

Will return the INT node as:

...
(ExpFunctionCall (idType: IDType {idVType = Nothing, idCType = Nothing})
                 (14:19)-(14:33)
                 (ExpValue (idType: IDType {idVType = Nothing, idCType = Just CTIntrinsic})
                 (14:19)-(14:21)
                 (ValVariable "int"))
...

Deleting any line in the include file or fortran file will result in it being parsed as:

...
(ExpFunctionCall (idType: IDType {idVType = Just (TInteger 4), idCType = Nothing})
                 (14:19)-(14:33)
                 (ExpValue (idType: IDType {idVType = Nothing, idCType = Just CTIntrinsic})
                 (14:19)-(14:21)
                 (ValIntrinsic "int"))
...

Reproduced this with b322458. Don't think this issue was present prior to 0.5.0 but not sure exactly where it was introduced.

The text was updated successfully, but these errors were encountered:

dorchard · 2021-11-08T11:49:16Z

heats up git bisect. @raehik on the case!

raehik · 2021-11-19T13:59:57Z

I appear to get the same behaviour on 0.4.0, 74c7bb0 (also tested on 0.4.3). Note that the type analysis was updated in 0.5.0 .

The rules for getting it to result in a ValIntrinsic instead of a ValVariable seem very arbitrary. Most non-executable statement changes I tried "fixed" the problem, returning a ValIntrinsic. Executable statement changes didn't make a difference. I did observe that adding another INT() statement would have it share the same behaviour as the problematic statement.

Swapping the order of the two executable statements in file.f also fixes the issue. (The len() intrinsic call always stays a ValIntrinsic.) What is going on??

raehik · 2021-11-19T16:43:36Z

The problem appears to be specific to the int intrinsic. I can add a new intrinsic e.g. intt to Intrinsics.fortran77intrinsics and append a copy of the final statement IDATE(1)=IN-INT(z'0030', 4) using IN-INTT instead, and it prints ValIntrinsic for intt, still ValVariable for int.

raehik · 2021-11-24T18:18:52Z

ValVariables are rewritten to ValIntrinsics in a transformation Transformation.Disambiguation.Intrinsic.disambiguateIntrinsic. It maps over every ExpValue ValVariable (every variable), rewrapping them with ValIntrinsic iff its recorded IDType is CTIntrinsic. Upon testing, I see it only matches for len(), not INT(), until you make a change. However, the printed output seems to always show CTIntrinsic. So I assume the initial "implicit" analysis the transformations use gets something wrong somehow, and the explicit one we ask for correctly infers INT() as an intrinsic. *That doesn't propagate to the AST, because the transformations only run before the explicit type analysis.

Moving the include anywhere fixes the issue (while all other non-exec statement moves do nothing). Wonder if it's related to include inlining? (Yet to find any sort of useful pattern, though.)

RaoulHC · 2021-11-24T18:35:16Z

I can't see how the include inlining would effect it in any way, it's pretty simple (defined here) and won't effect any statement that isn't StInclude. The only way I can see it affected this would be if it did its own definition of int in one of them.

It does sound like the this initial analysis is where the problem lies. If it's not, the only other place I can think of is if this is a bug in uniplate's transformBi that's somehow causing expressions to be occasionally missed.

raehik · 2021-11-24T21:45:38Z

I've done more tests. I don't know why, but the disambiguator receives INT() tagged as CTVariable. I tried matching my type analysis to how it gets called in the transformer, but it doesn't have the same issue (it always gets tagged correctly as CTIntrinsic). Something isn't working as expected during the initial analysis or the transformations.

raehik · 2021-11-24T21:54:22Z

Huh. Changing integer*2 int2 to integer*8 int2 also changes what gets recorded for INT()'s IDType - it erroneously gets a VType stored, which corresponds to int2's type. Thinking this is to do with naming/renaming...

raehik · 2021-11-24T22:19:56Z

Here's a MWE:

      subroutine main
      integer*1 x, int1, a1, a2, a3, a4, a5, a6, a7, a8, a9
      x = INT(int1, 2)
      end

This (incorrectly) parses the INT(int1, 2) call as ValVariable "int". Changing the number of declarations following int2 causes it to (correctly) parse as ValIntrinsic "int".

raehik · 2021-11-24T22:43:31Z

The problem is that during renaming, int1 is the 3rd binder processed, so gets uniqueName: main_int13. After all the other binders, INT ends up being the 13th binder processed. So it gets uniqueName: main_int13. Issues no doubt arise from that. I'm not sure why this shadowing isn't discovered by the renamer though.

raehik · 2021-11-25T12:47:10Z

Analysis.Renaming.renameExp is defined as:

-- Rename an ExpValue variable, assuming that it is to be treated as a
-- reference to a previous declaration, possibly in an outer scope.
renameExp :: Data a => RenamerFunc (Expression (Analysis a))
renameExp e@(ExpValue _ _ (ValVariable v))  = maybe e (`setUniqueName` setSourceName v e) `fmap` getFromEnvs v
-- Intrinsics get unique names for each use.
renameExp e@(ExpValue _ _ (ValIntrinsic v)) = flip setUniqueName (setSourceName v e) `fmap` addUnique v NTIntrinsic
renameExp e                                 = return e

At the point the renamer runs, the intrinsic transformation pass hasn't yet run, so we only need to look at the ValVariable case. The variable gets its unique name set via getFromEnvs, which doesn't find it because INT hasn't been declared. It then calls getIntrinsicReturnType, which finds that INT is an intrinsic. The renamer then creates a fresh binder in addUnique. The shadowing occurs here: addUnique doesn't check that its unique name is truly unique. It turns out in this case it isn't, and this causes problems down the road where unique names are expected to fully identify a variable.

We should be able to solve this with a check in addUnique to continually generate names until it finds one that isn't yet present.

RaoulHC · 2021-11-25T14:30:38Z

Good debugging!

I would favour a solution of changing the format of unique names in a way that avoids this altogether as opposed to checking whether it's present or not. There might be something I've not considered but I think putting an underscore between the symbol and the unique number would be sufficient. Aiui this is only an issue for symbols that end in a number.

raehik · 2021-11-25T15:23:23Z

There might be something I've not considered but I think putting an underscore between the symbol and the unique number would be sufficient. Aiui this is only an issue for symbols that end in a number.

Good point, agreed. That should rule out this class of ambiguity bug where two different symbols generate the same unique name. On first thought I wondered if it wouldn't work for intrinsics like INT_() but 1) on a proper look it seems safe and 2) no intrinsics look like that. The only downside is slightly noisier binders, and I note that other compilers and debuggers often go the number appending route. But definitely preferred over a (possibly expensive) uniqueness check.

I'll make a PR updating the unique name schema & check it over with Dominic on Monday. Amusing bug, cheers @RaoulHC for the report and help!

Previously, symbols like int and int1 could have the same unique name generated for them, depending on the fresh number used e.g. n=12: int -> int12; n=2: int1 -> int12. This can break things further up that expect unique names to be, well, unique. Adding a separator between the variable and the fresh number prevents this type of edge case from occurring. Also see the GitHub discussion for further info: #190

raehik · 2021-11-29T12:17:39Z

Discussed & above fix applied.

raehik mentioned this issue Nov 25, 2021

fix renaming edge case ambiguity #195

Merged

raehik closed this as completed Nov 29, 2021

RaoulHC mentioned this issue Nov 21, 2023

Use '.' to separate scope from variable name instead of '_'. #280

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky analysis of intrinsic functions to ValIntrinsic #190

Flaky analysis of intrinsic functions to ValIntrinsic #190

RaoulHC commented Nov 5, 2021

dorchard commented Nov 8, 2021 •

edited

Loading

raehik commented Nov 19, 2021

raehik commented Nov 19, 2021

raehik commented Nov 24, 2021 •

edited

Loading

RaoulHC commented Nov 24, 2021

raehik commented Nov 24, 2021

raehik commented Nov 24, 2021 •

edited

Loading

raehik commented Nov 24, 2021

raehik commented Nov 24, 2021

raehik commented Nov 25, 2021

RaoulHC commented Nov 25, 2021 •

edited

Loading

raehik commented Nov 25, 2021

raehik commented Nov 29, 2021

Flaky analysis of intrinsic functions to ValIntrinsic #190

Flaky analysis of intrinsic functions to ValIntrinsic #190

Comments

RaoulHC commented Nov 5, 2021

dorchard commented Nov 8, 2021 • edited Loading

raehik commented Nov 19, 2021

raehik commented Nov 19, 2021

raehik commented Nov 24, 2021 • edited Loading

RaoulHC commented Nov 24, 2021

raehik commented Nov 24, 2021

raehik commented Nov 24, 2021 • edited Loading

raehik commented Nov 24, 2021

raehik commented Nov 24, 2021

raehik commented Nov 25, 2021

RaoulHC commented Nov 25, 2021 • edited Loading

raehik commented Nov 25, 2021

raehik commented Nov 29, 2021

dorchard commented Nov 8, 2021 •

edited

Loading

raehik commented Nov 24, 2021 •

edited

Loading

raehik commented Nov 24, 2021 •

edited

Loading

RaoulHC commented Nov 25, 2021 •

edited

Loading