Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky analysis of intrinsic functions to ValIntrinsic #190

Closed
RaoulHC opened this issue Nov 5, 2021 · 13 comments
Closed

Flaky analysis of intrinsic functions to ValIntrinsic #190

RaoulHC opened this issue Nov 5, 2021 · 13 comments

Comments

@RaoulHC
Copy link
Collaborator

RaoulHC commented Nov 5, 2021

Had discussed this with @raehik before, but this is a stripped down minimal working case of the analysis transformations
for ExpFunctionCall name nodes to ValIntrinsics being flaky and seems to be effected by completely unrelated lines.

The following code:

import qualified Data.ByteString.Char8         as BC
import           Text.PrettyPrint.GenericPretty
import           Language.Fortran.Analysis
import           Language.Fortran.Analysis.Types
import           Language.Fortran.ParserMonad
import           Language.Fortran.Parser.Fortran77

main :: IO ()
main = do
  let path = "file.f"
  contents <- BC.readFile path
  pf       <-
    fromRight
    .   fromParseResult
    <$> legacy77ParserWithIncludes ["./"] contents path
  pp $ fst . analyseTypes $ initAnalysis pf

Used to parse and print the following Fortran:

C     file.f

C     Removing any line in this file causes the int to be marked as an intrinsic
C     This includes all lines in the include file too 
      SUBROUTINE func()
      include 'include.inc'
      INTEGER ICLEN
      INTEGER STR(7)
      CHARACTER CSTR*28
      INTEGER I,J,K,IN
      character inc4(4)
      CHARACTER*(*) RCSid

      iclen = len(ichar2)

      IDATE(1)=IN-INT(z'0030', 4)

      END
C     include.inc

C This is a real life header that seems in combination with the file causes the
C intrinsic int to be parsed an ValVariable instead of ValIntrinsic
      integer*2 int2
      LOGICAL BTESST
      INTEGER*4 ILBYTE
      character*12 itoc
      automatic int201,int202,int203,int204,x_01,x_02

Will return the INT node as:

...
(ExpFunctionCall (idType: IDType {idVType = Nothing, idCType = Nothing})
                 (14:19)-(14:33)
                 (ExpValue (idType: IDType {idVType = Nothing, idCType = Just CTIntrinsic})
                 (14:19)-(14:21)
                 (ValVariable "int"))
...

Deleting any line in the include file or fortran file will result in it being parsed as:

...
(ExpFunctionCall (idType: IDType {idVType = Just (TInteger 4), idCType = Nothing})
                 (14:19)-(14:33)
                 (ExpValue (idType: IDType {idVType = Nothing, idCType = Just CTIntrinsic})
                 (14:19)-(14:21)
                 (ValIntrinsic "int"))
...

Reproduced this with b322458. Don't think this issue was present prior to 0.5.0 but not sure exactly where it was introduced.

@dorchard
Copy link
Member

dorchard commented Nov 8, 2021

heats up git bisect. @raehik on the case!

@raehik
Copy link
Collaborator

raehik commented Nov 19, 2021

I appear to get the same behaviour on 0.4.0, 74c7bb0 (also tested on 0.4.3). Note that the type analysis was updated in 0.5.0 .

The rules for getting it to result in a ValIntrinsic instead of a ValVariable seem very arbitrary. Most non-executable statement changes I tried "fixed" the problem, returning a ValIntrinsic. Executable statement changes didn't make a difference. I did observe that adding another INT() statement would have it share the same behaviour as the problematic statement.

Swapping the order of the two executable statements in file.f also fixes the issue. (The len() intrinsic call always stays a ValIntrinsic.) What is going on??

@raehik
Copy link
Collaborator

raehik commented Nov 19, 2021

The problem appears to be specific to the int intrinsic. I can add a new intrinsic e.g. intt to Intrinsics.fortran77intrinsics and append a copy of the final statement IDATE(1)=IN-INT(z'0030', 4) using IN-INTT instead, and it prints ValIntrinsic for intt, still ValVariable for int.

@raehik
Copy link
Collaborator

raehik commented Nov 24, 2021

ValVariables are rewritten to ValIntrinsics in a transformation Transformation.Disambiguation.Intrinsic.disambiguateIntrinsic. It maps over every ExpValue ValVariable (every variable), rewrapping them with ValIntrinsic iff its recorded IDType is CTIntrinsic. Upon testing, I see it only matches for len(), not INT(), until you make a change. However, the printed output seems to always show CTIntrinsic. So I assume the initial "implicit" analysis the transformations use gets something wrong somehow, and the explicit one we ask for correctly infers INT() as an intrinsic. *That doesn't propagate to the AST, because the transformations only run before the explicit type analysis.

Moving the include anywhere fixes the issue (while all other non-exec statement moves do nothing). Wonder if it's related to include inlining? (Yet to find any sort of useful pattern, though.)

@RaoulHC
Copy link
Collaborator Author

RaoulHC commented Nov 24, 2021

I can't see how the include inlining would effect it in any way, it's pretty simple (defined here) and won't effect any statement that isn't StInclude. The only way I can see it affected this would be if it did its own definition of int in one of them.

It does sound like the this initial analysis is where the problem lies. If it's not, the only other place I can think of is if this is a bug in uniplate's transformBi that's somehow causing expressions to be occasionally missed.

@raehik
Copy link
Collaborator

raehik commented Nov 24, 2021

I've done more tests. I don't know why, but the disambiguator receives INT() tagged as CTVariable. I tried matching my type analysis to how it gets called in the transformer, but it doesn't have the same issue (it always gets tagged correctly as CTIntrinsic). Something isn't working as expected during the initial analysis or the transformations.

@raehik
Copy link
Collaborator

raehik commented Nov 24, 2021

Huh. Changing integer*2 int2 to integer*8 int2 also changes what gets recorded for INT()'s IDType - it erroneously gets a VType stored, which corresponds to int2's type. Thinking this is to do with naming/renaming...

@raehik
Copy link
Collaborator

raehik commented Nov 24, 2021

Here's a MWE:

      subroutine main
      integer*1 x, int1, a1, a2, a3, a4, a5, a6, a7, a8, a9
      x = INT(int1, 2)
      end

This (incorrectly) parses the INT(int1, 2) call as ValVariable "int". Changing the number of declarations following int2 causes it to (correctly) parse as ValIntrinsic "int".

@raehik
Copy link
Collaborator

raehik commented Nov 24, 2021

The problem is that during renaming, int1 is the 3rd binder processed, so gets uniqueName: main_int13. After all the other binders, INT ends up being the 13th binder processed. So it gets uniqueName: main_int13. Issues no doubt arise from that. I'm not sure why this shadowing isn't discovered by the renamer though.

@raehik
Copy link
Collaborator

raehik commented Nov 25, 2021

Analysis.Renaming.renameExp is defined as:

-- Rename an ExpValue variable, assuming that it is to be treated as a
-- reference to a previous declaration, possibly in an outer scope.
renameExp :: Data a => RenamerFunc (Expression (Analysis a))
renameExp e@(ExpValue _ _ (ValVariable v))  = maybe e (`setUniqueName` setSourceName v e) `fmap` getFromEnvs v
-- Intrinsics get unique names for each use.
renameExp e@(ExpValue _ _ (ValIntrinsic v)) = flip setUniqueName (setSourceName v e) `fmap` addUnique v NTIntrinsic
renameExp e                                 = return e

At the point the renamer runs, the intrinsic transformation pass hasn't yet run, so we only need to look at the ValVariable case. The variable gets its unique name set via getFromEnvs, which doesn't find it because INT hasn't been declared. It then calls getIntrinsicReturnType, which finds that INT is an intrinsic. The renamer then creates a fresh binder in addUnique. The shadowing occurs here: addUnique doesn't check that its unique name is truly unique. It turns out in this case it isn't, and this causes problems down the road where unique names are expected to fully identify a variable.

We should be able to solve this with a check in addUnique to continually generate names until it finds one that isn't yet present.

@RaoulHC
Copy link
Collaborator Author

RaoulHC commented Nov 25, 2021

Good debugging!

I would favour a solution of changing the format of unique names in a way that avoids this altogether as opposed to checking whether it's present or not. There might be something I've not considered but I think putting an underscore between the symbol and the unique number would be sufficient. Aiui this is only an issue for symbols that end in a number.

@raehik
Copy link
Collaborator

raehik commented Nov 25, 2021

There might be something I've not considered but I think putting an underscore between the symbol and the unique number would be sufficient. Aiui this is only an issue for symbols that end in a number.

Good point, agreed. That should rule out this class of ambiguity bug where two different symbols generate the same unique name. On first thought I wondered if it wouldn't work for intrinsics like INT_() but 1) on a proper look it seems safe and 2) no intrinsics look like that. The only downside is slightly noisier binders, and I note that other compilers and debuggers often go the number appending route. But definitely preferred over a (possibly expensive) uniqueness check.

I'll make a PR updating the unique name schema & check it over with Dominic on Monday. Amusing bug, cheers @RaoulHC for the report and help!

raehik added a commit that referenced this issue Nov 25, 2021
Previously, symbols like int and int1 could have the same unique name
generated for them, depending on the fresh number used e.g. n=12: int ->
int12; n=2: int1 -> int12. This can break things further up that expect
unique names to be, well, unique. Adding a separator between the
variable and the fresh number prevents this type of edge case from
occurring.

Also see the GitHub discussion for further info:
#190
raehik added a commit that referenced this issue Nov 29, 2021
Previously, symbols like int and int1 could have the same unique name
generated for them, depending on the fresh number used e.g. n=12: int ->
int12; n=2: int1 -> int12. This can break things further up that expect
unique names to be, well, unique. Adding a separator between the
variable and the fresh number prevents this type of edge case from
occurring.

Also see the GitHub discussion for further info:
#190
@raehik
Copy link
Collaborator

raehik commented Nov 29, 2021

Discussed & above fix applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants