Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fortran: Improve handling of case insensitivity #3668

Closed
RaoulHC opened this issue Mar 15, 2023 · 14 comments · Fixed by #3671
Closed

Fortran: Improve handling of case insensitivity #3668

RaoulHC opened this issue Mar 15, 2023 · 14 comments · Fixed by #3671

Comments

@RaoulHC
Copy link

RaoulHC commented Mar 15, 2023

Fortran is a case insensitive language and as a result it would be great if ctags took this into account.

For example the following code defines a subroutine as FOO but can happily be called by foo, and similarly uses myvar in multiple ways, but the tags file generated from it is not as useful as I'd like.

      subroutine FOO()
      integer myVar
      myvar = 3
      print *, "myVar is ", MyVar
      end subroutine

      program my_program
      call foo()
      end program my_program
$ ./ctags --kinds-Fortran=+L -o - test.f
FOo     test.f  /^      subroutine FOo(/;"      s
myVar   test.f  /^      integer myVar$/;"       L       subroutine:FOo  file:
my_program      test.f  /^      program my_program$/;"  p

Additionally Fortran symbols are appended with an underscore by most compilers, meaning that if in a multi language codebase I want to go to the definition of a Fortran function called in c as foo_ I similarly wont find it.

I think it makes sense for ctags to lowercase symbols, and potentially make sense for it to append an underscore to subroutine and function names. However because the latter may not play well with other tooling (my usecase is with opengrok), it might make the most sense to generate two tags, with and without the underscore, probably gated behind a flag.

Looks like it should be easy enough to implement this in the Fortran parser but as I'm not all that familiar with ctags I'd appreciate some input to this problem before raising a PR.

@masatake
Copy link
Member

Should ctags generate the following tags for the former reqeust?

FOO     test.f  /^      subroutine FOO(/;"      s
FOo     test.f  /^      subroutine FOO(/;"      s
FoO     test.f  /^      subroutine FOO(/;"      s
FOo     test.f  /^      subroutine FOO(/;"      s
Foo     test.f  /^      subroutine FOO(/;"      s
fOO     test.f  /^      subroutine FOO(/;"      s
fOo     test.f  /^      subroutine FOO(/;"      s
foO     test.f  /^      subroutine FOO(/;"      s
foo     test.f  /^      subroutine FOO(/;"      s

@masatake masatake changed the title Improve handling of Fortran case insensitivity Fortran: Improve handling of case insensitivity Mar 15, 2023
@RaoulHC
Copy link
Author

RaoulHC commented Mar 16, 2023

No I don't think that would be a good approach, for longer symbols where n is the number of letters you'd have to have 2n combinations, and tag files would quickly become rather large.

I think just having the lowercase option makes the most sense, i.e.:

foo     test.f  /^      subroutine FOO(/;"      s

Consumers of tag files will in general need to take into account the case insensitivity of the language regardless (I'm probably going to need to make some slight changes to opengrok).

If we have the underscore appended symbol too it would be:

foo     test.f  /^      subroutine FOO(/;"      s
foo_     test.f  /^      subroutine FOO(/;"      s

@masatake
Copy link
Member

If ctags records foo in a tags file, searching FOO with a tool consuming the tags file may not work, and it is incompatible destructive behavior for tools having been working with ctags.
Unacceptable. These things should be solved with consumers of tag files. I'm negative.

foo_ is a bit interesting. How do we call the id like foo_. ELF name? Link name?

@masatake
Copy link
Member

https://stackoverflow.com/questions/1569887/is-it-possible-to-perform-a-case-sensitive-search-in-opengrok

It seems that case sensitivity can be handled in OpenGrok side.

@RaoulHC
Copy link
Author

RaoulHC commented Mar 16, 2023

If ctags records foo in a tags file, searching FOO with a tool consuming the tags file may not work, and it is incompatible destructive behavior for tools having been working with ctags. Unacceptable. These things should be solved with consumers of tag files. I'm negative.

Perhaps this should be gated by a flag? As it stands if a function is defined as MyFunction, but is called as myfunction, if a user searches for the latter they will not find it, which is not ideal behaviour.

foo_ is a bit interesting. How do we call the id like foo_. ELF name? Link name?

I think link name makes sense.

It seems that case sensitivity can be handled in OpenGrok side.

Note there that the full search is case insensitive, but the search for definition is case sensitive. This is what I think would need to be changed in opengrok for Fortran specifically.

@masatake
Copy link
Member

Perhaps this should be gated by a flag? As it stands if a function is defined as MyFunction, but is called as myfunction, if a user searches for the latter they will not find it, which is not ideal behaviour.

I think such a flag should be part of the tools consuming the tags file.
readtags, a tool consuming tags files, has the ability to search in a case-insensitive way.
I wonder how OpenGrok people may say.

Note there that the full search is case insensitive, but the search for definition is case sensitive. This is what I think would need to be changed in opengrok for Fortran specifically.

So I think extending opengrok is the way to go instead of extending ctags.

foo_ is a bit interesting. How do we call the id like foo_. ELF name? Link name?

I think link name makes sense.

Could you tell me a documentation or specification about the rule synthesizing the name like foo_?

$   ./ctags --list-kinds-full=Fortran
#LETTER NAME       ENABLED REFONLY NROLES MASTER DESCRIPTION
E       enum       yes     no      0      NONE   enumerations
L       local      no      no      0      NONE   local, common block, and namelist variables
M       method     yes     no      0      NONE   type bound procedures
N       enumerator yes     no      0      NONE   enumeration values
P       prototype  no      no      0      NONE   subprogram prototypes
S       submodule  yes     no      0      NONE   submodules
b       blockData  yes     no      0      NONE   block data
c       common     yes     no      0      NONE   common blocks
e       entry      yes     no      0      NONE   entry points
f       function   yes     no      0      NONE   functions
i       interface  yes     no      0      NONE   interface contents, generic names, and operators
k       component  yes     no      0      NONE   type and structure components
l       label      yes     no      0      NONE   labels
m       module     yes     no      0      NONE   modules
n       namelist   yes     no      0      NONE   namelists
p       program    yes     no      0      NONE   programs
s       subroutine yes     no      0      NONE   subroutines
t       type       yes     no      0      NONE   derived types and structures
v       variable   yes     no      0      NONE   program (global) and module variables

Which kind may have such a link name?
I don't know Fortran.
However, I guess a language object having type kind may have no link name.
Tell me the criteria.

@masatake
Copy link
Member

$ cat /tmp/foo.f
cat /tmp/foo.f
      subroutine FOO()
      integer myVar
      myvar = 3
      print *, "myVar is ", MyVar
      end subroutine

      program my_program
      call foo()
      end program my_program
$ ./ctags -o - /tmp/foo.f
./ctags -o - /tmp/foo.f
FOO	/tmp/foo.f	/^      subroutine FOO(/;"	s
my_program	/tmp/foo.f	/^      program my_program$/;"	p
$ ./ctags -o - --extras-Fortran='{linkName}' /tmp/foo.f
./ctags -o - --extras-Fortran='{linkName}' /tmp/foo.f
FOO	/tmp/foo.f	/^      subroutine FOO(/;"	s
FOO_	/tmp/foo.f	/^      subroutine FOO()$/;"	s
my_program	/tmp/foo.f	/^      program my_program$/;"	p
my_program_	/tmp/foo.f	/^      program my_program$/;"	p
$ git diff |cat
git diff |cat
diff --git a/parsers/fortran.c b/parsers/fortran.c
index 1d9ec910a..29a3629d1 100644
--- a/parsers/fortran.c
+++ b/parsers/fortran.c
@@ -368,6 +368,18 @@ static const keywordTable FortranKeywordTable [] = {
 	{ "while",          KEYWORD_while        }
 };
 
+typedef enum {
+	X_LINK_NAME,
+} fortranXtag;
+
+static xtagDefinition FortranXtagTable [] = {
+	{
+		.enabled = false,
+		.name    = "linkName",
+		.description = "Name used in foreign languages",
+	},
+};
+
 static struct {
 	unsigned int count;
 	unsigned int max;
@@ -600,6 +612,16 @@ static void makeFortranTag (tokenInfo *const token, tagType tag)
 			 token->tag == TAG_PROTOTYPE))
 			e.extensionFields.signature = vStringValue (token->signature);
 		makeTagEntry (&e);
+		if (isXtagEnabled (FortranXtagTable[X_LINK_NAME].xtype))
+		{
+			vString *linkName_name = vStringNewInit (e.name);
+			vStringPut(linkName_name, '_');
+			tagEntryInfo linkName_e = e;
+			linkName_e.name = vStringValue (linkName_name);
+			markTagExtraBit (&linkName_e, X_LINK_NAME);
+			makeTagEntry (&linkName_e);
+			vStringDelete (linkName_name);
+		}
 	}
 }
 
@@ -2724,5 +2746,7 @@ extern parserDefinition* FortranParser (void)
 	def->initialize = initialize;
 	def->keywordTable = FortranKeywordTable;
 	def->keywordCount = ARRAY_SIZE (FortranKeywordTable);
+	def->xtagTable     = FortranXtagTable;
+	def->xtagCount     = ARRAY_SIZE(FortranXtagTable);
 	return def;
 }
$ 

@RaoulHC
Copy link
Author

RaoulHC commented Mar 16, 2023

Could you tell me a documentation or specification about the rule synthesizing the name like foo_?

It's not part of the standard, but is the default behaviour for most compilers, for example gfortran, Sun Studio. By default, XLF does not do it, but it can be turned on by -qextname.

Which kind may have such a link name?

I'll double check, but off the top of my head I believe the following symbols have underscore appended: common, entry, function, interface, and subroutine. Unsure about some of the module features, and block data, but stuff like variable and types don't have any linkage significance.

@RaoulHC
Copy link
Author

RaoulHC commented Mar 16, 2023

I had raised an issue on the OpenGrok repo and it looks like someone had tried to implement the case insensitive search on their side a couple of years ago! It never got merged in the end so I'll try and update that and get it over the line.

As for what symbols are appended with, I wrote a small program to try and cover most of those symbols (see here), so the following should be covered: function, subroutine, block data, common and entry. Other stuff don't have symbols or they are more significantly mangled, including functions, subroutines and entries within a module, so we wont want extra link names for them.

I think the approach you posted a diff of does make sense so I would be happy with that output, though I think given the symbol will actually be made all lowercase, it might make sense to make that link name lowercase.

Is there any significance to the difference of the link name catching the closing parenthesis? I guess it wouldn't make much difference to finding definitions.

@masatake
Copy link
Member

As far as reading the man page for GNU Fortran you kindly introduced, the compiler appends two underscores to a link name whose original name includes an underscore.

e.g.
my_var__ is for my_var.
foo_ is for foo.

Do you think ctags should do the same?

@RaoulHC
Copy link
Author

RaoulHC commented Mar 20, 2023

Hmmm I think the GFortran docs are actually slightly misleading, it describes what's happening with the non-default -fsecond-underscore option under the -fno-underscore option.

Testing it on the following example:

      program main
      call subr()
      call my_subr()
      end program main

Compiling it with the following options and inspecting the symbols in the resulting object file I get:

  • -funderscoring (the default): subr_, my_subr_
  • -fno-underscoring: subr, my_subr
  • -fsecond-underscore: subr_, my_subr__

I'll raise an issue with GCC folks to improve that, but I think adding a single underscore regardless of how many underscores there are in the name makes sense.

masatake added a commit to masatake/ctags that referenced this issue Mar 20, 2023
Close universal-ctags#3668.

The test input is taken from
https://gist.github.com/RaoulHC/a241fb714f191b5fc3a7c790d7b23523.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

@RaoulHC Thank you. Based on your advice I made a pull request. See #3671.

@RaoulHC
Copy link
Author

RaoulHC commented Mar 20, 2023

Excellent, looks like some tests need updating, but otherwise that looks like that'll really help our opengrok instance find Fortran definitions.

Thanks so much for this!

masatake added a commit to masatake/ctags that referenced this issue Mar 20, 2023
Close universal-ctags#3668.

The test input is taken from
https://gist.github.com/RaoulHC/a241fb714f191b5fc3a7c790d7b23523.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
@masatake
Copy link
Member

Excellent, looks like some tests need updating, ...

Please, give comments on the pull request. I will not merge till I get your ack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants