Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix heuristic for extension .yy (JSON vs Yacc) #6976

Merged
merged 7 commits into from
Aug 29, 2024

Conversation

DecimalTurn
Copy link
Contributor

@DecimalTurn DecimalTurn commented Jul 31, 2024

Description

Closes #6517
496k+ files affected

Implements fix suggested by @lildude and backed by @RobQuistNL in #6517 (comment).

See this comment for the approach used: #6976 (comment)

Checklist:

FIx .yy heuristic to account for changes in property name in GMStudio 2.3
@DecimalTurn DecimalTurn requested a review from a team as a code owner July 31, 2024 04:38
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a new sample and updated heuristic test, if necessary, for this too please.

@DecimalTurn
Copy link
Contributor Author

DecimalTurn commented Jul 31, 2024

Can we have a new sample and updated heuristic test, if necessary, for this too please.

Sure. I've added a sample JSON file with the .yy extension.

The test doesn't need any changes:

def test_yy_by_heuristics
assert_heuristics({
"JSON" => all_fixtures("JSON", "*.yy"),
"Yacc" => all_fixtures("Yacc", "*.yy")
})
end
end

@lildude
Copy link
Member

lildude commented Jul 31, 2024

I was referring to adding a new GMStudio sample that matches the heuristic change.

@DecimalTurn
Copy link
Contributor Author

DecimalTurn commented Jul 31, 2024

Those .yy files are produced by GMStudio, but they are (should be) classified as JSON by Linguist. That's why I've added the sample in the JSON folder.

The file added matches the heuristic added with the following line:
"resourceType":"GMScript",

@lildude
Copy link
Member

lildude commented Jul 31, 2024

🤦 Whoops. I missed that.

lildude
lildude previously approved these changes Jul 31, 2024
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

Note: this PR will not be merged until close to when the next release is made. See here for more details.

@DecimalTurn
Copy link
Contributor Author

🤦 Whoops. I missed that.

No worries. Let me know if you want me to fix the generated detection as well or if we can keep that for another PR.

# Internal: Is this a generated Game Maker Studio (2) metadata file?
#
# All Game Maker Studio 2 generated files will be JSON, .yy or .yyp, and have
# a part that looks like "modelName: GMname" on the 3rd line
#
# Return true or false
def generated_gamemakerstudio?
return false unless ['.yy', '.yyp'].include? extname
return false unless lines.count > 3
return lines[2].match(/\"modelName\"\:\s*\"GM/) ||
lines[0] =~ /^\d\.\d\.\d.+\|\{/
end

@lildude
Copy link
Member

lildude commented Jul 31, 2024

Yeah, probably best to fix that too. Lets do it in this PR.

@DecimalTurn
Copy link
Contributor Author

Should be all done now.
Note that I've had to relax the constraint that the property has to be on the 3rd line. There seems to be 50k+ files of GMStudio generated JSON where the old modelName property appears after the 3rd line, so even for GMStudio before v2.3, that constraint was too restrictive.

@DecimalTurn DecimalTurn changed the title Fix heuristic for extension .yy to account for change in property name in GMStudio 2.3 Fix heuristic for extension .yy (JSON vs Yacc) Aug 1, 2024
@DecimalTurn
Copy link
Contributor Author

DecimalTurn commented Aug 1, 2024

After looking a little more at the syntax for Yacc, I realize that a Yacc file cannot start with a { or [. It has to start with a declaration which begins with a %. This means that we could simply use the heuristic that we used for JSON from here (checking that the first non-whitespace character is either { or [):

- language: JSON
pattern: '\A\s*[{\[]'

And for the generated condition inside generated.rb , we could simply have something like:

      return lines.first(3).join('').match?(/^\s*[{\[]/) ||
             lines[0] =~ /^\d\.\d\.\d.+\|\{/

Thoughts?


Corresponding search query: Still 600K files.

@RobQuistNL
Copy link
Contributor

That is actually a much better approach in my opinion :+) Good find! I've seen one other .yy syntax that I don't think is either GM or Yacc, but since this setup only has these 2 included I think this might be a very clean and good solution.

JSON + .yy extension = GM
No JSON + .yy extension = Yacc

@DecimalTurn
Copy link
Contributor Author

DecimalTurn commented Aug 2, 2024

We should be all good now with the latest changes bb34e54 (#6976).

I've seen one other .yy syntax that I don't think is either GM or Yacc

Good catch! This actually seems like an unusual Yacc file that skips the directives section going against the convention/documentation, so it would still be correct to categorize them as Yacc technically.

However, for this to be a problem, those unconventional Yacc files would need to start with a {, but GitHub Search tells me the number of files where this happens is likely in the single digits as it's struggling to find more than 3.

Copy link
Contributor

@RobQuistNL RobQuistNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks.

Note: this PR will not be merged until close to when the next release is made. See here for more details.

@lildude lildude added this pull request to the merge queue Aug 29, 2024
Merged via the queue into github-linguist:master with commit cb03400 Aug 29, 2024
5 checks passed
@DecimalTurn DecimalTurn deleted the json-vs-yacc branch August 29, 2024 15:18
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Dec 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Game Maker Language file detection for those generated by GM Studio 2.3
4 participants