Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect .a86 files as assembly language #6778

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

TheBrokenPipe
Copy link

@TheBrokenPipe TheBrokenPipe commented Mar 29, 2024

Added .A86 to the list of extensions for assembly language. .A86 was an extension used by 8086 assembly language source files in the late 70s and early 80s.

Description

This pull request makes linguist detect .A86 files as x86 assembly language source code. People used to cross assemble 8086 code from 8080/Z80 machines, and .ASM was already reserved for the native assembly language of those machines, so they used .A86 instead. CP/M-86's ASM-86 and Seattle Computer Products' 8086 Cross Assembler, two of the earliest 8086 assemblers, both used the .A86 extension. GitHub currently does not detect .A86 files as assembly language.

Checklist:

Added .A86 to the list of extensions for assembly language. .A86 was an extension used by 8086 assembly language source files.
@TheBrokenPipe TheBrokenPipe requested a review from a team as a code owner March 29, 2024 12:03
Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add the sample you reference to the PR 😉

@TheBrokenPipe
Copy link
Author

TheBrokenPipe commented Mar 29, 2024

You need to add the sample you reference to the PR 😉

With all due respect, I do not get why a sample is needed. There is no difference between a .ASM and a .A86 file, the only difference is the 3-letter extension. However, I have added a sample as requested.

@lildude
Copy link
Member

lildude commented Mar 29, 2024

You need to add the sample you reference to the PR 😉

With all due respect, I do not get why a sample is needed. There is no difference between a .ASM and a .A86 file, the only difference is the 3-letter extension. However, I have added a sample as requested.

Whilst this language is the only user of this extension now and there are other extensions for the same language, it may not be the case in future in which case we may need to fall back to to the classifier which is trained off these samples. So whilst it seems pointless now, we can’t say that in future so it’s best to preempt things and prepare for the future now whilst this is fresh in people’s minds.

@TheBrokenPipe
Copy link
Author

TheBrokenPipe commented Apr 3, 2024

I looked around and saw that a lot of other pull requests included multiple samples, so I've added in another .A86 file.

Whilst this language is the only user of this extension now and there are other extensions for the same language, it may not be the case in future in which case we may need to fall back to to the classifier which is trained off these samples.

I see now, thanks!

By the way, I have a few questions:

  1. What is the "Pending Popularity" tag? My understanding is that the language is not popular enough for the PR to be merged, is this correct? If so, how many repos are required before this PR can be considered? I read in the contribution guideline that the number is 200, however we have about 1.4K .A86 files on GitHub (which should be more than 200 repos).
  2. Do I have to tick all boxes of the checklist? I feel like the last one is sort if irrelevant considering that this extension is currently only used by one language.

Sincerely,
Pig

@lildude
Copy link
Member

lildude commented Apr 3, 2024

1. What is the "Pending Popularity" tag? My understanding is that the language is not popular enough for the PR to be merged, is this correct? If so, how many repos are required before this PR can be considered? I read in the contribution guideline that the number is 200, however we have about 1.4K .A86 files on GitHub (which should be more than 200 repos).

Your understanding is correct, but it applies per extension too. #5756 as referenced in the CONTRIBUTING.md file details how we're determining popularity at the moment. We re-assess whenever we're close to making a new release which happens approximately every 3-4 months inline with the GitHub Enterprise Server release cycle.

2. Do I have to tick all boxes of the checklist? I feel like the last one is sort if irrelevant considering that this extension is currently only used by one language.

Nope. Only those that apply. This last point doesn't apply as the extension is unique.

Copy link
Member

@lildude lildude left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samples/Assembly/zip.a86 is too big (as it's suppressed in the diff by default). Please replace it with a smaller sample.

Added 2 sample .A86 files, which are small enough.
@TheBrokenPipe
Copy link
Author

Thanks for the fast reply! I have replaced the big file with a smaller one, hopefully it should be fine now and no more changes are required.

Sincerely,
Pig

@lildude
Copy link
Member

lildude commented Apr 3, 2024

Looks good. The only thing needed now is for the license to be specified for each sample in the template. We can only accept open-source licensed files and the license needs to be specified in the template.

@TheBrokenPipe
Copy link
Author

Looks good. The only thing needed now is for the license to be specified for each sample in the template. We can only accept open-source licensed files and the license needs to be specified in the template.

Bruh. Ok, well, how do I do that? I looked at a few other PRs and can't find where they've included the license. I'm thinking about writing my own .A86 samples at this point.

Sincerely,
Pig

@lildude
Copy link
Member

lildude commented Apr 3, 2024

You don't need to include the license in the PR. We only need them to be stated inline with the links or on the "Sample license(s):" line in the template.

@Alhadis Alhadis changed the title Detect .A86 Files As Assembly Language Detect .a86 files as assembly language Apr 3, 2024
@@ -412,6 +412,7 @@ Assembly:
- nasm
extensions:
- ".asm"
- ".a86"
- ".a51"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linguist requires extension lists to be sorted case-sensitively (sans the first entry, which is considered the “primary” extension), which means .a86 should follow .a51, not precede it.

   extensions:
   - ".asm"
   - ".a51"
+  - ".a86"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see that, sorry. Will change it later. However, afaik, .A51 (Intel 8051 I think) is not the same "language" as x86 assembly (8086, i286, i386 and co.). I see that Motorola 68K has its own thing, so maybe Intel 8051 should also have its own definition/whatever you call it? I can clearly see that GitHub is trying to apply the x86 syntax highlighting to the sample .A51 file. Maybe I should open an issue for that?

Again, sorry for not reading the contribution guidelines properly before opening this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that Motorola 68K has its own thing

Motorola 68K is grouped under Assembly, which means M68K falls under the Assembly umbrella for the purposes of language classification.

Motorola 68K Assembly:
type: programming
color: "#005daa"
group: Assembly
aliases:
- m68k
extensions:
- ".asm"
- ".i"
- ".inc"
- ".s"
- ".x68"
tm_scope: source.m68k
ace_mode: assembly_x86
language_id: 477582706

The reason for making this distinction in the first place is to employ more accurate syntax highlighting (note the tm_scope field above). E.g., XML property lists use a specialised grammar for highlighting the specialised XML subset used by plist(5), even though there's nothing wrong with XML's usual grammar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants