Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accurate gitignore generator #296

Open
3 of 21 tasks
KaKi87 opened this issue Oct 7, 2021 · 22 comments
Open
3 of 21 tasks

Accurate gitignore generator #296

KaKi87 opened this issue Oct 7, 2021 · 22 comments
Labels
Developer tooling Help your fellow developers out by making their job a bit more enjoyable with good tooling. Intermediate Projects that require a medium level of understanding. Doesn't require much prior knowledge. Medium work This project takes little time to complete. (ETA week or two)

Comments

@KaKi87
Copy link

KaKi87 commented Oct 7, 2021

Project description

.gitignore generators outputs all potentially applicable rules, relying on a project's language or platform, whether those will be used or not.

An accurate generator would only include useful ones for a project.

At first glance, here are two ways of doing this.

  • Dirty : output rules from existing generators, filter out those which won't have any effect (i.e. path does not exist) ;

  • Clean : output rules from smart detections of dev tools and dependencies.

Relevant Technology

The generator must use a language or platform that can run :

  • on any operating system ;

  • alongside any other installed language, platform or dev tool ;

  • ideally without any dependency.

The generator would ideally support all languages and platforms that could benefit from it.

Complexity and required time

Complexity

  • Beginner - This project requires no or little prior knowledge of the technolog(y|ies) specified to contribute to the project
  • Intermediate - The user should have some prior knowledge of the technolog(y|ies) to the point where they know how to use it, but not necessarily all the nooks and crannies of the technology
  • Advanced - The project requires the user to have a good understanding of all components of the project to contribute

Required time (ETA)

  • Little work - A couple of days
  • Medium work - A week or two
  • Much work - The project will take more than a couple of weeks and serious planning is required

Categories

  • Mobile app
  • IoT
  • Web app
  • Frontend/UI
  • AI/ML
  • APIs/Backend
  • Voice Assistant
  • Developer Tooling
  • Extension/Plugin/Add-On
  • Design/UX
  • AR/VR
  • Bots
  • Security
  • Blockchain
  • Futuristic Tech/Something Unique
@FredrikAugust FredrikAugust added Developer tooling Help your fellow developers out by making their job a bit more enjoyable with good tooling. Intermediate Projects that require a medium level of understanding. Doesn't require much prior knowledge. Medium work This project takes little time to complete. (ETA week or two) labels Oct 7, 2021
@yasinatesim
Copy link

yasinatesim commented Oct 26, 2021

What is the scope of this task? Doing something like the tool in this link ?

When I saw this idea, I thought it was a project that automatically creates a .gitignore file via the command line.
For example, fetching the technologies from files in this repository and creating a .gitignore file.

@KaKi87
Copy link
Author

KaKi87 commented Oct 28, 2021

The projects you're referring to are the ones which I call inaccurate, although a dirty implementation of my proposal could be based on those.

It could be a CLI, and/or an IDE plugin.

@Idrinth
Copy link

Idrinth commented Oct 31, 2021

I'm not quite sure how that detection should work. For example detecting an ide does not mean a specific project is opened with it.
Otherwise interesting

@yasinatesim
Copy link

yasinatesim commented Oct 31, 2021

@Idrinth there are solutions that are considered dirty, for example

For the clean solution, it seems to me that this situation can be solved with project-based plugins. For example, if it is a Next js project, only the lines related to Next should be brought from the fetch project Node .gitignore. This should be done for every language and technology, and in-app plugins should be making by providing contribution. 🤔

@KaKi87
Copy link
Author

KaKi87 commented Nov 1, 2021

detecting an ide does not mean a specific project is opened with it

Opening a project in an IntelliJ IDE creates an .idea directory at the project's root, opening a project in Visual Studio Code creates a .vscode directory at the project's root.

@Idrinth
Copy link

Idrinth commented Nov 1, 2021

And half the time(ok, often, but maybe not that often) someone committed that and it's opened by something else, let's say sublime text

@KaKi87
Copy link
Author

KaKi87 commented Nov 2, 2021

What's the issue then ? If the developer uses Sublime Text, there will be nothing related to that choice that requires any .gitignore update

@joshburnsxyz
Copy link

joshburnsxyz commented Nov 25, 2021

There is a boiled down concept here that actually makes alot of sense - Have a CLI that communicates with the files in githubs template repo. Have the user select a template from a menu, that will generate the "dirty" ignore file. After this simply read the new file line by line and test if the path exists, if its not, cut that line out of the file. I'd be happy to work on this project if anyone else is keen.

EDIT: Just to extend upon this idea -- That loop would also use something like a lookup table that would match the string to a "handler" function. which would in turn implement rules / actions / checks to be performed should that string be in the ignore file.

@KaKi87
Copy link
Author

KaKi87 commented Nov 26, 2021

Suggestion : before asking the user to select which gitignore files to use, filter out from the list those that already doesn't output any match.

@joshburnsxyz
Copy link

Suggestion : before asking the user to select which gitignore files to use, filter out from the list those that already doesn't output any match.

My only concern there is CPU Cycles. There would need to be a very efficient algorithm to sort & checking potentially millions of lines of text, against a recursive scan of the current working dir.
it could become a very memory heavy process very quickly. Especially if we're talking about larger projects like a Ruby on Rails or JHipster project for example, rather large codebases (many files) and
for every line in the gitignore we need to check against each individual file.

@KaKi87
Copy link
Author

KaKi87 commented Nov 29, 2021

Not if you put the working directory content in an array first, then check it against the gitignore files.

@joshburnsxyz
Copy link

Not if you put the working directory content in an array first, then check it against the gitignore files.

Thats actually the exact process I had in my head when I wrote that. Got to remember, lets say for each item in the array (that represents a file in the working directory tree) we need to make a call out to a function or something that either magically has every line from every file in this repo stored and sorted so we know what matches are relevant to what theme/framework/whatever OR We make a HTTP call to read each one of the files and see if we get matches that way. But that match that we do get needs to be stored as not only a match, but a match from Foo.gitignore (For Example).

And we need to repeat that for every file in the current project. You either end up with a very large install size because misc data that we may need, or we have high bandwidth usage because of the HTTP calls.

If you have a different solution im all ears haha.

@KaKi87
Copy link
Author

KaKi87 commented Nov 29, 2021

  • The app would clone the gitignore repo in a fixed location outside the project (for example ~/.cache/gitignore) and pull it at run
  • The app would create an array for each file, excluding negative rules and subdirectory rules
  • The app would only store the gitignore name in an array when it matches, but not the matching rule itself
  • The app would immediately stop trying to check rules in a file once one already matched, and try the next one instead

@Allyedge
Copy link

Allyedge commented Apr 19, 2022

I actually implemented something similar in Go.

It is just a fun project so I didn't really add too many features, it just checks for a .gitignore file in the directory where the tool is used and deletes the lines that aren't necessary.

@KaKi87
Copy link
Author

KaKi87 commented Apr 20, 2022

Well, it's not entirely what I suggested, but it's a step :)

@chenasraf
Copy link

chenasraf commented May 18, 2022

I have a good initial version working! There might be more to do, but mostly I only need a few more language deterministic patterns before I'm ready to call it v1.

Latest release (GitHub) | Repository (GitHub)

VS Code Extension (VS Code Marketplace) | VS Code Extension Repository (GitHub)

Binary verified working on:

  • macOS
    • M1
    • Intel
  • Windows
    • x386
  • Linux
    • x86_64

How it works

The general flow is this:

  • Populate the cache from gitignore (or update if needed)
  • Template matching:
    • Algorithm 1: Deterministic:
      • Iterate through map of Glob patterns with common file types on the project. If it finds a type of file glob directly associated with a template (e.g. *.{ts,js}x? for Node), this template is selected
    • Algorithm 2: Process of elimination - If Algorithm 1 did not find matches:
      • Iterate through all languages in template list, eliminating any glob patterns that are not found in the project. If a language has 1 or more patterns that DO match, they are selected.
    • In both methods, multiple can be matched
  • With the resulting templates:
    • If there is only one candidate, auto select it
    • If there are multiple, trigger multi select
  • Ask if user wants to clean up unused ignore lines
    • The cleanup process is as such:
      • Iterate each line, ignore comment lines
      • When a line is found and a pattern matches a file on the project, go back and collect the previous group of comment lines that preceded it
      • These become the output
      • Output all the remaining lines together
  • Prepend section comment with language name for each language file in output (if there is more than 1)
  • Ask to overwrite/append/skip file if already exists

Any suggestions welcome.

@chenasraf
Copy link

Would love some help with static file language detection, if you know any project files that are usually consistent per language/project I would love to hear it :) such as package.json for node, __init__.py for Python, etc.

I am keeping track of all the languages I statically check for right here: chenasraf/gi_gen#2

Feel free to add to the list so I can implement 🙏🏼

@chenasraf
Copy link

chenasraf commented May 26, 2022

Also created a VS Code extension that uses the above program :)

@chenasraf
Copy link

@KaKi87, I hate to make many comments in a row, but I am really looking forward to my submission being reviewed. If you feel it is appropriate, I will appreciate closing this issue, or, letting me know what can be changed, to make that happen.

Thank you for the idea, I loved working on this.

@KaKi87
Copy link
Author

KaKi87 commented Jun 2, 2022

I ran gi_gen on an existing project using an IntelliJ+Node+Electron stack.

  • Only Node was detected, which means the generated file in overwrite mode will always be incomplete ;
  • Duplicate entries may be generated in append mode ;
  • Universally ignorable files & directories (e.g. dist, build, config.js, etc.) are either detected as language-specific or undetected.

Here's the initial content :

.idea
node_modules
config.js
.parcel-cache
dist
build

Here's the resulting diff in overwrite mode :

diff --git a/.gitignore b/.gitignore
index 38f2865..82925f5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,6 +1,4 @@
-.idea
-node_modules
-config.js
-.parcel-cache
-dist
-build
\ No newline at end of file
+# Dependency directories
+node_modules/
+# Nuxt.js build / generate output
+dist
\ No newline at end of file

Here's the resulting diff in append mode :

diff --git a/.gitignore b/.gitignore
index 38f2865..69c094d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -3,4 +3,8 @@ node_modules
 config.js
 .parcel-cache
 dist
-build
\ No newline at end of file
+build
+# Dependency directories
+node_modules/
+# Nuxt.js build / generate output
+dist
\ No newline at end of file

Additionally, when using this particular mode, I would suggest not removing matching lines even if no template contains it, or asking the user's permission to do so.

Thank you for your interest into my idea.

@chenasraf
Copy link

Thanks for the reply @KaKi87 :)

Only Node was detected, which means the generated file in overwrite mode will always be incomplete

Yes, I need more examples of files to test against for more template types, I am keeping track of what's possible right now via this issue. I will add IDEs and more

Duplicate entries may be generated in append mode ;

True, I am not modifying the existing contents of the gitignore file, only the ones that GI Gen generates, and then gets added/replaces... It's definitely a point to improve upon. Generally I guess we would want both cleanup logic & dedupe logic in the end output file, not only the output file before it is appended.

Universally ignorable files & directories (e.g. dist, build, config.js, etc.) are either detected as language-specific or undetected.

Do you have a suggestion on what to do there? Should I ignore some specific examples when matching? A build directory is probably a rule in a lot of templates, I can't think of a way without blacklisting that line specifically

Additionally, when using this particular mode, I would suggest not removing matching lines even if no template contains it, or asking the user's permission to do so.

Is the prompt to clean unused lines not what you mean? Can you elaborate?

@KaKi87
Copy link
Author

KaKi87 commented Jun 3, 2022

Do you have a suggestion on what to do there?

I asked myself about this before posting, and knew that you'd ask as well, unfortunately I'm as clueless as you. 😅

Not only those directories exist in many templates as you know, but the developer might not even use such directory as an output one, but maybe store build scripts/tools in it, and use a differently named directory, to store generated builds.

Is the prompt to clean unused lines not what you mean?

No, that one works fine.

Can you elaborate?

gi_gen's overwrite mode removed .idea, config.js, .parcel-cache and build from the file, because those lines didn't exist in the selected template, although were all matching something in the project.
In that case, the tool should preserve those, or ask the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Developer tooling Help your fellow developers out by making their job a bit more enjoyable with good tooling. Intermediate Projects that require a medium level of understanding. Doesn't require much prior knowledge. Medium work This project takes little time to complete. (ETA week or two)
Projects
None yet
Development

No branches or pull requests

7 participants