Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git-dumper doesn't work in some cases when the git output have HTML content-type #25

Open
DEMON1A opened this issue May 15, 2021 · 8 comments

Comments

@DEMON1A
Copy link

DEMON1A commented May 15, 2021

  • I found a public git folder on some website. but during using git-dumper to dump the code out from the git folder i got these errors:
[-] Testing https://example.com/.git/HEAD [200]
[-] https://example.com//.git/HEAD responded with HTML
  • I checked the website manually and I can clearly see the git folder content is leaked. but git-dumper refuses to dump it since the data comming out from it is in HTML content-type. that will disallow git-dumper from dumping some cases.
@arthaud
Copy link
Owner

arthaud commented May 15, 2021

I think originally I was only checking whether the content contains "" but people had issues with that, see #13
@DashLt do you know what was the issue with the original check?
In the meantime you can replace line 33 of git_dumper.py with a return False.

@DEMON1A
Copy link
Author

DEMON1A commented May 15, 2021

Yeah I already edited that line of code before. but the issue was still there. then i noticed there's a second layer of validation on line 73 do the same thing as 33. edited it and now it's working for me.

@DashLt
Copy link

DashLt commented May 15, 2021

Not every site has a <html> tag verbatim. Many have attributes inside the tag, e.g.:

<html class="rwd geo-override no-js vis no-rtl headerfooter-menu3 " lang="en">

It's weird that whatever webserver in the site you're attacking isn't using the application/octet-stream content-type, but it exists so it's definitely an edge case that has to be handled. As a quick and dirty thing you could check for the existence of <html, but even then that tag isn't necessarily required. I think maybe some sort of HEAD file validation is in order?

@arthaud
Copy link
Owner

arthaud commented May 16, 2021

That's also my conclusion. We would need a reference syntax checker. or we could just skip the verification on that file and fail later when we parse objects file (which need to be compressed with zlib, so that rules out html).

@DEMON1A
Copy link
Author

DEMON1A commented May 16, 2021

Not every site has a tag verbatim. Many have attributes inside the tag, e.g.:

You can solve this with regex, Pattern: \<html(|.*)\>

@DEMON1A
Copy link
Author

DEMON1A commented May 16, 2021

If you gonna accept the RE solution, I can do the fixes on PR if you would like.

@DashLt
Copy link

DashLt commented May 16, 2021

You can solve this with regex, Pattern: \<html(|.*)\>

https://stackoverflow.com/a/1732454

(In all seriousness, running a regex that matches that much could cause serious slowdowns on pages that can easily reach the hundreds of KB or even MB. You would also be able to send git-dumper back a very large page and make it hang as well. It's in general just a very hacky solution.)

@DEMON1A
Copy link
Author

DEMON1A commented May 16, 2021

You seems to be right, but I guess in this case we don't really need that HTML content-type validation if we already know that it contains a content from the GIT folder. for example checking a string on /.git/config will be more than fine to keep fetching other stuff without caring about content-type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants