Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature to exclude the same http response #758

Closed
0xAwali opened this issue Sep 8, 2022 · 8 comments · Fixed by #1951
Closed

Add feature to exclude the same http response #758

0xAwali opened this issue Sep 8, 2022 · 8 comments · Fixed by #1951
Assignees
Labels
Status: Completed Nothing further to be done with this issue. Awaiting to be closed. Type: Enhancement Most issues will probably ask for additions or changes.
Milestone

Comments

@0xAwali
Copy link

0xAwali commented Sep 8, 2022

while scanning wide scope there are a lot of urls have the same base code so they have the same http response so it's time to let httpx exclude these urls based on comparing every http response with others

@0xAwali 0xAwali added the Type: Enhancement Most issues will probably ask for additions or changes. label Sep 8, 2022
@forgedhallpass forgedhallpass changed the title it's time to add feature to exclude the same http response Add feature to exclude the same http response Sep 29, 2022
@ibitebyt3s
Copy link

ibitebyt3s commented Oct 13, 2022

The anew tool might be a good fit for this only as a standalone program.

Wanted to suggest using its code in this project at first but I can't remember 100% if this was true since it happend about 2 years ago but anyway.

When using httpx with the -tech-detect option there will be times when a technology isn't detected in the same host, thus making anew useless in this case.

Lets say that in the first run it detected [php, Wordpress] and in the second run it detected only [php]. Using anew to check for duplicates will still keep duplicate hosts if their technologies detected are diffrent. Same could happen with other CLI options like -ip

@ehsandeep
Copy link
Member

ehsandeep commented Dec 8, 2022

More ideas - https://twitter.com/har1sec/status/1600445181115109377

when -path is used, make a request to / to get number of lines and words and use this as a baseline to filter response (quick custom 404 filter)

@dogancanbakir
Copy link
Member

How about something based on vision clusters? This can be deduplication of records in a cluster based on a threshold.

@Mzack9999 fyi

@dogancanbakir dogancanbakir self-assigned this Sep 19, 2024
@Mzack9999 Mzack9999 linked a pull request Oct 16, 2024 that will close this issue
@ehsandeep ehsandeep added the Status: Completed Nothing further to be done with this issue. Awaiting to be closed. label Oct 20, 2024
@ehsandeep ehsandeep added this to the httpx v1.6.9 milestone Oct 20, 2024
@Xitro01
Copy link

Xitro01 commented Nov 26, 2024

Just saw this getting included in httpx. Only thing I'm afraid of, is that it is going to exclude a lot of potential targets.
How are these duplicates checked? Currently I do word count, but also check if it's the same IP, then the chances are much slimmer that you miss something.

@dogancanbakir
Copy link
Member

@Xitro01 We use simhash to do deduplication, and it's optional. See: https://github.com/projectdiscovery/httpx/pull/1951/files

@Xitro01
Copy link

Xitro01 commented Nov 27, 2024

@Xitro01 We use simhash to do deduplication, and it's optional. See: https://github.com/projectdiscovery/httpx/pull/1951/files

Thanks for the response, yes this is probably going to skip a lot of stuff that might be interesting.
Imo the IP addresses should be compared first, before doing response/page comparison.

@dogancanbakir
Copy link
Member

@Xitro01 Thanks for the suggestion. Would you be willing to create a new issue to share your ideas? It could be a good addition to what we added.

@Xitro01
Copy link

Xitro01 commented Nov 27, 2024

@Xitro01 Thanks for the suggestion. Would you be willing to create a new issue to share your ideas? It could be a good addition to what we added.

#2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Completed Nothing further to be done with this issue. Awaiting to be closed. Type: Enhancement Most issues will probably ask for additions or changes.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants