-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
survey: binary search string results + api + usecases #13
Comments
I think showing the offset of the string by section is a nice feature and sets it apart from I'd also like to see searching executable code via hex strings (not any disassembly, that's probably beyond the scope of this project) with wild cards like Just a thought for what could make bingrep even more useful. Oh, and counting matches, versus just finding offset would be nice too! |
What would be nice, is if the search option also displays the context around a match. Searching for "chr" would show 'strrchr' and 'strchr' as match:
In case of grepping for a word in a long string, it would be useful to get the full line of text.
I had a usecase for the context displaying search option recently. By using the regex crate, it should be possible to do case insensitive matching and searching for hex strings: |
Hi @ghuls, sorry for the delay ! I like your idea, using regexp crate, for better searching. Unfortunately I won’t have the time to implement this, but if you felt inclined to submit a PR implementing this functionality I would be very likely to merge. |
Prolegomena
Hullo!
So I've begun adding search functionality, and I already find it very useful. In particular, this is a usecase I find myself having:
There are a number of issues at hand here. It's in the beginning stages, so I'd like to ask for everyone (anyone's) input about:
Again, there's a lot going on here, so I'll open up with a particular example which addresses the uses I usually have, but I'd really like to know what other people want!
Grepping for a static string
I'm debugging/analyzing a binary. I want to see if "hello world" is somewhere in the binary. So I run:
I want to know a few things:
It might look something like this:
Which is trying to say that hello was found at offset 0x724 in the binary; it is normalized to 0x724 in the PT_LOAD program header (for elf); to the .rodata section in the section headers, and here is that section header.
Similarly, it was also found in a strtab section header, which normalized is to the offset of
0x9f
starting from0x1668
Grepping for a symbol
Similarly, suppose we're looking simply for whether
puts
is called by our binary, and if so, what are the details of the symbol, and if possible, where is it called.Perhaps using the same api, we search for:
and this returns to us a couple of hits, which are semantically quite different:
Goal
What i'd like in both of these cases, if possible, is a unified api for querying the contents of a binary for a search string, and very importantly:
I don't want it to be busy; I want with similar color coding techniques to highlight the information I need; and I want the output to be semantically relevant, e.g., the search string is used against symbol names in the symbol table, etc.
Ideally, this is presented finally to the user as some kind of tabular structure, or a summary of a group of tabular structures, each tailored to the semantic content the string matched against, perhaps in different categories, like:
etc., for any various number of different kinds of matches, and categories.
Implementation Details
I'm not a big text search aficionado, so if anyone wants to help with the actual search string api, e.g., regexes, case insensitive, etc., as well as efficiency concerns, that would be great - i'm all ears - or in the case of PRs, very grateful!
Conclusion
If you have a usecase, or an idea of how to present this information usefully, I'm interested in your feedback.
The master branch right now contains a very, very prototypical implementation invoked via:
it is case sensitive, but also accidentally works with prefixes.
It currently dumps the regular print, then scans the binary, and pushes all matches, then normalizes the string against the program and section headers. I've started experimented with other "semantic" output, and there's definitely a lot of potential, hence this issue :)
Output is like:
The text was updated successfully, but these errors were encountered: