Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved regexes #52

Merged
merged 6 commits into from
Mar 21, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 58 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,34 @@ For the impatient ones, grab the download on the [releases page](https://github.
*: note that currently only apk files are supported, but ipa files will follow very shortly.
</p>

An example report can be found here: [example report](resources/example-report.zip)
An example report can be found here: [example report](https://github.com/vincentcox/StaCoAn/blob/master/resources/example-report.zip)

## Table of Contents
<!-- TOC depthFrom:2 depthTo:6 withLinks:1 updateOnSave:1 orderedList:0 -->

- [Table of Contents](#table-of-contents)
- [Features](#features)
- [Looting concept](#looting-concept)
- [Wordlists](#wordlists)
- [Filetypes](#filetypes)
- [Responsive Design](#responsive-design)
- [Limitations](#limitations)
- [Getting Started](#getting-started)
- [From the releases](#from-the-releases)
- [Docker](#docker)
- [From source](#from-source)
- [Building the executable](#building-the-executable)
- [Windows](#windows)
- [mac](#mac)
- [Linux](#linux)
- [Contributing](#contributing)
- [Roadmap](#roadmap)
- [Authors & Contributors](#authors-contributors)
- [Top contributors](#top-contributors)
- [License](#license)
- [Acknowledgments](#acknowledgments)

<!-- /TOC -->

## Features
The concept is that you drag and drop your mobile application file (an .apk or .ipa file) on the StaCoAn application and it will generate a visual and portable report for you. You can tweak the settings and wordlists to get a customized experience.
Expand Down Expand Up @@ -48,7 +75,7 @@ In the `exclusion_list.txt` you can define exclusions (if you have for some reas
```

### Filetypes
Any source file will be processed. This contains '.java', '.js', '.html', '.xml',... files.
Any source file will be processed. This contains `'.java', '.js', '.html', '.xml',...` files.

Database-files are also searched for keywords. The database also has a table viewer.

Expand All @@ -63,6 +90,7 @@ The reports are made to fit on all screens.
This tool will have trouble with [obfuscated](https://en.wikibooks.org/wiki/Introduction_to_Software_Engineering/Tools/Obfuscation) code. If you are a developer try to compile without obfuscation turned on before running this tool. If you are on the offensive side, good luck bro.

## Getting Started
### From the releases
If you want to get started as soon as possible, head over to the [releases page](https://github.com/vincentcox/StaCoAn/releases) and download the executable or archive which corresponds to your operating system.

If you have downloaded the release zip file, extract this.
Expand All @@ -71,13 +99,34 @@ On Windows you can just double click the executable. It will open in server mode

![Windows 1 click](resources/windows-1-click.gif)

On Mac and Linux you can just run it from the terminal without arguments.

On Mac and Linux you can just run it from the terminal without arguments for the server-mode.
```
./stacoan
```
Drag and drop this file onto the executable.

Or you can specify an apk-file to run it without the server-mode:
```
./stacoan -p test-apk.apk
```
The report will be put inside a folder with a name corresponding to the apk.

### Docker

```
cd docker
```

Drag and drop this file onto the executable. The report will now be generated in the `report` folder.
```
docker build . -t stacoan
```
_Make sure that your application is at the location `/yourappsfolder`._

```
docker run -e JAVA_OPTS="-Xms2048m -Xmx2048m" -p 8000:8000 -p 8080:8080 -i -t stacoan
```

Drag and drop your application via: http://127.0.0.1:8000.

### From source
```
Expand Down Expand Up @@ -162,23 +211,6 @@ Build stacoan:
python3 -m PyInstaller stacoan.py --onefile --icon icon.ico --name stacoan --clean
```

### Running the Docker container

```
cd docker
```

```
docker build . -t stacoan
```
_Make sure that your application is at the location `/yourappsfolder`._

```
docker run -e JAVA_OPTS="-Xms2048m -Xmx2048m" -p 8000:8000 -p 8080:8080 -i -t stacoan
```

Drag and drop your application via: http://127.0.0.1:8000.

## Contributing
This entire program's value is depending on the wordlists it is using. In the end, the final result is what matters. It is easy to build a wordlist (in comparison to writing actual code), but it has the biggest impact on the end result. You can help the community the most with making wordlists.

Expand All @@ -191,7 +223,9 @@ If the contribution is high enough, you will be mentioned in the `authors` secti
### Roadmap
- [ ] Make IPA files also work with this program
- [ ] Make DB matches loot-able
- [x] Use server to upload files (apk's, ipa's) and process them (https://gist.github.com/touilleMan/eb02ea40b93e52604938)
- [x] Better logging (cross platform)
- [x] Docker optimalisation
- [x] Use server to upload files (apk's, ipa's) and process them
- [x] Exception list for ignoring findings in certain folders. For example ignoring `http` in `res/layout` and in general `http://schemas.android.com/apk/res/android`
- [x] Make a cleaner file structure of this project

Expand Down Expand Up @@ -255,4 +289,4 @@ Also have a look at his course ["Advanced Android and iOS Hands-on Exploitation"
* [c4b3rw0lf](https://twitter.com/c4b3rw0lf): The awesome dude behind the [VulnOS series](https://www.vulnhub.com/series/vulnos,36/).
* [MacJu89](https://twitter.com/MacJu89): infra & XSS senpai

Many more should be listed here, but this readme file would be TL;DR which is the worst what can happen to a readme file.
Many more should be listed here, but I can't list them all.
20 changes: 6 additions & 14 deletions src/config/db_search_words.txt
Original file line number Diff line number Diff line change
@@ -1,18 +1,10 @@
password|||10|||triggers unwanted classes like password reset, hence the low score
privatekey|||80
private_key|||80
apikey|||75
http:|||10
https:|||7
database_secret|||80
database_password|||80
databasepassword|||80
databasesecret|||80
(https|http):\/\/.*api.*|||60||| This regex matches any URL containing 'api'
(https|http):\/\/.*test.*|||60||| This regex matches any URL containing 'test'
(https|http):\/\/.*uat.*|||60||| This regex matches any URL containing 'uat'
passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support for pwd ? Handle camel case to detect vars that might hold interesting information at some point ? Is passw actually used in applications to justify the ? ?

(P|p)(wd|assw(d|ord))

Maybe the case is handled, didn't check in the code.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your comment. In a hour I will push a new commit to the development branch with also updated wordlists.

Maybe the case is handled, didn't check in the code. -> The code checks all regex's case insensitive.

(private|secret|api|aws)[_-]?key|||80
https?:|||7
(db|database)[_-]?(passw(d|ord)?|secret)|||80
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same note on pwd

(db|database)[_-]?(p(wd|assw(d|ord)?)|secret)

https?:\/\/.*(uat|test|api).*|||60||| This regex matches any URL containing 'api|uat|test'
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$|||40||| Matching IP adresses
^[a-f0-9]{32}$|||70||| MD5 hash
\b([a-f0-9]{40})\b|||70||| SHA1 hash
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$|||70||| base64 string
Authorization: Basic|||95||| Basic authentication
Authorization: Basic|||95||| Basic authentication
4 changes: 2 additions & 2 deletions src/config/exclusion_list.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
http:|||"res","layout"||| Suggested by Adi
(https|http):\/\/.*api.*|||"res","layout"||| Suggested by Adi
http:\/\/schemas\.android\.com\/apk\/res\/android||||||
https?:\/\/.*api.*|||"res","layout"||| Suggested by Adi
http:\/\/schemas\.android\.com\/apk\/res\/android||||||
20 changes: 6 additions & 14 deletions src/config/src_search_words.txt
Original file line number Diff line number Diff line change
@@ -1,20 +1,12 @@
password|||10|||triggers unwanted classes like password reset, hence the low score
privatekey|||80
private_key|||80
apikey|||75
http:|||10
https:|||7
database_secret|||80
database_password|||80
databasepassword|||80
databasesecret|||80
(https|http):\/\/.*api.*|||60||| This regex matches any URL containing 'api'
(https|http):\/\/.*test.*|||60||| This regex matches any URL containing 'test'
(https|http):\/\/.*uat.*|||60||| This regex matches any URL containing 'uat'
passw(d|ord)?|||10|||triggers unwanted classes like password reset, hence the low score
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pwd again ?

p(wd|assw(d|ord)?)

(private|secret|api|aws)[_-]?key|||80
https?:|||7
(db|database)[_-]?(passw(d|ord)?|secret)|||80
https?:\/\/.*(uat|test|api).*|||60||| This regex matches any URL containing 'api|uat|test'
^((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$|||40||| Matching IP adresses
^[a-f0-9]{32}$|||70||| MD5 hash
\b([a-f0-9]{40})\b|||70||| SHA1 hash
^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=|[A-Za-z0-9+/]{4})$|||70||| base64 string
Authorization: Basic|||95||| Basic authentication
SELECT \* FROM|||40||| Intersting SQL transaction
INSERT INTO .* VALUES|||40||| Intersting SQL transaction
INSERT INTO .* VALUES|||40||| Intersting SQL transaction
Copy link

@Ayowel Ayowel Mar 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution with spaces, might also want to handle newlines and tabs (and non-blocking spaces, and all the annoying stuff that can be put in there).
You might also want to ensure that case is not taken into account as sql does not consider it either.

INSERT +INTO +.* +VALUES

Maybe also try to detect cases where an sql query is performed by inserting vars directly in the string as it also provides an access point (there WILL be problems with false positives, newline/tab handling, ... there is only so much one can do wit a type 3 grammar and this was quickly thrown together and is by no mean a proper regex for this usage) :

(SELECT|UPDATE|CREATE|CALL|PREPARE) +.*['"] *\+

I don't see why their would be update or create in a client, but you never know.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!

So your suggestion is to use (SELECT|UPDATE|CREATE|CALL|PREPARE) +.*['"] *\+ ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will go for this one: https://regex101.com/r/cYSPbB/1

Many thanks for your idea, it drove me in the right direction!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit is pushed. I will discuss the "pwd" keyword at work, I'm also unsure if it would add any value.

Copy link

@Ayowel Ayowel Mar 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code checks all regex's case insensitive.

great

I will discuss the "pwd" keyword at work, I'm also unsure if it would add any value.

Not the kind of keyword that would produce a lot of noise, and pwd is not all that rare a var name to hold sensitive data.

Mind if I ask where/what is your work ?

I will go for this one: https://regex101.com/r/cYSPbB/1

Not sure which field you're talking about. Is it this one ?

(SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+)| (UPDATE\s[\w]+\sSET\s[\w\,\'\=]+)| (INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+)| (DELETE\sFROM\s[\d\w\'\=]+)

In this case, you might want to put them all on there own lines, chaining them like that makes it hard to read and doesn't make it faster (and that will avoid the random spaces after the '|' 😏 )

SELECT\s[\w\*\)\(\,\s]+\sFROM\s[\w]+
UPDATE\s[\w]+\sSET\s[\w\,\'\=]+
INSERT\sINTO\s[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\sFROM\s[\d\w\'\=]+

And as I said, there might be more than one space (typo, newline and tabs for readability, ...), so you actually want to have all your \s with a + :

SELECT\s+[\w\*\)\(\,\s]+\s+FROM\s+[\w]+
UPDATE\s+[\w]+\s+SET\s[\w\,\'\=]+
INSERT\s+INTO\s+[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\s+FROM\s+[\d\w\'\=]+

Plus, it is not uncommon to name tables and columns with _, which is not supported by your current regexes. Overall, it is possible to use pretty much any character in a table name (which does not mean it is recommended, but that's an other point), so using .* or .+ instead of a complex rule would probably help (and considering that newlines aren't usually handled with a ., you might want to write (.|\n)+) :

SELECT\s+(.|\n)+\s+FROM\s+[\w]+
UPDATE\s+.+\s+SET\s[\w\,\'\=]+
INSERT\s+INTO\s+[\d\w]+[\s\w\d\)\(\,]*\sVALUES\s\([\d\w\'\,\)]+
DELETE\s+FROM\s+[\d\w\'\=]+
  • Note: [.] matches . and not a, no need to write [\.]

Overall, there is no need to be too selective on those regexes, the simple fact that the requests are made from a client is already a proof that there is a problem ; your goal with those is only to know :

  1. Which requests can be done:
    • If I know that I can make a SELECT, I can perform extraction
    • If I know that I can make an INSERT, I can corrupt the DB
    • If I know that I can make a UPDATE/DELETE/DROP, I hope that you've saved your DB recently
    • and so on...
  2. If an SQL injection is possible in a field as a regular user

So what you want to know is just where those are and their content. I'd personally go for:

SELECT\s(.|\s)+\sFROM\s.+
UPDATE\s+(.|\s)+\s+SET\s.+
INSERT\s+INTO\s+(.|\s)*\sVALUES\s.+
DELETE\s+FROM\s.+

To me, false negatives are much worse than false positive in this situation. Depends on what you want, but knowing that requests are performed from the client's side means that I have a (maybe limited) direct access to the database at some point. It's huge.

Handling multiple lines is going to make the pattern matching very slow, so you might want to group all those in a single item, so as to lower the number of tests, but that will increase the number of false positives so check that regexes attempt to match the shortest pattern first or it's going to be a mess)

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind if I ask where/what is your work ?

I currently work for Zionsecurity. My professional profile can be found here: https://vincentcox.com/linkedin.

In this case, you might want to put them all on there own lines, chaining them like that makes it hard to read and doesn't make it faster (and that will avoid the random spaces after the '|' 😏 )

I'm affraid that the parser reads line by line, causing this to break. I do agree that readability is bad of the current searchregex.

So what you want to know is just where those are and their content. I'd personally go for:
...

Will add it to the roadmap so I don't forget:

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)?

That would be the final regex, am I correct?

To me, false negatives are much worse than false positive in this situation. Depends on what you want, but knowing that requests are performed from the client's side means that I have a (maybe limited) direct access to the database at some point. It's huge.

You are 100% right, I do agree with you!

Thanks for taking (a lot) of time for writing your response and motivating your suggestions. I will add you to the top contributors, I really appreciate it!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm affraid that the parser reads line by line, causing this to break. I do agree that readability is bad of the current searchregex.

Well, I was actually talking about turning the regex into 4 independant regexes, so that should be fine.

Soooo... I hadn't actually tested the regexes and had written on the fly as they more intended as examples than actual implementations.

This (SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+)\s(FROM|SET|VALUES)\s.+)? is probably not a good idea (first of all because it doesn't parse, but also because it will produce way too many false positives on render, making the field effectively useless).

A working variant could be good if it was to run on strings extracted from the source code, but running it on the code itself is probably going to produce so many false positives each time that no one is going to pay attention to the field's results (so maybe in a later version, but right now I don't think such a feature is supported). You'd need to test it though, as it's mostly assumptions based on how I think the regex is going to behave. You could have a good surprise. Working version :

(SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+\s((FROM|SET|VALUES)\s)?

Added starting delimiters detection (to run against raw code and avoid detection of single vars whose name ends with a regex' start). This might actually provoke some false negatives, but I think most cases are handled here:

[;'\"\s()](SELECT|UPDATE|((INSERT|DELETE)\s+(INTO|FROM)))\s(.|\s)+\s((FROM|SET|VALUES)\s)?
  • ; If SQL requests are chained in a single string
  • ' & " string start & end detection
  • \s spaces & friends
  • () might occur in case of nested SQL Query

Or, for more specifics, you could split them up in 4 regexes (but as said in my previous post, you usually want to avoid multiple-lines matches, they make the overall pattern matching much slower):

[;'\"\s()]SELECT\s(.|\s)+\sFROM\s
[;'\"\s()]UPDATE\s(.|\s)+\sSET\s
[;'\"\s()]INSERT\s+INTO\s(.|\s)+\sVALUES\s
[;'\"\s()]INSERT\s+INTO\s(.|\s)+\sSET\s
[;'\"\s()]DELETE\s+FROM\s(.|\s)+

And yes, that's pretty much the first suggestion I made. If you want to display the matches until the end of the line, you could add a .*, but that's just a cosmetic change (try to display the whole matched pattern).


Quick explanation on the + positions:

you could place them after pretty much everything ([;'\"\s()], \s and (.|\s)), but you usually want to avoid any ambiguity on what matches what (because any ambiguity might cost you extra steps to solve them depending on your implementations), so there is no reason to put a + behind a \s when the following/previous match also has a + and uses the same char ((.|\s)+).
With [;'\"\s()], it's just that we don't really care what is matched before one character, so no need to generate a result bigger tan necessary.


If the website you linked matches the same way stacoan does, you might want to look if there is a way to ask for the shortest match instead of the longest (it's different algorithms, I don't know if both are implemented) and give a way to ask to use one or the other. Here, it would really change a lot on the output and be much more interesting in many cases.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I optimized your regex here (I replaced \s with [ \t] because new-lines were included with \s making everything match. But after that replacement it works perfectly:
https://regex101.com/r/xOWvH4/1

If you give your go, I will add it to the list 👍

About the seperate queries:
I tried to do the same: https://regex101.com/r/vZSRys/1, but it doesn't seem to work.

I really appreciate all the effort on explaining how you came to these regex's, therefore I added you in the top contributor's list: 19f956f

The develop will be merged to master this weekend if all go well.

Everyday I am amazed how great the INFOSEC community is 🎉 .