Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Idea] Import arbitrary data into Timesketch #967

Closed
jaegeral opened this issue Aug 21, 2019 · 15 comments
Closed

[Idea] Import arbitrary data into Timesketch #967

jaegeral opened this issue Aug 21, 2019 · 15 comments
Assignees

Comments

@jaegeral
Copy link
Collaborator

Maybe it would be a cool thing to add the ability to import pcaps files directly.

This is the outcome of a brainstorming session I had with some friends, not sure how useful it is and I will try to do some research how that could be accomplished.

@jaegeral
Copy link
Collaborator Author

@kiddinn
Copy link
Contributor

kiddinn commented Oct 3, 2019

this would also be parsing the data, so is this not better to be done in plaso or some other tool and then exported directly into timesketch?

@jaegeral
Copy link
Collaborator Author

jaegeral commented Oct 7, 2019

hm that is actually a great idea.

It might lead to a more generic question, if we want the ability to directly upload data to timesketch or always go plaso --> timesketch.

@joachimmetz
Copy link
Member

This is the outcome of a brainstorming session I had with some friends, not sure how useful it is and I will try to do some research how that could be accomplished.

Plaso had a pcap parser before, it was not maintained and suffered from high memory usage, hence we removed it.

@joachimmetz
Copy link
Member

The question is what information do you want to extract from the PCAP? And will this be the same every time?

@kiddinn
Copy link
Contributor

kiddinn commented Oct 22, 2019

and also whether you are intending this to be a "simple parser", that just does one-line per network packet, or if you are doing stream assembly and just doing a single line per "session", or per TCP stream (in the case of TCP)... and then parsing the content to add to the packet.

There is also the option of using turbinia and/or dftimewolf that can run some parser and then automatically upload the data to timesketch

@jaegeral
Copy link
Collaborator Author

yeah first idea was packet per packet, of course you can take every idea to the next level, but as you stated, other options exist / would make more sense, then I would cancel the Issue or rephrase it, if it makes sense to have a documentation section with "how to get data xyz into timesketch" where xyz is non native import supported by timesketch.

Thoughts?

@kiddinn
Copy link
Contributor

kiddinn commented Oct 23, 2019

I'm wondering whether it makes sense to do something simple as:

  • Add some documentation about how to "manually" import data into TS
  • Add a helper function into the client API to import "non-standard" data... this would be for instance to take a DataFrame and upload that to TS
  • Add a "magic" that can be added to jupyter/colab notebooks,.

Regarding the second point, it is very easy to add a function that takes a data frame and uploads that to TS. We could write some documentation and a demo notebook to demonstrate how to use that helper function, or how to get your data into a data frame... since I'm not sure how familiar peopler are with that data structure. There are plenty of "native" methods in dataframes, reading from SQL databases, reading JSON data, Excel, etc, etc... and then other simple manual methods as well.

WDYT about that? We could even demonstrate how to easily parse network packet data and convert that into a DataFrame as an example of how to do this.

@kiddinn
Copy link
Contributor

kiddinn commented Oct 23, 2019

What this will do is that instead of implementing a parser in timesketch, which I don't really want to do, since timesketch is not about parsing, we simply add a better importer of data, to make it easier to import data... and then you can rely on all the other parsers out there and write a small, simple script to utilize that parser and dump data into TS

@kiddinn
Copy link
Contributor

kiddinn commented Oct 24, 2019

See #1004 for at least the initial version of the importer...

This could be used like so:

my_sketch.upload_data_frame(data, 'pcap_test', '{src_ip:s}:{src_port:d}->{dst_ip:s}:{dst_port:d} = {url:s}')

For a data frame, but if you don't have that you can do something like:

...
from scapy import all as scapy_all
...

packets = scapy_all.rdpcap(fh)

with client.UploadStreamer() as streamer:
    streamer.set_sketch(my_sketch)
    streamer.set_timestamp_description('Network Log')
    streamer.set_timeline_name('pcap_test_log')
    streamer.set_message_format_string(
        '{src_ip:s}:{src_port:d}->{dst_ip:s}:{dst_port:d} = {url:s}')

    for packet in packets:
        # do something here
        ...
        timestamp = datetime.datetime.utcfromtimestamp(packet.time)
        for k, v in iter(data.fields.items()):
            for url in URL_RE.findall(str(v)):
                url = url.strip()
                streamer.add_dict({
                    'time': timestamp,
                    'src_ip': packet.getlayer('IP').src,
                    'dst_ip': packet.getlayer('IP').dst,
                    'src_port': layer.sport,
                    'dst_port': layer.dport,
                    'url': url})

And this will add the PCAP file content into Timesketch

@kiddinn kiddinn changed the title [Idea] Import for pcap [Idea] Import arbitrary data into Timesketch Oct 24, 2019
@kiddinn
Copy link
Contributor

kiddinn commented Oct 24, 2019

@deralexxx what do you think about this approach?

@jaegeral
Copy link
Collaborator Author

Wow, this is both general enough to catch a lot of different cases and simple enough to serve Startes, I like it a lot and this would also reduce the need to introduce importers for $format.

The other thing is during the issue that spending some more love in documentation would also facilitate awareness that e.g. plaso should be the go to tool if you look for a specific importer and you are not eager to develop something (a thing that I was missing as plaso was not part of my workflow so far). Like you do not re-implement functions in awk that are already in grep if you can simply pipe them together if that makes sense. And I am happy to think about ways to do that, e.g. writing some sentences to the "import data" portion of timesketch documentation.

@kiddinn
Copy link
Contributor

kiddinn commented Oct 24, 2019

yes, documentation is lacking ;)

and yes, we would love someone to fix that for us ;)

I will add some additional documentation alongside #1004 to document the upload streamer, at least some basic documentation there. But yes, we need more documentation for sure.

@jaegeral
Copy link
Collaborator Author

yes, documentation is lacking ;)

and yes, we would love someone to fix that for us ;)

on it

@kiddinn
Copy link
Contributor

kiddinn commented Oct 29, 2019

this is now submitted in, and already ready... see documentation here: https://github.com/google/timesketch/blob/master/docs/UploadDataViaAPI.md

@kiddinn kiddinn closed this as completed Oct 29, 2019
@kiddinn kiddinn self-assigned this Oct 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants