-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataHarmonization Document improvements #185
Comments
Thx nice catch. What you saw was an intermediate version/commit on github. Will update it. Mobile
|
True. Regarding reported keys, IMHO, its something to remove because in the new version (https://github.com/certtools/intelmq/tree/v1.0-beta) we have raw field where we will specify the raw event. I think its easy to agree with this one. Feedback? |
@SYNchroACK, still not read. Trying to focus on that one :( |
I need your help @aaronkaplan and @SYNchroACK: When working with the postgres (see my branch) I saw that the Data-Harmonization document uses underscore to subclass values. But the code uses dots. Replacing all underscores with docs does not make sense in all senses, e.g. with Additionally, some fields have other names in code and documentation, as with EDIT:
|
Mobile
Got it. Well the underscores come from abusehelper compatibility. I don't think we still need that anymore since we'll need a mapping anyway by now. So I would also be fine with camelCase or some other renaming. BUT! Let's be very careful about this! It means we will have to refactor everything and every bot @SYNchroACK: what do you say?
That seems like a bug then.
|
On 08/20/2015 12:44 PM, AaronK wrote:
|
The current fields are: time.source is correct. The idea is to have levels:
..etc... We should create a script to get the json from the file above and generate a harmonization.md document. |
On Aug 23, 2015, at 5:45 PM, Tomás Lima notifications@github.com wrote:
Tomas, did you see my other mail concerning the document? |
which email? subject and timestamp? |
On 08/23/2015 05:45 PM, Tomás Lima wrote:
|
Another issue that just came up: In geolocation sections we have BTW: I would like to rename |
@sebix , cymru_cc or geoip_cc doesnt exist. All current fields are here: I dont see the need to have multiple fields for that... in the pipeline we can put the bots cymru and maxmind in the way we want... cymru -> maxmind = means, if maxmind has a value, will overwrite cymru In my perspective, I think we should minimize the number of fields that does the same thing... but @aaronkaplan and @sebix , whats your feedback? |
On 08/24/2015 12:08 PM, Tomás Lima wrote:
|
On Mon, Aug 24, 2015 at 11:57 AM, Sebastian notifications@github.com
Some fields are deprecated, other just were updated...well... like I
True, but have multiple fields related to localization seems to be
No, Im just presenting the possibilities depending of your confidence in
No, Im not suggesting put them in 'description.' (old |
It's a tough decision. I'd say we keep geoip/maxmind_cc and cymru_cc. Why ? Because these have different values for many IPs. Cymru_cc essentially is the RIR data and ip to country code mapping. Maxmind is more geographic while the RIRs capture re organization owning the netblock an which country it is in. I am for leaving it and documenting when we recommend to use which one. Mobile
|
Sebix and me did that on FR. Please contact him for details. Mobile
|
Current proposal is to have only one Current proposal of fields is here: https://github.com/sebix/intelmq/blob/postgres/docs/Harmonization-fields.md |
@sebix form my point of view, I think its not a Abusix bot resposability to decide which "side".... the procedure that will send the emails should have the intelligence to understand the event and choose... so... its not a problem... and source.abuse_contact and destination.abuse_contact should continue.... Do you agree @sebix ? |
The idea by @aaronkaplan was to have only one abuse_contact, as never both are relevant. Based on the classification, source or destination contact can be chosen. Aaron, your comment please. |
On Aug 25, 2015, at 3:15 PM, Sebastian notifications@github.com wrote:
ACK
Best,
|
again, that intelligence should not be implemented in abusix and Im 100% this is the correct approach. Why? Imagine, tomorrow you will have a AbuseContactDB bot and a RIPE AbuseContactDB Bot, etc... and you will repeat in every bot the same intelligence. The bots should fill that keys (source.abuse_contact and destination.abuse_contact) with the abuse_contacts associated to the correspondent IPs. In the end of the pipeline, you will have a script or a bot, or whatever, that will be responsible to send the events depending of the classification. In that procedure (script, bot, platform...etc..) you will build that intel that will choose if you will use the source.abuse_contact or destination.abuse_contact. Advanges with this approach:
Disadvange:
If you still disagree, lets schedule a confcall to discuss. :) |
To document what @SYNchroACK and me just discussed, this issue was brought up by #298.
For the content my proposal is to use JSON. JSON is machine-readable, so the additional information can be easily extracted by existing parser. Even in postgres it is very easy to query that data, as postgres support JSON. |
@sebix agreed. 'extra' field should be add to harmonization.conf so, if some feed has the 'zzz' and 'yyy' fields, the intelmq should suppor the following harmonization:
|
I meant that |
Another open question affects For one IP or Host there are always multiple abuse contacts possible: domain (probably more then one), whois, AS. But abuse_contact is currently only defined for one email address. Should it be instead a list of addresses? Should they be grouped by responsiblity (e.g. domain, AS)? I think of some scenarios where it is appropriate to contact the AS/ISP (malware), but in case of a defacement or a vulnerable service, the domain/host owner should be (also) contacted. As with the source-destination discussion, should this be decided at the end of the pipeline? |
And next question: we have |
Note: 'account' field will be use to store email accounts (ex: email compromised) or website username (ex: github username compromised). So, should keep as: 'source.account' and 'destination.account'. Proposal:
Regarding your proposal, I think 'victim' will not fit the needs because you will have to put the intelligence on some 'middle bot' if the event is related to a victim or not... so, lets put in generic way and let that intel part to other bot... |
On Aug 26, 2015, at 1:34 PM, Sebastian notifications@github.com wrote:
Sure there is no reason why it can't be a comma separated list.
Nah... mail gets delivered :) The mail server decides the order any way.
Best, |
On Aug 25, 2015, at 4:40 PM, Tomás Lima notifications@github.com wrote:
Tomas I am not sure if I agree with source.abuse_contact and destination.abuse_contact. I went through all the cases manually with @sebix and it only makes sense to have "abuse_contact". My 2 cents,
|
well, I think that to follow your approach you need to give some solutions for the following problems:
IMHO, the solutions for these problems will not follow the KISS principle... |
On Aug 26, 2015, at 11:07 PM, Tomás Lima notifications@github.com wrote:
It is up to the individual implementor to chose the abuse contact lookup strategy. Did you take a look at ? I have the very strong feeling that some ideas are totally not aligned yet regarding the abuse contact lookups. Let's have a conf call please! okay? Thx,
I think we talk about different things...
|
I try to summarize the current discussions on the harmonization by grouping, citations and explanations from my side. Citations are not literally, I adapted most of them.
|
Two discussions are resolved so far:
We have no further comments on:
|
My proposal:
|
On Sep 2, 2015, at 12:39 PM, Tomás Lima notifications@github.com wrote:
Why? If the abuse contact lookup bot does not do what you want it to do, copy & paste & modify it :) So: IMHO one JSON field with a list (comma separated) of email addresses. My 2 cents...
|
My new proposal:
|
Include also:
|
I believe the user_agent topic came up again at a recent discussion in our jour fixe meeting. So, the whole discussion about this particular point (user_agent in extra.) seems to miss the reality. |
Hi,
the DataHarmonization document needs some improvements:
The text was updated successfully, but these errors were encountered: