-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrupt dump files when interrupted during writing #870
Comments
@wagner-certat validate the following please: Is this situation happens in Parsers, Experts and Outputs? I think it only happens with Collectors which I think is "ok", although if there is a possible fix, lets fix it. :) |
There's no difference in dump files handling for the bot types. This is the case for every kind of bot. |
Correct me if I'm wrong but if a KeyboardInterrupt happens, the message will still be on queue (check this line and the lines before). This will not happen only on Collectors, right? |
Yes, but the dump file will still be corrupted. |
Ok, so dump file will be always corrupted on the scenario that you present and data loss will just happens on Collectors which is something that we are already aware and we assume that. Cool. ;) Thank you for raising this. :) |
Data loss happens concurrent with the corrupt file. When the write operation is interrupted, the data not written is lost. |
Ok, after talked on IRC with @wagner-certat I understood that the problem here is the fact that all dumped data is loaded when bot needs to dump new bad events which means that every write operation is a full overwrite of dump file, therefore, if KeyboardInterrupt happens during a full write operation, the dump file will only have part of the information. Thank you @wagner-certat once more. |
2.1.2 ### Core - `__init__`: Resolve absolute path for `STATE_FILE_PATH` variable (resolves `..`). - `intelmq.lib.utils`: - log: Do not raise an exception if logging to neither file nor syslog is requested. - logging StreamHandler: Colorize all warning and error messages red. - logging FileHandler: Strip all shell colorizations from the messages (#1436). - `intelmq.lib.message`: - `Message.to_json`: Set `sort_keys=True` to get reproducible results. - `drop_privileges`: Handle situations where the user or group `intelmq` does not exist. - `intelmq.lib.pipeline`: - `Amqp._send` and `Amqp._acknowledge`: Log traceback in debug mode in case of errors and necessary re-connections. - `Amqp._acknowledge`: Reset delivery tag if acknowledge was successful. ### Bots #### Collectors - `intelmq.bots.collectors.misp.collector`: - Add compatibility with current pymisp versions and versions released after January 2020 (PR #1468). #### Parsers - `intelmq.bots.parsers.shadowserver.config`: Add some missing fields for the feed `accessible-rdp` (#1463). - `intelmq.bots.parsers.shadowserver.parser`: - Feed-detection based on file names: The prefixed date is optional now. - Feed-detection based on file names: Re-detect feed for every report received (#1493). #### Experts - `intelmq.bots.experts.national_cert_contact_certat`: Handle empty responses by server (#1467). - `intelmq.bots.experts.maxmind_geoip`: The script `update-geoip-data` now requires a license key as second parameter because of upstream changes (#1484)). #### Outputs - `intelmq.bots.outputs.restapi.output`: Fix logging of response body if response status code was not ok. ### Documentation - Remove some hardcoded `/opt/intelmq/` paths from code comments and program outputs. ### Packaging - debian/rules: Only replace `/opt/intelmq/` with LSB-paths in some certain files, not the whole tree, avoiding wrong replacements. - debian/rules and debian/intelmq.install: Do install the examples configuration directly instead of working around the abandoned examples directory. ### Tests - `lib/test_utils`: Skip some tests on Python 3.4 because `contextlib.redirect_stdout` and `contextlib.redirect_sterr` are not supported on this version. - Travis: Stop running tests with all optional dependencies on Python 3.4, as more and more libraries are dropping support for it. Tests on the core and code without non-optional requirements are not affected. - `tests.bots.parsers.html_table`: Make tests independent of current year. ### Tools - `intelmqctl upgrade-config`: Fix missing substitution in error message "State file %r is not writable.". ### Known issues - bots trapped in endless loop if decoding of raw message fails (#1494) - intelmqctl status of processes: need to check bot id too (#1492) - MongoDB authentication: compatibility on different MongoDB and pymongo versions (#1439) - ctl: shell colorizations are logged (#1436) - http stream collector: retry on regular connection problems? (#1435) - tests: capture logging with context manager (#1342) - Bots started with IntelMQ-Manager stop when the webserver is restarted. (#952) - n6 parser: mapping is modified within each run (#905) - reverse DNS: Only first record is used (#877) - Corrupt dump files when interrupted during writing (#870) 2.1.2 ### Core - `__init__`: Resolve absolute path for `STATE_FILE_PATH` variable (resolves `..`). - `intelmq.lib.utils`: - log: Do not raise an exception if logging to neither file nor syslog is requested. - logging StreamHandler: Colorize all warning and error messages red. - logging FileHandler: Strip all shell colorizations from the messages (#1436). - `intelmq.lib.message`: - `Message.to_json`: Set `sort_keys=True` to get reproducible results. - `drop_privileges`: Handle situations where the user or group `intelmq` does not exist. - `intelmq.lib.pipeline`: - `Amqp._send` and `Amqp._acknowledge`: Log traceback in debug mode in case of errors and necessary re-connections. - `Amqp._acknowledge`: Reset delivery tag if acknowledge was successful. ### Bots #### Collectors - `intelmq.bots.collectors.misp.collector`: - Add compatibility with current pymisp versions and versions released after January 2020 (PR #1468). #### Parsers - `intelmq.bots.parsers.shadowserver.config`: Add some missing fields for the feed `accessible-rdp` (#1463). - `intelmq.bots.parsers.shadowserver.parser`: - Feed-detection based on file names: The prefixed date is optional now. - Feed-detection based on file names: Re-detect feed for every report received (#1493). #### Experts - `intelmq.bots.experts.national_cert_contact_certat`: Handle empty responses by server (#1467). - `intelmq.bots.experts.maxmind_geoip`: The script `update-geoip-data` now requires a license key as second parameter because of upstream changes (#1484)). #### Outputs - `intelmq.bots.outputs.restapi.output`: Fix logging of response body if response status code was not ok. ### Documentation - Remove some hardcoded `/opt/intelmq/` paths from code comments and program outputs. ### Packaging - debian/rules: Only replace `/opt/intelmq/` with LSB-paths in some certain files, not the whole tree, avoiding wrong replacements. - debian/rules and debian/intelmq.install: Do install the examples configuration directly instead of working around the abandoned examples directory. ### Tests - `lib/test_utils`: Skip some tests on Python 3.4 because `contextlib.redirect_stdout` and `contextlib.redirect_sterr` are not supported on this version. - Travis: Stop running tests with all optional dependencies on Python 3.4, as more and more libraries are dropping support for it. Tests on the core and code without non-optional requirements are not affected. - `tests.bots.parsers.html_table`: Make tests independent of current year. ### Tools - `intelmqctl upgrade-config`: Fix missing substitution in error message "State file %r is not writable.". ### Known issues - bots trapped in endless loop if decoding of raw message fails (#1494) - intelmqctl status of processes: need to check bot id too (#1492) - MongoDB authentication: compatibility on different MongoDB and pymongo versions (#1439) - ctl: shell colorizations are logged (#1436) - http stream collector: retry on regular connection problems? (#1435) - tests: capture logging with context manager (#1342) - Bots started with IntelMQ-Manager stop when the webserver is restarted. (#952) - n6 parser: mapping is modified within each run (#905) - reverse DNS: Only first record is used (#877) - Corrupt dump files when interrupted during writing (#870)
2.2.0 Feature release Dropped support for Python 3.4. ### Core - `__init__`: Changes to the path-handling, see [User Guide, section _/opt and LSB paths_](docs/User-Guide.md#opt-and-lsb-paths) for more information - The environment variable `INTELMQ_ROOT_DIR` can be used to set custom root directories instead of `/opt/intelmq/` (certtools#805) in case of non LSB-path installations. - The environment variable `ROOT_DIR` can be used to set custom root directories instead of `/` (certtools#805) in case of LSB-path installations. - `intelmq.lib.exceptions`: Added `MissingDependencyError` for show error messages about a missing library and how to install it (certtools#1471). - Added optional parameter `installed` to show the installed version. - Added optional parameter `additional_text` to show arbitrary text. - Adding more type annotations for core libraries. - `intelmq.lib.pipeline.Pythonlist.sleep`: Drop deprecated method. - `intelmq.lib.utils`: `write_configuration`: Append a newline at end of configuration/file to allow proper comparisons & diffs. - `intelmq.lib.test`: `BotTestCase` drops privileges upon initialization (certtools#1489). - `intelmq.lib.bot`: - New class `OutputBot`: - Method `export_event` to format/export events according to the parameters given by the user. - `ParserBot`: New methods `parse_json_stream` and `recover_line_json_stream`. - `ParserBot.recover_line_json`: Fix format by adding a list around the line data. - `Bot.send_message`: In debugging log level, the path to which the message is sent is now logged too. ### Bots - Bots with dependencies: Use of `intelmq.lib.exceptions.MissingDependencyError`. #### Collectors - `intelmq.bots.collectors.misp.collector`: Deprecate parameter `misp_verify` in favor of generic parameter `http_verify_cert`. - `intelmq.bots.collectors.tcp.collector`: Drop compatibility with Python 3.4. - `intelmq.bots.collectors.stomp.collector`: - Check the stomp.py version and show an error message if it does not match. - For stomp.py versions `>= 5.0.0` redirect the `stomp.PrintingListener` output to debug logging. - `intelmq.bots.collectors.microsoft.collector_azure`: Support current Python library `azure-storage-blob>= 12.0.0`, configuration is incompatible and needs manual change. See NEWS file and bot's documentation for more details. - `intelmq.bots.collectors.amqp.collector_amqp`: Require `pika` minimum version 1.0. - `intelmq.bots.collectors.github_api.collector_github_contents_api`: Added (PR#1481). #### Parsers - `intelmq.bots.parsers.autoshun.parser`: Drop compatibility with Python 3.4. - `intelmq.bots.parsers.html_table.parser`: Drop compatibility with Python 3.4. - `intelmq.bots.parsers.shadowserver.parser`: Add support for MQTT and Open-IPP feeds (PR#1512, PR#1544). - `intelmq.bots.parsers.taichung.parser`: - Migrate to `ParserBot`. - Also parse geolocation information if available. - `intelmq.bots.parsers.cymru.parser_full_bogons`: - Migrate to `ParserBot`. - Add last updated information in raw. - `intelmq.bots.parsers.anubisnetworks.parser`: Add new parameter `use_malware_familiy_as_classification_identifier`. - `intelmq.bots.parsers.microsoft.parser_ctip`: Compatibility for new CTIP data format used provided by the Azure interface. - `intelmq.bots.parsers.cymru.parser_cap_program`: Support for `openresolver` type. - `intelmq.bots.parsers.github_feed.parser`: Added (PR#1481). - `intelmq.bots.parsers.urlvir.parser`: Removed, as the feed is discontinued (certtools#1537). #### Experts - `intelmq.bots.experts.csv_converter`: Added as converter to CSV. - `intelmq.bots.experts.misp`: Added (PR#1475). - `intelmq.bots.experts.modify`: New parameter `maximum_matches`. #### Outputs - `intelmq.bots.outputs.amqptopic`: - Use `OutputBot` and `export_event`. - Allow formatting the routing key with event data by the new parameter `format_routing_key` (boolean). - `intelmq.bots.outputs.file`: Use `OutputBot` and `export_event`. - `intelmq.bots.outputs.files`: Use `OutputBot` and `export_event`. - `intelmq.bots.outputs.misp.output_feed`: Added, creates a MISP Feed (PR#1473). - `intelmq.bots.outputs.misp.output_api`: Added, pushes to MISP via the API (PR#1506, PR#1536). - `intelmq.bots.outputs.elasticsearch.output`: Dropped ElasticSearch version 5 compatibility, added version 7 compatibility (certtools#1513). ### Documentation - Document usage of the `INTELMQ_ROOT_DIR` environment variable. - Added document on MISP integration possibilities. - Feeds: - Added "Full Bogons IPv6" feed. - Remove discontinued URLVir Feeds (certtools#1537). ### Packaging - `setup.py` do not try to install any data to `/opt/intelmq/` as the behavior is inconsistent on various systems and with `intelmqsetup` we have a tool to create the structure and files anyway. - `debian/rules`: - Provide a blank state file in the package. - Patches: - Updated `fix-intelmq-paths.patch`. ### Tests - Travis: Use `intelmqsetup` here too. - Install required build dependencies for the Debian package build test. - This version is no longer automatically tested on Python `<` 3.5. - Also run the tests on Python 3.8. - Run the Debian packaging tests on Python 3.5 and the code-style test on 3.8. - Added tests for the new bot `intelmq.bots.outputs.misp.output_feed` (certtools#1473). - Added tests for the new bot `intelmq.bots.experts.misp.expert` (certtools#1473). - Added tests for `intelmq.lib.exceptions`. - Added tests for `intelmq.lib.bot.OutputBot` and `intelmq.lib.bot.OutputBot.export_event`. - Added IPv6 tests for `intelmq.bots.parsers.cymru.parser_full_bogons`. - Added tests for `intelmq.lib.bot.ParserBot`'s new methods `parse_json_stream` and `recover_line_json_stream`. - `intelmq.tests.test_conf`: Set encoding to UTF-8 for reading the `feeds.yaml` file. ### Tools - `intelmqctl`: - `upgrade-config`: - Allow setting the state file location with the `--state-file` parameter. - Do not require a second run anymore, if the state file is newly created (certtools#1491). - New parameter `no_backup`/`--no-backup` to skip creation of `.bak` files for state and configuration files. - Only require `psutil` for the `IntelMQProcessManager`, not for process manager independent calls like `upgrade-config` or `check`. - Add new command `debug` to output some information for debugging. Currently implemented: - paths - environment variables - `IntelMQController`: New argument `--no-file-logging` to disable logging to file. - If dropping privileges does not work, `intelmqctl` will now abort (certtools#1489). - `intelmqsetup`: - Add argument parsing and an option to skip setting file ownership, possibly not requiring root permissions. - Call `intelmqctl upgrade-config` and add argument for the state file path (certtools#1491). - `intelmq_generate_misp_objects_templates.py`: Tool to create a MISP object template (certtools#1470). - `intelmqdump`: New parameter `-t` or `--truncate` to optionally give the maximum length of `raw` data to show, 0 for no truncating. ### Contrib - Added `development-tools`. - ElasticSearch: Dropped version 5 compatibility, added version 7 compatibility (certtools#1513). - Malware Name Mapping Downloader: - New parameter `--mwnmp-ignore-adware`. - The parameter `--add-default` supports an optional parameter to define the default value. ### Known issues - Bots started with IntelMQ-Manager stop when the webserver is restarted. (certtools#952). - Corrupt dump files when interrupted during writing (certtools#870).
2.2.0 Feature release Dropped support for Python 3.4. ### Core - `__init__`: Changes to the path-handling, see [User Guide, section _/opt and LSB paths_](docs/User-Guide.md#opt-and-lsb-paths) for more information - The environment variable `INTELMQ_ROOT_DIR` can be used to set custom root directories instead of `/opt/intelmq/` (certtools#805) in case of non LSB-path installations. - The environment variable `ROOT_DIR` can be used to set custom root directories instead of `/` (certtools#805) in case of LSB-path installations. - `intelmq.lib.exceptions`: Added `MissingDependencyError` for show error messages about a missing library and how to install it (certtools#1471). - Added optional parameter `installed` to show the installed version. - Added optional parameter `additional_text` to show arbitrary text. - Adding more type annotations for core libraries. - `intelmq.lib.pipeline.Pythonlist.sleep`: Drop deprecated method. - `intelmq.lib.utils`: `write_configuration`: Append a newline at end of configuration/file to allow proper comparisons & diffs. - `intelmq.lib.test`: `BotTestCase` drops privileges upon initialization (certtools#1489). - `intelmq.lib.bot`: - New class `OutputBot`: - Method `export_event` to format/export events according to the parameters given by the user. - `ParserBot`: New methods `parse_json_stream` and `recover_line_json_stream`. - `ParserBot.recover_line_json`: Fix format by adding a list around the line data. - `Bot.send_message`: In debugging log level, the path to which the message is sent is now logged too. ### Bots - Bots with dependencies: Use of `intelmq.lib.exceptions.MissingDependencyError`. #### Collectors - `intelmq.bots.collectors.misp.collector`: Deprecate parameter `misp_verify` in favor of generic parameter `http_verify_cert`. - `intelmq.bots.collectors.tcp.collector`: Drop compatibility with Python 3.4. - `intelmq.bots.collectors.stomp.collector`: - Check the stomp.py version and show an error message if it does not match. - For stomp.py versions `>= 5.0.0` redirect the `stomp.PrintingListener` output to debug logging. - `intelmq.bots.collectors.microsoft.collector_azure`: Support current Python library `azure-storage-blob>= 12.0.0`, configuration is incompatible and needs manual change. See NEWS file and bot's documentation for more details. - `intelmq.bots.collectors.amqp.collector_amqp`: Require `pika` minimum version 1.0. - `intelmq.bots.collectors.github_api.collector_github_contents_api`: Added (PR#1481). #### Parsers - `intelmq.bots.parsers.autoshun.parser`: Drop compatibility with Python 3.4. - `intelmq.bots.parsers.html_table.parser`: Drop compatibility with Python 3.4. - `intelmq.bots.parsers.shadowserver.parser`: Add support for MQTT and Open-IPP feeds (PR#1512, PR#1544). - `intelmq.bots.parsers.taichung.parser`: - Migrate to `ParserBot`. - Also parse geolocation information if available. - `intelmq.bots.parsers.cymru.parser_full_bogons`: - Migrate to `ParserBot`. - Add last updated information in raw. - `intelmq.bots.parsers.anubisnetworks.parser`: Add new parameter `use_malware_familiy_as_classification_identifier`. - `intelmq.bots.parsers.microsoft.parser_ctip`: Compatibility for new CTIP data format used provided by the Azure interface. - `intelmq.bots.parsers.cymru.parser_cap_program`: Support for `openresolver` type. - `intelmq.bots.parsers.github_feed.parser`: Added (PR#1481). - `intelmq.bots.parsers.urlvir.parser`: Removed, as the feed is discontinued (certtools#1537). #### Experts - `intelmq.bots.experts.csv_converter`: Added as converter to CSV. - `intelmq.bots.experts.misp`: Added (PR#1475). - `intelmq.bots.experts.modify`: New parameter `maximum_matches`. #### Outputs - `intelmq.bots.outputs.amqptopic`: - Use `OutputBot` and `export_event`. - Allow formatting the routing key with event data by the new parameter `format_routing_key` (boolean). - `intelmq.bots.outputs.file`: Use `OutputBot` and `export_event`. - `intelmq.bots.outputs.files`: Use `OutputBot` and `export_event`. - `intelmq.bots.outputs.misp.output_feed`: Added, creates a MISP Feed (PR#1473). - `intelmq.bots.outputs.misp.output_api`: Added, pushes to MISP via the API (PR#1506, PR#1536). - `intelmq.bots.outputs.elasticsearch.output`: Dropped ElasticSearch version 5 compatibility, added version 7 compatibility (certtools#1513). ### Documentation - Document usage of the `INTELMQ_ROOT_DIR` environment variable. - Added document on MISP integration possibilities. - Feeds: - Added "Full Bogons IPv6" feed. - Remove discontinued URLVir Feeds (certtools#1537). ### Packaging - `setup.py` do not try to install any data to `/opt/intelmq/` as the behavior is inconsistent on various systems and with `intelmqsetup` we have a tool to create the structure and files anyway. - `debian/rules`: - Provide a blank state file in the package. - Patches: - Updated `fix-intelmq-paths.patch`. ### Tests - Travis: Use `intelmqsetup` here too. - Install required build dependencies for the Debian package build test. - This version is no longer automatically tested on Python `<` 3.5. - Also run the tests on Python 3.8. - Run the Debian packaging tests on Python 3.5 and the code-style test on 3.8. - Added tests for the new bot `intelmq.bots.outputs.misp.output_feed` (certtools#1473). - Added tests for the new bot `intelmq.bots.experts.misp.expert` (certtools#1473). - Added tests for `intelmq.lib.exceptions`. - Added tests for `intelmq.lib.bot.OutputBot` and `intelmq.lib.bot.OutputBot.export_event`. - Added IPv6 tests for `intelmq.bots.parsers.cymru.parser_full_bogons`. - Added tests for `intelmq.lib.bot.ParserBot`'s new methods `parse_json_stream` and `recover_line_json_stream`. - `intelmq.tests.test_conf`: Set encoding to UTF-8 for reading the `feeds.yaml` file. ### Tools - `intelmqctl`: - `upgrade-config`: - Allow setting the state file location with the `--state-file` parameter. - Do not require a second run anymore, if the state file is newly created (certtools#1491). - New parameter `no_backup`/`--no-backup` to skip creation of `.bak` files for state and configuration files. - Only require `psutil` for the `IntelMQProcessManager`, not for process manager independent calls like `upgrade-config` or `check`. - Add new command `debug` to output some information for debugging. Currently implemented: - paths - environment variables - `IntelMQController`: New argument `--no-file-logging` to disable logging to file. - If dropping privileges does not work, `intelmqctl` will now abort (certtools#1489). - `intelmqsetup`: - Add argument parsing and an option to skip setting file ownership, possibly not requiring root permissions. - Call `intelmqctl upgrade-config` and add argument for the state file path (certtools#1491). - `intelmq_generate_misp_objects_templates.py`: Tool to create a MISP object template (certtools#1470). - `intelmqdump`: New parameter `-t` or `--truncate` to optionally give the maximum length of `raw` data to show, 0 for no truncating. ### Contrib - Added `development-tools`. - ElasticSearch: Dropped version 5 compatibility, added version 7 compatibility (certtools#1513). - Malware Name Mapping Downloader: - New parameter `--mwnmp-ignore-adware`. - The parameter `--add-default` supports an optional parameter to define the default value. ### Known issues - Bots started with IntelMQ-Manager stop when the webserver is restarted. (certtools#952). - Corrupt dump files when interrupted during writing (certtools#870).
When a message is dumped by a bot and during this time it receives a KeyboardInterrupt, the write operation will be interrupted causing a corrupt dump file and a data loss. The bigger the file, the higher the probability this will happen and the higher is the data loss.
The text was updated successfully, but these errors were encountered: