Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove old-style config for exclude-entropy-patterns #282

Merged
merged 9 commits into from
Nov 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ Bug fixes:
* [#284](https://github.com/godaddy/tartufo/pull/284) - Fix handling of first
commit during local scans; an exception was raised instead of processing the
commit.

Misc:

* [#282](https://github.com/godaddy/tartufo/pull/282) - Remove old style config for `exclude-entropy-patterns`

Features:

Expand Down Expand Up @@ -51,7 +55,7 @@ Features:

Misc:

* [#255](https://github.com/godaddy/tartufo/issues/255) -- Removed deprecated flags
* [#255](https://github.com/godaddy/tartufo/issues/255) - Removed deprecated flags
--include-paths and --exclude-paths

v2.10.0 - 3 November 2021
Expand Down
40 changes: 13 additions & 27 deletions docs/source/features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -277,34 +277,12 @@ Entropy Limiting

Entropy scans can produce a high number of false positives such as git SHAs or md5
digests. To avoid these false positives, enable ``exclude-entropy-patterns``. Exclusions
apply to any strings flagged by entropy checks.
apply to any strings flagged by entropy checks. This option is not available on the command line,
and must be specified in your config file.

For example, if ``docs/README.md`` contains a git SHA, this would be flagged by entropy.
To exclude this, add ``docs/.*\.md$::^[a-zA-Z0-9]{40}$`` to ``exclude-entropy-patterns``.

.. code-block:: sh

> tartufo ... --exclude-entropy-patterns "docs/.*\.md$::^[a-zA-Z0-9]{40}$"

.. code-block:: toml

[tool.tartufo]
exclude-entropy-patterns = [
# format: "{file regex}::{entropy pattern}"
"docs/.*\.md$::^[a-zA-Z0-9]{40}$", # exclude all git SHAs in the docs directory
]

.. warning::
.. versionchanged:: 2.9.0
As of version 2.9.0, the above specification style has been deprecated, and
will be removed in version 3.0. The new style uses a TOML `array of tables`_
as shown below.

Note that this new syntax is not available on the command line, and must be
specified in your config file.

Here is an example of how you might exclude SHA hashes in your docs, as well as
hashes for GitHub Actions in your workflows:
For example, if ``docs/README.md`` contains a git SHA and ``.github/workflows/*.yml`` contains pinned git SHAs
this would be flagged by entropy.
To exclude these, add the following entries to ``exclude-entropy-patterns`` in the config file.

.. code-block:: toml

sushantmimani marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -313,6 +291,14 @@ hashes for GitHub Actions in your workflows:
{path-pattern = 'docs/.*\.md$', pattern = '^[a-zA-Z0-9]$', reason = 'exclude all git SHAs in the docs'},
{path-pattern = '\.github/workflows/.*\.yml', pattern = 'uses: .*@[a-zA-Z0-9]{40}', reason = 'GitHub Actions'}
sushantmimani marked this conversation as resolved.
Show resolved Hide resolved
]
.. note::
``match-type`` is used to select the ``search`` or ``match`` regex operation. ``search`` looks for the regex
anywhere in the selected scope, while ``match`` requires the regex to match at the beginning of the selected scope.
Defaults to ``search``

``scope`` is used to specify if you want to perform the regex operation (search or match) by ``word`` or ``line``.
``word`` means exactly the high-entropy string of characters, while ``line`` searches the entire input line
containing the high-entropy string. Defaults to ``line``

Thanks to the magic of TOML, you could also split these out into their own tables
in the config if you wanted. So the following would be 100% equivalent to what
Expand Down
9 changes: 5 additions & 4 deletions tartufo/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,10 +120,11 @@ def get_command(self, ctx: click.Context, cmd_name: str) -> Optional[click.Comma
"-xe",
"--exclude-entropy-patterns",
multiple=True,
help="""Specify a regular expression which matches entropy strings to
exclude from the scan. This option can be specified multiple times to
exclude multiple patterns. If not provided (default), no entropy strings
will be excluded ({path regex}::{pattern regex}).""",
hidden=True,
help="""Specify a regular expression which matches entropy strings to exclude from the scan. This option can be
specified multiple times to exclude multiple patterns. If not provided (default), no entropy strings will be
excluded. ({"path-pattern": {path regex}, "pattern": {pattern regex}, "match-type": "match"|"search",
sushantmimani marked this conversation as resolved.
Show resolved Hide resolved
"scope": "word"|"line"}).""",
)
@click.option(
"-e",
Expand Down
86 changes: 35 additions & 51 deletions tartufo/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
import pathlib
import re
import shutil
import warnings
from typing import (
Any,
Dict,
Expand All @@ -20,11 +19,12 @@
import tomlkit

from tartufo import types, util
from tartufo.types import ConfigException, Rule
from tartufo.types import ConfigException, Rule, MatchType, Scope

OptionTypes = Union[str, int, bool, None, TextIO, Tuple[TextIO, ...]]

DEFAULT_PATTERN_FILE = pathlib.Path(__file__).parent / "data" / "default_regexes.json"
EMPTY_PATTERN = re.compile("")


def load_config_from_path(
Expand Down Expand Up @@ -217,15 +217,19 @@ def load_rules_from_file(rules_file: TextIO) -> Dict[str, Rule]:
rule = Rule(
name=rule_name,
pattern=re.compile(rule_definition["pattern"]),
path_pattern=re.compile(path_pattern) if path_pattern else None,
re_match_type="match",
path_pattern=re.compile(path_pattern)
if path_pattern
else EMPTY_PATTERN,
re_match_type=MatchType.Match,
re_match_scope=None,
)
except AttributeError:
rule = Rule(
name=rule_name,
pattern=re.compile(rule_definition),
path_pattern=None,
re_match_type="match",
re_match_type=MatchType.Match,
re_match_scope=None,
)
rules[rule_name] = rule
return rules
Expand All @@ -246,58 +250,38 @@ def compile_path_rules(patterns: Iterable[str]) -> List[Pattern]:
]


def compile_rule(pattern: str) -> Rule:
"""
Compile pattern string to Rule.

:param pattern: Rule pattern with {path_pattern}::{pattern}
:return Rule: Rule object with pattern and path_pattern
"""
try:
path, pattern = pattern.split("::", 1)
except ValueError: # Raised when the split separator is not found
path = ".*"
return Rule(
name=None,
pattern=re.compile(pattern),
path_pattern=re.compile(path),
re_match_type="match",
)


def compile_rules(patterns: Iterable[Union[str, Dict[str, str]]]) -> List[Rule]:
def compile_rules(patterns: Iterable[Dict[str, str]]) -> List[Rule]:
"""Take a list of regex string with paths and compile them into a List of Rule.

Any line starting with `#` will be ignored.

:param patterns: The list of patterns to be compiled
:return: List of Rule objects
"""
try:
return list(
{
rules: List[Rule] = []
for pattern in patterns:
try:
match_type = MatchType(pattern.get("match-type", MatchType.Search.value))
except ValueError as exc:
raise ConfigException(
f"Invalid value for match-type: {pattern.get('match-type')}"
) from exc
try:
scope = Scope(pattern.get("scope", Scope.Line.value))
except ValueError as exc:
raise ConfigException(
f"Invalid value for scope: {pattern.get('scope')}"
) from exc
try:
rules.append(
Rule(
name=pattern.get("reason", None), # type: ignore[union-attr]
pattern=re.compile(pattern["pattern"]), # type: ignore[index]
path_pattern=re.compile(pattern.get("path-pattern", ".*")), # type: ignore[union-attr]
re_match_type="search",
path_pattern=re.compile(pattern.get("path-pattern", "")), # type: ignore[union-attr]
re_match_type=match_type,
re_match_scope=scope,
)
for pattern in patterns
}
)
except KeyError as exc:
raise ConfigException(
f"Malformed exclude-entropy-patterns: {patterns}"
) from exc
except AttributeError:
warnings.warn(
"Using old-style exclude-entropy-patterns; this behavior will be removed in v3.0",
DeprecationWarning,
)
stripped = (p.strip() for p in patterns) # type: ignore[union-attr]
rules = [
compile_rule(pattern)
for pattern in stripped
if pattern and not pattern.startswith("#")
]
return rules
)
except KeyError as exc:
raise ConfigException(
f"Invalid exclude-entropy-patterns: {patterns}"
) from exc
return rules
28 changes: 22 additions & 6 deletions tartufo/scanner.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,13 @@
import pygit2

from tartufo import config, types, util
from tartufo.types import BranchNotFoundException, Rule, TartufoException
from tartufo.types import (
BranchNotFoundException,
Rule,
TartufoException,
MatchType,
Scope,
)

BASE64_REGEX = re.compile(r"[A-Z0-9+/_-]+={,2}", re.IGNORECASE)
HEX_REGEX = re.compile(r"[0-9A-F]+", re.IGNORECASE)
Expand Down Expand Up @@ -351,18 +357,25 @@ def rule_matches(rule: Rule, string: str, line: str, path: str) -> bool:

:param rule: Rule to perform match
:param string: string to match against rule pattern
:param line: Source line containing string of interest
:param path: path to match against rule path_pattern
:return: True if string and path matched, False otherwise.
"""
match = False
if rule.re_match_type == "match":
if rule.re_match_scope == Scope.Word:
scope = string
elif rule.re_match_scope == Scope.Line:
scope = line
sushantmimani marked this conversation as resolved.
Show resolved Hide resolved
else:
raise TartufoException(f"Invalid value for scope: {rule.re_match_scope}")
if rule.re_match_type == MatchType.Match:
if rule.pattern:
match = rule.pattern.match(string) is not None
match = rule.pattern.match(scope) is not None
if rule.path_pattern:
match = match and rule.path_pattern.match(path) is not None
elif rule.re_match_type == "search":
elif rule.re_match_type == MatchType.Search:
if rule.pattern:
match = rule.pattern.search(line) is not None
match = rule.pattern.search(scope) is not None
if rule.path_pattern:
match = match and rule.path_pattern.search(path) is not None
sushantmimani marked this conversation as resolved.
Show resolved Hide resolved

Expand All @@ -372,6 +385,7 @@ def entropy_string_is_excluded(self, string: str, line: str, path: str) -> bool:
"""Find whether the signature of some data has been excluded in configuration.

:param string: String to check against rule pattern
:param line: Source line containing string of interest
:param path: Path to check against rule path pattern
:return: True if excluded, False otherwise
"""
Expand Down Expand Up @@ -651,7 +665,9 @@ def load_repo(self, repo_path: str) -> pygit2.Repository:
self.global_options.exclude_signatures = tuple(
set(self.global_options.exclude_signatures + tuple(signatures))
)

entropy_patterns = data.get("exclude_entropy_patterns", None)
if entropy_patterns:
self.global_options.exclude_entropy_patterns += tuple(entropy_patterns)
include_patterns = list(data.get("include_path_patterns", ()))
repo_include_file = data.get("include_paths", None)
if repo_include_file:
Expand Down
55 changes: 33 additions & 22 deletions tartufo/types.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,35 @@
# pylint: disable=too-many-instance-attributes
import enum
from dataclasses import dataclass
from typing import Any, Dict, Optional, TextIO, Tuple, Pattern
from typing import Any, Dict, Optional, TextIO, Tuple, Pattern, Union


class IssueType(enum.Enum):
Entropy = "High Entropy" # pylint: disable=invalid-name
RegEx = "Regular Expression Match" # pylint: disable=invalid-name


class MatchType(enum.Enum):
Match = "match" # pylint: disable=invalid-name
Search = "search" # pylint: disable=invalid-name


class Scope(enum.Enum):
Word = "word" # pylint: disable=invalid-name
Line = "line" # pylint: disable=invalid-name


class LogLevel(enum.IntEnum):
ERROR = 0
WARNING = 1
INFO = 2
DEBUG = 3


class OutputFormat(enum.Enum):
Text = "text" # pylint: disable=invalid-name
Json = "json" # pylint: disable=invalid-name
Compact = "compact" # pylint: disable=invalid-name


@dataclass
Expand Down Expand Up @@ -35,7 +63,7 @@ class GlobalOptions:
scan_filenames: bool
include_path_patterns: Tuple[str, ...]
exclude_path_patterns: Tuple[str, ...]
exclude_entropy_patterns: Tuple[str, ...]
exclude_entropy_patterns: Tuple[Dict[str, str], ...]
exclude_signatures: Tuple[str, ...]
output_dir: Optional[str]
git_rules_repo: Optional[str]
Expand All @@ -59,11 +87,6 @@ class GitOptions:
include_submodules: bool


class IssueType(enum.Enum):
Entropy = "High Entropy" # pylint: disable=invalid-name
RegEx = "Regular Expression Match" # pylint: disable=invalid-name


@dataclass
class Chunk:
__slots__ = ("contents", "file_path", "metadata")
Expand All @@ -74,31 +97,19 @@ class Chunk:

@dataclass
class Rule:
__slots__ = ("name", "pattern", "path_pattern", "re_match_type")
__slots__ = ("name", "pattern", "path_pattern", "re_match_type", "re_match_scope")
name: Optional[str]
pattern: Pattern
path_pattern: Optional[Pattern]
re_match_type: str
re_match_type: Union[str, MatchType]
re_match_scope: Optional[Union[str, Scope]]

def __hash__(self) -> int:
if self.path_pattern:
return hash(f"{self.pattern.pattern}::{self.path_pattern.pattern}")
return hash(self.pattern.pattern)


class LogLevel(enum.IntEnum):
ERROR = 0
WARNING = 1
INFO = 2
DEBUG = 3


class OutputFormat(enum.Enum):
Text = "text" # pylint: disable=invalid-name
Json = "json" # pylint: disable=invalid-name
Compact = "compact" # pylint: disable=invalid-name


class TartufoException(Exception):
"""Base class for all package exceptions"""

Expand Down
Loading