Skip to content

Commit

Permalink
Add a "remove duplicate lines" filter.
Browse files Browse the repository at this point in the history
  • Loading branch information
Sveder committed Oct 6, 2021
1 parent edf687c commit 38344a3
Show file tree
Hide file tree
Showing 3 changed files with 27 additions and 0 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ The format mostly follows [Keep a Changelog](http://keepachangelog.com/en/1.0.0/

- Migrated CI pipeline from Travis CI to Github Actions
- `user_visible_url` can now be specified for all job types (#654, by kongomongo)
- Added a `remove-duplicate-lines` filter.

## [2.23] -- 2021-04-10

Expand Down
1 change: 1 addition & 0 deletions docs/source/filters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ At the moment, the following filters are built-in:
- **sha1sum**: Calculate the SHA-1 checksum of the content
- **shellpipe**: Filter using a shell command
- **sort**: Sort input items
- **remove-duplicate-lines**: Remove duplicate lines (case sensitive)
- **strip**: Strip leading and trailing whitespace
- **xpath**: Filter XML/HTML using XPath expressions
- **jq**: Filter, transform and extract values from JSON
Expand Down
25 changes: 25 additions & 0 deletions lib/urlwatch/filters.py
Original file line number Diff line number Diff line change
Expand Up @@ -809,6 +809,31 @@ def filter(self, data, subfilter):
return separator.join(sorted(data.split(separator), key=str.casefold, reverse=reverse))


class RemoveDuplicateLinesFilter(FilterBase):
"""Remove duplicate lines"""

__kind__ = 'remove-duplicate-lines'

__supported_subfilters__ = {
'separator': 'Item separator (default: newline)',
}

__default_subfilter__ = 'separator'

def filter(self, data, subfilter):
separator = subfilter.get('separator', '\n')
data_lines = data.split(separator)

def get_unique_lines(lines):
seen = set()
for line in lines:
if line not in seen:
yield line
seen.add(line)

return separator.join(get_unique_lines(data_lines))


class ReverseFilter(FilterBase):
"""Reverse input items"""

Expand Down

0 comments on commit 38344a3

Please sign in to comment.