Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter include/exclude #103

Closed
Anian-igor opened this issue Dec 23, 2022 · 12 comments
Closed

Filter include/exclude #103

Anian-igor opened this issue Dec 23, 2022 · 12 comments

Comments

@Anian-igor
Copy link

Is this possible to get filter to include/exclude filter to RSS feed same like for YT feed?

@umputun
Copy link
Owner

umputun commented Dec 23, 2022

Not sure what you mean exactly; however, the filters apply to feeds, not to the source (like YT). Pls, explain your use case (better with some config example) and give me more details about what exactly you are trying to filter out and what you meant by "include"

@Anian-igor
Copy link
Author

I have really huge feed from local radio broadcaster https://nv.ua/rss/podcasts/viyna-v-ukraini.xml
I want to filter just for few names in titles
I tried to filter by title but without luck
filter:
title: Сич
I have regexp but I don't know how to apply it. And didn't find example
/Фурса|Сич|Михайлов|Шабунін|Горєвой|Портников|Яковина|Тимошенко|Денисенко/i

@umputun
Copy link
Owner

umputun commented Dec 23, 2022

Well, we do have an example, see https://github.com/umputun/feed-master/blob/master/_example/etc/fm.yml#L86

Generally, the filter is to exclude things, not to include them. I.e. filter doesn't act as "gimme those items only", but rather "gimme all the items except some". However, you can try some reversed/inverted regex, like ^(?:(?!first|second).)*$. This particular one won't work because go's regex engine doesn't support the lookaround (e.g. the ?! negative lookahead operator). I'm not sure if there is an alternative way to achive exclusion in regex, not a big expert in regex magic.

Alternatively, the small change in the code (adding configuration parameter "inversed" or smth like this) and using it to flip result of the check in this function can be much easier way to achive the desired result. Feel free to subimt a PR for this or ping the author of the original implementation of regex filters

@Anian-igor
Copy link
Author

Yes. I see these examples. But I thought I miss include filter.
Thanks.

@umputun
Copy link
Owner

umputun commented Dec 23, 2022

I have added the "invert" parameter to the filter. In your case, it will look like this:

    filter:
      title: (Фурса|Сич|Михайлов|Шабунін|Горєвой|Портников|Яковина|Тимошенко|Денисенко)
      invert: true

pls test it and let me know. You need to use :master docker image for those tests

@Anian-igor
Copy link
Author

I updated master image. And Invert still don't work
feed-master | 2022/12/23 14:30:19.849 [DEBUG] {proc/processor.go:52} refresh started feed-master | 2022/12/23 14:30:19.853 [DEBUG] {api/server.go:90} loading templates from webapp/templates/* feed-master | 2022/12/23 14:30:20.267 [INFO] {proc/processor.go:92} filtered 17703 (Fri, 23 Dec 2022 19:38:53 +0000), radio-nv Війна в Україні: Чому Путін сере в секретну валізу — Сергій Фурса, Віталій Сич feed-master | 2022/12/23 14:30:20.275 [INFO] {proc/store.go:54} save 1671824333-9da26a141ac244b5ddda211a7aa15a1ac3df94e8 - radio-nv - Війна в Україні: Чому Путін сере в секретну валізу — Сергій Фурса, Віталій Сич - 17703 feed-master | 2022/12/23 14:30:20.306 [INFO] {proc/store.go:54} save 1671794536-fa2b37434ccdbb6d86e21eb7db38a4f311ad2248 - radio-nv - Війна в Україні: Чи Баканов умисно не чіпав Московскьий патріархат? – Євстратій Зоря, ПЦУ - 17696

@umputun
Copy link
Owner

umputun commented Dec 23, 2022

pls show the first line it prints to the log (with version info) and the filter part of your config

@Anian-igor
Copy link
Author

  radio-nv:
    title: Радіо НВ - вибране
    description: НВ вибране
    link: https://podcasts.nv.ua
    language: "uk-ua"
    image: images/lavel_nv.png
    filter:
      title: (Фурса|Сич|Михайлов|Шабунін|Горєвой|Портников|Яковина|Тимошенко|Денисенко)
      invert: true
    sources:
      - name: Війна в Україні
        url: https://nv.ua/rss/podcasts/viyna-v-ukraini.xml
feed-master  | init container
feed-master  | set timezone America/Chicago (Fri Dec 23 14:41:38 CST 2022)
feed-master  | custom APP_UID not defined, using default uid=1001
feed-master  | chown: /srv/etc/fm.yml: Read-only file system
feed-master  | execute /srv/feed-master
feed-master  | feed-master master-04b9d0a-20221223T14:32:28

@umputun
Copy link
Owner

umputun commented Dec 23, 2022

confused. your output actually indicates the entry as properly filtered.

2022/12/23 14:30:20.267 [INFO] {proc/processor.go:92} filtered 17703 (Fri, 23 Dec 2022 19:38:53 +0000), radio-nv Війна в Україні: Чому Путін сере в секретну валізу — Сергій Фурса, Віталій Сич feed-master

Those filtered suckers stored to the internal db with a special "junk" flag and this is why you see the next message "save ...". However, they not populated to the feed, or at least they not supposed to. Do you actually see them in the result feed?

@Anian-igor
Copy link
Author

I see that I put not inverted filter log. Right now I put correct one

feed-master  | 2022/12/23 15:00:47.795 [DEBUG] {api/server.go:90} loading templates from webapp/templates/*
feed-master  | 2022/12/23 15:00:48.190 [INFO]  {proc/store.go:54} save 1671824333-9da26a141ac244b5ddda211a7aa15a1ac3df94e8 - radio-nv - Війна в Україні: Чому Путін сере в секретну валізу — Сергій Фурса, Віталій Сич - 17703
feed-master  | 2022/12/23 15:00:48.198 [INFO]  {proc/store.go:54} save 1671794536-fa2b37434ccdbb6d86e21eb7db38a4f311ad2248 - radio-nv - Війна в Україні: Чи Баканов умисно не чіпав Московскьий патріархат? – Євстратій Зоря, ПЦУ - 17696
feed-master  | 2022/12/23 15:00:48.202 [INFO]  {proc/store.go:54} save 1671794340-c60c9a4906d590e27567a831da0054f1e4d5a29e - radio-nv - Війна в Україні: США: Північна Корея передала зброю ПВК Вагнер. Яку саме? — Олексій Їжак - 17695
feed-master  | 2022/12/23 15:00:48.209 [INFO]  {proc/store.go:54} save 1671786292-b809f71eb5be0b98635a60978f63d03cc0f9d90a - radio-nv - Війна в Україні: Російські чмобіки – це армія заробітчан — Олексій Кошель - 17693
feed-master  | 2022/12/23 15:00:48.212 [INFO]  {proc/store.go:54} save 1671786094-3ad034fbcda34a022a3286b44f118c851aebbf8c - radio-nv - Війна в Україні: Спротив в Каховці. 150 рашистів знищено, колаборанта підірвано — Денис Попович - 17692
feed-master  | 2022/12/23 15:00:48.215 [INFO]  {proc/store.go:54} save 1671745727-539cf36247c5e144bf702e1204d5a9d60effbf27 - radio-nv - Війна в Україні: Є ідеї, щоб Україна будувала 50 заводів під землею — Вадим Черниш - 17691

@umputun
Copy link
Owner

umputun commented Dec 23, 2022

reproduced the issue and it should be fixed by now.
pls, pull the fresh master and give it another try

@Anian-igor
Copy link
Author

Works as indeed. Thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants