[FR]: few more simple standard options for not remarking articles UnRead #1279

Owyn · 2024-01-22T13:57:34Z

Brief description of the feature request

I think few more options here in addition to this outlined one would be appropriate and very handy for the users:

We have "Ignore changes in article body", so why not have:

1) Ignore changes in article title
2) Ignore changes in article date

?

Because now I have to apply "No duplicates by URL" (msg.isAlreadyInDatabase(MessageObject.SameUrl) filter to basically every feed I have, and when I add a new feed I have a chore to go to the filter list and add it to that filter, - twitter feeds tend to change titles a lot, either authors edit them or it's a technical thing done by twitter itself (I think) with some special characters like #

and for the Date - scrappers generate article dates based on relative dates (e.g. "yesterday" or "two hours ago") so article dates in feeds would fluctuate remarking em "unread" each and every time - I see no reason to remark something as unread just because its date changed? related Owyn/CSS2RSS#4

There shouldn't be a need to use filters for such standard things (and using filters might scare an average user)
and there shouldn't be a need to use a filter for literally every feed (because it is bothersome to re-apply it every newly added feed)

The text was updated successfully, but these errors were encountered:

martinrotter · 2024-01-22T14:11:52Z

There was added a automatic feature which considers "time" unchanged if it differs from stored time max 120 seconds. If you believe that this thresshold should be made bigger, let me know.

As for your suggestions. Yes, these would make some sense to implement. That said, my TODO list is HUGE, sadly.

martinrotter · 2024-01-22T14:17:36Z

and for the Date - scrappers generate article dates based on relative dates (e.g. "yesterday" or "two hours ago") so article dates in feeds would fluctuate remarking em "unread" each and every time - I see no reason to remark something as unread just because its date changed? related Owyn/CSS2RSS#4

As for scrapers. I have some scraped websites which post absolute date time in some element and I would greatly like to parse these. In this case, everything will work greatly. I even have custom scripts which convert relative timestamps to absolute ones.

And like I said this is partly mitigated by upcoming feature to consider two dates (which differ to upto 120 seconds) as "same".

Moreover, do you feel that when article is "updated" it should NEVER be re-marked as unread by RSS Guard and original state should be kept all the time? If I make such a change, will you b e able to promptly test for me?

Owyn · 2024-01-22T14:23:07Z

Moreover, do you feel that when article is "updated" it should NEVER be re-marked as unread

For my use-cases - yes,
but perhaps people would still want to be notified when something was changed - that would make sense for when the body is changed - but there's already an option for that

and I can do a swift test by using a portable version with no problems

max 120 seconds

that would not make a difference, look at the following relative dates:

two hours ago
yesterday
two weeks ago
last year

e.g. "two weeks ago" would change into "three weeks ago" only in a... week of time, not after 120 seconds, so every fetch (every 15 min) the date will be shifted by 15 minutes

martinrotter · 2024-01-22T14:26:55Z

two hours ago

yesterday

two weeks ago

last year

Like I said, I even think that CSS2RSS script should/could automagically resolved these dates. There is excellent python libarry for this called

https://dateparser.readthedocs.io/en/latest/

it can parse relative dates too with one line of python. Look it up. It is genious. If that was integrated into css2rss, it would be killer feature. User simply with argument picks an element (whatever) containing the absolute/relative date/time string, and boom, parsed. I already use it.

https://github.com/martinrotter/rssguard/blob/master/resources/scripts/scrapers/hudebnibazar.py#L53

martinrotter · 2024-01-22T14:28:16Z

APPENDUM: Yes, for GREATLY relative times like "week ago", it is really problem. Luckily, my use-case does not do that much "relativity", it is always relative with per-hour precision like "1 hour ago, 2 hour ago" so I always round date/times to whole hours.

Owyn · 2024-01-22T14:32:48Z

Like I said, I even think that CSS2RSS script should/could automagically resolved these dates. There is excellent python libarry for this called

it does, it uses maya library which uses dataparser you have mentioned in it to do it in one line \ one function call

Luckily, my use-case does not do that much "relativity", it is always relative with per-hour precision like "1 hour ago, 2 hour ago" so I always round date/times to whole hours.

Test scenario:
you fetch an article dated "1 hour ago",
15 minutes passes,
you fetch it again and it still says "1 hour ago" - the date on the article would now be +15 minutes from what it was last time 15 minutes ago cuz it still says "1 hour ago" and would still say so for the next 45 minutes

result:
the article will be remarked as unread cuz 15 minutes is bigger than 120 seconds

martinrotter · 2024-01-22T14:41:38Z

Like I said, I even think that CSS2RSS script should/could automagically resolved these dates. There is excellent python libarry for this called

it does, it uses maya library which uses dataparser you have mentioned in it to do it in one line \ one function call

it is always relative with per-hour precision like "1 hour ago, 2 hour ago" so I always round date/times to whole hours.

Test scenario: you fetch an article dated "1 hour ago", 15 minutes passes, you fetch it again and it still says "1 hour ago" - they date on the article would now be +15 minutes from what it was last time 15 minutes ago cuz it still says "1 hour ago" and would still say so for the next 45 minutes

That is why I talker about precision. For some use case, like one of mine, what I did was following. Lets say it is 15:18 and article says 1 hour ago.

What I do, call dateparser(1 hour ago) -> 14:18 -> trim all minutes -> 14:00.

Lets say 15 minute passes. my uses case is "nice" and while it says "1 hour ago" it still likely internally know exact time, so even when time is 15:59 . then

call dateparser(1 hour ago) -> 14:59 -> trim all minutes -> 14:00.

then it is 16:02 and it returns "2 hours ago"

call dateparser(2 hour ago) -> 14:02 -> trim all minutes -> 14:00.

So when the source uses relative time with good enough "precision", then some techniques can be used to mitigate any problems.

But I agree this is only specific use case.

I pushed cda0f79, can you wait for build (or build yourself) and test? articles now should NOT be re-marked as unread if they are updated, but existing read state is honored instead.

Owyn · 2024-01-22T16:15:07Z

I have tested it - changing the now doesn't seem to remark items as unread or to change their date at all

but changing titles is not recognized as a change at all - rather changed titles are considered brand new feeds

martinrotter · 2024-01-22T16:16:24Z

Does that particular article has its own "ID" provided from feed or script?

Owyn · 2024-01-22T16:21:53Z

Does that particular article has its own "ID" provided from feed or script?

it's a feed generated by my CSS2RSS scrapper so I guess - there's no id

martinrotter · 2024-01-23T06:20:02Z

RSS Guard has very robust logic of determining "uniqueness" of an article, meaning finding if some version of article already is stored in DB.

Mainly, most of feeds do provide "guid" for each of their entries. If such ID is provided by the feed, it is used as only key for identifiyng articles. In scripts, it is good practice to use URL of article as "guid". If there is now GUID then author/title/url triplet is used to identify messages and all three fields must match to mark the message as being "the one".

Check lines 1574-1600 of this file.

martinrotter · 2024-01-23T06:27:24Z

Addendum: I just tested article "updating" with local file-feed and everything works as expected. When date/time is only updated -> change is visible in rssguard (regardless if article has guid or not) and no duplicate article is created.

If I changed title for article which did not have guid, then yes, new article is created in rssguard, this is expected behavior at this point.

If article has guid, I tested changing date/time, title, author -> all changes are visible in rssguard, no duplicate article is correctly created.

martinrotter · 2024-01-23T06:28:06Z

Closing this for now, as "re-marking as unread" itself is now solved, for other issues, create separate tickets. :)

Owyn added the Type-Enhancement This is request for brand new feature. label Jan 22, 2024

Owyn assigned martinrotter Jan 22, 2024

martinrotter closed this as completed Jan 23, 2024

martinrotter added Component-DB Status-Fixed Ticket is resolved. labels Jan 23, 2024

martinrotter added this to the 4.6.4 milestone Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR]: few more simple standard options for not remarking articles UnRead #1279

[FR]: few more simple standard options for not remarking articles UnRead #1279

Owyn commented Jan 22, 2024

martinrotter commented Jan 22, 2024

martinrotter commented Jan 22, 2024

Owyn commented Jan 22, 2024 •

edited

Loading

martinrotter commented Jan 22, 2024

martinrotter commented Jan 22, 2024

Owyn commented Jan 22, 2024 •

edited

Loading

martinrotter commented Jan 22, 2024

Owyn commented Jan 22, 2024

martinrotter commented Jan 22, 2024

Owyn commented Jan 22, 2024 •

edited

Loading

martinrotter commented Jan 23, 2024 •

edited

Loading

martinrotter commented Jan 23, 2024

martinrotter commented Jan 23, 2024

[FR]: few more simple standard options for not remarking articles UnRead #1279

[FR]: few more simple standard options for not remarking articles UnRead #1279

Comments

Owyn commented Jan 22, 2024

Brief description of the feature request

martinrotter commented Jan 22, 2024

martinrotter commented Jan 22, 2024

Owyn commented Jan 22, 2024 • edited Loading

martinrotter commented Jan 22, 2024

martinrotter commented Jan 22, 2024

Owyn commented Jan 22, 2024 • edited Loading

martinrotter commented Jan 22, 2024

Owyn commented Jan 22, 2024

martinrotter commented Jan 22, 2024

Owyn commented Jan 22, 2024 • edited Loading

martinrotter commented Jan 23, 2024 • edited Loading

martinrotter commented Jan 23, 2024

martinrotter commented Jan 23, 2024

Owyn commented Jan 22, 2024 •

edited

Loading

Owyn commented Jan 22, 2024 •

edited

Loading

Owyn commented Jan 22, 2024 •

edited

Loading

martinrotter commented Jan 23, 2024 •

edited

Loading