-
-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR]: few more simple standard options for not remarking articles UnRead #1279
Comments
There was added a automatic feature which considers "time" unchanged if it differs from stored time max 120 seconds. If you believe that this thresshold should be made bigger, let me know. As for your suggestions. Yes, these would make some sense to implement. That said, my TODO list is HUGE, sadly. |
As for scrapers. I have some scraped websites which post absolute date time in some element and I would greatly like to parse these. In this case, everything will work greatly. I even have custom scripts which convert relative timestamps to absolute ones. And like I said this is partly mitigated by upcoming feature to consider two dates (which differ to upto 120 seconds) as "same". Moreover, do you feel that when article is "updated" it should NEVER be re-marked as unread by RSS Guard and original state should be kept all the time? If I make such a change, will you b e able to promptly test for me? |
For my use-cases - yes, and I can do a swift test by using a portable version with no problems
that would not make a difference, look at the following relative dates:
e.g. "two weeks ago" would change into "three weeks ago" only in a... week of time, not after 120 seconds, so every fetch (every 15 min) the date will be shifted by 15 minutes |
Like I said, I even think that CSS2RSS script should/could automagically resolved these dates. There is excellent python libarry for this called https://dateparser.readthedocs.io/en/latest/ it can parse relative dates too with one line of python. Look it up. It is genious. If that was integrated into css2rss, it would be killer feature. User simply with argument picks an element (whatever) containing the absolute/relative date/time string, and boom, parsed. I already use it. |
APPENDUM: Yes, for GREATLY relative times like "week ago", it is really problem. Luckily, my use-case does not do that much "relativity", it is always relative with per-hour precision like "1 hour ago, 2 hour ago" so I always round date/times to whole hours. |
it does, it uses
Test scenario: result: |
That is why I talker about precision. For some use case, like one of mine, what I did was following. Lets say it is 15:18 and article says 1 hour ago. What I do, call dateparser(1 hour ago) -> 14:18 -> trim all minutes -> 14:00. Lets say 15 minute passes. my uses case is "nice" and while it says "1 hour ago" it still likely internally know exact time, so even when time is 15:59 . then call dateparser(1 hour ago) -> 14:59 -> trim all minutes -> 14:00. then it is 16:02 and it returns "2 hours ago" call dateparser(2 hour ago) -> 14:02 -> trim all minutes -> 14:00. So when the source uses relative time with good enough "precision", then some techniques can be used to mitigate any problems. But I agree this is only specific use case. I pushed cda0f79, can you wait for build (or build yourself) and test? articles now should NOT be re-marked as unread if they are updated, but existing read state is honored instead. |
Does that particular article has its own "ID" provided from feed or script? |
it's a feed generated by my CSS2RSS scrapper so I guess - there's no id |
RSS Guard has very robust logic of determining "uniqueness" of an article, meaning finding if some version of article already is stored in DB. Mainly, most of feeds do provide "guid" for each of their entries. If such ID is provided by the feed, it is used as only key for identifiyng articles. In scripts, it is good practice to use URL of article as "guid". If there is now GUID then author/title/url triplet is used to identify messages and all three fields must match to mark the message as being "the one". Check lines 1574-1600 of this file. |
Addendum: I just tested article "updating" with local file-feed and everything works as expected. When date/time is only updated -> change is visible in rssguard (regardless if article has guid or not) and no duplicate article is created. If I changed title for article which did not have guid, then yes, new article is created in rssguard, this is expected behavior at this point. If article has guid, I tested changing date/time, title, author -> all changes are visible in rssguard, no duplicate article is correctly created. |
Closing this for now, as "re-marking as unread" itself is now solved, for other issues, create separate tickets. :) |
Brief description of the feature request
I think few more options here in addition to this outlined one would be appropriate and very handy for the users:
We have "Ignore changes in article body", so why not have:
1) Ignore changes in article title
2) Ignore changes in article date
?
Because now I have to apply "No duplicates by URL" (
msg.isAlreadyInDatabase(MessageObject.SameUrl
) filter to basically every feed I have, and when I add a new feed I have a chore to go to the filter list and add it to that filter, - twitter feeds tend to change titles a lot, either authors edit them or it's a technical thing done by twitter itself (I think) with some special characters like #and for the Date - scrappers generate article dates based on relative dates (e.g. "yesterday" or "two hours ago") so article dates in feeds would fluctuate remarking em "unread" each and every time - I see no reason to remark something as unread just because its date changed? related Owyn/CSS2RSS#4
There shouldn't be a need to use filters for such standard things (and using filters might scare an average user)
and there shouldn't be a need to use a filter for literally every feed (because it is bothersome to re-apply it every newly added feed)
The text was updated successfully, but these errors were encountered: