-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a parameter to be parsed as item's datetime stamp #4
Comments
Would that be useful? 🤔 |
Yes. It would be useful in being able to limit the initial amount of articles CSS2RSS-based feeds fetch upon the first fetch after feed creation (or the first fetch after the database is cleaned & optimized) with article filters and properly limit the amount of articles in general with this upcoming feature. Right now the article filter I use successfully limits the initial amount of non-CSS2RSS feed articles fetched but cannot limit the initial amount of CSS2RSS feed articles fetched as they all have the same timestamp, and the feature has unexpected behaviour as when for example limiting a CSS2RSS feed to:
|
Well, that can be done of course as a yet another parameter, but my worry is - will people actually bother to use it and match one more element with a new css selector every feed for this? (even if they'd want to limit the number of fetched articles with this mentioned feature) |
well I myself would use that feature it would be nice if the provided element was somehow magically parsed into date/time object, there is excellent python package "dateparser" which provides such automagic string -> date/time conversion function |
I would, yes. |
@martinrotter articles keep getting remarked as non-read when their date changes even a little :-( so with this feature they will be forever unread (cuz many dates are relative) 2024-01-21.16-47-14.mp4Tried in 4.6.3 - same result |
New date selector works relatively well, thanks alot :) Couldn't get it to work for these pages even if they do have dates to their items: https://jwfan.com/?cat=21 |
Calling script like this fails:
I only need to specify "item" selector and date selector (in this case with "date" class)
|
I think you need to use ~ instead of "" to skip selectors you don't need. |
OK, I am really beginner when it comes to CSS selectors. |
both "" and ~ should work to skip elements, I'll fix it 👍 |
Tilde works. Perhaps anyone can help me. I have website, which has one huge element which contains elements (which represent titles) and elements (which contain dates/times) like this: <p>
<a href="....." class="x">Title 1</a>
<span class="y">2024-01-07</span>
<a href="....." class="x">Title 2</a>
<span class="y">2023-07-29</span>
<a href="....." class="x">Title 3</a>
<span class="y">2022-01-07</span>
</p> @Owyn @RetroAbstract Can anyone from you experts tell me how to instruct css2rss to spill out feed from this correctly now with date? |
date selector should point at this span element |
as for the 2nd link I'm not sure what you're trying to match there... update history? there isn't a separate date element there |
Would this also work if input element for date is messed with auxiliary text like this? <p>
<a href="....." class="x">Title 1</a>
<span class="y">2024-01-07 SOMEMORETEXT</span>
<a href="....." class="x">Title 2</a>
<span class="y">2023-07-29 SOMEMORETEXT</span>
<a href="....." class="x">Title 3</a>
<span class="y">2022-01-07 SOMEMORETEXT</span>
</p> |
They are redirected into the article's body, check it :-) Because erroring a whole feed just because an auxiliary feature like date-parsing broke would've been too much |
Yes, I can create feeds for both sites with CSS2RSS, but can't get the date selector to work with them. The "time" element does not work for https://jwfan.com/?cat=21, as well as "published", "entry-meta", "entry-meta > published" etc.. articles have as time-stamp the moment RSS Guard fetches the feed. For https://support.microsoft.com/en-us/topic/windows-10-update-history-8127c2c6-6edf-4fdf-8b9f-0f7be1ef3562, you're right there isn't a separate date element, my mistake. Would the script be able to only take into consideration the date of each update and discard the rest? For example with the "January 9, 2024—KB5034122 (OS Builds 19044.3930 and 19045.3930)" update - Keep the date, use it for the date selector and discard everything after it? |
Well, what I try to parse is this URL https://antivirus.22web.org/clanky.htm Here is RAW input: https://www.pastebin.cz/en/p/VVFQEN9 Here is how I call it
Sadly with the the output "date_published" attribute is empty and error message is not seen in contents of article either. |
It works for me 🤔 have you tried the script line from my screenshot?
it should be at the very bottom of it in |
|
Works now, thanks 😄 |
check after 4fe2c89 now it should write when the date element wasn't found at all (it was probably the case - cuz then it won't write any parsing errors - since there's nothing to parse) |
that's because the date element isn't inside .clanek element but next to it... |
Yes i know, how to tell it that it is next then? Pls |
Don't tell it that it's next, just choose a root element (1st argument) a level higher |
Would need some help trying to get the dates for items on eBay. For example, the search term "nintendo switch" sorted by newly listed, the time of listing for each article is found in:
When I use any of the above ( Any help with this would be much appreciated. 🙂 |
because there are no dates for ebay ads they masked as listed items, that's why it says so
also, date parsing now uses your local timezone, not UTC when unspecified (I've just noticed it was wrong here) and there's now a more detailed message saying how many items exactly didn't get their date parsed or found so you'll see when it's just few and not all of them failing |
Thanks. I tried Maya and Beautiful Soup are installed. What URL are you using? Update: The user agent I had set to launch with RSS Guard was the cause of me not getting the same results. Changed it and now it works fine. 🙂 |
100% sure it's the same line you wrote, copied and pasted: I also updated css2rss to your latest commit.
I thought this too, tried the UK and CA eBays but same results. |
Perhaps RSSGuard version also matters, I'm still using newer versions should have an option to switch between full and lite browser and also some alternative ways to fetch websites for parsing as I've heard |
Tried on 4.3.3 No Web Engine and indeed, I now get the results you do.
4.7.2 No Web Engine is my daily driver, tried alterning Use legacy article formatting in Feeds & Articles > Articles toggled on and off but no difference.
I think those are related to the Discover feeds feature which to my knowledge cannot be used with CSS2RSS. |
BTW, lightweight variant of browser is still available in latest RSS Guard, you just have to enable it in settings. |
Feel free to report the "visited link color" if it bugs you, could be a bug. |
No description provided.
The text was updated successfully, but these errors were encountered: