Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The link attribute in a feed might not be unique, relying only on that to identify entries is not enough #111

Closed
StayPirate opened this issue Jul 6, 2023 · 4 comments · Fixed by #120
Assignees
Labels
bug Something isn't working enhancement New feature or request

Comments

@StayPirate
Copy link
Contributor

I found that the ReadME Github podcast set the same link for most of the items. Since rss2email tracks read items via their link, this cause any new item to be consider already processed, so ignored.

StayPirate added a commit to StayPirate/sieve-disroot that referenced this issue Jul 6, 2023
StayPirate added a commit to StayPirate/apu2e4-containers that referenced this issue Jul 6, 2023
@StayPirate
Copy link
Contributor Author

Hi @skx today I find out another feed which behaves the same way. As you can see all the entries uses the same link

<item>
<title>ZDI-CAN-22550: Lexmark</title>
<guid isPermaLink="false">ZDI-CAN-22550</guid>
<link>
http://www.zerodayinitiative.com/advisories/upcoming/     <-----------------------------------
</link>
<description>
A CVSS score 6.3 <a href="http://nvd.nist.gov/cvss.cfm?calculator&version=3.0&vector=(AV:A/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L)">(AV:A/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:L)</a> severity vulnerability discovered by 'Foundry Zero' was reported to the affected vendor on: 2023-12-01, 4 days ago. The vendor is given until 2024-03-30 to publish a fix or workaround. Once the vendor has created and tested a patch we will coordinate the release of a public advisory.
</description>
<pubDate>Fri, 01 Dec 2023 00:00:00 -0600</pubDate>
</item>
<item>
<title>ZDI-CAN-22454: Hewlett Packard Enterprise</title>
<guid isPermaLink="false">ZDI-CAN-22454</guid>
<link>
http://www.zerodayinitiative.com/advisories/upcoming/     <-----------------------------------
</link>
<description>
A CVSS score 7.8 <a href="http://nvd.nist.gov/cvss.cfm?calculator&version=3.0&vector=(AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H)">(AV:L/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H)</a> severity vulnerability discovered by 'Anonymous' was reported to the affected vendor on: 2023-12-01, 4 days ago. The vendor is given until 2024-03-30 to publish a fix or workaround. Once the vendor has created and tested a patch we will coordinate the release of a public advisory.
</description>
<pubDate>Fri, 01 Dec 2023 00:00:00 -0600</pubDate>
</item>
<item>

because of that rss2email consider any entry as already seen and it's not possible to use the feed with it.

Since it's not the first feed I see providing the same link on multiple entry, maybe a more unique way to mark entries in rss2email should be considered. What do you think?

@StayPirate StayPirate changed the title Feed with multiple items with the same links are ignored The link attribute in a feed might not be unique, relying only on that to identify entries is not enough Dec 5, 2023
@skx
Copy link
Owner

skx commented Dec 5, 2023

I guess that's something we could work around:

  • If there are duplicates, then we can use the guid value.
    • Perhaps something link {{.link}}#{{.guid}} ?

I think that really the feed is a bit horrid and broken, but it should be something we can cope with.

@StayPirate
Copy link
Contributor Author

Would such a change triggers all entries once again? I guess an initial run with -send=false should mitigate that problem.

If you proposal will be implemented, it could impact the way how the link is shown in the email client once the email is received? Or that would still be a clean X-RSS-Link: {{.Link}}?

Another solution I can think of is to calculate a CRC or a quick hash of the entry and track that one on the rss2email's seen-database. But that could be slower (because of the hash calculation) and trigger new emails in case old items change over time, like an updated description will trigger the email notification once again.

skx added a commit that referenced this issue Dec 5, 2023
This pull-request should close #111 once complete, but I admit
I haven't tested it yet.
@skx
Copy link
Owner

skx commented Dec 5, 2023

Would such a change triggers all entries once again?

Yes

I guess an initial run with -send=false should mitigate that problem.

Yes

If you proposal will be implemented, it could impact the way how the link is shown in the email client once the email is received? Or that would still be a clean X-RSS-Link: {{.Link}}?

Yes, the link would be literally updated.

Another solution I can think of is to calculate a CRC or a quick hash of the entry and track that one on the rss2email's seen-database. But that could be slower (because of the hash calculation) and trigger new emails in case old items change over time, like an updated description will trigger the email notification once again.

I think I'd probably not love that approach, but I guess it is an option.

Does #120 work for you, or is that just a doomed approach?

@skx skx self-assigned this Dec 6, 2023
@skx skx added bug Something isn't working enhancement New feature or request labels Dec 6, 2023
@skx skx closed this as completed in #120 Dec 15, 2023
skx added a commit that referenced this issue Dec 15, 2023
This pull-request closes #111, by attempting to make feed items which share URLs unique, via appending the UID of the item with a "#"-mark.

This will fail if a mark already exists, but that risk is worthwhile given the nature of the feeds that will be affected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants