Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v11 BUG: RTE parser too permissive for old file links syntax #111

Closed
YKWeyer opened this issue Dec 13, 2023 · 3 comments
Closed

v11 BUG: RTE parser too permissive for old file links syntax #111

YKWeyer opened this issue Dec 13, 2023 · 3 comments
Assignees

Comments

@YKWeyer
Copy link
Contributor

YKWeyer commented Dec 13, 2023

We discovered than using file:(int) in RTE fields caused the ContentPublisher to add an association to the file with the id, even if it was just within a regular text, or as a part of an external URL.

For example, entering the following text in a RTE field would create two relations to file 123 and 456 (whether they exist or not 🫤) although there is no need for it:

<p>profile:123</p>
<p><a href="https://www.google.com/search?hl=en&q=profile:456">Google search for profile</a></p>

It seems the DefaultRecordFinder::fetchRelatedRecordsByRte is too permissive on detecting old typo3 link syntax:

if (strpos($bodyText, 'file:') !== false) {
preg_match_all('~file:(\d+)~', $bodyText, $matches);
if (!empty($matches[1])) {
$matches = $matches[1];
}
$matches = array_filter($matches);
if (
count($matches) > 0
&& !in_array('sys_file', $excludedTableNames, true)
) {
foreach ($matches as $match) {
$relatedRecords[] = $this->findByIdentifier((int)$match, 'sys_file');
}
}
}

Maybe this method should use HtmlParser::splitIntoBlock to only target href attributes of <a> tags? (Or should other tags such as link also be considered?). Happy to open a pull request for it if you see fit.

ℹ️ At least, this doesn't seem to be the case in v12 🎉

@vertexvaar vertexvaar self-assigned this Dec 18, 2023
@vertexvaar
Copy link
Collaborator

I checked in every TYPO3 version back to v10 how TYPO3 stores links. It is always a t3 URN. (t3://<classification>=<identifier>) I think we can drop the support for non-t3-URNs completely in TYPO3 v11. Editing content with legacy file URIs will convert them to the new t3 URNs.
@YKWeyer can you please check if there are still links with the legacy syntax used in your system?

@YKWeyer
Copy link
Contributor Author

YKWeyer commented Dec 18, 2023

I already removed all legacy syntax from our system. In fact, all occurrences were links pointing to an old TYPO3 instance (< 8 LTS) that is now gone.

I think we can drop the support for non-t3-URNs completely in TYPO3 v11. Editing content with legacy file URIs will convert them to the new t3 URNs.

I don't know how many instances might still use the old syntax though. They would have had to upgrade from an old version (< 8.3) to the most recent ones without editing their content in between. It's probably unlikely, but the fact that the TYPO3 core still contains the logic to translate this old syntax more than 4 major releases after might be a hint that they suspect it might affect more instances that we think. (Either that, or they simply forgot about it)

But I would assume it's fair to remove support for legacy in the ContentPublisher, as long as this is announced as a potentially breaking change for older instances that might still have them somewhere…

@vertexvaar
Copy link
Collaborator

Thank you for your research, it has shed some light on the old file link syntax for me. It's clear that this kind of link will only ocurr in an anchor tag href attribute. Therefore i can change the regex accordingly. Dropping the support is not possible since we promise to support everything that works in TYPO3 (excluding 3rd party extensions obviously).
Bugfix incoming!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants