Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix folder_contents being broken with VHM and Multilingual #18

Merged
merged 10 commits into from
Sep 8, 2022
Merged

Fix folder_contents being broken with VHM and Multilingual #18

merged 10 commits into from
Sep 8, 2022

Conversation

Rudd-O
Copy link
Contributor

@Rudd-O Rudd-O commented Sep 7, 2022

When Multilingual is enabled and VHM is enabled too, serving any folderish's folder_contents under a language folder as a top level domain name is currently broken (if en.site.com/archives is VHMed to PloneSite/en/archives then @@qsOptions URL is /archives/@@qsOptions instead of being /@@qsOptions as it should be.

The fundamental problem is that the function get_top_site_from_url(...) is failing to retrieve the actual top site because it can't find a site in the traversal structure, so it picks the last folderish as the site, which is obvs broken because /@@qsOptions only works on site roots, not on folderish content.

This code makes it so that any site found in the structure is returned as the top site from the URL, but if no top site is found in that traversal, the actual site from getSite() is returned instead, which is the correct behavior based on how the function is named, and actually fixes the /@@qsOptions conundrum.

Reproducer

Odd intersection of circumstances to reproduce bug:

  1. Create classic Plone site.
  2. Enable Multilingual with en-US or some other language.
  3. Verify visiting /Plone redirects to Plone/en-US or your language folder of choice.
  4. VHM your site so that /Plone/en-US is accessible as / (a standard, supported Multilingual rewrite configuration typically used by Ploners for language-code subdomain hosting). Plone/en-US/VirtualHostRoot is the relevant VHM rewrite.
  5. Visit the site. Verify that site is accessible as /.
  6. Create folder xxx on that location.
  7. Visit folder xxx on that location.
  8. Click on Folder contents in the editor toolbar.
  9. Verify that the folder contents screen for xxx loads, but the actual listing remains empty.
  10. Verify that there is an HTTP 404 for @@qsOptions in the network panel of your browser inspector.

Rationale

This code makes the method sometimes return a content object in some cases, rather than an actual site root.

Why?

All callsites (see below) need a "root object" from the perspective of the browser to compute a root and generate various URLs for their own purposes, because all their purposes are XHR browser stuff (fill lists of objects and search results, chiefly). This method I'm fixing determines the "root" used for that tree which is used in related items, image insertion, and folder contents browsing, and if that "root" is incorrect / does not correspond to what the browser sees as "root", nothing works.

Needless to say, this method should never be used to traverse anything server-side.

Quality control

Callsites

My omelette is big. There are not that many uses, and all its uses currently broken are fixed by this patch. I count two uses:

  1. plone.app.content.browser.contents:ContentsBaseAction / get_options / ContextInfo.call
  2. plone.app.widgets:get_relateditems_options

Unit and regression testing

All current unit tests pass now. I have added a test case to cover the "root URL object is not actually a site" case, which was never tested before. In addition to that, I have tested this with a local build (most up to date code) and here is what I can verify:

  • Related items works now!
  • Image insertion works as well.
  • Browsing folders in folder contents works too.

So pretty much everything that was broken before, is no longer broken thanks to this modest change. And that is the complete manual regression test of all users of this code.

When Multilingual is enabled and VHM is enabled too, serving any folderish's `folder_contents` under a language folder as a top level domain name is currently broken (if `en.site.com/archives` is VHMed to `PloneSite/en/archives` then `@@qsOptions` URL is `/archives/@@qsOptions` instead of being `/@@qsOptions` as it should be.

The fundamental problem is that the function `get_top_site_from_url(...)` is failing to retrieve the actual top site because it can't find a site in the traversal structure, so it picks the last folderish as the site, which is obvs broken because `/@@qsOptions` only works on site roots, not on folderish content.

This code makes it so that any site found in the structure is returned as the top site from the URL, but if no top site is found in that traversal, the actual site from `getSite()` is returned instead, which is the correct behavior based on how the function is named, and actually fixes the `/@@qsOptions` conundrum.
@mister-roboto
Copy link

@Rudd-O thanks for creating this Pull Request and helping to improve Plone!

TL;DR: Finish pushing changes, pass all other checks, then paste a comment:

@jenkins-plone-org please run jobs

To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically.

Happy hacking!

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

@jenkins-plone-org please run jobs

@Rudd-O Rudd-O requested review from jensens, thet and flipmcf September 7, 2022 16:55
@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

Hold the merge for this, even if the tests pass. I have discovered an issue with the fix that needs to be addressed.

EDIT: false alarm. This is good.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

@jenkins-plone-org please run jobs

1 similar comment
@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

@jenkins-plone-org please run jobs

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

I have excellent news!

Another thing that was legendarily broken in Multilingual sites (exactly because of this get_top_site_from_url(...) breakage) was the media browser in the TinyMCE editor. You could not browse for files to link, embed or insert as images.

This code fixes it!

OMG I have been five years without being able to insert things on my site unless I use an admin domain without VHM... and now I can! Pure joy.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

@jenkins-plone-org please run jobs

@flipmcf
Copy link

flipmcf commented Sep 7, 2022

  1. Take a look at the docstring for get_top_site_from_url() in your final commit. It looks like it's duplicated.

Questions:
How did you arrive at ISite as the interface marker? I think there are more, like ISiteRoot and INavigationRoot that are also good candidates.

(already answered: am preserving the behavior of the code, just including the topmost accessible object (rather than the bottommost only) in the list of objects to be considered as "site roots" to return to the caller.)

Docs say use ISiteRoot https://docs.plone.org/develop/plone/serving/traversing.html#traversing-back-to-the-site-root
---But I don't recommend making this change here. I'd rather see another ticket specifically calling that out as a bug, if it even is one.

I don't use VHM and Multilingual, so I don't have direct experience, and I'm not sure if I can deploy this into my env and actually test it.

Will come back and edit this with more.


More: God, I hate this code. That Try/Except block is huge.... I want to re-write this.... calm down....


@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

From @flipmcf :

So, let me get this straight.

The "canonical way" (bigger than 'best practice') to get a site root is to traverse upwards until you find an object that implements ISite.

I honestly do not know :-)

You are proposing that we add a utility to do just this. And use that utility.

We have a get_top_site_from_url function in plone.base which AFAIK is may not be new code but if it isn't, it definitely used to live somewhere else before plone.base was conceived. Do not take my word for it — I have not tried this in Plone 5.

This function does not traverse from leaf to root. It traverses from root to leaf, starting with the rootmost element visible through the Web (VHM) and falling back to the actual Plone site root discovered through getSite(). Unfortunately, as of main right this very instant, and for many years now, the traversal gets to the leaf object and falls back to that, which is wrong, since @@qsOptions is not gonna work there (also, under no circumstance are most leaf objects a "top site").

This code fixes the traversal so that we traverse from root to leaf, collecting any legit ISite()s along the way (and also the rootmost visible object according to VHM), and picks the rootmost object to return (if there are none, we simply pick the actual site root as before). So the change is:

  • from returning the leafmost object, and if not found, the root ISite, to
  • returning the leafmost ISite, then if not found the rootmost visible object, then if not found the actual root ISite.

Questions:

@Rudd-O How did you arrive at ISite as the interface marker? I think there are more, like ISiteRoot and INavigationRoute that are also good candidates.

Was already there. I did not change that.

Docs say use ISiteRoot https://docs.plone.org/develop/plone/serving/traversing.html#traversing-back-to-the-site-root

I am wary of changing this.

Does this play well with zope.component.hooks.getSite? Should it? Does it even matter?

Doesn't affect getSite at all. It is only dealing with URL traversal.

I concur, you are on the right track here. Just throwing out some ideas.

I appreciate them!

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

There really were a bunch of things that were utterly broken by the behavior of get_top_site_from_url before. This function is never used for traversal proper, but it is used all over the place to generate all those helpful JSONlets (@@qsOptions and @@fc-*) that let the client-side JS in Plone do magic like browse folders and embed media. Obviously, if this function is broken, all those helpful JSONlets become helpless.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

@jenkins-plone-org please run jobs

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

The CI is dead folks. I will merge upon approval. I have the fix live running on locally and it works correctly with production VHM enabled.

@flipmcf
Copy link

flipmcf commented Sep 7, 2022

Should do some manual regressions on affected code - especially pre-release. As you say, get_top_site_from_url is used all over the place to do UI magic.

The problem is, who is depending on the bad behavior of get_top_site_from_url and by fixing it, what breaks.

https://github.com/search?q=org%3Aplone+get_top_site_from_url&type=code

I'm feeling that the safest (but not best) change is to repair plone.app.content.browser.contents.__init__.py to use a better method of getting the site url, rather than modifying lower-level code 'get_top_site_from_url' to return something that 'folder listing' likes. This ticket is "Fix folder_contents' not 'fix get_top_site_from_url()'

If there is really a bug in the utility, then all code that depends on this utility needs to be notified, tested, and updated. This is indeed the correct path if @Rudd-O analyis is correct. I'm not able to verify that this analysis is correct (not saying it's incorrect, but that I simply can't verify)

If we change this utility, I'm confident we'll be back here with someone else saying "Hey, this behavior changed that I was relying on". Maybe that's where we want to be, distributing the 'correct' work across the dependent code through finding and fixing failures, but in that case, we must be CERTAIN that get_top_site_from_url() is broken.

I want another opinion on the hypotheses that get_top_site_from_url() is broken. --> @thet

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

The problem is, who is depending on the bad behavior of get_top_site_from_url and by fixing it, what breaks.

Callsites

My omelette is big. There are not that many uses, and all its uses currently broken are fixed by this patch. I count two uses:

  1. plone.app.content.browser.contents:ContentsBaseAction / get_options / ContextInfo.call
  2. plone.app.widgets:get_relateditems_options

Why this must return a content object in some cases, rather than an actual site root

These two cases all need a "root object" from the perspective of the browser to compute a root for their own purposes, because all their purposes are browser stuff (fill lists of objects, chiefly). This method I'm fixing determines the "root" used for that tree which is used in related items, image insertion, and folder contents browsing, and if that "root" is incorrect / does not correspond to what the browser sees as "root", nothing works.

Unit and regression testing

All current unit tests pass now. I have added a test case to cover the "root URL object is not actually a site" case, which was never tested before. In addition to that, I have tested this with a local build (most up to date code) and here is what I can verify:

  • Related items works now!
  • Image insertion works as well.
  • Browsing folders in folder contents works too.

So pretty much everything that was broken before, is no longer broken thanks to this modest change. And that is the complete manual regression test of all users of this code.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 7, 2022

I've added a new test case covering this exact issue. All tests pass.

Now that tests are passing and the functionality of the site is working perfectly, I give myself some time to give a better rationale for this change as comments, for future me and future Plonistas.
Copy link
Member

@jensens jensens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! And AFAICT a perfect solution. Bonus points for the doc-string improvement ;-)

@MrTango
Copy link
Contributor

MrTango commented Sep 8, 2022

@plone/framework-team should have a look here too i guess. Specially that nothing for restapi/Volto breaks here.

@flipmcf
Copy link

flipmcf commented Sep 8, 2022

So pretty much everything that was broken before, is no longer broken thanks to this modest change

This is so satisfying when a low-level fix happens. Congrats! I'm still going to wait on the complaints from those who may have relied on the incorrect behavior, but the analysis seems spot-on. Who would rely on something being broken in VHM environments? (we will find out!)

@Rudd-O analysis is a good defense to anyone who finds this change offensive to them personally.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 8, 2022

A profound thanks to @flipmcf and @jensens for both the encouragement and the review. Here goes.

@Rudd-O Rudd-O merged commit 1199d72 into plone:main Sep 8, 2022
@ale-rt
Copy link
Member

ale-rt commented Sep 9, 2022

@plone/framework-team should have a look here too i guess. Specially that nothing for restapi/Volto breaks here.

Skimming the code it looks sane.
@Rudd-O does this also fix plone/plone.subrequest#17?

@mauritsvanrees
Copy link
Member

This causes test failures. See Jenkins

For now, I will remove plone.base from checkouts.cfg, that should fix it. But please have a look at the affected packages.

Can't blame anyone for not seeing the test failures earlier, since PRs on Jenkins are broken currently. :-/ At least the standard jobs still work, so we see this after merge.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 9, 2022

I read the tests and they (edit) do seem to be related to my fix.

How can I run only the offending tests locally? I am using buildout and I want to bisect this.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 9, 2022

Figured it out thanks to @thet ! Please stand by for fixes...

This package really should have CI.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 9, 2022

@mauritsvanrees I have replicated the test environment locally with my code, but basing my tests on plone.base version 1.0.0b2, and my code change passes tests.

The problem began appearing with 1.0.0b3 and my code.

Will continue the bisect here.

EDIT: somehow when I use mainline my fix does cause issues. So weird.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 9, 2022

I will work on this throughout the weekend. Please don't revert this fix just yet.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 9, 2022

I think I have a fix. Stand by for patch. Validating with the other test failures now.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 9, 2022

Broken tests: https://jenkins.plone.org/job/plone-6.0-python-3.10/763/

Fix: #19

The tests that were broken were retested locally using my own buildout, and they passed.

@thet
Copy link
Member

thet commented Sep 9, 2022

@Rudd-O great job! that must have been some deep diving. your analysis about how get_top_site_from_url is totally correct.

@Rudd-O
Copy link
Contributor Author

Rudd-O commented Sep 11, 2022

Thank you but I still broke the tests, and that is because I could not run them locally (I am now setup to run them here, so it's fine). A fix is incoming in #20 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants