-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating a Repository Item w/ an Alias Does Weird Things #1095
Comments
My tepid take: ignore aliases and use /node/foo as the node's URI. Aliases are for presentation, not keeping track of stuff. |
We should consistently use /node/1 unless we want trouble every time a user changes an alias. Trying to understand the situation fully, here. So everything stays the same except schema:sameAs, which changes to use the new alias? For the most part we're using |
@dannylamb, the only practical difference between those two examples is the addition of the The URL object is very insistent that it use the alias in the URL. The only function that gives us the internal path (/node/1) is Url::getInternalPath but it doesn't consider setAbsolute (to include the domain) or setRouteParameter (to include _format=jsonld). We would have to write our own helper function to get consistent external URLs using the internal path consistently. As far as Drupal is concerned, the internal path should stay internal. |
@seth-shaw-unlv++ I was not aware those were the defaults. Good example, eh? ^_^ To confirm my fears, I tried curl
give it an alias and...
So yeah, shared utility function for sure, if Drupal's not gonna give it to us willingly. Those urls are essentially ids, so we can't have them changing on us. It does raise the question of if we want to capture the alias in RDF, and how that alias plays along with the fedora vs drupal url thing we've got going on. |
Yeah, rel=canonical is a flag to search engines saying "use this URL" in your index for this page/content (instead of other URLs for the same page/content due to GET parameters that don't impact content, redirects, mobile versions, etc.). Drupal assumes that if you give a node an alias, THAT is the URL search engines should direct users to. See also Yoast's "the ultimate guide." I think we need to keep the alias in Fedora and the triplestore. For one, if we ever need to rebuild Drupal from a Fedora, we will want the alias stored there. For another, if we do let people query the triplestore, the alias is the URI they will expect to use (unless we start displaying the internal path URLs prominently); so we at least need to include a relationship with our internal path URI so people will know which URI we are actually using. |
As far as I can tell, Drupal 8 (like 7 before it) allows you to access /node/x at /node/x as well as at /my-fancy-alias. It's just internally, when creating a link to /node/x, it'll replace it with /my-fancy-alias. Like in 7, I'd bet there's a module to force /node/x to switch over to its alias. I'm really ... scared ... of treating Drupal aliases as "URIs". They're changeable and human-readable and follow no set schema. This is off topic but kinda related... if we want to expose a triplestore based on Drupal, should we maybe pop off the ?_format=jsonld? Because like seth-shaw-unlv pointed out, it's not exactly what someone expects to query for or see. The URI of an object can be different from the URL of the document that describes it *puts down can opener and admires the pretty worms* ;) |
You could always put the alias in a local identifier property right? Since aliases can pretty much change in Drupal without notifying the actual node of it (path-auto) they are dangerous creatures to be assumed constant or persistent. Also, you would have to dump their data (and they are not configuration entities right?) when moving to a new site or restoring and i'm pretty sure you won't be allowed to pre-create aliases before having the actual nodes ingested, so egg-dinosaur-egg-extinction dylema. |
@DiegoPino, actually, aliases are the only thing you could keep migrating from one Drupal to another. You can can give Drupal content, including an alias, at creation but then Drupal decides what your internal path is. So, if you theoretically lost your Drupal site and had to rebuild from Fedora, your Gemini database and triplestore of Drupal URIs would be useless unless you repopulate the new Drupal giving it your content in the exact same order again (including dummy content for anything that was deleted for node IDs no longer being used). If you really want a URI to persist from one Drupal to another AND throughout time... create an alias and then somehow lock it down so no one can change it. |
@seth-shaw-unlv true. My experience varies there but maybe its my interpretation of your answer and the fact that you/i could have path-auto, which is in fact different way of aliasing. When you say
Yes, not saying you should not restore them or not keep the around. My fault about the precreation. No precreation, but creation at the same time is possible. But still probably all your references/links inside the ecosystem happen via uuid and sadly many times via the uid so you need to keep your uids anyway. Even Views building depends on the uid/uuid. Not your path alias. So re-ingesting a set of nodes related to each other requires a lot of mangling, waiting for the node id to come back for the first request, (files!) if you are thinking about a restoring from scratch. There are ways, yes but sitll partial to this. I feel you need uid, uuid and alias, the whole package. On the other side JSONAPI allows to define your entity UUID and that also allows you to avoid overwriting nodes but not your alias. True too, the node id (uid) can not be ensured (except if you migrate your full table in your case here many tables) and jsonapi disallows setting it. But the uuid can be persisted on export, re-import, via jsonapi, etc. So also how you build your alias is an issue. And then you have also language based aliases. How do you handle that fact? Also, you can have many aliases, and when requesting one you will get always the most recent one. (i even remember on 7.x killing paths because a shortcut became the used alias and then title of an object gets stuck for ever....) I think i mentioned this some time ago (like years ago) but it seems a better approach (and there i agree with you) that if using alias is the only thing you can control and persist and its important for you all, that every Object/node gets an alias automatically and you can all agree on that one being the real thing (remember islandora/pid). I'm on my side do that by generating one automatically (that becomes my "purl") based on the uuid. Maybe even simpler, you can agree that API interaction happens always on that single PURL and that the response includes all the aliases (many, many) so your fedora, triplestore etc always contains that one. And then well, UI, etc can do whatever it wants. This is a good code read (pointing to 8.7 just in case) https://git.drupalcode.org/project/drupal/blob/8.7.x/core/lib/Drupal/Core/Path/AliasStorage.php and kinda needed to understand how aliases are handled on storage = crud. |
Oh, also, Aliases are going to change in the future |
Haha @DiegoPino you convinced me a UUID-based "purl"-like alias was the thing to do before you said that was your solution. If only there were fields on aliases so that we could mark these ones as "special". (I didn't read the entire Drupal issue you pointed to, but if the "5 months ago" stuff is anything like the "5 years ago" stuff, that is a serious possibility). Restoring from scratch, with lots of stuff related to each other, is going to require a chain of migrations and migration lookups... if it's true that you can't set the UID (i.e. unique id, i.e. nid, tid, or uid?) during migrate, then you're gonna do lookups and I don't think the aliases (or lack thereof) make it any harder or easier. |
Looking like we won't land this before release, but we can hide the alias fields in the form and document in the meantime. |
@dannylamb, yeah, I suppose that will work for now. When we do get to it, as it looks like we are going to rely on the internal path, I would like the JSON-LD serialization to use the internal path for the |
We might want to raise this with the Drupal community to see if there is a consistent way to get the actual node id instead of the canonical one. |
I refactored every instance where we generate urls into some basic utility functions in IslandoraUtils and just call those instead. Now that we're consistent, turns out this sorts itself out! PR pending once I touch up tests. |
@whikloj Calling I'm relying on that behaviour for now, but at least when it changes we only have to update the code in one spot. |
@seth-shaw-unlv I don't think this is a thing anymore since we've 'standardized' how we generate URLs that go into headers and the events and such. OK to close? |
@dannylamb, my only hesitation to closing this is that we don't push the alias to Fedora in any form anymore. If someone wanted to rebuild an Islandora site just from their Fedora repo, they would lose any aliases they have. But, yeah, the root issue here (aliases breaking things) is resolved. |
@seth-shaw-unlv I made a ticket for publishing url aliases in RDF and am closing this one. |
Funny what pops up during testing sprints that you didn't expect:
If I create a Repository Item without an alias, the item gets indexed in Gemini, the triplestore, and Fedora just fine. I can update the item and all updates are persisted to the triplestore and Fedora. If I add an alias, Gemini keeps the old node URL and the new alias replaces the node url in Fedora's schema:sameAs. The triple-store, however, starts using the new alias as the URI for the node and all the old metadata for the original node URL just sits there.
If I create a Repository Item with an alias from the start, the alias is used in Gemini and is used in Fedora and the triplestore. If I ever change the alias the Gemini lookups start failing and Fedora will fail to update. The triplestore, as with the other case, simply starts using the new alias and leaves the old data sitting around.
What do we do about this, if anything, 🤷♂️ . Thoughts, @Islandora-CLAW/committers?
The text was updated successfully, but these errors were encountered: