Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Islandora link headers grow unbounded #1764

Open
bseeger opened this issue Feb 17, 2021 · 10 comments
Open

Islandora link headers grow unbounded #1764

bseeger opened this issue Feb 17, 2021 · 10 comments
Labels
Type: enhancement Identifies work on an enhancement to the Islandora codebase

Comments

@bseeger
Copy link
Member

bseeger commented Feb 17, 2021

Every time we add an entity reference to a node, we get another link header (Subjects, Resource Type, Genre, Contributor -- anything referencing a taxonomy term). This overall makes sense, but means the link headers can grow unbounded and potentially run into external limits. Once we had our metadata application profile in place and started testing nodes, we ran into header buffer size limits in NGINX and started getting HTTP 502 Bad Gateway errors. The limits are easily upped, but the headers can still grow.

Link: <http://purl.org/coar/resource_type/c_c513>; rel="tag"; title="Image"
Link: <http://future.islandora.ca/taxonomy/term/5>; rel="tag"; title="Image"
Link: <http://future.islandora.ca/taxonomy/term/27>; rel="tag"; title="Cats"
Link: <http://future.islandora.ca/taxonomy/term/28>; rel="tag"; title="Dogs"
Link: <http://future.islandora.ca/taxonomy/term/28>; rel="tag"; title="Dogs"
Link: <http://future.islandora.ca/taxonomy/term/27>; rel="tag"; title="Cats"

Islandora itself doesn't concern itself with how it's deployed, really, so this ticket is about should we have all these link headers? Do they add value? Can we de-dup them? How much control over them do we have in the first place?

The system functions overall, but nodes with enough entity reference that hit this limit return 502's. It is a nasty little bug to run into when you suddenly start getting 502's for one node and not another. I'm not sure if there is an easy fix here, so this is more of a conversation starter and to make folks aware about these headers.

@dannylamb
Copy link
Contributor

Hey @bseeger,

I think @kayakr has stumbled into this before as well.

So we do that to try and be as "RESTful" as possible while still conforming to web standards, but I dunno how many people (er... client softwares) are making use of it. We use some link headers in the backend to get things into Fedora, but certainly not all. The ones generated by your standard entity references (member or, media of, tags, etc...) don't come into play at all as far as Islandora is concerned.

I'm happy to either

  1. Eliminate them entirely (you know how much I love 🔥)
  2. Keep them, but push them into the message body using json:api and https://www.drupal.org/project/jsonapi_hypermedia

I'm open to other suggestions, too. In the very least, or maybe just as a stop-gap measure, we can document the issue and suggest workarounds for nginx/apache.

@birkland
Copy link

Right now, it's unclear to me:

  • what information these links are intending to convey, and/or what is the use case for including them?
  • who (which module) is generating these links
  • where these links are configurable, if at all.

Right now, it seems the information these link headers convey is limited. i.e. they're all tag relation (unrelated to, say, what the RDF predicate would be when relating the object to the entity). The nature of the linked resource is not apparent to the consumer (i.e. they could be any other entity or taxonomy term. Name, subject, copyright, type, etc). title can be confusing, particularly (as is the case of Image in Bethany's example) when the same title is used for different resources.

It would seem ideal if they were configurable somehow. Maybe we want to simply un-check a checkbox to turn them off entirely. Maybe others might want to specify which taxonomies they wanted to include, or use a different rel (related, maybe), I'm not sure.

@dannylamb
Copy link
Contributor

dannylamb commented Feb 17, 2021

@bseeger I'm curious as to how they're getting duplicated. Are you tagging twice or do we just have a bug there.

@birkland It's provided by the main islandora module as an attempt to provide links to relevant items in message headers. The idea is that you'd be able to navigate the repository using just HEAD requests until you find what you want. I don't know who's actually using it, though. And our own backend, which I'd consider the main client / user of this feature, doesn't really use it much at all.

I think in terms of concrete steps forward, we can definitely

  1. Investigate the duplicates and de-dupe them. It's pretty wasteful / silly to keep them.
  2. Figure out how to limit / restrict them
    1. With little effort, we can only emit the headers to REST api requests and not when a user views the page in the browser
    2. With a bit more effort, we can toggle the feature with config

But if no one really is using this at all and it's more of a nuisance than anything else, we can totally deprecate and remove it. If we make it toggleable, and no one uses that toggle and everyone just sets it to off and walks away.... then we don't actually need to maintain that code at all.

@kayakr
Copy link
Contributor

kayakr commented Feb 17, 2021

I can see why link headers are potentially useful, but it was a tricky issue to diagnose when we encountered it for the first time, and nginx has quite a low allowance by default (64 I think). See previously #1519 Islandora generates Link headers for non-repository content

@bseeger
Copy link
Member Author

bseeger commented Feb 17, 2021

@dannylamb - I totally double tagged things (on purpose just to have a number of links in there). Here's the record: http://future.islandora.ca/node/40

Screen Shot 2021-02-17 at 2 21 03 PM

@mjordan
Copy link
Contributor

mjordan commented Feb 17, 2021

@dannylamb points out that

The idea is that you'd be able to navigate the repository using just HEAD requests until you find what you want.

If that's the case, couldn't a REST client use the JSON-LD for a node to do the same thing, using GET requests? If so, would we need all those link headers at all?

@dannylamb
Copy link
Contributor

@bseeger Good to know it's not a bug, just a use case that was never considered. Didn't plan on folks tagging twice.

@mjordan Good point. Everything we're exposing is in the jsonld already. The advantage would be that you don't have to pull down the whole record and can get by with just HEAD requests, which would be faster. But considering this has inconvenienced more people than those who have taken advantage of the feature, frankly I don't think it's even worth it at this point.

@mjordan
Copy link
Contributor

mjordan commented Feb 18, 2021

I completely agree. I'd gladly sacrifice the link headers for more reliability, especially when we have similar functionality that doesn't have nasty side effects. Would be happy to hear alternative points of view though.

@antbrown
Copy link

antbrown commented Jul 6, 2021

I have also recently come up against this nginx header limit being overcome by Link headers added by an entity_reference field.

I think in the short term I'm going to ask for the http_max_hdr limit to be increased on the server. Long term I suggest removing Link headers for non-Islandora objects or allowing site administrator to configure which entity types/fields are used to generate Link headers.

@mjordan
Copy link
Contributor

mjordan commented Nov 29, 2021

Workaround implemented in Islandora Workbench is to set Requests' max headers to 10,000 (from default of 100 headers).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement Identifies work on an enhancement to the Islandora codebase
Projects
Development

No branches or pull requests

7 participants