Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long metadata author list (172 authors) capped at 69 authors #2061

Closed
rukayaj opened this issue Jul 11, 2023 · 42 comments
Closed

Long metadata author list (172 authors) capped at 69 authors #2061

rukayaj opened this issue Jul 11, 2023 · 42 comments
Assignees

Comments

@rukayaj
Copy link
Contributor

rukayaj commented Jul 11, 2023

GBIF URL: https://www.gbif.org/dataset/36914742-56c5-4d54-a18a-6ab1e41b9240#contacts
IPT URL: https://ukraine.ipt.gbif.no/resource?r=alienspeciesua1
IPT version 2.7.3

One of our Ukrainian data providers has contacted me, unable to enter the full list of authors for this dataset. We can add up to 69, and then when we try to save the 70th the IPT shows us the successfully saved message, but going back to the Basic Metadata only the original 69 are shown. It is possible to edit of the 1 - 69 authors and the changes are persisted.

I believe that previously it was possible to have all 172 authors included in the metadata, but I just have to confirm that with the data provider.

@rukayaj
Copy link
Contributor Author

rukayaj commented Jul 11, 2023

Some more info:

On the latest EML file (eml 1.5) when I search for the string it occurs 72 times. So I see two in the Resource Contacts section, and the others must be in the Resource Creators section.

On the 1.3 file there are 172 instances of the string. I think we might have updated the IPT at some point recently, so perhaps there's something in the new update?

@mike-podolskiy90 mike-podolskiy90 self-assigned this Jul 17, 2023
@mike-podolskiy90
Copy link
Contributor

@rukayaj Thank you for reporting the issue

@mike-podolskiy90
Copy link
Contributor

I was not able to reproduce the bug. I've tried the current development version (2.7.4-SNAPSHOT) and the same version 2.7.3 at different environments.
We need more information to get it fixed. Are there any errors in the logs? Could you send me the log file please?

@rukayaj
Copy link
Contributor Author

rukayaj commented Jul 17, 2023

Sure, here's the log file.
ukraine.ipt.gbif.no_admin_logfile.do_log=debug.txt

@mike-podolskiy90
Copy link
Contributor

Thanks. I've tried the IPT. It does not save more than 69 agents indeed. However I don't see any related errors, whether it's in the logs or the browser. The IPT sends data correctly and reports it's saved. No ideas for now what can be the issue, I've never seen something like this in the IPT.
If it's possible let's try to archive all resource files from your server and I'll try to reproduce it with the exact files you have

@rukayaj
Copy link
Contributor Author

rukayaj commented Jul 17, 2023

Hmm strange. I wonder what the problem is... Anyway, I emailed you with details of how to access the backup we made of the IPT.

@mike-podolskiy90
Copy link
Contributor

IPT sends a pretty large request to save the data there, so it's possible that the large URL encoded request is hitting some limitations or restrictions, causing the truncation of data and resulting in only part of the data being saved.

Tomcat may have limitations on the length of URLs. When the URL length exceeds the server's limit, it may truncate the request data. You can check the server configuration to determine the maximum URL length supported and we can compare it with the size of the URL encoded request you are sending.

Also if you are using Apache HTTP Server as a front-end proxy for Tomcat, it may have its own configuration for limiting URL length. Ensure that the Apache HTTP Server configuration allows for a large enough URL length to handle your request data.

@MattBlissett
Copy link
Member

I've only seen URL length limits for GET requests, but I assume this one is a POST. Some sort of web application firewall could be interfering with a request though.

@MichalTorma
Copy link
Contributor

We have our IPT(s) deployed in k8s cluster with standard nginx ingress - so I also don't think the issue is there. Also if it was just cutting off parameters, I'd assume there would be an error because of an invalid request?

@mike-podolskiy90
Copy link
Contributor

What was the first time the issue happened? And what was the IPT version when it was fine the last time?
Would it be possible to revert to that version and double-check it's working fine for that version?

@rukayaj
Copy link
Contributor Author

rukayaj commented Aug 23, 2023

Hmm we just applied your latest update and it doesn't work on that one. We're not sure which version it was actually working on unfortunately... Just that it was working at some time in the past. Are there any changes you've made to the IPT in the past year or so which might have caused an issue?

@mike-podolskiy90
Copy link
Contributor

There were many things. The biggest one is #1325, might be because of it, but I just can't reproduce it anywhere

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 13, 2024

Our user is asking about this again:

I am still writing about the issue that occurred in 2023 and it still exists.
The problem is that with some new updates on GBIF there appeared to be some sort of limit for the number of authors. We have some datasets with more than 100 co-authors (representing records from a big proceedings of conferences or collections).
The problem is that at some point (after some number of authors) the system stops reading and showing them, deleting them from IPT and everywhere.

Here are the problematic datasets.
https://www.gbif.org/dataset/36914742-56c5-4d54-a18a-6ab1e41b9240
https://www.gbif.org/dataset/b4f04ac9-5449-4dd8-90f9-f66fc942b781

An interesting thing is that there are some previous datasets that still exist without problems with bigger number of authors. But once I add something or change something in metadata and re-publish the dataset - the same problem occurs immediately.
Today I made this mistake with the second dataset, thinking that this was solved in the new version of the system.

Can this be somehow fixed? Can you please tell me if there were some progress with solving of this problem?

I have several authors who keep asking me why they are not in the authors list. They are quite angry and I don't know what can be done about this now. I heavily need the help of GBIF...

So I think we probably need to take a look at this again if possible. @MichalTorma and I did update the IPT to the latest version and the problem still seems to be there.

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 13, 2024

Oleksii adds: It seems that the limit is set for 31 authors and not more.

@mike-podolskiy90
Copy link
Contributor

mike-podolskiy90 commented Mar 15, 2024

@rukayaj Thank you.

Last time I checked I couldn't fix the issue because I couldn't reproduce it on any available IPT (cloud one or local).
I'll try again, maybe I find some clues

@mike-podolskiy90
Copy link
Contributor

@rukayaj May I ask you to send me this resource https://ukraine.ipt.gbif.no/manage/resource.do?r=alienspeciesua1 data directory archive? ( <data_directory>/resources/alienspeciesua1 )

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 15, 2024

So you added this ukraine dataset to one of your cloud ipts and it was working when you tried to add authors? When I tried it just now it seems like when you import from these archives (I tried https://ukraine.ipt.gbif.no/archive.do?r=redbookua2022) it doesn't actually import all the creators. Here is what I did:

  1. Downloaded https://ukraine.ipt.gbif.no/archive.do?r=redbookua2022
  2. Logged into to our test IPT which is hosted and managed in exactly the same way as the Ukraine IPT - https://test.ipt.gbif.no/
  3. Created https://test.ipt.gbif.no/manage/metadata-basic.do?r=ukraine-test using Create New > Import from an archived resource
  4. Published a public version, registered it with GBIF
  5. Went into the metadata, scrolled down to the last created Resource Creator (Kateryna Kuzhel)
  6. Tried to save a new author - it saved fine
  7. Checked in the citation list of https://www.gbif.org/dataset/b4f04ac9-5449-4dd8-90f9-f66fc942b781 and noticed Kuzhel was definitely not the last creator

Maybe it's worth having a call (and perhaps including Oleksii) so we can go through it step by step?

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 15, 2024

@rukayaj May I ask you to send me this resource https://ukraine.ipt.gbif.no/manage/resource.do?r=alienspeciesua1 data directory archive? ( <data_directory>/resources/alienspeciesua1 )

This is the one from our back up from yesterday which should be the same as what is on the server:
alienspeciesua1.zip

@mike-podolskiy90
Copy link
Contributor

mike-podolskiy90 commented Mar 15, 2024

Thank you. Looking into it

@mike-podolskiy90
Copy link
Contributor

Yes, I still don't experience anything like that either at the local IPT or https://ipt.gbif-uat.org (I can create you an account there)

If you're able to reproduce the issue at your test IPT, would it be possible to start it in a debug mode so I can connect remotely?

@mike-podolskiy90
Copy link
Contributor

Another thing - as you also mentioned previously, it used to work before. So we can try to downgrade the test IPT to try to locate what version causes the issues?

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 15, 2024

Hmm for downgrading I'd have to work out which version of the IPT we were using when the eml 1.3 file was created, I can take a look on Mon. I put the test ipt into debug mode using https://test.ipt.gbif.no/admin/config.do, is that what you meant?

If you email me (rukayasj@uio.no) an acc for https://ipt.gbif-uat.org/ I will test it the way I did our test IPT.

@mike-podolskiy90
Copy link
Contributor

I've sent credentials

@mike-podolskiy90
Copy link
Contributor

No, I think you need some manual configs to run Tomcat in debug.
https://stackoverflow.com/questions/16689274/how-to-start-debug-mode-from-command-prompt-for-apache-tomcat-server

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 15, 2024

I've sent credentials

Logged in, tried it, you're right, the dataset imports correctly with all the authors, and adding new authors isn't an issue. I wonder now whether there could be some limit in our deployment which stops tomcat creating larger file sizes? But then I would have thought it would cut off mid xml tag or something you know? All the xml is perfectly formed, it's just it isn't saving. So weird. And actually it can't be that because writing more info into the eml works, it's just literally adding more creators that doesn't.

Just be aware we deploy to a k8s cluster using this helm chart https://github.com/gbif-norway/ipt-s3/tree/main/helm/ipt-s3, and in case more info is helpful:

So i can log into into the test ipt pod and change tomcat to run in debug but then I guess you need some kind of port to be opened to connect to as well? It might just be easier for us to temporarily make you a user acc so you can kubectl exec into our test ipt pod. Maybe this is not something for Friday evening though, let's pick it up on Mon if you have time. Thanks for your help, have a good weekend!

@mike-podolskiy90
Copy link
Contributor

Thank you. Yes, let's try to figure it out on Monday

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 15, 2024

https://chat.openai.com/share/1cd41752-1c51-4206-80f9-50dedea49318 chatgpt suggests a few things, the one that jumps out to me is number 2 upping the pod resource limits - I want to try that because we're not being super generous with it right now. But I'll try on Monday, just adding this as a comment so I don't forget :)

@mike-podolskiy90
Copy link
Contributor

@rukayaj Did you manage to find out anything?

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 20, 2024

I had something urgent I had to do this week, I still haven't tried upping the resources yet. I will try and do something this evening or tomorrow...

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 21, 2024

Doubling the resources (see referenced issue) doesn't seem to have done the trick, unfortunately.
Before:

        resources:
          limits:
            memory: 1024Mi
            cpu: "1"
          requests:
            memory: 512Mi
            cpu: "0.5"

After:

        resources:
          limits:
            memory: 2048Mi
            cpu: "2"
          requests:
            memory: 1024Mi
            cpu: "1"

Verified with kubectl describe pod:

Containers:
  test-ipt:
    Container ID:   containerd://271afcd5746f02bc8427a87c8ba21ff50c8ff324955d3c0900016f064e6ba32c
    Image:          gbifnorway/ipt-s3:latest
    Image ID:       docker.io/gbifnorway/ipt-s3@sha256:0e20199b7cac127ba7de8b2dc06c108c724ace0b2cbd0e5820949ae57c027e8a
    Port:           8080/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Thu, 21 Mar 2024 08:24:53 +0100
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  2Gi
    Requests:
      cpu:        1
      memory:     1Gi

These were my steps:

  1. Created a new resource from the dwca-redbookua2022-v1.9 archive
  2. Saved organisation as something, checked to make sure it saved properly
  3. Scrolled down and verified the last creator in the EML was imported correctly
  4. Added a new creator "test" in First Name, Last Name, Position and Organization
  5. Saved, received the "Basic Metadata successfully saved" message
  6. Navigated back, scrolled to the bottom, 'test' contact was not there

I also tried adding multiple creators at once at step 4. Then I repeated the whole process 3 times for that archive, and then I tried it with dwca-alienspeciesua1-v1.5.zip (freshly downloaded from the ukraine IPT), same result. I downloaded a new one because the old zip I had from last time didn't seem to work, not sure what that was about.

I'm emailing you a username+pass so you can access our test IPT, can't remember if I did it before.

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 21, 2024

So I suppose next debugging step would be to try run Tomcat in debug and open it so you can connect? If we can't figure this out one thing we could do is move the Ukraine IPT over to your cloud, now that you have individual hosted IPTs for countries I think they would be happy with e.g. https://cloud.gbif.org/ua/. I am curious about what could possibly be causing it now though. How are you deploying there, also using kubernetes and helm?

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 21, 2024

I'm going to try 2x the resource allocation again just to make sure... Edit: Nope, still not. Def not a resource problem then :(

@mike-podolskiy90
Copy link
Contributor

So I suppose next debugging step would be to try run Tomcat in debug and open it so you can connect? If we can't figure this out one thing we could do is move the Ukraine IPT over to your cloud, now that you have individual hosted IPTs for countries I think they would be happy with e.g. https://cloud.gbif.org/ua/. I am curious about what could possibly be causing it now though. How are you deploying there, also using kubernetes and helm?

Yes, I think I need to try debugging. We will also likely need to analyze Tomcat's HTTP traffic to see what IPT is receiving.

I don't think we host dedicated IPTs for non-participant countries, but they can surely publish on https:://cloud.gbif.org/eca for example. I'll need to clarify.

We don't have kubernetes here for IPTs, we have a simple custom script that replaces war files in Tomcat.

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 21, 2024

Yes, I think I need to try debugging. We will also likely need to analyze Tomcat's HTTP traffic to see what IPT is receiving.

Ok, I need to finish something else today but I'll try open it for you tomorrow or later today when I have time.

I don't think we host dedicated IPTs for non-participant countries, but they can surely publish on https:://cloud.gbif.org/eca for example. I'll need to clarify.

Hmm they were on there to begin with as far as I remember and decided it was better to have their own space as they have so many datasets.

We don't have kubernetes here for IPTs, we have a simple custom script that replaces war files in Tomcat.

👍 simpler is often better

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 22, 2024

I'm working on giving you access now, but I think realistically I should probably only actually do it on Monday so I don't leave the port open all weekend - sorry this keeps getting delayed, and thanks for staying engaged and for your help. I will post again on Mon.

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 25, 2024

Sent you an email with connection info, @mike-podolskiy90

@rukayaj
Copy link
Contributor Author

rukayaj commented Mar 26, 2024

Ok I'm struggling with this a bit 😅 but I'm guessing you're going on leave soon for Easter right? I'm off from tomorrow, so maybe it'll be best to pick it up again after.

@mike-podolskiy90
Copy link
Contributor

I will be working tomorrow, then we gonna have 5 holidays here in Denmark.
Yes, we can get back to it next week

@rukayaj
Copy link
Contributor Author

rukayaj commented Apr 2, 2024

I have a ticket open with digital ocean (our service provider) about this by the way, it's a networking issue (I think) and it's still unresolved. I'll keep this thread updated.

@rukayaj
Copy link
Contributor Author

rukayaj commented Apr 9, 2024

I'll give this one more day with digital ocean support and if they haven't come up with some way of fixing it I'll give up and we'll try something else.

@rukayaj
Copy link
Contributor Author

rukayaj commented Apr 10, 2024

Nice work @mike-podolskiy90 and @MichalTorma :) And thank you very much for your patience with the networking issues @mike-podolskiy90. I think we can probably close this now?

@rukayaj rukayaj closed this as completed Apr 10, 2024
@mike-podolskiy90
Copy link
Contributor

Glad this solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants