Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Looking for an efficient way to monitor for download folder changes #349

Open
okurz opened this issue Jan 30, 2023 · 15 comments
Open

Looking for an efficient way to monitor for download folder changes #349

okurz opened this issue Jan 30, 2023 · 15 comments
Labels
enhancement New feature or request todo

Comments

@okurz
Copy link
Member

okurz commented Jan 30, 2023

Motivation

I reported https://progress.opensuse.org/issues/123797 about a problem that a pipeline that was monitoring the URL http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next* is now always reporting changes even though no new files where showing up in that folder.

the content always changes as the generated HTML page shows a "csrf-token" changing on each call. I found that by calling

diff <(curl -sS "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next*") <(curl -sS "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next*")

which yields:

<       <meta name="csrf-token" content="3a72819665c9adb750ad4d5e8054961c5b1c3efc" />
---
>       <meta name="csrf-token" content="0b17cbf73c393a1baa4daee989158038caeafc7c" />

As there is also no "last-modified" served in HEAD of those documents and also not when looking into files themselves like

curl --head "http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso.sha256"

I wonder what is the best approach to look for changes in a folder and trigger external services accordingly. Right now the best approach I found is to download the checksum file http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso.sha256 recurringly and check for changes in content

Suggestions

  • Maybe the always changing "csrf-token" can be removed from those generated pages. I don't think this needs to be delivered to users
  • Would it be possible to add a "last-modified" to the page header?
@andrii-suse
Copy link
Collaborator

Maybe using json output will resolve the issue?
https://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next*&json

@okurz
Copy link
Member Author

okurz commented Jan 31, 2023

I wasn't aware how to find JSON routes. This looks helpful however the content is bigger than the checksum file so it would be less efficient to fetch that and compare in content. Also the "last-modified" approach would be more efficient

@andrii-suse
Copy link
Collaborator

So what last-modified should refer to when using pattern or regex for file name? The max mtime of files that match the pattern / regex?

@okurz
Copy link
Member Author

okurz commented Jan 31, 2023

So what last-modified should refer to when using pattern or regex for file name? The max mtime of files that match the pattern / regex?

I guess it could simply be the same "last-modified" of the complete page regardless of the filtering. So it would be like the "minimum mtime". The filtering would be applied on top.

@andrii-suse
Copy link
Collaborator

But then it will not be usable for monitoring for changes with file filter, because a change in the mtime may be related to some other files. Do I get it right?

@andrii-suse
Copy link
Collaborator

Another way may be to check for header response. For now it works only for mirrorcache.o.o , but I probably need to fix it for download.o.o as well the same way:

curl -Is https://mirrorcache.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso | grep location
location: /repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.2-Build23.9.iso

@andrii-suse
Copy link
Collaborator

andrii-suse commented Jan 31, 2023

So far it works only for *Current.iso, but I can add support for the same logic for GNOME*.iso , e.g.

curl -sI https://download.opensuse.org/tumbleweed/iso/openSUSE-Tumbleweed-DVD-x86_64-Current.iso | grep -i location
location: https://download.opensuse.org/tumbleweed/iso/openSUSE-Tumbleweed-DVD-x86_64-Snapshot20230129-Media.iso

andrii-suse added a commit that referenced this issue Feb 7, 2023
andrii-suse added a commit that referenced this issue Feb 9, 2023
@andrii-suse
Copy link
Collaborator

FYI GNOME_Next now redirects to particular build - this may be good way to track changes:

curl -Is https://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso | grep -I location
location: /repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64-43.2-Build23.34.iso

We can close the ticket or let it wait til there will be priority to provide something like mtime in response header.

@okurz
Copy link
Member Author

okurz commented Feb 9, 2023

That looks good. But I don't think I can instruct jenkins to read just the head and the always-changing "csrf-token" as described in the original description still seems to be problematic.

@andrii-suse
Copy link
Collaborator

I was working on a similar issue and investigating adding etag and x-media-version headers:

# curl -I http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next.x86_64*Build*.iso
HTTP/1.1 200 OK
etag: 1-664C81BE
x-media-version: 29.64

Sad thing that it doesn't work properly if many files match the mask, but it should be easy to fix.
But the question: will such (properly working) headers be enough or do you still prefer to have last-modified?

@andrii-suse
Copy link
Collaborator

also :

 curl -Is https://download.opensuse.org/repositories/GNOME:/Medias/images/iso/GNOME_Next.x86_64.iso
HTTP/2 302 
etag: 664C81BE-64310000
x-media-version: 29.64

@andrii-suse
Copy link
Collaborator

also now:

# curl -I http://download.opensuse.org/repositories/GNOME:/Medias/images/iso/?P=GNOME_Next*
etag: 9-664C9DC0
x-media-version: 29.65

@okurz
Copy link
Member Author

okurz commented May 29, 2024

But the question: will such (properly working) headers be enough or do you still prefer to have last-modified?
As I stated the goal is that jenkins can handle a way to monitor download folders to decide if jenkins builds should be triggered.

@andrii-suse
Copy link
Collaborator

In my understanding jenkins can monitor etag or x-media-version response headers, so I am closing the call.

@andrii-suse
Copy link
Collaborator

andrii-suse commented Oct 18, 2024

On second thought both last-modified and csrf-token are needed as well, so I will change it to feature request

@andrii-suse andrii-suse reopened this Oct 18, 2024
@andrii-suse andrii-suse added enhancement New feature or request todo labels Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request todo
Projects
None yet
Development

No branches or pull requests

2 participants