Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1635: Try downloading orc-format from dlcdn.apache.org before archive.apache.org #1830

Closed
wants to merge 1 commit into from

Conversation

progval
Copy link
Contributor

@progval progval commented Feb 28, 2024

What changes were proposed in this pull request?

Try downloading orc-format from dlcdn.apache.org before archive.apache.org

This replaces #1820 which required dlcdn to have the current version.

Why are the changes needed?

https://archive.apache.org/ discourages heavy use, and its rate limits can cause CI systems building Apache ORC to be banned.

How was this patch tested?

It builds from a clean repo

Was this patch authored or co-authored using generative AI tooling?

no

…e.org

https://archive.apache.org/ discourages heavy use, and its rate limits
can cause CI systems building Apache ORC to be banned.
@@ -72,6 +72,7 @@ endif ()
# ----------------------------------------------------------------------
# ORC Format
ExternalProject_Add (orc-format_ep
URL "https://dlcdn.apache.org/orc/orc-format-${ORC_FORMAT_VERSION}/orc-format-${ORC_FORMAT_VERSION}.tar.gz"
URL "https://archive.apache.org/dist/orc/orc-format-${ORC_FORMAT_VERSION}/orc-format-${ORC_FORMAT_VERSION}.tar.gz"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to https://cmake.org/cmake/help/latest/module/ExternalProject.html#url , this looks wrong to me. Have you tested this PR, @progval ?

URL <url1> [<url2>...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the behavior is the same either way. I checked by replacing the first URL with a non-existing one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thank you for confirming, @progval .

@dongjoon-hyun dongjoon-hyun added this to the 2.1.0 milestone Feb 29, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM (Pending CIs) for Apache ORC 2.1.0.

@dongjoon-hyun
Copy link
Member

Merged to main for Apache ORC 2.1.0. Thank you, @progval .

@douardda
Copy link
Contributor

douardda commented Mar 1, 2024

+1, LGTM (Pending CIs) for Apache ORC 2.1.0.
Merged to main for Apache ORC 2.1.0. Thank you, @progval .

Thanks, but why not merging to all active branches? (at least 1.9 which is advertised as the current release and 2.0)? In a context of this being a fix mainly for CI workloads, makes sense to apply it on the current release version.

@progval
Copy link
Contributor Author

progval commented Mar 1, 2024

v1.9 does not download orc-format so it's not applicable to that branch. But I agree it would be nice to have it on v2.0

@dongjoon-hyun
Copy link
Member

I fully understand why you (@douardda and @progval) are thinking in that way. However, this is categorized as Improvement instead of Bug, isn't it? Technically, dlcdn.apache.org is a caching layer without any guarantee of file existence.

Screenshot 2024-03-01 at 08 33 10

From my perspective, main branch is the most active branch (for developing and receiving pull requests) which contributing the download traffic to ASF site. It's good to have because it will be in sync always with orc-format repository and its latest version. However, for the release branches, we don't think that's true.

In general, as one of the open source community,

  1. Apache ORC community follows Semantic Versioning officially.
  2. For all release branches, we don't backport the Improvement with one exception which tools module. There was an official decision about tools module exception in the dev mailing list.
  3. FYI, the feature freeze of Apache ORC v2.0 happens already when we cut a branch, branch-2.0. After that, we are focusing QA activity(testing, bug fixing and polishing) instead of core changes.

In short, this improvement contribution is simply late to be a part of 2.0. It will be 2.1.

@progval
Copy link
Contributor Author

progval commented Mar 2, 2024

I opened the JIRA ticket as Bug instead of Improvement because I see this as a fixing a regression: v1.x didn't download the file so it builds fine from anywhere, v2.0 downloads the file from archive.apache.org so it can't be built on CIs

@dongjoon-hyun
Copy link
Member

I believe that we already agreed that your original claim was wrong here (#1820 (comment))

I opened the JIRA ticket as Bug instead of Improvement because I see this as a fixing a regression: v1.x didn't download the file so it builds fine from anywhere, v2.0 downloads the file from archive.apache.org so it can't be built on CIs

I'm wondering if you are still claiming that "it can't be built on CIs". What are you unable to build on CIs?

@progval
Copy link
Contributor Author

progval commented Mar 5, 2024

I'm wondering if you are still claiming that "it can't be built on CIs".

I can now build the main branch, but not the v2.0 branch.

What are you unable to build on CIs?

I can't build the C++ code in the v2.0 branch of this repository on my CI because I get this error:

13:54:56    -- Downloading...
13:54:56       dst='/var/lib/jenkins/workspace/DGRPH/gitlab-builds/target/debug/build/orcxx-8d1ca2e7d12cd415/orc/orc-format_ep-prefix/src/orc-format-1.0.0.tar.gz'
13:54:56       timeout='none'
13:54:56       inactivity timeout='none'
13:54:56    -- Using src='https://archive.apache.org/dist/orc/orc-format-1.0.0/orc-format-1.0.0.tar.gz'
13:54:56  
13:54:56    --- stderr
13:54:56    CMake Error at orc-format_ep-stamp/download-orc-format_ep.cmake:170 (message):
13:54:56      Each download failed!
13:54:56  
13:54:56        error: downloading 'https://archive.apache.org/dist/orc/orc-format-1.0.0/orc-format-1.0.0.tar.gz' failed
13:54:56              status_code: 28
13:54:56              status_string: "Timeout was reached"
13:54:56              log:
13:54:56              --- LOG BEGIN ---
13:54:56                Trying 65.108.204.189:443...
13:54:56        Trying [2a01:4f9:1a:a084::2]:443...
13:54:56  
13:54:56      Immediate connect fail for 2a01:4f9:1a:a084::2: Cannot assign requested
13:54:56      address
13:54:56  
13:54:56      connect to 65.108.204.189 port 443 failed: Connection timed out
13:54:56  
13:54:56      Failed to connect to archive.apache.org port 443 after 129277 ms: Couldn't
13:54:56      connect to server
13:54:56  
13:54:56      Closing connection 0

after being blocked by archive.apache.org

@dongjoon-hyun
Copy link
Member

To @progval , it's your network environment network issue which doesn't follow the ASF policy correctly.

after being blocked by archive.apache.org

As I mentioned before, when Apache ORC community release Apache ORC Format 1.0.1, your claim will be broken again because your network environment will block you to access Apache ORC Format 1.0.0. I'd like to recommend to fix your network permanently to allow the archive link.

@pitrou
Copy link
Member

pitrou commented Dec 9, 2024

For the record, on Apache Arrow we are getting similar download failures on public GitHub CI.

@dongjoon-hyun
Copy link
Member

Ack. Thank you for reporting @pitrou .

@kou
Copy link
Member

kou commented Dec 9, 2024

FYI: We must use https://www.apache.org/dyn/closer.lua/orc/orc-format-${ORC_FORMAT_VERSION}/orc-format-${ORC_FORMAT_VERSION}.tar.gz?action=download instead. The dlcdn.apache.org URL doesn't work when orc-format-2.0.0 is released and orc-format-1.0.0 is removed from https://dist.apache.org/repos/dist/release/orc/ . If we use the closer.lua URL, we don't need to list dlcdn.apache.org URL (for the latest release) and archive.apache.org URL (for old releases). closer.lua redirects to dlcdn.apache.org or something for the latest release and archive.apache.org for old releases automatically.

See also:

@dongjoon-hyun
Copy link
Member

Sure, I'll do the follow-up in Apache ORC side to use https://www.apache.org/dyn/closer.lua, @kou . Thanks!

dongjoon-hyun added a commit that referenced this pull request Dec 10, 2024
### What changes were proposed in this pull request?

This PR aims to use the ASF-recommended `closer.lua` URL instead of the direct `dlcdn` link or `archive` link.
- https://infra.apache.org/release-download-pages.html#download-page

    >  you can generate a direct download link using the following syntax:
    > http://www.apache.org/dyn/closer.lua/bar/foo/foo-5.5.1.zip?action=download

### Why are the changes needed?

To use the recommended download link in order to stabilize CIs.

This is suggested from Arrow community.
- #1830 (comment)
- apache/arrow#44977

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #2081 from dongjoon-hyun/ORC-1811.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
dongjoon-hyun added a commit that referenced this pull request Dec 10, 2024
This PR aims to use the ASF-recommended `closer.lua` URL instead of the direct `dlcdn` link or `archive` link.
- https://infra.apache.org/release-download-pages.html#download-page

    >  you can generate a direct download link using the following syntax:
    > http://www.apache.org/dyn/closer.lua/bar/foo/foo-5.5.1.zip?action=download

To use the recommended download link in order to stabilize CIs.

This is suggested from Arrow community.
- #1830 (comment)
- apache/arrow#44977

Pass the CIs.

No.

Closes #2081 from dongjoon-hyun/ORC-1811.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
(cherry picked from commit 28fe676)
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants