Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builds using specific .nar file from cache.nixos.org fail with 503 #207

Closed
busti opened this issue Feb 20, 2022 · 25 comments
Closed

builds using specific .nar file from cache.nixos.org fail with 503 #207

busti opened this issue Feb 20, 2022 · 25 comments

Comments

@busti
Copy link

busti commented Feb 20, 2022

Is your feature request related to a problem? Please describe.
I tried rebuilding my system today, first updating the channel and then running nixos-rebuild switch
This resulted in the following error message:

unable to download 'https://cache.nixos.org/nar/0g3mvx4rg81g9fdcjc5822v14vf73lnr84fcbxa8jdgciqa1m3qk.nar.xz': HTTP error 503
cannot build derivation '/nix/store/y2qkr4rzwxkghvs2ld9p4gsapknnknj8-cudatoolkit-10.2.89.drv': 1 dependencies couldn't be built
cannot build derivation '/nix/store/987vs37cvg7ryjps6z3vchv02rzfdbjn-nvtop-1.2.2.drv': 1 dependencies couldn't be built

Following the link with a browser reveals the following http server error message:

Error 503 Response object too large

Response object too large

Guru Mediation:
Details: cache-hhn4082-HHN 1645394518 540221339

Varnish cache server

Describe the solution you'd like
Disappearance of the error message and successful download of the file.

Describe alternatives you've considered
Downgrading my system channel or switching to unsable. I tried both and it does not work, the same error appears.
I also tried searching for mirrors of cache.nixos.org, but there seem to be no up-to-date ones.
Many other package managers seem to have this, i.e. apt has lot's of mirrors hosted by universities, but this does not seem to be a thing with nixos.

Additional context
I ran the scripts provided at the landing page of cache.nixos.org
Below is their result:

domain=cache.nixos.org
> dig -t A cache.nixos.org
    
    ; <<>> DiG 9.16.25 <<>> -t A cache.nixos.org
    ;; global options: +cmd
    ;; Got answer:
    ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39869
    ;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 1
    
    ;; OPT PSEUDOSECTION:
    ; EDNS: version: 0, flags:; udp: 1232
    ;; QUESTION SECTION:
    ;cache.nixos.org.		IN	A
    
    ;; ANSWER SECTION:
    cache.nixos.org.	1509	IN	CNAME	dualstack.v2.shared.global.fastly.net.
    dualstack.v2.shared.global.fastly.net. 27 IN A	151.101.114.217
    
    ;; Query time: 0 msec
    ;; SERVER: 10.11.2.1#53(10.11.2.1)
    ;; WHEN: Sun Feb 20 22:57:25 CET 2022
    ;; MSG SIZE  rcvd: 111
    
Exit: 0


> ping -c1 cache.nixos.org
    PING dualstack.v2.shared.global.fastly.net (151.101.114.217) 56(84) bytes of data.
    64 bytes from 151.101.114.217 (151.101.114.217): icmp_seq=1 ttl=59 time=15.1 ms
    
    --- dualstack.v2.shared.global.fastly.net ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 15.063/15.063/15.063/0.000 ms
Exit: 0


> ping -4 -c1 cache.nixos.org
    PING dualstack.v2.shared.global.fastly.net (151.101.114.217) 56(84) bytes of data.
    64 bytes from 151.101.114.217 (151.101.114.217): icmp_seq=1 ttl=59 time=16.7 ms
    
    --- dualstack.v2.shared.global.fastly.net ping statistics ---
    1 packets transmitted, 1 received, 0% packet loss, time 0ms
    rtt min/avg/max/mdev = 16.702/16.702/16.702/0.000 ms
Exit: 0


> ping -6 -c1 cache.nixos.org
    ping: connect: Network is unreachable
Exit: 0


> mtr -c 20 -w -r cache.nixos.org
    Start: 2022-02-20T22:57:25+0100
    HOST: traal                         Loss%   Snt   Last   Avg  Best  Wrst StDev
      1.|-- _gateway                       0.0%    20    0.9   0.9   0.5   1.2   0.2
      2.|-- 192.168.178.1                  0.0%    20    2.5   2.5   1.6   4.0   0.7
      3.|-- p3e9bf353.dip0.t-ipconnect.de  0.0%    20    9.8  13.0   9.5  55.2  10.0
      4.|-- f-ed11-i.F.DE.NET.DTAG.DE      0.0%    20   19.0  16.3  14.5  19.4   1.4
      5.|-- 80.150.170.243                 0.0%    20   17.5  17.1  13.5  38.5   5.4
      6.|-- 151.101.114.217                0.0%    20   14.0  16.0  13.9  33.7   4.6
Exit: 0


> curl_test -4 http://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 151.101.114.217:80...
    * Connected to cache.nixos.org (151.101.114.217) port 80 (#0)
    > GET / HTTP/1.1
    > Host: cache.nixos.org
    > User-Agent: curl/7.79.1
    > Accept: */*
    > 
    * Mark bundle as not supporting multiuse
    < HTTP/1.1 200 OK
    < Last-Modified: Fri, 03 Sep 2021 08:49:25 GMT
    < ETag: "5acf2b960cbd1b6e6ab71dd118e204e0"
    < Content-Type: text/html
    < Server: AmazonS3
    < Via: 1.1 varnish, 1.1 varnish
    < Content-Length: 2326
    < Accept-Ranges: bytes
    < Date: Sun, 20 Feb 2022 21:57:50 GMT
    < Age: 36976
    < Connection: keep-alive
    < X-Served-By: cache-iad-kcgs7200141-IAD, cache-hhn4036-HHN
    < X-Cache: HIT, HIT
    < X-Cache-Hits: 1, 1
    < X-Timer: S1645394271.829984,VS0,VE1
    < access-control-allow-origin: *
    < 
    { [897 bytes data]
    
100  2326  100  2326    0     0  73572      0 --:--:-- --:--:-- --:--:-- 75032
    * Connection #0 to host cache.nixos.org left intact
    
    time_namelookup:    0.000471
    time_connect:       0.015695
    time_appconnect:    0.000000
    time_pretransfer:   0.015745
    time_redirect:      0.000000
    time_starttransfer: 0.030416
    time_total:         0.031615
Exit: 0


> curl_test -6 http://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 2a04:4e42:1b::729:80...
    * Immediate connect fail for 2a04:4e42:1b::729: Network is unreachable
    * Closing connection 0
    curl: (7) Couldn't connect to server
    
    time_namelookup:    0.000394
    time_connect:       0.000000
    time_appconnect:    0.000000
    time_pretransfer:   0.000000
    time_redirect:      0.000000
    time_starttransfer: 0.000000
    time_total:         0.000045
Exit: 0


> curl_test -4 https://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 151.101.114.217:443...
    * Connected to cache.nixos.org (151.101.114.217) port 443 (#0)
    * ALPN, offering h2
    * ALPN, offering http/1.1
    } [5 bytes data]
    * TLSv1.3 (OUT), TLS handshake, Client hello (1):
    } [512 bytes data]
    * TLSv1.3 (IN), TLS handshake, Server hello (2):
    { [106 bytes data]
    * TLSv1.2 (IN), TLS handshake, Certificate (11):
    { [2875 bytes data]
    * TLSv1.2 (IN), TLS handshake, Server key exchange (12):
    { [300 bytes data]
    * TLSv1.2 (IN), TLS handshake, Server finished (14):
    { [4 bytes data]
    * TLSv1.2 (OUT), TLS handshake, Client key exchange (16):
    } [37 bytes data]
    * TLSv1.2 (OUT), TLS change cipher, Change cipher spec (1):
    } [1 bytes data]
    * TLSv1.2 (OUT), TLS handshake, Finished (20):
    } [16 bytes data]
    * TLSv1.2 (IN), TLS handshake, Finished (20):
    { [16 bytes data]
    * SSL connection using TLSv1.2 / ECDHE-RSA-AES128-GCM-SHA256
    * ALPN, server accepted to use h2
    * Server certificate:
    *  subject: CN=cache.nixos.org
    *  start date: Oct 15 08:42:06 2021 GMT
    *  expire date: Nov 16 08:42:05 2022 GMT
    *  subjectAltName: host "cache.nixos.org" matched cert's "cache.nixos.org"
    *  issuer: C=BE; O=GlobalSign nv-sa; CN=GlobalSign Atlas R3 DV TLS CA H2 2021
    *  SSL certificate verify ok.
    * Using HTTP2, server supports multiplexing
    * Connection state changed (HTTP/2 confirmed)
    * Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
    } [5 bytes data]
    * Using Stream ID: 1 (easy handle 0x23fb9f0)
    } [5 bytes data]
    > GET / HTTP/2
    > Host: cache.nixos.org
    > user-agent: curl/7.79.1
    > accept: */*
    > 
    { [5 bytes data]
    < HTTP/2 200 
    < last-modified: Fri, 03 Sep 2021 08:49:25 GMT
    < etag: "5acf2b960cbd1b6e6ab71dd118e204e0"
    < content-type: text/html
    < server: AmazonS3
    < via: 1.1 varnish, 1.1 varnish
    < accept-ranges: bytes
    < date: Sun, 20 Feb 2022 21:57:50 GMT
    < age: 36976
    < x-served-by: cache-iad-kcgs7200141-IAD, cache-hhn4021-HHN
    < x-cache: HIT, HIT
    < x-cache-hits: 1, 2
    < x-timer: S1645394271.918198,VS0,VE0
    < access-control-allow-origin: *
    < content-length: 2326
    < 
    { [1116 bytes data]
    
100  2326  100  2326    0     0  30196      0 --:--:-- --:--:-- --:--:-- 30605
    * Connection #0 to host cache.nixos.org left intact
    
    time_namelookup:    0.000639
    time_connect:       0.015143
    time_appconnect:    0.061348
    time_pretransfer:   0.061563
    time_redirect:      0.000000
    time_starttransfer: 0.076903
    time_total:         0.077028
Exit: 0


> curl_test -6 https://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 2a04:4e42:1b::729:443...
    * Immediate connect fail for 2a04:4e42:1b::729: Network is unreachable
    * Closing connection 0
    curl: (7) Couldn't connect to server
    
    time_namelookup:    0.000408
    time_connect:       0.000000
    time_appconnect:    0.000000
    time_pretransfer:   0.000000
    time_redirect:      0.000000
    time_starttransfer: 0.000000
    time_total:         0.000044
Exit: 0


> curl -I -4 https://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0  2326    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    HTTP/2 200 
    last-modified: Fri, 03 Sep 2021 08:49:25 GMT
    etag: "5acf2b960cbd1b6e6ab71dd118e204e0"
    content-type: text/html
    server: AmazonS3
    via: 1.1 varnish, 1.1 varnish
    accept-ranges: bytes
    date: Sun, 20 Feb 2022 21:57:51 GMT
    age: 36976
    x-served-by: cache-iad-kcgs7200141-IAD, cache-hhn4030-HHN
    x-cache: HIT, HIT
    x-cache-hits: 1, 1
    x-timer: S1645394271.012715,VS0,VE1
    access-control-allow-origin: *
    content-length: 2326
    
Exit: 0


> curl -I -4 https://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0  2326    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    HTTP/2 200 
    last-modified: Fri, 03 Sep 2021 08:49:25 GMT
    etag: "5acf2b960cbd1b6e6ab71dd118e204e0"
    content-type: text/html
    server: AmazonS3
    via: 1.1 varnish, 1.1 varnish
    accept-ranges: bytes
    date: Sun, 20 Feb 2022 21:57:51 GMT
    age: 36976
    x-served-by: cache-iad-kcgs7200141-IAD, cache-hhn4042-HHN
    x-cache: HIT, HIT
    x-cache-hits: 1, 1
    x-timer: S1645394271.103633,VS0,VE1
    access-control-allow-origin: *
    content-length: 2326
    
Exit: 0


> curl -I -4 https://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0  2326    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    HTTP/2 200 
    last-modified: Fri, 03 Sep 2021 08:49:25 GMT
    etag: "5acf2b960cbd1b6e6ab71dd118e204e0"
    content-type: text/html
    server: AmazonS3
    via: 1.1 varnish, 1.1 varnish
    accept-ranges: bytes
    date: Sun, 20 Feb 2022 21:57:51 GMT
    age: 36977
    x-served-by: cache-iad-kcgs7200141-IAD, cache-hhn4027-HHN
    x-cache: HIT, HIT
    x-cache-hits: 1, 1
    x-timer: S1645394271.198903,VS0,VE1
    access-control-allow-origin: *
    content-length: 2326
    
Exit: 0


> curl -I -6 https://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Couldn't connect to server
Exit: 0


> curl -I -6 https://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Couldn't connect to server
Exit: 0


> curl -I -6 https://cache.nixos.org/
      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0curl: (7) Couldn't connect to server
Exit: 0


@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/cache-nixos-org-responds-with-503/17800/3

@yorickvP
Copy link

yorickvP commented Mar 1, 2022

Probably caused by #206?. Might be fixed by setting up Segmented Caching

@edolstra
Copy link
Member

edolstra commented Mar 1, 2022

We should probably revert to the 2 GB limit. We really shouldn't produce NARs that big.

@bobvanderlinden
Copy link
Member

I made NixOS/nixpkgs#161070 as a follow-up from #206, but it wasn't a good solution either. I'm interested in what is a good solution and how to get there.

Cronjob that generates ISO images separate from the Hydra nixos-* builds?

@samueldr
Copy link
Member

samueldr commented Mar 1, 2022

The iso image has to be verifiably and trustably produced by the same build infra as the other packages.

The iso image shouldn't be this big. Maybe it's time to consider removing the build dependencies (e.g. gcc) from the iso images?

@bobvanderlinden
Copy link
Member

The image size is quite normal compared to other distros: NixOS/nixpkgs#159612 (comment)

Making it smaller makes sense, but it will still be big. I'm afraid it will be harder to do so if desktop environments (gnome/kde) keep growing.

@samuela
Copy link
Member

samuela commented Mar 8, 2022

I'm seeing the same thing every time I try to build something related to cudatoolkit. For example, in the course of trying to review NixOS/nixpkgs#153542 running

$ nix-build \
-A cudatoolkit_10_0 \
-A cudatoolkit_10_1 \
-A cudatoolkit_10_2 \
-A cudatoolkit_11_0 \
-A cudatoolkit_11_1 \
-A cudatoolkit_11_2 \
-A cudatoolkit_11_3 \
-A cudatoolkit_11_4 \
-A cudatoolkit_11_5

gives me a bunch of these 503 object too large errors.

@dsxmachina
Copy link

I face the same issue with opencv+cuda.
Does anyone have a quick-fix / hack for this ?
I use Nixos for my work environment and need cuda - but I would also like to update my system again :D

@samuela
Copy link
Member

samuela commented Mar 11, 2022

@dsxmachina You can use the --option substituters "" option as NobbZ notes on discourse. Be warned however that it will take a while since that means you build everything locally.

@samuela
Copy link
Member

samuela commented Mar 15, 2022

Is there anything we can do to get unblocked on this? This issue is currently blocking users from using cudatoolkit_10. There are a bunch of people running into this on Discourse:

cc @NixOS/cuda-maintainers

@cfhammill
Copy link

This is causing headaches for me as well. In my flake I tried setting:

nixConfig.substituters = [ ];

to try building cuda from source - this fails trying to build libunistring which fails because it's already in the nix store from the cache. So something about changing the substituters has triggered a cache miss for some important c libs, but for some reason tries to build to the place the cache.nixos.org build artifact was already downloaded.

 libtool: install: /nix/store/p4s4jf7aq6v6z9iazll1aiqwb34aqxq9-bootstrap-tools/bin/install -c .libs/libunistring.so.2.1.0 /nix/store/8ckxc8biqqfdwyhr0w70jgrcb4h7a4y5-libunistring-0.9.10/lib/libunistring.so.2.1.0
       > /nix/store/p4s4jf7aq6v6z9iazll1aiqwb34aqxq9-bootstrap-tools/bin/install: cannot remove '/nix/store/8ckxc8biqqfdwyhr0w70jgrcb4h7a4y5-libunistring-0.9.10/lib/libunistring.so.2.1.0': Permission denied
       > make[3]: *** [Makefile:3620: install-libLTLIBRARIES] Error 1
       > make[3]: Leaving directory '/tmp/nix-build-libunistring-0.9.10.drv-0/libunistring-0.9.10/lib'
       > make[2]: *** [Makefile:4422: install-am] Error 2
       > make[2]: Leaving directory '/tmp/nix-build-libunistring-0.9.10.drv-0/libunistring-0.9.10/lib'
       > make[1]: *** [Makefile:4416: install] Error 2
       > make[1]: Leaving directory '/tmp/nix-build-libunistring-0.9.10.drv-0/libunistring-0.9.10/lib'
       > make: *** [Makefile:1559: install-recursive] Error 1
       For full logs, run 'nix log /nix/store/nq5zrwpzxs20qvl54ks3frj14qhfalqp-libunistring-0.9.10.drv'

Then I tried

nixConf.fallback = true;

this fails with

error: substituter 'https://cache.nixos.org' is disabled
[0/135 built, 1/2/219 copied (1 failed) (1.8/3037.9 MiB), 0.7/585.3 MiB DL] terminate called after throwing an instance of 'nix::SubstituterDisabled'
  what():  error: substituter 'https://cache.nixos.org' is disabled
Aborted (core dumped)

@nixos-discourse
Copy link

This issue has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/why-is-it-rebuilding-a-derivation-whose-output-path-is-present/13326/4

@zimbatm
Copy link
Member

zimbatm commented Mar 17, 2022

I bet the issue is that Fastly refuses to cache those large objects. I don't know if there is a way to bump the limit on Fastly side.

@zimbatm
Copy link
Member

zimbatm commented Mar 17, 2022

2G seems to match this limit:
https://docs.fastly.com/en/guides/resource-limits#account-created-prior-to-june-17-2020

Cache file size (without streaming miss and without Segmented Caching enabled)

@zimbatm
Copy link
Member

zimbatm commented Mar 17, 2022

Applied the segment caching but still hitting the error with https://cache.nixos.org/nar/0g3mvx4rg81g9fdcjc5822v14vf73lnr84fcbxa8jdgciqa1m3qk.nar.xz

@vcunat
Copy link
Member

vcunat commented Mar 17, 2022

Looks resolved now to me. (and it wasn't a few minutes ago)

zimbatm added a commit that referenced this issue Mar 17, 2022
This should fix the issue we're seeing in #207
zimbatm added a commit that referenced this issue Mar 17, 2022
This should fix the issue we're seeing in #207
@zimbatm
Copy link
Member

zimbatm commented Mar 17, 2022

This is fixed now!

@zimbatm zimbatm closed this as completed Mar 17, 2022
@samuela
Copy link
Member

samuela commented Mar 17, 2022

Thank you so much @zimbatm! I really owe you one for this!

@SomeoneSerge
Copy link

You can use the --option substituters "" option as NobbZ notes on discourse. Be warned however that it will take a while since that means you build everything locally.
@samuela

One more reason to have preferLocalBuild and allowSubstitutes=false for cudatoolkit 🤔

@vcunat
Copy link
Member

vcunat commented Mar 17, 2022

If I heard right, this wasn't for the package itself but for its source code.

@samuela
Copy link
Member

samuela commented Mar 17, 2022

If I heard right, this wasn't for the package itself but for its source code.

That's correct. Oddly enough, the source code is still being cached even though the derivation itself is not built by Hydra. Does anyone know why that is?

@SomeoneSerge
Copy link

SomeoneSerge commented Mar 17, 2022

If I heard right, this wasn't for the package itself but for its source code.

I think these flags would imply that .run is also always downloaded by the local machine

(DISCLAIMER: I can be terribly wrong)

@samuela
Copy link
Member

samuela commented Mar 17, 2022

I think these flags would imply that .run is also always downloaded by the local machine

If true, we ought to enable them for cudatoolkit.

@vcunat
Copy link
Member

vcunat commented Mar 18, 2022

@zimbatm
Copy link
Member

zimbatm commented Mar 19, 2022

thanks, fixed in f010edb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests