Can't access InRelease files #7373

NatoBoram · 2020-05-26T23:30:51Z

Version information:

go-ipfs version: 0.6.0-dev-413ab315b
Repo version: 9
System version: arm64/linux
Golang version: go1.14.3

OS: Ubuntu 20.04 LTS aarch64 
Host: Raspberry Pi 4 Model B Rev 1.2 
Kernel: 5.4.0-1011-raspi 
Uptime: 2 days, 43 mins 
Packages: 669 (dpkg), 6 (snap) 
Shell: bash 5.0.16 
Terminal: /dev/pts/0 
CPU: BCM2835 (4) @ 1.500GHz 
Memory: 885MiB / 3793MiB

Description:

I'm trying to build a mirror of Ubuntu Archives on IPNS using a Raspberry Pi and a 2 TB external HDD. So far, thing are going pretty well, but I think I've encountered a breaking bug.

deb http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal           main restricted universe multiverse # IPNS
deb http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal-updates   main restricted universe multiverse # IPNS
deb http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal-backports main restricted universe multiverse # IPNS
deb http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal-security  main restricted universe multiverse # IPNS
deb http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal-proposed  main restricted universe multiverse # IPNS

Err:11 http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal-updates InRelease
  Connection failed [IP: 127.0.0.1 8080]
Err:12 http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal-backports InRelease
  Connection failed [IP: 127.0.0.1 8080]
Err:13 http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal-security InRelease
  Connection failed [IP: 127.0.0.1 8080]
Err:14 http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu focal-proposed InRelease
  Connection failed [IP: 127.0.0.1 8080]
Fetched 265 kB in 4min 0s (1 102 B/s)
Reading package lists... Done
Building dependency tree       
Reading state information... Done
12 packages can be upgraded. Run 'apt list --upgradable' to see them.
W: Failed to fetch http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu/dists/focal-updates/InRelease  Connection failed [IP: 127.0.0.1 8080]
W: Failed to fetch http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu/dists/focal-backports/InRelease  Connection failed [IP: 127.0.0.1 8080]
W: Failed to fetch http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu/dists/focal-security/InRelease  Connection failed [IP: 127.0.0.1 8080]
W: Failed to fetch http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu/dists/focal-proposed/InRelease  Connection failed [IP: 127.0.0.1 8080]
W: Some index files failed to download. They have been ignored, or old ones used instead.

According to those logs, the problem occurs at http://localhost:8080/ipns/QmSbCLwYuqBGQYTG4PBHaFunsKcpLLn97ApNn1wf6cV8jd/ubuntu/dists.

I'm using this to query multiple public gateways to know if they can access the file.

To speed up discovery, ipfs swarm connect /p2p/QmV8TePNsdZiXUpq62739hp5MJLSk8SdpSWcpLxaqhRQdR.

The text was updated successfully, but these errors were encountered:

Stebalien · 2020-05-26T23:59:41Z

Is it a symlink?

NatoBoram · 2020-05-27T00:03:59Z

There's a huge probability it is; There's way too many links in there. I noticed some of them were just downloaded as files and it looks like some other just aren't reachable.

Stebalien · 2020-05-27T00:04:54Z

Could you give me your full multiaddr? I can't find your node.

Stebalien · 2020-05-27T00:05:44Z

(but yeah, we need to follow symlinks on the gateway)

Stebalien · 2020-05-27T00:26:23Z

I can't seem to reach any of those addresses. But you can check to see if it's a symlink by calling `ipfs get` on the file in question.

NatoBoram · 2020-05-27T04:20:14Z

Oh. I think we found the problem.

ipfs get bafybeihocm6ufvyz44kde6fewu2wsj4qfiecfzbjubbekvcnw3hr7u3smq/ubuntu/dists/focal-updates/
Saving file(s) to focal-updates
 311.33 MiB / 311.33 MiB [==================================================================================] 100.00% 1s
Error: data in file did not match. mirrors/ubuntu/dists/focal-updates/InRelease offset 0

Because rsync takes 10 minutes to re-sync and IPFS takes multiple hours to re-sync, there's no way the InRelease file can match.

Is there a way to make the adding process faster? Right now, the command I'm using is ipfs add --recursive --hidden --quieter --wrap-with-directory --chunker=rabin --nocopy --fscache --cid-version=1.

I saw in ipfs-inactive/package-managers#18 that removing --nocopy held huge improvements, but that's kinda hard when Ubuntu Archives is 1.24 TB and I have only 2 TB available 🤔

Stebalien · 2020-06-02T19:52:51Z

Removing --fscache may help. Other than that, which datastore are you using? Could you post the output of ipfs config show?

NatoBoram · 2020-06-04T13:54:03Z

ipfs config show

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/127.0.0.1/tcp/5001",
    "Announce": [],
    "Gateway": "/ip4/0.0.0.0/tcp/8080",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4001",
      "/ip6/::/tcp/4001",
      "/ip4/0.0.0.0/udp/4001/quic",
      "/ip6/::/udp/4001/quic"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb"
  ],
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "child": {
        "path": "badgerds",
        "syncWrites": false,
        "truncate": true,
        "type": "badgerds"
      },
      "prefix": "badger.datastore",
      "type": "measure"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "FilestoreEnabled": true,
    "GraphsyncEnabled": true,
    "Libp2pStreamMounting": true,
    "P2pHttpProxy": true,
    "ShardingEnabled": true,
    "StrategicProviding": true,
    "UrlstoreEnabled": true
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "QmV8TePNsdZiXUpq62739hp5MJLSk8SdpSWcpLxaqhRQdR"
  },
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Router": ""
  },
  "Reprovider": {
    "Interval": "12h",
    "Strategy": "all"
  },
  "Routing": {
    "Type": "dht"
  },
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {
      "GracePeriod": "20s",
      "HighWater": 900,
      "LowWater": 600,
      "Type": "basic"
    },
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "DisableRelay": false,
    "EnableAutoRelay": true,
    "EnableRelayHop": true
  }
}

Since I got the data in file did not match error, I removed the --nocopy option, but now I need 2.48 TB of storage and I only have 1.80 TB. I think this project will sink for me ^^

Right now, I'm using Btrfs and duperemove to save on the duplication, but it looks like not much of the Badger Datastore can be deduplicated. If I could deduplicate just enough to not go over my 1.8 TB budget, I would be able to publish this mirror and actually use it.

apt show duperemove

Package: duperemove
Version: 0.11.1-3
Priority: optional
Section: universe/admin
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Original-Maintainer: Peter Záhradník <peter.zahradnik@znik.sk>
Bugs: https://bugs.launchpad.net/ubuntu/+filebug
Installed-Size: 260 kB
Depends: libc6 (>= 2.14), libglib2.0-0 (>= 2.31.8), libsqlite3-0 (>= 3.7.15)
Enhances: btrfs-progs
Homepage: https://markfasheh.github.io/duperemove/
Download-Size: 70.6 kB
APT-Sources: http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages
Description: extent-based deduplicator for file systems
 Duperemove is a tool for finding duplicated extents and submitting them for
 deduplication.  When given a list of files it will hash their contents on a
 block by block basis and compare those hashes to each other, finding and
 categorizing extents that match each other.
 .
 On BTRFS and, experimentally, XFS, it can then reflink such extents in a
 race-free way.  Unlike hardlink-based solutions, affected files appear
 independent in any way other than reduced disk space used.

Stebalien · 2020-06-04T17:29:31Z

Got it. I wanted to make sure you were using badger without sync writes enabled.

I'm not sure why removing --nocopy helps and I'm not entirely sure that that's still true after some optimizations we've made.

Note: I'd consider using snapshots to decouple these. That is, you can:

Rsync in one loop every 10? minutes.
In a separate loop:
1. Take a btrfs snapshot (use flock to make sure an rsync run isn't running?).
2. Add this btrfs snapshot to IPFS with nocopy.

That will mean that the IPFS mirror will always be a bit behind but you'll never have to stall the HTTP mirror to wait on the IPFS mirror. This will also ensure that you never modify files after adding them to IPFS.

NatoBoram · 2020-06-04T19:50:52Z

Oh, that's very interesting. For the --nocopy option to work, new files have to be in a different path than the old files, and unchanged files mustn't be removed. That means I'll end up with an ever-growing amount of snapshots, roughly once per ipfs add.

Is there a way to cleanup the snapshots? What happens if I add using --nocopy a file that already exists elsewhere?

Stebalien · 2020-06-04T20:13:34Z

That means I'll end up with an ever-growing amount of snapshots, roughly once per ipfs add.

Yes, but the snapshots should dedup.

Is there a way to cleanup the snapshots? What happens if I add using --nocopy a file that already exists elsewhere?

Unfortunately, I don't think it's possible to override old files with new files. I believe for performance reasons, we don't bother replacing old "filestore no copy" records with ones pointing to new files.

Honestly, I think the best approach here would be to create a new repo, add a new snapshot, then delete the old repos and the old snapshots (once every few days). I assume the repos (with --nocopy) aren't too large, right?

Otherwise, we may be able to find a way to bypass the "do I already have this block check" by adding yet another flag (but I'd prefer not to if possible).

NatoBoram · 2020-06-06T03:43:25Z

Otherwise, we may be able to find a way to bypass the "do I already have this block check" by adding yet another flag (but I'd prefer not to if possible).

This seems very useful. In fact, it's confusing that it's not already the case; If I add a new file using --nocopy, then I expect the unpinned ones to be replaced. Another approach could be to add multiple sources to --nocopy files, but I'm not sure if it's that useful. I think I prefer just overriding the previous link.

I believe the benefices are real. Should I raise an issue for that?

Stebalien · 2020-06-06T04:00:17Z

It deserves an issue, but I'm not sure about the best approach. A really nice property of the current blockstore is that it's idempotent. This change would break that.

ivan386 · 2020-06-06T18:32:05Z

@Stebalien
1 Check have
2 Validate
3 Replace if old is bad block

Stebalien · 2020-06-12T20:30:26Z

I'm closing this as it's not really a bug. Removing/changing a file on disk after adding it to go-ipfs with the --nocopy flag isn't allowed.

NatoBoram · 2020-06-18T19:24:49Z

Hey! I just wanted to add that I've updated my script to manage snapshots as you suggested.

I had to create a Btrfs subvolume and move the mirror over, but this done overnight, I'm now adding it back to IPFS using a fresh badgerds. It seems to take a very long time.

The problem with the program I made is that it's now dependent on Btrfs. While I do love Btrfs, I'm not sure if it's a great idea for my ipfs-mirror-manager to be tied to a filesystem. Moreover, the --nocopy option makes it mandatory for the node to boot from the same drive as the mirror itself. It would be nice to be able to separate them.

Nonetheless, successfully pulling off an IPFS mirror of the Ubuntu archive on a Raspberry Pi would be very impressive, and I'm extremely proud that IPFS has come this far.

At this time, the .ipfs folder is only 1.3G. The total disk usage is 1.3T.

Stebalien · 2020-06-18T20:09:20Z

So, my ideal solution here would be to just not use the go-ipfs daemon, but instead write a custom dropbox like IPFS service by cobbling together bitswap, libp2p, a datastore, and the DHT. It would:

Monitor a directory for changes.
When a file is added, it would chunk, hash, and index (but not copy) the file. You could even store the results in an sql database instead of using a datastore.
When a file is removed/changed, it would remove references to the file.

The database schema would be:

Table: files
- filename (primary key)
- modtime
Table: blocks
- id (primary key)
- cid (indexed)
- filename (indexed)
- offset

On start:

scan for changed files, comparing with the mod times in the database.
On add/update.
Add the file to the files table.
Run DELETE FROM blocks where filename=filename (just in case)
Chunk the file, adding each block to the blocks table.
On remove:
Run DELETE FROM blocks where filename=filename (just in case)
Remove the file from the files table.

rpodgorny · 2020-06-19T03:36:10Z

So, my ideal solution here would be to just not use the go-ipfs daemon, but instead write a custom dropbox like IPFS service by cobbling together bitswap, libp2p, a datastore, and the DHT. It would:

1. Monitor a directory for changes.

2. When a file is added, it would chunk, hash, and _index_ (but not copy) the file. You could even store the results in an sql database instead of using a datastore.

3. When a file is removed/changed, it would remove references to the file.

The database schema would be:

* Table: files
  
  * filename (primary key)
  * modtime

* Table: blocks
  
  * id (primary key)
  * cid (indexed)
  * filename (indexed)
  * offset

On start:

* scan for changed files, comparing with the mod times in the database.
  On add/update.

* Add the file to the files table.

* Run `DELETE FROM blocks where filename=filename` (just in case)

* Chunk the file, adding each block to the blocks table.
  On remove:

* Run `DELETE FROM blocks where filename=filename` (just in case)

* Remove the file from the files table.

perhaps you should create a new issue to track the development of this idea

Stebalien · 2020-06-19T16:49:50Z

Good point. I've filed an issue here: ipfs/notes#434

NatoBoram · 2020-06-22T22:05:07Z

I can't really afford the time it would take to build a custom IPFS daemon, I have to do with what I have. And now, what I have is a mirror that takes around 2 days per updates. I posted it on Reddit.

In the meantime, is there any way to optimize it?

Right now, the command I'm using is ipfs add --recursive --hidden --quieter --progress --chunker=rabin --nocopy --cid-version=1.

CPU usage is about 40% and HDD read speeds are at about 15-30 Mbps.

Stebalien · 2020-06-22T22:12:08Z

Don't use --chunker=rabin. Our rabin implementation is terrible. For now, I recommend --chunker=buzhash. You could also try passing --inline to inline small (<=32 bytes) files into directory entries.

NatoBoram added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels May 26, 2020

NatoBoram mentioned this issue Jun 11, 2020

Replace original link when re-adding via --nocopy #7465

Open

Stebalien closed this as completed Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't access InRelease files #7373

Can't access InRelease files #7373

NatoBoram commented May 26, 2020 •

edited

Loading

Stebalien commented May 26, 2020

NatoBoram commented May 27, 2020 •

edited

Loading

Stebalien commented May 27, 2020

Stebalien commented May 27, 2020

Stebalien commented May 27, 2020 via email

NatoBoram commented May 27, 2020 •

edited

Loading

Stebalien commented Jun 2, 2020

NatoBoram commented Jun 4, 2020 •

edited

Loading

Stebalien commented Jun 4, 2020

NatoBoram commented Jun 4, 2020

Stebalien commented Jun 4, 2020

NatoBoram commented Jun 6, 2020

Stebalien commented Jun 6, 2020

ivan386 commented Jun 6, 2020

Stebalien commented Jun 12, 2020

NatoBoram commented Jun 18, 2020 •

edited

Loading

Stebalien commented Jun 18, 2020

rpodgorny commented Jun 19, 2020

Stebalien commented Jun 19, 2020 via email

NatoBoram commented Jun 22, 2020

Stebalien commented Jun 22, 2020

Can't access InRelease files #7373

Can't access InRelease files #7373

Comments

NatoBoram commented May 26, 2020 • edited Loading

Version information:

Description:

Stebalien commented May 26, 2020

NatoBoram commented May 27, 2020 • edited Loading

Stebalien commented May 27, 2020

Stebalien commented May 27, 2020

Stebalien commented May 27, 2020 via email

NatoBoram commented May 27, 2020 • edited Loading

Stebalien commented Jun 2, 2020

NatoBoram commented Jun 4, 2020 • edited Loading

Stebalien commented Jun 4, 2020

NatoBoram commented Jun 4, 2020

Stebalien commented Jun 4, 2020

NatoBoram commented Jun 6, 2020

Stebalien commented Jun 6, 2020

ivan386 commented Jun 6, 2020

Stebalien commented Jun 12, 2020

NatoBoram commented Jun 18, 2020 • edited Loading

Stebalien commented Jun 18, 2020

rpodgorny commented Jun 19, 2020

Stebalien commented Jun 19, 2020 via email

NatoBoram commented Jun 22, 2020

Stebalien commented Jun 22, 2020

NatoBoram commented May 26, 2020 •

edited

Loading

NatoBoram commented May 27, 2020 •

edited

Loading

NatoBoram commented May 27, 2020 •

edited

Loading

NatoBoram commented Jun 4, 2020 •

edited

Loading

NatoBoram commented Jun 18, 2020 •

edited

Loading