Skip to content
This repository has been archived by the owner on Apr 16, 2020. It is now read-only.

Generate a breakdown of average linux package manager directory structure and sizes #79

Open
andrew opened this issue Jul 17, 2019 · 5 comments
Assignees

Comments

@andrew
Copy link
Collaborator

andrew commented Jul 17, 2019

For @dirkmc's work on #77 it'd be very helpful to get an idea of the kind of sizes and shapes of the directory structures of various linux package manager repositories.

This likely involves rsyncing copies of #75 and inspecting them (du may be useful here), producing a histogram of output for each one and possibly an average of all of them.

This will allow us to build some scripts that can generate repo-like directory structures without needing to download 1TB+ of real data.

@andrew
Copy link
Collaborator Author

andrew commented Jul 23, 2019

If the directory you'd like to analyze is called arch, then here are a collection of handy commands for generating a breakdown of sizes and structures:

breakdown of file sizes

cmd: find arch -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s: %6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }'

result:

  1k:  11110
  2k:     60
  4k:    359
  8k:   1042
 16k:   1235
 32k:   1374
 64k:   1229
128k:   1260
256k:   1247
512k:    916
  1M:    705
  2M:    631
  4M:    374
  8M:    218
 16M:    197
 32M:    114
 64M:     69
128M:     33
256M:     17
512M:     10
  1G:      2
  2G:      1

breakdown of file sizes as .csv file

cmd: find arch -type f -print0 | xargs -0 ls -l | awk '{ n=int(log($5)/log(2)); if (n<10) { n=10; } size[n]++ } END { for (i in size) printf("%d %d\n", 2^i, size[i]) }' | sort -n | awk 'function human(x) { x[1]/=1024; if (x[1]>=1024) { x[2]++; human(x) } } { a[1]=$1; a[2]=0; human(a); printf("%3d%s,%6d\n", a[1],substr("kMGTEPYZ",a[2]+1,1),$2) }' > sizes.csv

result: a file called sizes.csv which you can visualize in the terminal with: perl -lane 'print $F[0], "\t", "=" x ($F[1] / 15)' sizes.csv

1k,	=======================================================
2k,	
4k,	=
8k,	=====
16k,	======
32k,	======
64k,	======
128k,	======
256k,	======
512k,	====
1M,	===
2M,	===
4M,	=
8M,	=
16M,	
32M,	
64M,	
128M,	
256M,	
512M,	
1G,	
2G,

breakdown of file extensions

cmd: find arch -type f | sed -n 's/..*\.//p' | sort | uniq -c | sort -r

result:

11021 sig
11002 xz
  51 txt
  46 gz
  24 old
  17 img
   9 torrent
   9 iso
   6 LICENSE
   3 sha512
   3 sfs
   2 01/arch/boot/x86_64/vmlinuz
   1 12/boot/vmlinuz_x86_64
   1 12/boot/vmlinuz_i686
   1 08/boot/vmlinuz_x86_64
   1 08/boot/vmlinuz_i686
   1 06/boot/vmlinuz_x86_64
   1 02/arch/boot/x86_64/vmlinuz

note: For the next two, you may need to brew install tree on a mac.

total directories and files

cmd: tree arch/ | tail -1

result: 64 directories, 22203 files

simplified tree view

cmd: tree -d arch/

result:

arch/
├── community
│   └── os
│       └── x86_64
│           └── local
├── community-staging
│   └── os
│       └── x86_64
├── community-testing
│   └── os
│       └── x86_64
├── core
│   └── os
│       └── x86_64
├── extra
│   └── os
│       └── x86_64
├── gnome-unstable
│   └── os
│       └── x86_64
├── iso
│   ├── 2019.05.02
│   │   └── arch
│   │       ├── boot
│   │       │   └── x86_64
│   │       └── x86_64
│   ├── 2019.06.01
│   │   └── arch
│   │       ├── boot
│   │       │   └── x86_64
│   │       └── x86_64
│   ├── 2019.07.01
│   │   └── arch
│   │       ├── boot
│   │       │   └── x86_64
│   │       └── x86_64
│   └── archboot
│       ├── 2016.08
│       │   └── boot
│       ├── 2016.12
│       │   └── boot
│       ├── 2018.06
│       │   └── boot
│       └── history
├── kde-unstable
│   └── os
│       └── x86_64
├── multilib
│   └── os
│       └── x86_64
├── multilib-staging
│   └── os
│       └── x86_64
├── multilib-testing
│   └── os
│       └── x86_64
├── pool
│   ├── community
│   └── packages
├── staging
│   └── os
│       └── x86_64
└── testing
    └── os
        └── x86_64

64 directories

dutree is also a very nice tool for visualizing the relative sizes of nested directories in the terminal, if you have rust installed, grab it with cargo install dutree

tree view with size breakdown

cmd: dutree -d2 arch

result:

[ arch 47.50 GiB ]
├─ pool                 │              █████████████████████████████████████████████████│  80%     38.45 GiB
│  ├─ community         │              ░░░░░░░░░░░░█████████████████████████████████████│  76%     29.55 GiB
│  └─ packages          │              ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░██████████│  23%      8.89 GiB
├─ iso                  │                                                     ██████████│  18%      8.97 GiB
│  ├─ archboot          │                                                     ░░░░░█████│  57%      5.16 GiB
│  ├─ 2019.07.01        │                                                     ░░░░░░░░░░│  14%      1.28 GiB
│  ├─ 2019.06.01        │                                                     ░░░░░░░░░░│  14%      1.27 GiB
│  └─ 2019.05.02        │                                                     ░░░░░░░░░░│  14%      1.26 GiB
├─ community            │                                                               │   0%     51.92 MiB
│  └─ os                │                                                               │  99%     51.92 MiB
├─ extra                │                                                               │   0%     22.69 MiB
│  └─ os                │                                                               │  99%     22.69 MiB
├─ core                 │                                                               │   0%      2.41 MiB
│  └─ os                │                                                               │  99%      2.41 MiB
├─ kde-unstable         │                                                               │   0%      1.06 MiB
│  └─ os                │                                                               │  99%      1.06 MiB
├─ multilib             │                                                               │   0%      1.05 MiB
│  └─ os                │                                                               │  99%      1.05 MiB
├─ staging              │                                                               │   0%    449.09 KiB
│  └─ os                │                                                               │  99%    449.00 KiB
├─ testing              │                                                               │   0%    350.96 KiB
│  └─ os                │                                                               │  99%    350.86 KiB
├─ community-testing    │                                                               │   0%    270.26 KiB
│  └─ os                │                                                               │  99%    270.16 KiB
├─ gnome-unstable       │                                                               │   0%    183.86 KiB
│  └─ os                │                                                               │  99%    183.76 KiB
├─ community-staging    │                                                               │   0%     59.60 KiB
│  └─ os                │                                                               │  99%     59.51 KiB
├─ multilib-testing     │                                                               │   0%     10.89 KiB
│  └─ os                │                                                               │  99%     10.79 KiB
├─ multilib-staging     │                                                               │   0%      6.62 KiB
│  └─ os                │                                                               │  98%      6.52 KiB
├─ lastsync             │                                                               │   0%          11 B
└─ lastupdate           │                                                               │   0%          11 B

@andrew andrew closed this as completed Jul 23, 2019
@andrew andrew reopened this Jul 23, 2019
@andrew
Copy link
Collaborator Author

andrew commented Jul 23, 2019

You can also import that csv into excel/numbers and make some pretty graphs:

image

@dirkmc
Copy link

dirkmc commented Jul 23, 2019

Super helpful, thanks 👍

@andrew
Copy link
Collaborator Author

andrew commented Jul 23, 2019

Updated with real data now that rsync has finished, also uploaded the csv file here: https://gist.github.com/andrew/3ca196c9aa464a9a35d23e669d6e70bd

@meiqimichelle
Copy link
Contributor

This is ready to close, but MHz is going to think about whether it makes sense to document this somewhere else as well. And/or, turn this into -- or follow-on task -- now we should run these commands on other repos (see #75 for list).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants