Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement dense bitset strategy for a compact csr representation #3

Open
daniel-j-h opened this issue Jan 13, 2023 · 1 comment
Open

Comments

@daniel-j-h
Copy link
Member

With a graph

(0 -> 1)
(0 -> 2)
(1 -> 0)
(1 -> 2)
(2 -> 1)

the TinyGraphIO interchange file format stores the compressed sparse row representation

offsets: [0, 2, 4, 5]
targets: [1, 2, 0, 2, 1]

where edges(v) = targets[offsets[v]] - targets[offsets[v + 1]]

Varints of Deltas

For a compact representation we use the following techniques

  1. The offsets array is monotonically increasing. We use delta encoding to transform the offsets array into small positive integers.
  2. We store the offsets deltas as varints so that small deltas can be represented in as little space as a single byte.
  3. We store the targets array as varints so that small targets can be represented in as little space as a single byte.

With delta encoded varints for offsets and varints for targets, the compressed sparse row graph can be compactly represented as

offsets = [0x0, 0x2, 0x2, 0x1]  # four bytes total
targets = [0x1, 0x2, 0x0, 0x2, 0x1]  # five bytes total

Dense Bitsets

An alternative is representing offsets as a dense bitset indicating when a new sub-range in targets starts

offsets = [1, 0, 1, 0, 1]  # 5 bits total, stored as 1 byte
targets = [1, 2, 0, 2, 1]  # 5 bytes total

This alternative representation is more space efficient for graphs with a relatively low number of edges compared to number of nodes.

Open Questions

Should we support both approaches and leave it to the user to decide? Or can we be smart and select the approach based on what the user graph looks like?

@daniel-j-h
Copy link
Member Author

With graph chunks #7 we could have a tag for dense/sparse encoding and take that decision on a graph chunk by chunk level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant