Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PeerDAS: Multiple improvements #14467

Merged
merged 6 commits into from
Sep 23, 2024
Merged

PeerDAS: Multiple improvements #14467

merged 6 commits into from
Sep 23, 2024

Conversation

nalepae
Copy link
Contributor

@nalepae nalepae commented Sep 20, 2024

Please read all the message commits containing needed information.

  • Improve logging to help debugging
  • Use cache to keep track of received / stored data columns

@nalepae nalepae requested a review from a team as a code owner September 20, 2024 12:36
@nalepae nalepae requested review from kasey, terencechain and james-prysm and removed request for a team September 20, 2024 12:36
… some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.
Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.
This is useful to debug "lost data columns" in devnet.
Copy link
Member

@nisdas nisdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing context, why are we using the data columns cache here in sync ? This seems to duplicate an existing functionality


return nil
}

func (s *Service) internalBroadcastDataColumn(
ctx context.Context,
slot primitives.Slot,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the slot and slotStartTime if we can already get this information from the sidecar object

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here: 1dc8225

Additional question:
We also have the possibility to remove the root argument from internalBroadcasDatacolumn, since the hash tree root can be obtain from the data columns sidecar object with:

dataColumnSidecar.SignedBlockHeader.GetHeader().HashTreeRoot()

However, is the cost of re-computing HashTreeRoot worth the removal of the root argument in the function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think passing the root is a better idea, hashing the header 128 times will waste CPU cycles unnecessarily.

dataColumnsWithholdCount := features.Get().DataColumnsWithholdCount

// Get the time corresponding to the start of the slot.
genesisTime := uint64(vs.TimeFetcher.GenesisTime().Unix())
slotStartTime, err := slots.ToTime(genesisTime, slot)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned earlier , I think this can be done in the broadcast method itself, we do not need to provide it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in the other comment.

@nalepae
Copy link
Contributor Author

nalepae commented Sep 23, 2024

Maybe I am missing context, why are we using the data columns cache here in sync ? This seems to duplicate an existing functionality

The context is explained here.

@nalepae nalepae merged commit 5f896aa into peerDAS Sep 23, 2024
14 of 15 checks passed
@nalepae nalepae deleted the peerdas-misc branch September 23, 2024 09:09
nalepae added a commit that referenced this pull request Oct 7, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
nalepae added a commit that referenced this pull request Oct 7, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
nalepae added a commit that referenced this pull request Oct 7, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
nalepae added a commit that referenced this pull request Oct 7, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
nalepae added a commit that referenced this pull request Oct 8, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
nalepae added a commit that referenced this pull request Oct 23, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
nalepae added a commit that referenced this pull request Oct 23, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
nalepae added a commit that referenced this pull request Oct 23, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
nalepae added a commit that referenced this pull request Oct 24, 2024
* `scheduleReconstructedDataColumnsBroadcast`: Really minor refactor.

* `receivedDataColumnsFromRootLock` -> `dataColumnsFromRootLock`

* `reconstructDataColumns`: Stop looking into the DB to know if we have some columns.

Before this commit:
Each time we receive a column, we look into the filesystem for all columns we store.
==> For 128 columns, it looks for 1 + 2 + 3 + ... + 128 = 128(128+1)/2 = 8256 files look.

Also, as soon as a column is saved into the file system, then if, right after, we
look at the filesystem again, we assume the column will be available (strict consistency).
It happens not to be always true.

==> Sometimes, we can reconstruct and reseed columns more than once, because of this lack of filesystem strict consistency.

After this commit:
We use a (strictly consistent) cache to determine if we received a column or not.
==> No more consistency issue, and less stress for the filesystem.

* `dataColumnSidecarByRootRPCHandler`: Improve logging.

Before this commit, logged values assumed that all requested columns correspond to
the same block root, which is not always the case.

After this commit, we know which columns are requested for which root.

* Add a log when broadcasting a data column.

This is useful to debug "lost data columns" in devnet.

* Address Nishant's comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants