Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save parsed events into database; Design Event schema #2

Closed
QianSwirlds opened this issue Jul 30, 2019 · 18 comments
Closed

Save parsed events into database; Design Event schema #2

QianSwirlds opened this issue Jul 30, 2019 · 18 comments
Assignees
Labels
enhancement Type: New feature P3 parser Area: File parsing
Milestone

Comments

@QianSwirlds
Copy link
Contributor

QianSwirlds commented Jul 30, 2019

t_events Table

Type Column Name Description NotNull
bigint id Primary Key, auto-increment NotNull
bigint consensus_order order in history (0 first) NotNull
bigint creator_node_id nodeID of this event's creator NotNull
bigint creator_seq sequence number for this by its creator (0 is first) NotNull
bigint other_node_id ID of otherParent 's creator
bigint other_seq sequence number for otherParent event (by its creator)
bytea signature creator's sig for this NotNull
bytea hash hash of this event NotNull
bigint self_parent_id the id for the self parent
bigint other_parent_id the id for other parent
bytea self_parent_hash hash of the self parent
bytea other_parent_hash hash of other parent
bigint self_parent_generation the generation for the self parent
bigint other_parent_generation the generation for other parent
bigint generation generation (which is 1 plus max of parents' generations) NotNull
bigint created_timestamp_ns seconds * (10 ^ 9) + nanos of creation time, as claimed by its creator NotNull
bigint consensus_timestamp_ns seconds * (10 ^ 9) + nanos of the community's consensus timestamp for this event NotNull
bigint latency_ns consensus_timestamp_ns - created_timestamp_ns NotNull
integer txs_bytes_count number of bytes in transactions in this event NotNull
integer platform_tx_count number of platform Transactions in this event NotNull
integer app_tx_count number of application Transactions in this event NotNull
@QianSwirlds
Copy link
Contributor Author

Hi @gregscullard I am working on creating event and transaction table, and inserting data to the tables while parsing eventStream files. I will send you a pull request once I finish it. Thanks.

@gregscullard
Copy link
Contributor

can you call the tables t_events and t_event_transactions please ?

@QianSwirlds
Copy link
Contributor Author

@gregscullard Sure. Will do. Thanks

QianSwirlds added a commit that referenced this issue Jul 31, 2019
QianSwirlds added a commit that referenced this issue Jul 31, 2019
QianSwirlds added a commit that referenced this issue Jul 31, 2019
…eam, accountBalance can use the same connection object; #2
@gregscullard
Copy link
Contributor

@QianSwirlds,
-Overall design: Shouldn't additional NOT NULL constraints be added on columns we expect to always contain data ? This enforces consistency of the data, bug avoidance, etc...

-Overall design: Should there be self-referenced foreign keys between "otherId" and the event it refers to (and any other such relationship) ? Although this would require the NOT NULL constraint (above) to be removed so that the first event can be inserted. For example

create table test (
  id serial primary key,
  parent integer not null,
  foreign key (parent) references test(id)
);

-Is "creatorId" the ID of the parent event ? If not, is the ID of the parent event missing ?

-timeCreated and consensusTimestamp should not be specified WITH TIMEZONE

-Timestamps in Postgres have a resolution of microseconds, if the timestamp includes nano seconds, it may be wise to split the timeCreated into timeCreatedSecs and timeCreatedNanos. Likewise for consensusTimestamp.

@QianSwirlds
Copy link
Contributor Author

@gregscullard
I just added NOT NULL to the columns;

otherId is the id of this event's other parent's creator, not an id of event. Currently, the ID of the parent events is not serialized in the EventStream files, because we call Event.writeEvent() method in platform sdk to serialize events, and this method doesn't write the id of parent events.

I just split the timeCreated into timeCreatedSecs and timeCreatedNanos , and did the same for consensusTimestamp. please let me know if there is anything need to be modified. Thanks!

@gregscullard
Copy link
Contributor

Ok, so the IDs are in fact node numbers/IDs (0,1,2,3) ? Should we name them accordingly ?

@gregscullard
Copy link
Contributor

Also, if hashes are what joins events, shouldn't they be stored in a separate table for efficiency ? A table of hashes with an ID and the ID is used in the t_events table to refer to the hash in question.

@QianSwirlds
Copy link
Contributor Author

yes, the IDs are node IDs. each node's id is the sequence of the node in the AddressBook (start from 0). I think it's ok to save node id. because when we load the nodes information from the file 0.0.102, the information we get are: nodeId, ipAddress, portno, memo (node AccountID), RSA_PubKey. if we want to search events created by a node AccountID or pubKey, we can get its nodeID from a map and then search in the database.

message NodeAddress {
    bytes ipAddress = 1; // The ip address of the Node with separator & octets
    int32 portno = 2; // The port number of the grpc server for the node
    bytes memo = 3; // The memo field of the node
    string RSA_PubKey = 4; // The RSA public key of the node.
}

@gregscullard
Copy link
Contributor

Each hash is 48 bytes I think, each hash appears twice (parent + self) so 96 bytes in total. If we stored in a separate table, we'd have 48 bytes + 8 for the bigint ID. And 2x 8 bytes for the foreign key ID in the t_events table (72 bytes). Not a huge saving, but it all counts at the speed we're storing data.
It will have an impact on processing, making it slightly slower.
However, when we query (event->other->other->other), matching bigints will be more efficient than matching whole byte arrays (if that's even possible).

@gregscullard
Copy link
Contributor

Can we call those nodeIds "creatorNodeId" and "otherNodeId" then ? This will make more sense to someone who's not familiar with the hashgraph algorithm.

@QianSwirlds
Copy link
Contributor Author

Can we call those nodeIds "creatorNodeId" and "otherNodeId" then ? This will make more sense to someone who's not familiar with the hashgraph algorithm.

Sure, the names ware the same as the fields name of Event. I will modify them as you suggest.

@QianSwirlds
Copy link
Contributor Author

QianSwirlds commented Aug 1, 2019

Each hash is 48 bytes I think, each hash appears twice (parent + self) so 96 bytes in total. If we stored in a separate table, we'd have 48 bytes + 8 for the bigint ID. And 2x 8 bytes for the foreign key ID in the t_events table (72 bytes). Not a huge saving, but it all counts at the speed we're storing data.
It will have an impact on processing, making it slightly slower.
However, when we query (event->other->other->other), matching bigints will be more efficient than matching whole byte arrays (if that's even possible).

Do yo mean we add a EventHash table, which contains EventConsensusOrder and Hash columns, when we are storing an event, we query its parent event hash in a EventHash table, and get its two parents' consensusOrder (which is the identifier), and save them into the Event table as selfParentID and otherParentID instead of selfParentHash and otherParentHash?

but it is possible that two different events has the same Hash, isn't it?

@gregscullard
Copy link
Contributor

I guess a hash should be unique but there is a small possibility of duplicates (maybe Leemon can comment). If the hashes are the same we have the same issue regardless, we won't be able to follow the event history (rebuild) the graph in the future from the data no ?

@gregscullard
Copy link
Contributor

gregscullard commented Aug 1, 2019

Looks good, one last thing, talking to Mike Burrage, he suggested storing seconds+nanos in a single bigInt which is large enough for a few years to come. This saves us 16 bytes per row.
Apologies for asking you to split them from timestamp earlier, although this was also a good thing to do.

@QianSwirlds
Copy link
Contributor Author

Looks good, one last thing, talking to Mike Burrage, he suggested storing seconds+nanos in a single bigInt which is large enough for a few years to come. This saves us 16 bytes per row.
Apologies for asking you to split them from timestamp earlier, although this was also a good thing to do.

No worries, I agree that splitting them is a good thing to do.
If we store seconds * (10 ^ 9) + nanos in a single bigInt, the Maximum value of bigInt represents the timestamp: 2262-04-11T23:47:16.854775807Z. I think it is sufficient. I have modified the schema. Thanks.

@gregscullard
Copy link
Contributor

I'm ok with the schema. Suggest we close this issue.

QianSwirlds added a commit that referenced this issue Aug 2, 2019
QianSwirlds added a commit that referenced this issue Aug 6, 2019
@steven-sheehy steven-sheehy added this to the 0.1.0 milestone Aug 27, 2019
steven-sheehy added a commit that referenced this issue Sep 4, 2019
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
steven-sheehy added a commit that referenced this issue Nov 12, 2019
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
steven-sheehy added a commit that referenced this issue Nov 26, 2019
* Initial HCS design document

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Minor tweaks

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Review feedback

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Meeting feedback

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Small tweaks

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Review feedback #2

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Use reactive-grpc & split consensus service

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Change realm column to realm_num

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
@steven-sheehy steven-sheehy added enhancement Type: New feature P3 parser Area: File parsing labels Nov 26, 2019
steven-sheehy added a commit that referenced this issue Dec 30, 2019
* Initial HCS design document

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Minor tweaks

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Review feedback

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Meeting feedback

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Small tweaks

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Review feedback #2

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Use reactive-grpc & split consensus service

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>

* Change realm column to realm_num

Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
apeksharma added a commit that referenced this issue Mar 1, 2020
- Move topic messages, contract result, file data, and live hashes from
  RecordFileLogger to PostgresWritingRecordParsedItemHandler
- Logic for Transaction and batching will be moved together in followup

Signed-off-by: Apekshit Sharma <apekshit.sharma@hedera.com>
apeksharma added a commit that referenced this issue Mar 2, 2020
- Move topic messages, contract result, file data, and live hashes from
  RecordFileLogger to PostgresWritingRecordParsedItemHandler
- Logic for Transaction and batching will be moved together in followup

Signed-off-by: Apekshit Sharma <apekshit.sharma@hedera.com>
apeksharma added a commit that referenced this issue Mar 2, 2020
- Move topic messages, contract result, file data, and live hashes from
  RecordFileLogger to PostgresWritingRecordParsedItemHandler
- Logic for Transaction and batching will be moved together in followup

Partially fixes #566 

Signed-off-by: Apekshit Sharma <apekshit.sharma@hedera.com>
steven-sheehy added a commit that referenced this issue Apr 1, 2020
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
steven-sheehy added a commit that referenced this issue Apr 1, 2020
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
steven-sheehy added a commit that referenced this issue Sep 2, 2021
Signed-off-by: Steven Sheehy <steven.sheehy@hedera.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Type: New feature P3 parser Area: File parsing
Projects
None yet
Development

No branches or pull requests

5 participants