Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SnapshotEngine: New metadata format and schema validation #530

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
35f8c0a
change tezos_version information in metadata
orcutt989 Jan 9, 2023
4272f3b
new metadata format version compare
orcutt989 Jan 24, 2023
f5977b5
schema validation
orcutt989 Feb 1, 2023
196e4ee
add pip package and better log for json schema validate
orcutt989 Feb 2, 2023
6b3e151
black formatting
orcutt989 Feb 2, 2023
4c3bef4
typos and better function error handling
orcutt989 Feb 2, 2023
3dddabd
check-jsonschema needs python headers for ruamel
orcutt989 Feb 2, 2023
25115f4
fixes:
orcutt989 Feb 3, 2023
5432d7f
rm --in memory it was retired
orcutt989 Feb 3, 2023
7f8a9bb
rm network version
orcutt989 Feb 3, 2023
2d3bf14
strings to ints
orcutt989 Feb 3, 2023
e099901
skip old metadata style
orcutt989 Feb 3, 2023
bf2cb01
typos
orcutt989 Feb 3, 2023
98f8a5b
clean up metadata
orcutt989 Feb 3, 2023
b714bf0
typo
orcutt989 Feb 3, 2023
2654371
move snapshot header cat
orcutt989 Feb 3, 2023
bc02fdb
conditional --json for v16 and v15
orcutt989 Feb 3, 2023
2d9a1f5
nicolas version correction
orcutt989 Feb 3, 2023
4195954
fix sh var quoting
orcutt989 Feb 3, 2023
15b9f99
Get TEZOS_VERSION before you check it
orcutt989 Feb 3, 2023
667748d
Fix version assignment
orcutt989 Feb 4, 2023
4752632
fix assignments
orcutt989 Feb 4, 2023
8b28766
new version string
orcutt989 Feb 4, 2023
3629424
more graceful handling of pre-schema artifacts
nicolasochem Feb 4, 2023
43b920b
fix UI bug - display more recent artifact instead of oldest
nicolasochem Feb 10, 2023
c640023
Rm erroneous context_elements using commit_hash
orcutt989 Feb 13, 2023
a0b7d43
Move rpc info out of each type since its universal
orcutt989 Feb 14, 2023
fc47de8
Overwrite release object in metadata
orcutt989 Feb 14, 2023
6a4403a
Don't upload empty metadata files
orcutt989 Feb 14, 2023
93056af
Rm context elements and add --json to snapshot info
orcutt989 Feb 15, 2023
7c012bb
Validate tezos-snapshots.json isntead of individual files
orcutt989 Feb 16, 2023
6460b3b
add default schema url
orcutt989 Feb 16, 2023
dae3e89
validation deps and failure if no validation
orcutt989 Feb 16, 2023
3fe128a
replace version with $schema url
orcutt989 Feb 16, 2023
2a63377
cleanup
orcutt989 Feb 16, 2023
2c4aab0
Fix imports
orcutt989 Feb 16, 2023
832b2d6
Need another level for new metadata format
orcutt989 Feb 16, 2023
2193532
update default value per nicolas request
orcutt989 Feb 16, 2023
9ba701f
config generator needs to know where snapshots are
orcutt989 Feb 16, 2023
7b7ae5f
Merge branch 'master' into 527-use-new-machine-readable-octez-node-sn…
orcutt989 Feb 17, 2023
8b763a4
https default snapshot engine schemaurl
orcutt989 Feb 17, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions charts/snapshotEngine/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ data:
ALL_SUBDOMAINS: {{ $.Values.allSubdomains }}
ARCHIVE_SLEEP_DELAY: {{ $.Values.artifactDelay.archive }}
ROLLING_SLEEP_DELAY: {{ $.Values.artifactDelay.rolling }}
SCHEMA_URL: {{ $.Values.schemaUrl }}
kind: ConfigMap
metadata:
name: snapshot-configmap
Expand Down
3 changes: 3 additions & 0 deletions charts/snapshotEngine/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -96,3 +96,6 @@ allSubdomains: ""
artifactDelay:
rolling: 0m
archive: 0m

# URL to schema.json file to validate generated metadata against
schemaUrl: "https://oxheadalpha.com/tezos-snapshot-metadata.schema.1.0.json"
4 changes: 2 additions & 2 deletions snapshotEngine/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,10 @@ RUN apk --no-cache add \
&& apk --no-cache del \
binutils \
&& rm -rf /var/cache/apk/* \
&& apk add --update --no-cache python3 && ln -sf python3 /usr/bin/python \
&& apk add --update --no-cache python3-dev && ln -sf python3 /usr/bin/python \
&& python3 -m ensurepip \
&& pip3 install --no-cache-dir --upgrade pip && \
pip3 install --no-cache-dir setuptools boto3 datefinder datetime pytz
pip3 install --no-cache-dir setuptools boto3 datefinder datetime pytz jsonschema

RUN chown jekyll:jekyll -R /usr/gem

Expand Down
41 changes: 35 additions & 6 deletions snapshotEngine/getAllSnapshotMetadata.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,20 @@
from genericpath import exists
import os
import urllib, json
import urllib.request
from jsonschema import validate
from datetime import datetime

schemaURL = os.environ["SCHEMA_URL"]
allSubDomains = os.environ["ALL_SUBDOMAINS"].split(",")
snapshotWebsiteBaseDomain = os.environ["SNAPSHOT_WEBSITE_DOMAIN_NAME"]

filename = "tezos-snapshots.json"

# Write empty top-level array to initialize json
json_object = []
artifact_metadata = []

urllib.request.urlretrieve(schemaURL, "schema.json")

print("Assembling global metadata file for all subdomains:")
print(allSubDomains)

# Get each subdomain's base.json and combine all artifacts into 1 metadata file
Expand All @@ -22,11 +26,36 @@
with urllib.request.urlopen(baseJsonUrl) as url:
data = json.loads(url.read().decode())
for entry in data:
json_object.append(entry)
artifact_metadata.append(entry)
except urllib.error.HTTPError:
continue

now = datetime.now()

# Matches octez block_timestamp.
# Is ISO 8601 with military offset of Z
dt_string = now.strftime('%Y-%m-%dT%H:%M:%SZ')

# Meta document that includes the list of storage artifacts among some other useful keys.
metadata_document = json.dumps({
"date_generated": dt_string,
"org": "Oxhead Alpha",
"$schema": schemaURL,
"data": artifact_metadata,
}, indent=4)

with open("schema.json","r") as f:
schema = f.read()

# json.validate() returns None if successful
if not validate(metadata_document, json.loads(schema)):
print("Metadata sucessfully validated against schema!")
else:
raise Exception("Metadata NOT validated against schema!")


# Write to file
with open(filename, "w") as json_file:
json_string = json.dumps(json_object, indent=4)
json_file.write(json_string)
json_file.write(metadata_document)

print(f"Done assembling global metadata file {filename}")
87 changes: 28 additions & 59 deletions snapshotEngine/getLatestSnapshotMetadata.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
import datetime
import json
import urllib
import urllib.request
from pathlib import Path
import pprint
from datetime import datetime

import datefinder
import pytz
import random
from genericpath import exists

filename = "tezos-snapshots.json"
import pprint
pp = pprint.PrettyPrinter(indent=4)

filename='tezos-snapshots.json'

if exists(filename):
print("SUCCESS tezos-snapshots.json exists locally!")
Expand All @@ -29,7 +25,7 @@

all_snapshots = [{"name": "example", "all_snapshots": {}}]

for snapshot in snapshots:
for snapshot in snapshots['data']:
network = snapshot["chain_name"]
if network not in snapshots_per_network:
snapshots_per_network[network] = []
Expand All @@ -39,53 +35,28 @@
network_latest_snapshots = {}
network_snapshots = {}

# Initialize date to now, and then update with build date as we iterate
last_tezos_build_datetime = datetime.now().replace(tzinfo=pytz.UTC)
for type, mode, path in [
("tarball", "rolling", "rolling-tarball"),
("tarball", "archive", "archive-tarball"),
("tezos-snapshot", "rolling", "rolling"),
]:
# Parses date from tezos build of each artifact, compares to last date, updates if older, otherwise its newer
for snapshot in snapshots:
matches = datefinder.find_dates(snapshot["tezos_version"])
tezos_build_datetime = list(matches)[0]
if tezos_build_datetime < last_tezos_build_datetime:
latest_tezos_build_version = [
src
for time, src in datefinder.find_dates(
snapshot["tezos_version"], source=True
)
][1]
last_tezos_build_datetime = tezos_build_datetime

# Snapshots of type (tarball/snapshot) and history mode
typed_snapshots = [
s
for s in snapshots
if s["artifact_type"] == type and s["history_mode"] == mode
]
typed_snapshots.sort(key=lambda x: int(x["block_height"]), reverse=True)

try:
# Keep list of all snapshots
network_snapshots[path] = typed_snapshots
except IndexError:
# Find a lowest version available for a given network, artifact_type, and history_mode
for (artifact_type, history_mode, path) in [("tarball", "rolling", "rolling-tarball"), ("tarball", "archive", "archive-tarball"), ("tezos-snapshot", "rolling", "rolling")]:
# List of snapshot metadata for this particular artifact type and history mode
typed_snapshots = [s for s in snapshots if s["artifact_type"] == artifact_type and s["history_mode"] == history_mode]

# Lowest version is the top item (int) of a sorted unique list of all the versions for this particular artifact type and history mode
octez_versions = sorted(list(set([ s['tezos_version']['version']['major'] for s in typed_snapshots if 'version' in s['tezos_version'] ])))
if octez_versions:
lowest_octez_version = octez_versions[0]
else:
# no metadata yet for this namespace, ignoring
continue

# Latest should only show oldest supported build so let's filter by the oldest supported version we found above
typed_snapshots = [
t
for t in typed_snapshots
if latest_tezos_build_version in t["tezos_version"]
]
network_snapshots[path] = typed_snapshots

try:
# Latest snapshot of type is the first item in typed_snapshots which we just filtered by the latest supported tezos build
network_latest_snapshots[path] = typed_snapshots[0]
except IndexError:
continue
# Latest offered should only show oldest supported build so let's filter by the oldest supported version we found above
typed_snapshots = [d for d in typed_snapshots if 'version' in d['tezos_version'] and d['tezos_version']['version']['major'] == lowest_octez_version ]

# Latest snapshot of type is the last item in typed_snapshots which we just filtered by the latest supported tezos build
network_latest_snapshots[path] = typed_snapshots[-1]

# This becomes the list of snapshots
latest_snapshots.append(
{
"name": network,
Expand All @@ -102,9 +73,7 @@
)

Path("_data").mkdir(parents=True, exist_ok=True)
with open(f"_data/snapshot_jekyll_data.json", "w") as f:
json.dump(
{"latest_snapshots": latest_snapshots, "all_snapshots": all_snapshots},
f,
indent=2,
)
filename = "_data/snapshot_jekyll_data.json"
with open(filename, 'w') as f:
json.dump({"latest_snapshots": latest_snapshots, "all_snapshots": all_snapshots}, f, indent=2)
print(f"Done writing structured list of snapshots for Jekyll to render webpage: {filename}")
38 changes: 23 additions & 15 deletions snapshotEngine/mainJob.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ spec:
# Set up config for headless RPC using new restored storage
octez-node config init \
--config-file /home/tezos/.tezos-node/config.json \
--network "${NETWORK}" \
--network "${CHAIN_NAME}" \
--data-dir /var/tezos/node/data

# Run headless tezos node to validate storage on restored volume
Expand Down Expand Up @@ -117,16 +117,20 @@ spec:
fi

# Get BLOCK_TIMESTAMP from RPC
wget -qO- http://localhost:8732/chains/main/blocks/head/header | sed -E 's/.*"timestamp":"?([^,"]*)"?.*/\1/' > /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_TIMESTAMP
wget -qO- http://localhost:8732/chains/main/blocks/head/header | sed -E 's/.*"timestamp":"?([^,"]*)"?.*/\1/' > /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_TIMESTAMP

# Get Tezos Version from octez-node command
# Old version string
/usr/local/bin/octez-node --version > /"${HISTORY_MODE}"-snapshot-cache-volume/TEZOS_VERSION

# Get new version object from RPC
wget -qO- http://localhost:8732/version > /"${HISTORY_MODE}"-snapshot-cache-volume/TEZOS_RPC_VERSION_INFO

# Print variables for debug
printf "%s BLOCK_HASH is...$(cat /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_HASH))\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
printf "%s BLOCK_HEIGHT is...$(cat /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_HEIGHT)\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
printf "%s BLOCK_TIMESTAMP is...$(cat /"${HISTORY_MODE}"-snapshot-cache-volume/BLOCK_TIMESTAMP)\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
printf "%s TEZOS_VERSION is...$(cat /"${HISTORY_MODE}"-snapshot-cache-volume/TEZOS_VERSION)\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
printf "%s TEZOS_RPC_VERSION_INFO is...$(cat /"${HISTORY_MODE}"-snapshot-cache-volume/TEZOS_RPC_VERSION_INFO)\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"

# Blow open permissions for next job to write to volume
sudo chmod -R 755 /"${HISTORY_MODE}"-snapshot-cache-volume
Expand All @@ -142,6 +146,8 @@ spec:
envFrom:
- configMapRef:
name: snapshot-configmap
- configMapRef:
name: tezos-config
containers:
- name: create-tezos-rolling-snapshot
image: ""
Expand All @@ -164,7 +170,7 @@ spec:

octez-node config init \
--config-file /home/tezos/.tezos-node/config.json \
--network "${NETWORK}" \
--network "${CHAIN_NAME}" \
--data-dir /var/tezos/node/data

if [ "${HISTORY_MODE}" = rolling ]; then
Expand All @@ -180,11 +186,18 @@ spec:

octez-node snapshot import \
/"${HISTORY_MODE}"-snapshot-cache-volume/"${ROLLING_SNAPSHOT_NAME}".rolling \
--in-memory \
--block "${BLOCK_HASH}" \
--config-file /home/tezos/.tezos-node/config.json \
--data-dir /rolling-tarball-restore/var/tezos/node/data

# Version check for v15 vs v16 conditional
TEZOS_VERSION=$(cat /"${HISTORY_MODE}"-snapshot-cache-volume/TEZOS_VERSION)

# Get context elements and octez snapshot version
if [[ "$TEZOS_VERSION" != '763259c5 (2022-12-01 10:20:58 +0000) (15.1)' ]]; then
/usr/local/bin/octez-node snapshot info --json /"${HISTORY_MODE}"-snapshot-cache-volume/"${ROLLING_SNAPSHOT_NAME}".rolling --json > /"${HISTORY_MODE}"-snapshot-cache-volume/SNAPSHOT_HEADER
fi

rm /rolling-tarball-restore/snapshot-import-in-progress
else
printf "%s Skipping rolling snapshot import since this job is for an archive node.\n" "$(date "+%Y-%m-%d %H:%M:%S" "$@")"
Expand All @@ -199,16 +212,11 @@ spec:
env:
- name: HISTORY_MODE
value: ""
- name: NAMESPACE
valueFrom:
configMapKeyRef:
name: snapshot-configmap
key: NAMESPACE
- name: NETWORK_OVERRIDE
valueFrom:
configMapKeyRef:
name: snapshot-configmap
key: NETWORK_OVERRIDE
envFrom:
- configMapRef:
name: snapshot-configmap
- configMapRef:
name: tezos-config
- name: zip-and-upload
image: ""
imagePullPolicy: Always
Expand Down
Loading