Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deleting and recreating ledger with different schema errors, still holds old data #98

Open
filipesilva opened this issue Oct 11, 2021 · 4 comments

Comments

@filipesilva
Copy link

Hi there 👋

I think there might be a problem when recreating a deleted a ledger with a different schema. I've uploaded a repro you can run by following these steps:

  • git clone https://github.com/filipesilva/fluree-ledger-create-delete-create
  • cd fluree-ledger-create-delete-create
  • docker compose up fluree
  • start a clojure repl and run the commands in the repro comment form

The repro ns is as follows:

(ns repro
  (:require [fluree.db.api :as fdb]))

(comment

  (def ledger "events/log")
  (def conn (fdb/connect "http://localhost:8090"))

  (when-not (empty? @(fdb/ledger-info conn ledger))
    @(fdb/delete-ledger conn ledger))

  ;; Create ledger
  @(fdb/new-ledger conn ledger)
  (fdb/wait-for-ledger-ready conn ledger)

  ;; Transact schema
  @(fdb/transact conn ledger [{:_id              :_collection
                               :_collection/name :event
                               :_collection/doc  "Athens semantic events."}
                              {:_id               :_predicate
                               :_predicate/name   :event/id
                               :_predicate/doc    "A globally unique event id."
                               :_predicate/unique true
                               :_predicate/type   :string}
                              {:_id             :_predicate
                               :_predicate/name :event/data
                               :_predicate/doc  "Event data serialized as an EDN string."
                               :_predicate/type :string}])

  ;; Add some sample events.
  @(fdb/transact conn ledger [{:_id        :event
                               :event/id   "uuid-1"
                               :event/data "[1 2 3]"}
                              {:_id        :event
                               :event/id   "uuid-2"
                               :event/data "[4 5 6]"}
                              {:_id        :event
                               :event/id   "uuid-3"
                               :event/data "[7 8 9]"}])

  ;; Delete the ledger and all data in it.
  @(fdb/delete-ledger conn ledger)

  ;; Recreate the ledger with a different schema.
  @(fdb/new-ledger conn ledger)
  (fdb/wait-for-ledger-ready conn ledger)
  @(fdb/transact conn ledger [{:_id              :_collection
                               :_collection/name :event
                               :_collection/doc  "Athens semantic events."}
                              {:_id               :_predicate
                               :_predicate/name   :event/id
                               :_predicate/doc    "A globally unique event id."
                               :_predicate/unique true
                               :_predicate/type   :string}
                              {:_id             :_predicate
                               :_predicate/name :event/dataaa ;; changed name
                               :_predicate/doc  "Event data serialized as an EDN string."
                               :_predicate/type :string}])
  ;; fluree on docker errors here, with log
  ;; f.db.ledger.transact - Unexpected consensus error proposing new block - clojure.lang.ExceptionInfo:  --------------- BLOCK REJECTED! Blocks out of order
  ;; sometimes I see a different log
  ;; f.d.l.txgroup.monitor - Error processing new block. Exiting tx monitor loop. - java.lang.NullPointerException: null\n	at org.apache.avro.io.DecoderFactory.binaryDecoder(DecoderFactory.java:238)\n

  ;; Query still shows old data from the deleted ledger there.
  @(fdb/query (fdb/db conn ledger) {:select [:*],
                                    :from "event"})

  (fdb/close conn))

The most significant parts are ;; fluree on docker errors here, with log and ;; Query still shows old data from the deleted ledger there..

@ldw1007
Copy link
Collaborator

ldw1007 commented Oct 11, 2021

For tracking purposes:

@filipesilva
Copy link
Author

filipesilva commented Dec 1, 2021

Saw this again today with the same repro steps with ledger version 1 and db version 1.0.0-rc33, except that it also happens when the schema does not change. I see this error on the ledger container logs when transacting the schema onto the recreated ledger:
fluree_1 | 2021-12-01 13:58:43,889 ERROR f.d.l.txgroup.monitor - Error processing new block. Ignoring last block and transaction(s):clojure.lang.LazySeq@16c5bae5 - java.lang.NullPointerException: null\n at org.apache.avro.io.DecoderFactory.binaryDecoder(DecoderFactory.java:238)\n at abracad.avro$binary_decoder.invokeStatic(avro.clj:170)\n at abracad.avro$binary_decoder.invoke(avro.clj:160)\n at abracad.avro$decode.invokeStatic(avro.clj:192)\n at abracad.avro$decode.invoke(avro.clj:186)\n at fluree.db.serde.avro.Serializer._deserialize_branch(avro.clj:296)\n at fluree.db.serde.protocol$fn__32159$G__32138__32162.invoke(protocol.cljc:6)\n at fluree.db.serde.protocol$fn__32159$G__32137__32166.invoke(protocol.cljc:6)\n at fluree.db.storage.core$read_branch$fn__33917$state_machine__5237__auto____33922$fn__33925.invoke(core.cljc:349)\n at fluree.db.storage.core$read_branch$fn__33917$state_machine__5237__auto____33922.invoke(core.cljc:349)\n at clojure.core.async.impl.ioc_macros$run_state_machine.invokeStatic(ioc_macros.clj:978)\n at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:977)\n at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invokeStatic(ioc_macros.clj:982)\n at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:980)\n at clojure.core.async.impl.ioc_macros$take_BANG_$fn__5255.invoke(ioc_macros.clj:991)\n at clojure.core.async.impl.channels.ManyToManyChannel$fn__797.invoke(channels.clj:265)\n at clojure.lang.AFn.run(AFn.java:22)\n at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n at clojure.core.async.impl.concurrent$counted_thread_factory$reify__635$fn__636.invoke(concurrent.clj:29)\n at clojure.lang.AFn.run(AFn.java:22)\n at java.base/java.lang.Thread.run(Thread.java:829)\n

We are trying to delete a ledger and recreate it from data we have, but it looks like ledgers can't be recreated via the programmatic API right now. Our workaround is to shut down the fluree container and delete the volume data for it.

@ldw1007
Copy link
Collaborator

ldw1007 commented Dec 1, 2021

Hi @filipesilva

I'll update our internal tracking with this information.

If you shut down the Fluree container and then restart it, without physical deletion of the volume data, does Fluree recognize that the ledger is deleted & allow you to re-create the ledger?

@filipesilva
Copy link
Author

filipesilva commented Dec 1, 2021

@ldw1007 if I restart the ledger container right after @(fdb/delete-ledger conn ledger), it seems to delete properly and I can recreate the ledger without issue.

If I restart the ledger after the delete, recreate, and schema transact (so after seeing an error), the ledger fails to start with the following message:

fluree_1  | 2021-12-01 14:33:50,163 INFO  fluree.db.server - JVM arguments:  {:jvm "OpenJDK 64-Bit Server VM", :input ["-Xmx2g" "-Xms1g" "-XX:+UseG1GC" "-XX:MaxGCPauseMillis=50" "-Dfdb-storage-file-root=/var/lib/fluree/" "-Dfdb-group-log-directory=/var/lib/fluree/group/" "-Dfdb.properties.file=./fluree_sample.properties" "-Dfdb.log.ansi" "-Dlogback.configurationFile=./logback.xml"]} -
fluree_1  | 2021-12-01 14:33:50,166 INFO  fluree.db.server - Memory Info:  {:used 0.2 GB, :committed 1.2 GB, :max 2.0 GB, :init 1.0 GB, :time 2021-12-01T14:33:50.164606Z} -
fluree_1  | 2021-12-01 14:33:50,247 INFO  f.d.l.indexing.full-text - Starting Full Text Indexer -
fluree_1  | 2021-12-01 14:33:50,249 INFO  fluree.db.peer.http-api - Starting web server on port: 8090 with an open API. -
fluree_1  | 2021-12-01 14:33:50,249 INFO  fluree.db.peer.http-api -  -
fluree_1  | 2021-12-01 14:33:50,249 INFO  fluree.db.peer.http-api - http://localhost:8090 -
fluree_1  | 2021-12-01 14:33:50,249 INFO  fluree.db.peer.http-api -  -
fluree_1  | 2021-12-01 14:33:50,278 INFO  f.d.l.consensus.dbsync2 - Block 2 missing for ledger: events/log. Attempting to retrieve from other ledger server -
fluree_1  | 2021-12-01 14:33:53,318 ERROR f.d.l.consensus.dbsync2 - Fatal error synchronizing ledger files. - clojure.lang.ExceptionInfo: Unable to retrieve file: events_log_block_000000000000002 after attempting servers: []\n	at fluree.db.ledger.util$throw_if_exception.invokeStatic(util.clj:15)\n	at fluree.db.ledger.util$throw_if_exception.invoke(util.clj:9)\n	at fluree.db.ledger.consensus.dbsync2$consistency_full_check$fn__68227$state_machine__5237__auto____68248$fn__68251.invoke(dbsync2.clj:300)\n	at fluree.db.ledger.consensus.dbsync2$consistency_full_check$fn__68227$state_machine__5237__auto____68248.invoke(dbsync2.clj:300)\n	at clojure.core.async.impl.ioc_macros$run_state_machine.invokeStatic(ioc_macros.clj:978)\n	at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:977)\n	at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invokeStatic(ioc_macros.clj:982)\n	at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:980)\n	at clojure.core.async.impl.ioc_macros$take_BANG_$fn__5255.invoke(ioc_macros.clj:991)\n	at clojure.core.async.impl.channels.ManyToManyChannel$fn__748.invoke(channels.clj:135)\n	at clojure.lang.AFn.run(AFn.java:22)\n	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n	at clojure.core.async.impl.concurrent$counted_thread_factory$reify__635$fn__636.invoke(concurrent.clj:29)\n	at clojure.lang.AFn.run(AFn.java:22)\n	at java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: clojure.lang.ExceptionInfo: Unable to retrieve file: events_log_block_000000000000002 after attempting servers: []\n	at fluree.db.ledger.consensus.dbsync2$remote_copy_fn_STAR_$fn__66969$raise__66973.invoke(dbsync2.clj:80)\n	at fluree.db.ledger.consensus.dbsync2$remote_copy_fn_STAR_$fn__66969$fn__67160$state_machine__5237__auto____67199$fn__67201.invoke(dbsync2.clj:111)\n	at fluree.db.ledger.consensus.dbsync2$remote_copy_fn_STAR_$fn__66969$fn__67160$state_machine__5237__auto____67199.invoke(dbsync2.clj:94)\n	at clojure.core.async.impl.ioc_macros$run_state_machine.invokeStatic(ioc_macros.clj:978)\n	at clojure.core.async.impl.ioc_macros$run_state_machine.invoke(ioc_macros.clj:977)\n	at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invokeStatic(ioc_macros.clj:982)\n	at clojure.core.async.impl.ioc_macros$run_state_machine_wrapped.invoke(ioc_macros.clj:980)\n	at clojure.core.async$ioc_alts_BANG_$fn__5466.invoke(async.clj:421)\n	at clojure.core.async$do_alts$fn__5405$fn__5408.invoke(async.clj:288)\n	at clojure.core.async.impl.channels.ManyToManyChannel$fn__797.invoke(channels.clj:265)\n	... 6 common frames omitted\n
fluree_1  | 2021-12-01 14:33:53,320 INFO  fluree.raft - Shutting down raft -
fluree_1  | 2021-12-01 14:33:53,323 INFO  f.d.l.consensus.raft - Ledger group snapshot write triggered for index: 200 -
fluree_1  | 2021-12-01 14:33:53,335 INFO  f.d.l.consensus.raft - Ledger group snapshot completed for index 200 in 12 milliseconds. -
fluree_1  | 2021-12-01 14:33:53,336 INFO  fluree.raft - Wrote final snapshot at for index 200 -
fluree_1  | 2021-12-01 14:33:53,341 INFO  fluree.db.util.log - :conn-closed -
fluree_1  | 2021-12-01 14:33:53,342 INFO  fluree.db.server - SHUTDOWN Start -
fluree_1  | 2021-12-01 14:33:53,346 INFO  fluree.raft - Shutting down raft -

So it seems like shutting down the ledger container will effect the delete, but recreating without a shutdown will leave the ledger container with data that breaks it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants