You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a part of the production release, RMap needs to be able to (re)populate its Solr index from the triplestore. The need to reindex the triplestore could arise for any number of reasons, but two common scenarios are:
recovery from a corrupt index
change in index schema, which would require updating existing documents in the index
The re-indexer could be implemented as a Spring Boot application.
Option 1
At a high level, re-indexing the triplestore would involve:
retrieving the RMap events from the triplestore
publishing those events to Kafka
letting the indexer consume those events normally
Note that step 3 could re-use the existing code path currently used by the index. It is steps 1 and 2 that differ from the normal code path (normally ORMapEventMgr produces events in response to user actions; in a re-indexing scenario, separate logic would be used to produce events).
Option 2
An optional workflow could be:
retrieve the RMap events from the triplestore
create Solr documents for the events
deposit documents directly to Solr (bypass Kafka)
Considerations
The triplestore, as @karenhanson has warned, can exhibit performance issues when dealing with large result sets, and response times may be slow. Maybe the triplestore can fall over! I don't know. The re-indexer should be prepared to deal with a triplestore that may (appear to) become unavailable during the course of re-indexing, and be able to resume indexing from a particular event id (or timestamp?).
The indexer stores Kafka metadata with Solr documents. This allows the indexer to resume reading the event stream without receiving duplicate events or missing events (i.e. KafkaMetadata is part of the implementation for exactly once messaging). If Option 2 is implemented, we would need to be sure the indexer can properly resume consuming the event stream. This should be fine as long as the DEFAULT_SEEK_BEHAVIOR is Seek.EARLIEST in SaveOffsetOnRebalance.
If RMap can be placed in a "read-only" mode while re-indexing is taking place, then the RMap application and the indexer don't have to be concerned with handling indexing events coming from the RMap API/UI and from the re-indexing process. If we want RMap to be live (e.g. in read/write mode) while re-indexing takes place, then we'll need to be more fancy, and I would lean towards Option 1 for that kind of implementation. For example, there could be two topics, one for receiving events from the re-indexing application, and one for receiving events from RMap. Consumption from the RMap topic could be paused (this is supported by Kafka), and have the indexer consume from the re-indexing topic instead. When the re-index is finished, the RMap topic can be resumed.
The text was updated successfully, but these errors were encountered:
As a part of the production release, RMap needs to be able to (re)populate its Solr index from the triplestore. The need to reindex the triplestore could arise for any number of reasons, but two common scenarios are:
The re-indexer could be implemented as a Spring Boot application.
Option 1
At a high level, re-indexing the triplestore would involve:
Note that step 3 could re-use the existing code path currently used by the index. It is steps 1 and 2 that differ from the normal code path (normally
ORMapEventMgr
produces events in response to user actions; in a re-indexing scenario, separate logic would be used to produce events).Option 2
An optional workflow could be:
Considerations
KafkaMetadata
is part of the implementation for exactly once messaging). If Option 2 is implemented, we would need to be sure the indexer can properly resume consuming the event stream. This should be fine as long as theDEFAULT_SEEK_BEHAVIOR
isSeek.EARLIEST
inSaveOffsetOnRebalance
.The text was updated successfully, but these errors were encountered: