-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement MongoDBArtifactStore #4963
Implement MongoDBArtifactStore #4963
Conversation
This is great! |
Will go through this sometime this week! Thanks! |
import scala.util.Try | ||
import scala.util.Failure | ||
|
||
class AsyncStreamSource(stream: AsyncInputStream, chunkSize: Int)(implicit ec: ExecutionContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make the naming for the akka graph stages more specific so it's clear what they're used for? Something like ActionAttachmentByteStreamSource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
// MongoDB doesn't support using `$` as the first char of field name, so below two fields needs to be encoded first | ||
private val fieldsNeedEncode = Seq("annotations", "parameters") | ||
|
||
override protected[database] def put(d: DocumentAbstraction)(implicit transid: TransactionId): Future[DocInfo] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I'm unsure how this variable is generally named.
Looking at called methods (e.g. failed
, finished
), could we rename this to transaction
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the transid
here? such name are also used in other places
|
||
val f = collection | ||
.find(Filters.eq("_id", doc.id.id)) // method deserialize will check whether the _rev matched | ||
.toFuture() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: any thoughts on .toSingle().toFutureOption()
? Your result would be a Future[Option[A]]
which looks to match your check for an empty collection or single-item collection below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry, I didn't find which object have the toSingle
method?
//Work done perform close | ||
//Using async "blessed" callback does not work at this stage so | ||
// need to invoke as normal callback | ||
//TODO Revisit this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need an open issue here to track? I can see this TODO get lost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to current master code, so I think it is unfit to record as an issue?
* @param collName the name of the collection to operate on | ||
* @param documentHandler helper class help to simulate the designDoc of CouchDB | ||
* @param viewMapper helper class help to simulate the designDoc of CouchDB | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't tend to see scala docs on your other files. Are they expected like here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for mention! I will add such comment doc to other places
I think we should go ahead with merging this to provide a total alternative to couchdb for people. Are you all using this in production? |
@bdoyle0182 We are not yet using this but will use it at some point as we can't keep up with CouchDB. |
Yea that's why I suggested. We also can't keep up with couchdb. I think we should offer it, but it does seem weird to advertise it without any production cases. We could potentially attempt to try it, but will need to either have dual write / reads or a safe way to migrate the data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nit about the readme. I scanned this but did not go through it thoroughly. +1 to advance the PR and merge.
ansible/README.md
Outdated
@@ -191,6 +191,42 @@ ansible-playbook -i environments/<environment> routemgmt.yml | |||
- To use the API Gateway, you'll need to run `apigateway.yml` and `routemgmt.yml`. | |||
- Use `ansible-playbook -i environments/<environment> openwhisk.yml` to avoid wiping the data store. This is useful to start OpenWhisk after restarting your Operating System. | |||
|
|||
### Deploying Using MongoDB | |||
|
|||
You can use MongoDB as the database backend, and there is an ansible task to deploy a single node MongoDB server for testing and developing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear reading this part of the doc if mongo is for entities, activations, or both.
Co-authored-by: Chetan Mehrotra <chetanm@apache.org>
97387f8
to
68120f2
Compare
ansible/group_vars/all
Outdated
commonEnv: | ||
CONFIG_whisk_mongodb_uri: "{{ db.mongodb.connect_string }}" | ||
CONFIG_whisk_mongodb_database: "{{ db.mongodb.database }}" | ||
CONFIG_whisk_spi_ArtifactStoreProvider: "org.apache.openwhisk.core.database.mongodb.MongoDBArtifactStoreProvider" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to follow the convention just like the others?
openwhisk/ansible/group_vars/all
Line 224 in 68120f2
spi: "{{ userLogs_spi | default('org.apache.openwhisk.core.containerpool.logging.DockerToActivationLogStoreProvider') }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can follow elasticsearch activation store style?
openwhisk/ansible/roles/invoker/tasks/deploy.yml
Lines 294 to 308 in 68120f2
- name: setup elasticsearch activation store env | |
set_fact: | |
elastic_env: | |
"CONFIG_whisk_activationStore_elasticsearch_protocol": "{{ db.elasticsearch.protocol}}" | |
"CONFIG_whisk_activationStore_elasticsearch_hosts": "{{ elasticsearch_connect_string }}" | |
"CONFIG_whisk_activationStore_elasticsearch_indexPattern": "{{ db.elasticsearch.index_pattern }}" | |
"CONFIG_whisk_activationStore_elasticsearch_username": "{{ db.elasticsearch.auth.admin.username }}" | |
"CONFIG_whisk_activationStore_elasticsearch_password": "{{ db.elasticsearch.auth.admin.password }}" | |
"CONFIG_whisk_spi_ActivationStoreProvider": "org.apache.openwhisk.core.database.elasticsearch.ElasticSearchActivationStoreProvider" | |
when: db.activation_store.backend == "ElasticSearch" | |
- name: merge elasticsearch activation store env | |
set_fact: | |
env: "{{ env | combine(elastic_env) }}" | |
when: db.activation_store.backend == "ElasticSearch" |
module: mongodb | ||
short_description: A module which support some simple operations on MongoDB. | ||
description: | ||
- Including add user/insert document/create indexes in MongoDB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this tool does something similar to wskadmin, should we put this under tools?
https://github.com/apache/openwhisk/blob/68120f2170dc9f9b53361ab0cb51c4e9458dbe29/tools/admin/wskadmin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is like an ansible library
state: absent | ||
keep_volumes: False | ||
|
||
- name: remove MongoDB data volume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose even if a user wants to clean up his container, he may still want to keep his data.
I`d rather keep volumes or at least make it configurable to keep volumes when cleaning up the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sense, I will update
# This role will run a MongoDB server on the db group, this is only for test, please use | ||
# shared cluster for production env | ||
|
||
- name: (re)start mongodb |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I think this can be handled in the subsequent PR, but just asking, should we include some steps to set up a MongoDB cluster rather than a standalone one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the standalone one is just for user to check the feature, we can submit a subsequent PR to setup a MongoDB cluster
import scala.reflect.ClassTag | ||
|
||
case class MongoDBConfig(uri: String, database: String) { | ||
assume(Set(database, uri).forall(_.nonEmpty), "At least one expected property is missing") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if it's possible for one of these two parameters to be missing.
I suppose even if an empty string is passed, we can't check that with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, do you mean one parameter is an empty string ""
or " "
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, Set(database, uri)
would be always nonEmpty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the _.nonEmpty
will check against database
and url
in the Set
instead of Set
itself since there is a forall
scala> Set("", "asd").forall(_.nonEmpty)
res0: Boolean = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is this.
scala> Set("", "").forall(_.nonEmpty)
res0: Boolean = false
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so in this case Set("", "")
, assume
will raise an error, this is expected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok now I get it.
|
||
// calculate the revision manually, to be compatible with couchdb's _rev field | ||
private def revisionCalculate(doc: JsObject): (String, String) = { | ||
val md: MessageDigest = MessageDigest.getInstance("MD5") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: I think we can use store util instead?
https://github.com/apache/openwhisk/blob/master/common/scala/src/main/scala/org/apache/openwhisk/core/database/StoreUtils.scala#L101
Should the underlying algorithm be MD5?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great, I will update
if (rev.startsWith("1-")) { | ||
// for new document, we should get no matched document and insert new one | ||
// if there is a matched document, that one with no _rev filed will be replaced | ||
// if there is a document with the same id but has an _rev field, will return en E11000(conflict) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for MongoDB to include _rev
field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I didn't get it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I got confused.
According to the description in this PR, MongoDB doesn't have this field.
The data scheme in MongoDB is just like CouchDB except that MongoDB doesn't have a _rev field, below is the data of whisk.system/invokerHealthTestAction0 in CouchDB and MongoDB:
So I thought we can't use a field name _rev
for some reason, but it seems not.
While CouchDB manages _rev
value, this will manually manage revision using revisionCalculator.
https://github.com/apache/openwhisk/pull/4963/files/68120f2170dc9f9b53361ab0cb51c4e9458dbe29#diff-f5da2a3c01f4ecedf28b8c10af6bb42c71ce4ad7f66424497f81523c17e33469R572
So the field _rev
is also stored in MongoDB along with other fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, CouchDb will generate _rev
automatically, while MongoDB should set this field manually
DocInfo(DocId(id), DocRevision(rev)) | ||
} | ||
.recover { | ||
case t: MongoException if t.getCode == 11000 => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to add some comment to describe what 11000(conflict) stands for or suggesting to use a constant for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
val attachmentScheme: String = attachmentStore.map(_.scheme).getOrElse(mongodbScheme) | ||
|
||
private val database = client.getDatabase(dbName) | ||
private val collection = getCollectionAndCreateIndexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
getCollectionAndCreateIndexes
-> getCollectionAndCreateIndexes()
to indicate there is a side effect according to the scala convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
* @param documentHandler helper class help to simulate the designDoc of CouchDB | ||
* @param viewMapper helper class help to simulate the designDoc of CouchDB | ||
*/ | ||
class MongoDBArtifactStore[DocumentAbstraction <: DocumentSerializer](client: MongoClient, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need comprehensive documents describing the internals of this component and source/sink.
That would be helpful for those who have less background in MongoDB like me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
create a document on here: https://cwiki.apache.org/confluence/display/OPENWHISK/MongoDB+Artifact+Store
Codecov Report
@@ Coverage Diff @@
## master #4963 +/- ##
==========================================
- Coverage 81.46% 73.11% -8.36%
==========================================
Files 209 223 +14
Lines 10344 11115 +771
Branches 450 467 +17
==========================================
- Hits 8427 8127 -300
- Misses 1917 2988 +1071
Continue to review full report at Codecov.
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4963 +/- ##
===========================================
- Coverage 81.46% 70.11% -11.36%
===========================================
Files 209 231 +22
Lines 10344 12272 +1928
Branches 450 499 +49
===========================================
+ Hits 8427 8604 +177
- Misses 1917 3668 +1751 ☔ View full report in Codecov by Sentry. |
"CONFIG_whisk_mongodb_uri": "{{ db.mongodb.connect_string }}" | ||
"CONFIG_whisk_mongodb_database": "{{ db.mongodb.database }}" | ||
"CONFIG_whisk_spi_ArtifactStoreProvider": "org.apache.openwhisk.core.database.mongodb.MongoDBArtifactStoreProvider" | ||
when: db.backend == "MongoDB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is defined along with db.activation_store.backend == "ElasticSearch"
?
I feel we need to consolidate DB related configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should work well with ES as activation backend while mongodb store other entities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant we need a consolidated way to configure multiple DBs for each entity store as we have more options for them.
For example, we might consider the following option:
- Use the database for all entities if only
db.backend
is defined. - Use a different database for activation if
db.activation_store.backend
is defined along withdb.backend
. - Use a different database for whisk entities if
db.artifact_store.backend
is defined along withdb.backend
.
.
.
.
if no specific backend store is defined, we can fall back to db.backend
by default.
We can have fine granularity but I feel like having two configurations for activations and the others would be enough for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think activation
is a kind of artifact
? so db.artifact_store.backend
should refer to all kind of entities including activation
, which is just like what db.backend
do here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can name them accordingly as long as we can selectively configure activation stores and the other stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so use db.artifact_store.backend
for all entities, and db.activation_store.backend
for activations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a good starting point.
I believe we can add more options for subjects, whisk entities in the future if required.
cd <openwhisk_home> | ||
./gradlew distDocker | ||
cd ansible | ||
ansible-playbook -i environments/<environment> initMongodb.yml -e mongodb_connect_string="mongodb://172.17.0.1:27017" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relation with db_local.ini
?
Getting this when deploying this change.
TASK [prepare db_local.ini] *********************************************************************************************************************************************************
Monday 24 May 2021 15:07:04 +0900 (0:00:00.883) 0:00:02.373 ************
fatal: [localhost -> localhost]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: 'dict object' has no attribute 'db'"}
AnsibleUndefinedVariable: 'dict object' has no attribute 'db'
PLAY RECAP **************************************************************************************************************************************************************************
localhost : ok=2 changed=0 unreachable=0 failed=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems pymongo
needs to be installed.
then it should be documented somewhere.
TASK [create necessary auth keys] ******************************************************************************************************************************************************************************************************************************************************************************************************************************************
Monday 24 May 2021 15:57:29 +0900 (0:00:01.872) 0:00:02.229 ************
failed: [ansible] (item=guest) => {"changed": false, "item": "guest", "msg": "the python pymongo module is required"}
failed: [ansible] (item=whisk.system) => {"changed": false, "item": "whisk.system", "msg": "the python pymongo module is required"}
All items completed
PLAY RECAP *****************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
ansible : ok=1 changed=0 unreachable=0 failed=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure the reason yet but I am unable to deploy this change with this line.
scheduler:
dataManagementService:
retryInterval: "{{ scheduler_dataManagementService_retryInterval | default(1 second) }}"
TASK [kafka : add kafka default env vars] **********************************************************************************************************************************************************************************************************************************************************************************************************************************
Monday 24 May 2021 16:07:39 +0900 (0:00:00.054) 0:00:13.372 ************
fatal: [kafka0]: FAILED! => {"msg": "An unhandled exception occurred while templating '{% set ret = [] %}{% for host in groups['zookeepers'] %}{{ ret.append( hostvars[host].ansible_host + ':' + ((zookeeper.port+loop.index-1)|string) ) }}{% endfor %}{{ ret | join(',') }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: template error while templating string: expected token ',', got 'second'. String: {{ scheduler_dataManagementService_retryInterval | default(1 second) }}"}
An unhandled exception occurred while templating '{% set ret = [] %}{% for
host in groups['zookeepers'] %}{{ ret.append( hostvars[host].ansible_host + ':'
+ ((zookeeper.port+loop.index-1)|string) ) }}{% endfor %}{{ ret | join(',') }}'.
Error was a <class 'ansible.errors.AnsibleError'>, original message: template
error while templating string: expected token ',', got 'second'. String: {{
scheduler_dataManagementService_retryInterval | default(1 second) }}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove db_local.ini relation
seems group_vars/all
needs a db_local.ini
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, db_local.ini
is being used to specify the target DB, can we apply the same for MongoDB as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for 2.5.2
we need to add ' ' around string value, like default('1 second')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok, then that should be updated as we guide users to use ansible=2.5.2
.
https://github.com/apache/openwhisk/tree/master/ansible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I will create a new PR to fix this
it's already fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move to a more modern version of ansible (in separate PRs of course). We don't have to stay on 2.5.2 forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move to a more modern version of ansible (in separate PRs of course). We don't have to stay on 2.5.2 forever.
I'm installing OW on a clean new machine and I'm updating the versions that worked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cd <openwhisk_home> | ||
./gradlew distDocker | ||
cd ansible | ||
ansible-playbook -i environments/<environment> initMongodb.yml -e mongodb_connect_string="mongodb://172.17.0.1:27017" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure the reason yet but I am unable to deploy this change with this line.
scheduler:
dataManagementService:
retryInterval: "{{ scheduler_dataManagementService_retryInterval | default(1 second) }}"
TASK [kafka : add kafka default env vars] **********************************************************************************************************************************************************************************************************************************************************************************************************************************
Monday 24 May 2021 16:07:39 +0900 (0:00:00.054) 0:00:13.372 ************
fatal: [kafka0]: FAILED! => {"msg": "An unhandled exception occurred while templating '{% set ret = [] %}{% for host in groups['zookeepers'] %}{{ ret.append( hostvars[host].ansible_host + ':' + ((zookeeper.port+loop.index-1)|string) ) }}{% endfor %}{{ ret | join(',') }}'. Error was a <class 'ansible.errors.AnsibleError'>, original message: template error while templating string: expected token ',', got 'second'. String: {{ scheduler_dataManagementService_retryInterval | default(1 second) }}"}
An unhandled exception occurred while templating '{% set ret = [] %}{% for
host in groups['zookeepers'] %}{{ ret.append( hostvars[host].ansible_host + ':'
+ ((zookeeper.port+loop.index-1)|string) ) }}{% endfor %}{{ ret | join(',') }}'.
Error was a <class 'ansible.errors.AnsibleError'>, original message: template
error while templating string: expected token ',', got 'second'. String: {{
scheduler_dataManagementService_retryInterval | default(1 second) }}
cd ansible | ||
ansible-playbook -i environments/<environment> initMongodb.yml -e mongodb_connect_string="mongodb://172.17.0.1:27017" | ||
ansible-playbook -i environments/<environment> apigateway.yml -e mongodb_connect_string="mongodb://172.17.0.1:27017" | ||
ansible-playbook -i environments/<environment> openwhisk.yml -e mongodb_connect_string="mongodb://172.17.0.1:27017" -e db_artifact_backend="MongoDB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope we can add some description about the relationship between db_artifact_backend
and db_activation_backend
.
They can coexist and if no db_activation_backend
is specified, the system will use db_artifact_backend
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also needs to be handled with this #4963 (comment)
as currently, users can't configure each DB interchangeably.
- name: merge mongodb artifact store env | ||
set_fact: | ||
env: "{{ env | combine(mongodb_env) }}" | ||
when: db.artifact_store.backend == "MongoDB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this can we use MongoDB as an activation store and CouchDB as an artifact store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope, this is for artifact
, which means all kind of entities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What users can do if they want to use the CouchDB artifact store with the MongoDB activation store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I havn't test such case yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I think that can be handled in a different PR.
🎉 👏 nice to finally merge this capability! Great work. |
Playing around with this and got it working on a test deployment, but the one thing I haven't figured out is updating the limits for a namespace. Combing through the code and I think that may have been missed having support for since the limits doc lookup in couchdb was done through the edit: nvm looks like I potentially got bit by some caching, seems to be working. passing all of my validations now this is awesome |
This PR provides a MongoDBArtifactStore implementation to enable using MongoDB for storing subjects, whisks and activation in MongoDB, and mostly comes from old PR: #3570.
Since lots of codes are inspired by https://github.com/chetanmeh/openwhisk-mongo, so I added @chetanmeh as co-author
Description
Some users may want to use other database backends instead of CouchDB to store entities, this PR provides MongoDB as an alternative choice for these users
A simple document here: https://cwiki.apache.org/confluence/display/OPENWHISK/MongoDB+Artifact+Store
MongoDB driver
MongoDB provides official drivers for many languages, in this case, we pick the mongo-scala-driver, this driver provides an idiomatic Scala API that is built on top of the MongoDB Async Java driver.
Design Considerations
Data Model
The data scheme in MongoDB is just like CouchDB except that MongoDB doesn't have a
_rev
field, below is the data ofwhisk.system/invokerHealthTestAction0
in CouchDB and MongoDB:annotations
andparameters
fields may using arbitrary strcut and MongoDB doesn't support$
as the first char for field name, so it will convert these two fields to string, and convert it back when get them via openwhisk, this is transparent to users_rev
field will not be generated automatically in MongoDB, so it is calculted and inserted in code explicitlyAttachment
MongoDB use GridFS to store and retrieve files that exceed the BSON-document size limit of 16 MB.
Attachment in MongoDB is stored in a separate collection with a independent
_id
, this PR use thedoc._id + doc.file_name
as the attachment's_id
field, then we can find the relative attachment easily.Progress
Finished Work
FurtherWork
My changes affect the following components
Types of changes
Checklist: