-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added WriteConcern as a param for dataset #4696
Conversation
Warning Rate limit exceeded@minhtuev has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 18 minutes and 0 seconds before requesting another review. How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. WalkthroughThe recent updates enhance the functionality of the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
01eadcf
to
8e83944
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (2)
- fiftyone/core/collections.py (4 hunks)
- fiftyone/core/dataset.py (4 hunks)
Additional comments not posted (10)
fiftyone/core/dataset.py (5)
24-31
: Import ofWriteConcern
is appropriate.The import of
WriteConcern
frompymongo
is necessary for the new functionality related to controlling MongoDB write operations.
333-333
: Addition of_write_concern
attribute is appropriate.The
_write_concern
attribute is added to manage MongoDB write concerns, initialized toNone
to allow for optional configuration.
1184-1188
: Inclusion ofwrite_concern
in_sample_collstats
is beneficial.This change ensures that the configured write concern is respected when retrieving collection statistics, enhancing control over MongoDB operations.
1195-1199
: Inclusion ofwrite_concern
in_frame_collstats
is beneficial.This change ensures that the configured write concern is respected when retrieving frame collection statistics, enhancing control over MongoDB operations.
7045-7064
: Enhancement of collection methods withwrite_concern
is appropriate.These methods now accept a
write_concern
parameter, providing flexibility in configuring write operations for both sample and frame collections.fiftyone/core/collections.py (5)
22-22
: Import WriteConcern.The
WriteConcern
class is imported frompymongo
. Ensure that it's used correctly in the context of MongoDB operations.
9129-9130
: Update function signature to includeacknowledged
parameter.The
create_index
method now includes anacknowledged
parameter, allowing control over whether index creation should be acknowledged by the server. This is a useful addition for performance tuning in scenarios where acknowledgment is not necessary.
9165-9166
: Clarifyacknowledged
parameter behavior.The docstring correctly explains that setting
acknowledged
toFalse
results inw=0
for theWriteConcern
, meaning the operation does not require acknowledgment from the server. Ensure this behavior is consistent with the intended use cases.
9245-9246
: Set write concern based on acknowledgment.The code correctly sets the
write_concern
toWriteConcern(w=0)
whenacknowledged
isFalse
, which aligns with the intended behavior of non-acknowledged operations. This is a good implementation for scenarios requiring faster operations without server acknowledgment.
9249-9255
: Usewrite_concern
when getting collections.The
_get_frame_collection
and_get_sample_collection
methods are called withwrite_concern
to ensure the correct acknowledgment behavior is applied. This ensures that the collection operations respect theacknowledged
parameter.
We can set a global write concern argument too :) https://www.mongodb.com/developer/products/mongodb/global-read-write-concerns/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@minhtuev just confirming: you've tested acknowledged=False
on large index creation, right? And it indeed has the desired effect of returning roughly immediately, before the index is fully constructed?
fiftyone/core/collections.py
Outdated
@@ -9126,7 +9126,9 @@ def get_index_information(self, include_stats=False): | |||
|
|||
return index_info | |||
|
|||
def create_index(self, field_or_spec, unique=False, **kwargs): | |||
def create_index( | |||
self, field_or_spec, unique=False, acknowledged=True, **kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about calling this parameter wait=True
rather than acknowledged=True
? We have one precedent of something like this: session.wait()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is fine with me, usually wait
means we are waiting for a purpose, so in this case, we are waiting for the index to finish building.
fiftyone/core/collections.py
Outdated
@@ -9238,10 +9242,17 @@ def create_index(self, field_or_spec, unique=False, **kwargs): | |||
# Satisfactory index already exists | |||
return index_name | |||
|
|||
# Setting `w=0` sets `acknowledged=False` in pymongo | |||
write_concern = WriteConcern(w=0) if not acknowledged else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we support this for drop_index()
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this for drop index - it terminates immediately!
fiftyone/core/dataset.py
Outdated
return foo.get_db_conn()[self._sample_collection_name] | ||
return self._get_sample_collection(write_concern=self._write_concern) | ||
|
||
def _get_sample_collection(self, write_concern=None) -> Collection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would prefer not to add type hints here for consistency with the rest of this module not having them.
fiftyone/core/dataset.py
Outdated
@@ -322,6 +330,7 @@ def __init__( | |||
self._run_cache = cachetools.LRUCache(5) | |||
|
|||
self._deleted = False | |||
self._write_concern = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of globally configuring a dataset's write concern is creative, but it's scary because it's very possible that builtin methods and user code alike will start breaking if one were to globally set w=0
, as certain routines may do a sequence of things where some steps implicitly rely on previous steps having been fully completed (ie populating a new field and then immediately computing something about it).
I'd suggest that we not add Dataset._write_concern
at this time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
f6203a3
to
90b6fb9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Files selected for processing (2)
- fiftyone/core/collections.py (4 hunks)
- fiftyone/core/dataset.py (3 hunks)
Files skipped from review as they are similar to previous changes (1)
- fiftyone/core/collections.py
Additional context used
Ruff
fiftyone/core/dataset.py
30-30:
pymongo.WriteConcern
imported but unusedRemove unused import:
pymongo.WriteConcern
(F401)
32-32:
pymongo.collection.Collection
imported but unusedRemove unused import:
pymongo.collection.Collection
(F401)
Additional comments not posted (2)
fiftyone/core/dataset.py (2)
7042-7045
: Verifywrite_concern
parameter integration.The
write_concern
parameter is added to the_get_sample_collection
method. Ensure that this parameter is correctly passed and utilized in MongoDB operations where necessary.
7055-7061
: Verifywrite_concern
parameter integration.The
write_concern
parameter is added to the_get_frame_collection
method. Ensure that this parameter is correctly passed and utilized in MongoDB operations where necessary.
fiftyone/core/dataset.py
Outdated
WriteConcern, | ||
) | ||
from pymongo.collection import Collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused imports.
The imports WriteConcern
and Collection
are not used in the code and should be removed to clean up the codebase.
- WriteConcern,
- from pymongo.collection import Collection
Tools
Ruff
30-30:
pymongo.WriteConcern
imported but unusedRemove unused import:
pymongo.WriteConcern
(F401)
32-32:
pymongo.collection.Collection
imported but unusedRemove unused import:
pymongo.collection.Collection
(F401)
90b6fb9
to
9a1654b
Compare
@brimoor : setting Normal index creation is slow for 10M dataset: Setting I also discovered certain operations, such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@minhtuev LGTM! Can you retarget this change at develop
since it's fully functional as a standalone addition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What changes are proposed in this pull request?
For Mongo write operations,
WriteConcern
is an argument that controls the process of write operations from the client side. Setting the value ofw=0
for WriteConcern allows the client to terminate early without waiting for the rest of the clusters to confirm write operations.For more information on
WriteConcern
: https://www.mongodb.com/docs/manual/reference/write-concern/Similarly related,
ReadConcern
: https://www.mongodb.com/docs/manual/reference/read-concern/How is this patch tested? If it is not, please explain why.
Release Notes
Is this a user-facing change that should be mentioned in the release notes?
notes for FiftyOne users.
(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)
What areas of FiftyOne does this PR affect?
fiftyone
Python library changesSummary by CodeRabbit
New Features
Bug Fixes
Documentation