[Docs] Leaky splits #5189

jacobsela · 2024-11-25T16:28:09Z

What changes are proposed in this pull request?

Added docs for new leaky-splits module

How is this patch tested? If it is not, please explain why.

Built docs and checked that they looked generally sane.

Release Notes

New brain method for detecting leaky splits! Try it out by importing from fiftyone.brain import compute_leaky_splits.
Documentation can be found in the brain docs.

Is this a user-facing change that should be mentioned in the release notes?

No. You can skip the rest of this section.
[ x] Yes. Give a description of this change to be included in the release
notes for FiftyOne users.

(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)

see voxel51/fiftyone-brain#203

What areas of FiftyOne does this PR affect?

App: FiftyOne application changes
Build: Build and test infrastructure changes
Core: Core fiftyone Python library changes
[ x] Documentation: FiftyOne documentation changes
Other

Summary by CodeRabbit

New Features
- Introduced a new documentation section on "Leaky Splits" for identifying potential data leaks between dataset splits.
- Included a visual representation of the leaky splits analysis.
Documentation
- Updated existing documentation to include details on the compute_leaky_splits() method, input requirements, expected outputs, and leak detection mechanisms.

coderabbitai · 2024-11-25T16:28:18Z

Walkthrough

The documentation for the FiftyOne Brain has been expanded to include a new section titled "Leaky Splits," which introduces the concept of identifying potential data leaks between dataset splits. This section details the compute_leaky_splits() method, including its input requirements, expected outputs, and how it can automatically clean datasets or tag identified leaks. A visual representation of the leaky splits analysis is also included, while the existing documentation remains unchanged.

Changes

File Path	Change Summary
docs/source/brain.rst	Added a new section on "Leaky Splits," detailing the `compute_leaky_splits()` method, its usage, and visual aids.
fiftyone/brain.py	Introduced the method `compute_leaky_splits(dataset, split_tags=None, split_field=None, split_views=None)`.

Possibly related PRs

Docs integration updates #4940: The changes in this PR involve documentation updates related to integrations, but they do not directly relate to the compute_leaky_splits() method or the concept of leaky splits introduced in the main PR.

Suggested reviewers

mwoodson1
brimoor

🐰 In the land of data, where splits may leak,
A method was born, oh so unique!
With thresholds and tags, it cleans with grace,
Ensuring our models find their true place.
So hop along, friends, let’s celebrate,
For leaky splits now have a new fate! 🌟

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)

docs/source/brain.rst (3)
1781-1808: Fix typo in split_field example.

The code examples effectively demonstrate different ways to define splits. However, there's a typo in the split_field example:
-split_field = ['split'] # holds split values e.g. 'train' or 'test'
+split_field = 'split' # holds split values e.g. 'train' or 'test'
The split_field parameter should be a string, not a list, as it represents a single field name.

1809-1818: Fix typos and formality in output description.

The input/output documentation is comprehensive, but there are a few issues to address:
-This is not only visually appealing and fun to look at, but can also give you an
+This is not only visually informative but can also provide you with an
-visiual and semantic similarity
+visual and semantic similarity
1819-1847: Add missing punctuation.

The technical documentation is thorough and well-explained. Fix the missing period:
-process
+process.
Otherwise, the explanation of similarity backends, models, embeddings, and thresholds is clear and comprehensive.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ee88926 and 5d835ef.

⛔ Files ignored due to path filters (1)

docs/source/images/brain/brain-leaky-splits.png is excluded by !**/*.png, !**/*.png

📒 Files selected for processing (1)

docs/source/brain.rst (2 hunks)

🔇 Additional comments (2)

docs/source/brain.rst (2)

1769-1780: Well-written introduction!

The introduction effectively explains the problem of data leaks between splits and their impact on model evaluation. The explanation is clear and technically accurate.

1848-1851: LGTM!

The image is properly included with appropriate alt text and alignment.

jacobmarks

Please make the requested changes before merging. On the whole, LGTM 👍

docs/source/brain.rst

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)

docs/source/brain.rst (2)
1792-1793: Add a concrete example dataset.

As suggested in the past review comments, providing a real example like COCO would make this more practical and easier to follow.

1821-1825: Fix typos and clarify model training statement.

There are a few issues in this paragraph:

"visiual" should be "visual"

The statement about models not trained on the dataset needs clarification
-Leakiness find leaks by embedding samples with a powerful
-model and finding very close samples in different splits in this space. Large,
-powerful models that were *not* trained on a dataset can provide insight into
-visiual and semantic similarity between images, without creating further leaks
-in the process.
+Leakiness finds leaks by embedding samples with a powerful
+model and finding very close samples in different splits in this space. Large,
+powerful models that were *not* trained on the target dataset can provide insight into
+visual and semantic similarity between images, without creating further leaks
+in the process.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 5d835ef and 02d6e70.

📒 Files selected for processing (1)

docs/source/brain.rst (2 hunks)

🔇 Additional comments (1)

docs/source/brain.rst (1)

77-83: LGTM! Clear and consistent introduction.

The new bullet point follows the established pattern and effectively introduces the leaky splits feature.

docs/source/brain.rst

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (4)

docs/source/brain.rst (4)

1775-1780: Improve clarity of the introduction.

The introduction could be more precise about the consequences of data leaks. Consider mentioning specific impacts like inflated validation metrics or overconfidence in model performance.

-Despite our best efforts, duplicates and other forms of non-IID samples
-show up in our data. When these samples end up in different splits, this
-can have consequences when evaluating a model. It can often be easy to
-overestimate model capability due to this issue.
+Despite our best efforts, duplicates and other forms of non-IID samples
+can appear in our data. When these samples end up in different splits (e.g., train/test),
+they violate the independence assumption and can lead to:
+- Inflated validation/test metrics
+- Overestimation of model generalization capability
+- Unreliable model evaluation results

1794-1807: Add docstrings to code examples.

The code examples would be more helpful with brief comments explaining what each approach does and when to use it.

-    # splits via tags
+    # Use when splits are defined using dataset tags
+    # Example: samples tagged with 'train' or 'test'
     split_tags = ['train', 'test']
     index, leaks = fob.compute_leaky_splits(dataset, split_tags=split_tags)

-    # splits via field
+    # Use when splits are stored in a dataset field
+    # Example: a 'split' field containing values like 'train' or 'test'
     split_field = ['split']
     index, leaks = fob.compute_leaky_splits(dataset, split_field=split_field)

-    # splits via views
+    # Use when splits are defined as dataset views
+    # Example: filtered views for train and test data
     split_views = {
         'train' : some_view
         'test' : some_other_view
     }

1821-1825: Clarify the model selection criteria.

The explanation about model selection should be more specific about what constitutes a "powerful model" and why using models not trained on the dataset is important.

-**What to expect**: Leakiness find leaks by embedding samples with a powerful
-model and finding very close samples in different splits in this space. Large,
-powerful models that were *not* trained on a dataset can provide insight into
-visiual and semantic similarity between images, without creating further leaks
-in the process.
+**What to expect**: Leakiness finds leaks by embedding samples using a foundation
+model (like CLIP or ResNet) and identifying very close samples across different
+splits in the embedding space. Using pre-trained models that were *not* trained
+on your specific dataset is crucial because:
+1. They can capture general visual and semantic similarities
+2. They avoid introducing additional data leakage
+3. They provide unbiased similarity metrics

1841-1848: Enhance threshold documentation.

The threshold documentation should include guidance on how to choose appropriate values and examples of typical ranges.

-**Thresholds**: The leaky-splits module uses a threshold to decide what samples
-are 'too close' and mark them as potential leaks. This threshold can be changed
-either by passing a value to the `threshold` argument of the `compute_leaky_splits()`
-method, or by using the
-:meth:`set_threshold() <fiftyone.brain.internal.core.leaky_splits.SimilarityIndex.set_threshold>`
-method. The best value for your use-case may vary depending on your dataset, as well
-as the embeddings used. A threshold that's too big will have a lot of false positives,
-a threshold that's too small will have a lot of false negatives.
+**Thresholds**: The leaky-splits module uses a threshold to identify potential leaks
+by measuring embedding distances between samples. You can configure this threshold:
+
+- Via the `threshold` argument in `compute_leaky_splits()`
+- Using the :meth:`set_threshold() <fiftyone.brain.internal.core.leaky_splits.SimilarityIndex.set_threshold>` method
+
+Choosing the right threshold:
+- Typical range: 0.1 to 0.5 for normalized embeddings
+- Higher values (>0.3): More aggressive leak detection, higher false positives
+- Lower values (<0.2): More conservative, higher false negatives
+- Start with 0.2 and adjust based on manual inspection of results

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 02d6e70 and 259e14e.

📒 Files selected for processing (1)

docs/source/brain.rst (2 hunks)

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)

docs/source/brain.rst (2)

1781-1808: Add more context to the code examples

While the code examples show different ways to specify splits, they would benefit from additional explanation of when to use each approach.

Consider adding explanatory text before each example:

 # splits via tags
+# Use when your samples are tagged with split names
 split_tags = ['train', 'test']
 index, leaks = fob.compute_leaky_splits(dataset, split_tags=split_tags)

 # splits via field
+# Use when split information is stored in a sample field
 split_field = ['split'] # holds split values e.g. 'train' or 'test'
 index, leaks = fob.compute_leaky_splits(dataset, split_field=split_field)

 # splits via views
+# Use when splits are defined by dataset views
 split_views = {
     'train' : some_view
     'test' : some_other_view
 }

1809-1827: Enhance the COCO example

The COCO example could be more educational with additional comments explaining each step.

Add more context:

+# Load the COCO test split
 coco = foz.load_zoo_dataset("coco-2017", split="test")
+# Clear existing tags to prepare for our split
 coco.untag_samples(coco.distinct("tags"))
 
+# Create a random 70/30 train/test split
 four.random_split(coco, {"train": 0.7, "test": 0.3})
+# Analyze the splits for potential leaks
 index, leaks = compute_leaky_splits(coco, split_tags=['train', 'test'])

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 259e14e and 19ebdad.

📒 Files selected for processing (1)

docs/source/brain.rst (2 hunks)

🔇 Additional comments (3)

docs/source/brain.rst (3)

1770-1773: LGTM! Well-structured section header

The section is properly placed in the documentation hierarchy and follows the established format.

1775-1780: LGTM! Clear introduction of the concept

The introduction effectively explains the problem of data leaks between splits and their impact on model evaluation.

1901-1904: ⚠️ Potential issue

Image file 'brain-leaky-splits.png' is 26MB - requires immediate compression

The image file size of 26MB is significantly large for version control. Large binary files in Git repositories:

Increase clone and fetch times
Consume excessive storage
Impact repository performance

Please compress this image while maintaining acceptable visual quality. Recommended actions:

Use image optimization tools (e.g., ImageOptim, TinyPNG)
Reduce dimensions if unnecessarily large
Consider converting to a more efficient format (e.g., WebP)
Aim for a compressed size under 1MB

✅ Verification successful

Large image file confirmed - requires optimization to reduce 26MB size

The verification confirms the image file is 26MB, which is excessively large for version control. This will negatively impact repository performance and clone times.

Please take the following actions:

Compress the image while maintaining acceptable visual quality using tools like ImageOptim or TinyPNG
Consider reducing dimensions if they exceed documentation needs
Convert to a more efficient format like WebP if supported by your documentation toolchain
Target a compressed size under 1MB for optimal repository performance

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check the size of the image file
ls -lh docs/source/images/brain/brain-leaky-splits.png | awk '{print $5}'

Length of output: 79

docs/source/brain.rst

jacobmarks

Overall LGTM 👍, but please consider making the suggested enhancements before merging

docs/source/brain.rst

jacobmarks · 2024-11-26T15:13:06Z

docs/source/brain.rst

+    coco.untag_samples(coco.distinct("tags"))
+
+    four.random_split(coco, {"train": 0.7, "test": 0.3})
+    index, leaks = compute_leaky_splits(coco, split_tags=['train', 'test'])


Maybe print out the number of leaks? Right now it isn't 100% clear the complete value this provides. You're underselling it

I think the best way to convince the reader is either through having them run it and seeing the visual one they open the app or having them read the blogpost. I think just the number doesn't tell the reader much because how can they trust that the algorithm doesn't have a lot of false positives?

docs/source/brain.rst

jacobmarks · 2024-11-26T15:15:17Z

docs/source/brain.rst

+  can be used to find potential leaks between dataset splits. These splits can
+  be misleading when evaluating a model, giving an overly optimistic measure
+  for the quality of training. 
+
 .. note::

    Check out the :ref:`tutorials page <tutorials>` for detailed examples


Are you adding a tutorial on data leakage to the tutorials section?

Should I? What would this look like?

docs/source/brain.rst

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)

docs/source/brain.rst (3)

1775-1780: Enhance the introduction with more specific details.

Consider expanding the introduction to:

Explain what "non-IID" means for readers unfamiliar with the term

Provide concrete examples of how leaky splits can affect model evaluation

Quantify the potential impact on model performance metrics

1781-1808: Add explanatory comments to the code examples.

The code examples would be more helpful with:

Comments explaining when to use each split specification method

More descriptive variable names (e.g., some_view → train_subset_view)

Expected output format description

1820-1824: Improve the example's clarity and add performance metrics.

Consider enhancing the example by:

Using dataset instead of coco for consistency with other examples

Adding a print statement to show the number of leaks found

Explaining why these specific split ratios were chosen

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 19ebdad and 0900594.

⛔ Files ignored due to path filters (1)

docs/source/images/brain/brain-leaky-splits.png is excluded by !**/*.png, !**/*.png

📒 Files selected for processing (1)

docs/source/brain.rst (2 hunks)

🔇 Additional comments (1)

docs/source/brain.rst (1)

1901-1904: ⚠️ Potential issue

Compress the brain-leaky-splits.png image.

The image file is 26MB which is too large for version control. Please compress it to reduce repository bloat while maintaining acceptable visual quality.

coderabbitai · 2024-11-26T15:59:03Z

docs/source/brain.rst

+Before evaluating your model on your test set, consider getting a version of it
+with the leaks removed. This can be easily done with the built in method
+:meth:`no_leaks_view() <fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.no_leaks_view>`.
+


⚠️ Potential issue

Fix method name inconsistency.

The documentation refers to no_leaks_view() but the actual method name is get_no_leaks_view().

-:meth:`no_leaks_view() <fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.no_leaks_view>` +:meth:`get_no_leaks_view() <fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.get_no_leaks_view>`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Before evaluating your model on your test set, consider getting a version of it

with the leaks removed. This can be easily done with the built in method

:meth:`no_leaks_view() <fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.no_leaks_view>`.

Before evaluating your model on your test set, consider getting a version of it

with the leaks removed. This can be easily done with the built in method

:meth:`get_no_leaks_view() <fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.get_no_leaks_view>`.

coderabbitai

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (2)

docs/source/brain.rst (2)

1775-1780: Enhance the introduction with concrete examples.

Consider adding specific examples of what constitutes a "leak" between splits to make the concept more tangible for users. For example, mention cases like identical/near-identical images, same scene from different angles, or temporally adjacent frames from a video sequence.

1781-1811: Add expected output and return value explanations to code examples.

The code examples clearly show different ways to specify splits, but would benefit from:

Comments showing example output format

Explanation of what the returned index and leaks objects contain

Example values for the split field/tags

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0900594 and e948e4d.

📒 Files selected for processing (1)

docs/source/brain.rst (2 hunks)

🔇 Additional comments (1)

docs/source/brain.rst (1)

1841-1844: ⚠️ Potential issue

Fix method name inconsistency.

The method name no_leaks_view() is referenced here, but it's inconsistent with the actual method name in the codebase.

-:meth:`no_leaks_view() <fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.no_leaks_view>`
+:meth:`get_no_leaks_view() <fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.get_no_leaks_view>`

Likely invalid or redundant comment.

coderabbitai · 2024-11-26T16:17:47Z

docs/source/brain.rst

+Here is a sample snippet to run this on the `COCO <https://cocodataset.org/#home>`_ dataset.
+Try it for yourself and see what you may find.
+
+.. code-block:: python
+    :linenos:
+
+    import fiftyone as fo
+    import fiftyone.zoo as foz
+    import fiftyone.utils.random as four
+    from fiftyone.brain import compute_leaky_splits
+
+    # load coco
+    dataset = foz.load_zoo_dataset("coco-2017", split="test")
+
+    # set up splits via tags
+    dataset.untag_samples(dataset.distinct("tags"))
+    four.random_split(dataset, {"train": 0.7, "test": 0.3})
+
+    # compute leaks
+    index, leaks = compute_leaky_splits(dataset, split_tags=['train', 'test'])
+


🛠️ Refactor suggestion

Expand the COCO example with real-world insights.

The COCO example would be more valuable with:

Expected output format and typical findings

Common types of leaks found in COCO

Interpretation guidelines for the results

Performance impact examples (e.g., accuracy difference with/without leaks)

coderabbitai · 2024-11-26T16:17:47Z

docs/source/brain.rst

+Once you have these leaks, it is wise to look through them. You may gain some insight
+into the source of the leaks.
+
+.. code-block:: python
+    :linenos:
+
+    session = fo.launch_app(leaks)
+
+Before evaluating your model on your test set, consider getting a version of it
+with the leaks removed. This can be easily done with the built in method
+:meth:`no_leaks_view() <fiftyone.brain.internal.core.leaky_splits.LeakySplitsIndex.no_leaks_view>`.
+
+.. code-block:: python
+    :linenos:
+
+    # if you already have it
+    test_set = some_view
+
+    # can also be found with the variable `split_views` from the index
+    # make sure to put in the right string based on the field/tag/key in view dict
+    # passed when building the index
+    test_set = index.split_views['test']
+
+    test_set_no_leaks = index.no_leaks_view(test_set) # return a view with leaks removed
+    session.view = test_set_no_leaks
+
+    # do evaluations on test_set_no_leaks rather than test_set
+
+Performance on the clean test set will can be closer to the performance of the
+model in the wild. If you found some leaks in your dataset, consider comparing
+performance on the base test set against the clean test set.
+


🛠️ Refactor suggestion

Add error handling and edge cases documentation.

The section should include:

How to handle cases when no leaks are found

Error scenarios (e.g., invalid split specifications)

Performance considerations for large datasets

Best practices for threshold selection

coderabbitai · 2024-11-26T16:17:47Z

docs/source/brain.rst

+**Thresholds**: The leaky-splits module uses a threshold to decide what samples
+are 'too close' and mark them as potential leaks. This threshold can be changed
+either by passing a value to the `threshold` argument of the `compute_leaky_splits()`
+method, or by using the
+:meth:`set_threshold() <fiftyone.brain.internal.core.leaky_splits.SimilarityIndex.set_threshold>`
+method. The best value for your use-case may vary depending on your dataset, as well
+as the embeddings used. A threshold that's too big will have a lot of false positives,
+a threshold that's too small will have a lot of false negatives.
+


🛠️ Refactor suggestion

Add threshold guidance and examples.

The threshold explanation would benefit from:

Recommended threshold ranges for common use cases

Examples of how different thresholds affect results

A method to automatically determine optimal threshold values

coderabbitai · 2024-11-26T16:17:47Z

docs/source/brain.rst

+.. image:: /images/brain/brain-leaky-splits.png
+   :alt: leaky-splits
+   :align: center
+


💡 Codebase verification

⚠️ Potential issue

Image file size is smaller than reported but still needs optimization

The actual size of 'brain-leaky-splits.png' is 7.6MB, not 26MB as initially stated. While this is better than previously thought, it's still larger than optimal for version control. The recommendation to compress remains valid but with less urgency:

Current size: 7.6MB

Target size: <1MB

Use image optimization tools while maintaining visual quality

Consider WebP format if browser support allows

🔗 Analysis chain

Image file 'brain-leaky-splits.png' is 26MB - requires immediate compression

The image file size of 26MB is significantly large for version control. Large binary files in Git repositories:

Increase clone and fetch times

Consume excessive storage

Impact repository performance

Please compress this image while maintaining acceptable visual quality. Recommended actions:

Use image optimization tools (e.g., ImageOptim, TinyPNG)

Reduce dimensions if unnecessarily large

Consider converting to a more efficient format (e.g., WebP)

Aim for a compressed size under 1MB

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check the size of the image file ls -lh docs/source/images/brain/brain-leaky-splits.png | awk '{print $5}'

Length of output: 80

brimoor

LGTM, appreciate you adding this! 🥇

leaky splits docs

5d835ef

jacobsela requested review from jacobmarks and brimoor November 25, 2024 16:28

jacobsela mentioned this pull request Nov 25, 2024

Feature/leaky splits voxel51/fiftyone-brain#203

Merged

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

jacobmarks previously approved these changes Nov 25, 2024

View reviewed changes

Jacob Sela added 6 commits November 25, 2024 08:55

fixed name of method

46c53ba

added datasetview as an input option

e345c05

fiftyone brain -> the fiftyone brain

40ff57c

of identifying -> to identify

5a823a4

made language a little more professional

d67015a

leakyness -> leakiness

02d6e70

jacobsela dismissed jacobmarks’s stale review via 02d6e70 November 25, 2024 17:05

fix

259e14e

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

docs/source/brain.rst Show resolved Hide resolved

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

added more snippets, more thorough explainations

19ebdad

coderabbitai bot reviewed Nov 25, 2024

View reviewed changes

docs/source/brain.rst Outdated Show resolved Hide resolved

compressed image

873287e

brimoor changed the title ~~leaky splits docs~~ [Docs] Leaky splits Nov 26, 2024

jacobmarks self-requested a review November 26, 2024 15:09

jacobmarks previously approved these changes Nov 26, 2024

View reviewed changes

brimoor requested changes Nov 26, 2024

View reviewed changes

docs/source/brain.rst Outdated Show resolved Hide resolved

docs/source/brain.rst Outdated Show resolved Hide resolved

docs/source/brain.rst Outdated Show resolved Hide resolved

typo

0900594

jacobsela dismissed jacobmarks’s stale review via 0900594 November 26, 2024 15:56

coderabbitai bot reviewed Nov 26, 2024

View reviewed changes

cleaned up example code

e948e4d

coderabbitai bot reviewed Nov 26, 2024

View reviewed changes

jacobsela requested a review from brimoor November 26, 2024 16:48

jacobsela requested a review from jacobmarks November 26, 2024 16:48

brimoor approved these changes Nov 26, 2024

View reviewed changes

jacobsela merged commit 738b128 into release/v1.1.0 Nov 26, 2024
13 checks passed

jacobsela deleted the leaky-splits-docs branch November 26, 2024 21:33

This was referenced Nov 27, 2024

Merge release/v1.1.0 to develop #5190

Merged

[Docs] Features + release notes #5188

Merged

Merge release/v1.1.0 to develop #5205

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Leaky splits #5189

[Docs] Leaky splits #5189

jacobsela commented Nov 25, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 25, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

jacobmarks left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

jacobmarks left a comment

jacobmarks Nov 26, 2024

jacobsela Nov 26, 2024

jacobmarks Nov 26, 2024

jacobsela Nov 26, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 26, 2024

coderabbitai bot left a comment

coderabbitai bot Nov 26, 2024

coderabbitai bot Nov 26, 2024

coderabbitai bot Nov 26, 2024

coderabbitai bot Nov 26, 2024

brimoor left a comment

[Docs] Leaky splits #5189

[Docs] Leaky splits #5189

Conversation

jacobsela commented Nov 25, 2024 • edited by coderabbitai bot Loading

What changes are proposed in this pull request?

How is this patch tested? If it is not, please explain why.

Release Notes

Is this a user-facing change that should be mentioned in the release notes?

What areas of FiftyOne does this PR affect?

Summary by CodeRabbit

coderabbitai bot commented Nov 25, 2024 • edited Loading

Walkthrough

Changes

Possibly related PRs

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

jacobmarks left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

jacobmarks left a comment

Choose a reason for hiding this comment

jacobmarks Nov 26, 2024

Choose a reason for hiding this comment

jacobsela Nov 26, 2024

Choose a reason for hiding this comment

jacobmarks Nov 26, 2024

Choose a reason for hiding this comment

jacobsela Nov 26, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 26, 2024

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 26, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 26, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 26, 2024

Choose a reason for hiding this comment

coderabbitai bot Nov 26, 2024

Choose a reason for hiding this comment

brimoor left a comment

Choose a reason for hiding this comment

jacobsela commented Nov 25, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 25, 2024 •

edited

Loading