Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bounds for embedded lists in QP #5202

Merged
merged 10 commits into from
Dec 2, 2024
Merged

Conversation

benjaminpkane
Copy link
Contributor

@benjaminpkane benjaminpkane commented Dec 2, 2024

What changes are proposed in this pull request?

Fixes bounds computations for embedded lists for QP sidebar. != None is not a valid filtering approach for multikey searches

How is this patch tested? If it is not, please explain why.

Added lightning test coverage

What areas of FiftyOne does this PR affect?

  • App: FiftyOne application changes
  • Build: Build and test infrastructure changes
  • Core: Core fiftyone Python library changes
  • Documentation: FiftyOne documentation changes
  • Other

Summary by CodeRabbit

  • New Features

    • Enhanced numeral formatting to improve error handling for "NaN" outputs.
    • Updated RangeSlider to provide more informative feedback when no bounds are available.
  • Bug Fixes

    • Improved handling of special float values (inf, nan) in query responses.
    • Enhanced error handling and control flow in view loading and filtering processes.
  • Tests

    • Updated test cases to reflect changes in data handling and expected outcomes for various data types.

Copy link
Contributor

coderabbitai bot commented Dec 2, 2024

Warning

Rate limit exceeded

@benjaminpkane has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 24 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between f085b74 and 249f48e.

Walkthrough

The changes in this pull request focus on enhancing error handling and feedback mechanisms within the numeral formatting and filtering components of the application. Modifications include updates to the getFormatter function to handle "NaN" outputs more robustly, improvements to the RangeSlider component for better contextual messaging, and adjustments to the nonfinitesText selectorFamily for more informative data representation. Additionally, the Lightning server's query handling logic has been refined, along with corresponding updates to unit tests to ensure alignment with the new data structures and expected outcomes.

Changes

File Change Summary
app/packages/core/src/components/Common/utils.tsx Modified getFormatter function to enhance error handling for numeral formatting, specifically addressing "NaN" outputs.
app/packages/core/src/components/Filters/NumericFieldFilter/RangeSlider.tsx Introduced nonfinitesText state variable for improved conditional rendering in the RangeSlider component, enhancing user feedback when no bounds are available.
app/packages/core/src/components/Filters/NumericFieldFilter/state.ts Renamed hasNonfinites selectorFamily to nonfinitesText, altering its logic to return a comma-separated string of non-finite keys.
fiftyone/server/lightning.py Enhanced query handling in the Lightning server, including updates to the _do_async_query and _parse_result functions for improved clarity and functionality.
tests/unittests/lightning_tests.py Updated test cases to reflect changes in data handling for floating-point special values and modified GraphQL query structures accordingly.

Possibly related PRs

Suggested labels

enhancement, app, server, documentation

Suggested reviewers

  • minhtuev
  • CamronStaley

🐇 In the meadow where numbers play,
A formatter found a better way.
With "NaN" no longer causing fright,
It brings clarity, shining bright.
The sliders now tell tales so true,
With every bound, a clearer view! 🌼


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@benjaminpkane benjaminpkane changed the title Do not Fix bounds for embedded lists in QP Dec 2, 2024
@benjaminpkane benjaminpkane self-assigned this Dec 2, 2024
@benjaminpkane benjaminpkane marked this pull request as ready for review December 2, 2024 20:03
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
app/packages/core/src/components/Filters/NumericFieldFilter/state.ts (1)

42-49: Refactor to eliminate code duplication

The mapping and filtering logic is duplicated. Consider extracting it into a helper function for better maintainability.

+ const getNonfiniteKeys = (get, params) =>
+   FLOAT_NONFINITES.map((key) => [
+     key,
+     get(nonfiniteAtom({ key, ...params })),
+   ])
+     .filter(([_, v]) => v)
+     .map(([k]) => k);

  export const nonfinitesText = selectorFamily({
    key: "nonfinitesText",
    get:
      (params: { path: string; modal: boolean }) =>
      ({ get }) => {
-       const result = FLOAT_NONFINITES.map((key) => [
-         key,
-         get(nonfiniteAtom({ key, ...params })),
-       ])
-         .filter(([_, v]) => v)
-         .map(([k]) => k);
+       const result = getNonfiniteKeys(get, params);
        return result.length ? result.join(", ") : null;
      },
  });
app/packages/core/src/components/Common/utils.tsx (1)

71-78: Simplify format selection logic and improve NaN handling

The format selection logic could be simplified, and the NaN handling could be more explicit.

+ const getNumberFormat = (fieldType: string, bounds: [number, number]) => {
+   if ([INT_FIELD, FRAME_NUMBER_FIELD, FRAME_SUPPORT_FIELD].includes(fieldType)) {
+     return "0a";
+   }
+   return bounds[1] - bounds[0] < 0.1 ? "0.0000a" : "0.00a";
+ };

- const str = numeral(v).format(
-   [INT_FIELD, FRAME_NUMBER_FIELD, FRAME_SUPPORT_FIELD].includes(fieldType)
-     ? "0a"
-     : bounds[1] - bounds[0] < 0.1
-     ? "0.0000a"
-     : "0.00a"
- );
+ const format = getNumberFormat(fieldType, bounds);
+ const str = numeral(v).format(format);

- return str === "NaN" ? v.toString() : str;
+ // Explicitly handle NaN case
+ if (str === "NaN" || Number.isNaN(v)) {
+   return v.toString();
+ }
+ return str;
app/packages/core/src/components/Filters/NumericFieldFilter/RangeSlider.tsx (1)

38-38: Consider using a more flexible text formatting approach

The text concatenation could be made more flexible and maintainable.

  const nonfinitesText = useRecoilValue(state.nonfinitesText({ path, modal }));

+ const getBoxText = (nonfinitesText: string | null) => {
+   if (!nonfinitesText) {
+     return "No results";
+   }
+   return `${nonfinitesText} present`;
+ };

  if (!hasBounds) {
    return (
-     <Box text={nonfinitesText ? `${nonfinitesText} present` : "No results"} />
+     <Box text={getBoxText(nonfinitesText)} />
    );
  }

Also applies to: 41-43

fiftyone/server/lightning.py (2)

326-328: Consider optimizing query performance.

The current implementation adds sort before match, which could impact performance as MongoDB would sort the entire collection before filtering. Consider reordering to filter first:

-            query.insert(0, {"$match": {k: v}})
-            query.insert(0, {"$sort": {k: 1}})
+            query.insert(0, {"$sort": {k: 1}})
+            query.insert(0, {"$match": {k: v}})

509-515: LGTM: Enhanced result parsing with finite check.

The changes improve result parsing by properly handling non-finite float values. Consider adding debug logging for when non-finite values are encountered to help with troubleshooting.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7287351 and 750694b.

📒 Files selected for processing (5)
  • app/packages/core/src/components/Common/utils.tsx (1 hunks)
  • app/packages/core/src/components/Filters/NumericFieldFilter/RangeSlider.tsx (1 hunks)
  • app/packages/core/src/components/Filters/NumericFieldFilter/state.ts (1 hunks)
  • fiftyone/server/lightning.py (4 hunks)
  • tests/unittests/lightning_tests.py (27 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
app/packages/core/src/components/Common/utils.tsx (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

app/packages/core/src/components/Filters/NumericFieldFilter/RangeSlider.tsx (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

app/packages/core/src/components/Filters/NumericFieldFilter/state.ts (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

🔇 Additional comments (5)
fiftyone/server/lightning.py (2)

12-12: LGTM: Required import for float handling.

The math module import is necessary for checking finite float values.


429-432: LGTM: Improved float handling logic.

The changes correctly handle float values by:

  1. Adding float-specific pipeline steps when needed
  2. Ensuring existence check is consistently applied
tests/unittests/lightning_tests.py (3)

538-538: LGTM: Comprehensive test coverage for float handling.

The test assertions have been properly updated to verify the new behavior of returning None for non-finite float values in various scenarios.

Also applies to: 572-572, 591-591, 609-609, 635-635, 644-644, 654-654, 663-663, 672-672, 681-681, 734-734, 753-753, 771-771, 797-797, 806-806, 816-816, 825-825, 834-834, 843-843, 932-932, 951-951, 969-969


1063-1067: LGTM: Added test cases for group dataset queries.

New test cases properly verify the handling of classifications with confidence values, including negative values.

Also applies to: 1076-1080


1092-1096: LGTM: Comprehensive group dataset test coverage.

The test cases thoroughly verify:

  1. Float field handling in classifications
  2. Proper slicing of group datasets
  3. Correct min/max computations

Also applies to: 1114-1120, 1128-1132, 1146-1152, 1160-1164

CamronStaley
CamronStaley previously approved these changes Dec 2, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
app/packages/core/src/components/Filters/NumericFieldFilter/state.ts (1)

28-33: Add type safety improvements

The selector could benefit from better type safety:

  1. Missing return type annotation
  2. No type checking for the data structure

Consider adding these type improvements:

 export const nonfinitesText = selectorFamily({
   key: "nonfinitesText",
   get:
     (params: { path: string; modal: boolean }) =>
-    ({ get }) => {
+    ({ get }): string | null => {
+      const data: Record<string, boolean> = get(nonfiniteData({ ...params, extended: false }));
fiftyone/server/lightning.py (1)

326-328: Consider performance impact of added sorting operation

While the addition of sorting ensures consistent ordering, consider the performance impact when dealing with large datasets.

Consider adding a conditional sort only when necessary:

 for k, v in filter.items():
     query.insert(0, {"$match": {k: v}})
-    query.insert(0, {"$sort": {k: 1}})
+    if needs_sorting(k):  # Add helper function to determine if sorting is needed
+        query.insert(0, {"$sort": {k: 1}})
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 750694b and f085b74.

📒 Files selected for processing (3)
  • app/packages/core/src/components/Filters/NumericFieldFilter/state.ts (2 hunks)
  • fiftyone/server/lightning.py (5 hunks)
  • fiftyone/server/view.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
app/packages/core/src/components/Filters/NumericFieldFilter/state.ts (1)

Pattern **/*.{ts,tsx}: Review the Typescript and React code for conformity with best practices in React, Recoil, Graphql, and Typescript. Highlight any deviations.

🔇 Additional comments (7)
fiftyone/server/lightning.py (4)

429-430: LGTM: Proper handling of non-finite float values

The addition of special float handling in the pipeline is correct and aligns with the PR objective.


432-432: LGTM: Added existence check

The addition of the existence check in the pipeline is a good defensive programming practice.


509-515: LGTM: Proper handling of non-finite float values

The addition of math.isfinite() check ensures proper handling of special float values (inf, -inf, nan).


143-147: LGTM: Improved filter initialization logic

The conditional initialization of filters based on group field and slice presence is a good improvement.

Let's verify the usage of group_field across the codebase:

✅ Verification successful

Based on the search results, I can see that the group_field is a core feature of FiftyOne datasets that allows organizing samples into groups. The filter initialization in the code under review is correctly using dataset.group_field to filter samples based on the group slice when present.

LGTM: Filter initialization correctly handles group field filtering

The conditional filter initialization using dataset.group_field and input.slice is properly implemented to filter samples by their group slice when applicable. This is consistent with FiftyOne's group field functionality as evidenced by extensive usage across the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for group_field usage to ensure consistent handling
rg "group_field" -A 3

Length of output: 66275

fiftyone/server/view.py (3)

617-619: LGTM: Added defensive range check

The addition of the range check before unpacking prevents potential KeyError exceptions.


644-647: LGTM: Proper datetime range handling

The addition of range check and proper timestamp conversion ensures robust datetime handling.


649-651: LGTM: Consistent range handling for numeric values

The addition of range check maintains consistency with other type handlers and prevents potential errors.

Comment on lines 34 to 37
const result = Object.fromEntries(
Object.entries(data).filter(([k, v]) => k !== "none" && Boolean(v))
);
return result.length ? result.join(", ") : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix incorrect Object manipulation logic

The current implementation has several issues:

  1. Object.fromEntries() creates an object, but then tries to use array methods (length, join)
  2. The filtering logic could be more efficient

Apply this fix:

-      const result = Object.fromEntries(
-        Object.entries(data).filter(([k, v]) => k !== "none" && Boolean(v))
-      );
-      return result.length ? result.join(", ") : null;
+      const filtered = Object.entries(data)
+        .filter(([k, v]) => k !== "none" && Boolean(v))
+        .map(([k]) => k);
+      return filtered.length ? filtered.join(", ") : null;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const result = Object.fromEntries(
Object.entries(data).filter(([k, v]) => k !== "none" && Boolean(v))
);
return result.length ? result.join(", ") : null;
const filtered = Object.entries(data)
.filter(([k, v]) => k !== "none" && Boolean(v))
.map(([k]) => k);
return filtered.length ? filtered.join(", ") : null;

@@ -317,7 +317,7 @@ async def _do_async_query(
filter: t.Optional[t.Mapping[str, str]],
):
if isinstance(query, DistinctQuery):
if query.has_list and not query.filters:
if query.has_list:
return await _do_distinct_query(collection, query, filter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure about mongo, but distinct scans in most dbs are really expensive and slow... have you confirmed that relying more on distinct improves performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A distinct scan with high cardinality is not good, yes. I am not aware of an aggregation pipeline that can return the first N sorted results using a multikey index, though. But we should investigate more

[INT_FIELD, FRAME_NUMBER_FIELD, FRAME_SUPPORT_FIELD].includes(fieldType)
? "0a"
: bounds[1] - bounds[0] < 0.1
? "0.0000a"
: "0.00a"
);
return str === "NaN" ? v.toString() : str;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if str is not a number (null?), we just show it as null?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The formatter (numeral().format) returns an NaN string when it can't meaningfully format given the precision provided ("0.00a"). Punting on a better solution for now, this at least shows the value as opposed to NaN

Copy link
Contributor

@minhtuev minhtuev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment, otherwise LGTM

@benjaminpkane benjaminpkane merged commit bf1d29d into release/v1.1.0 Dec 2, 2024
13 checks passed
@benjaminpkane benjaminpkane deleted the bugfix/ne-none branch December 2, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants