-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for jsonlines #35
Conversation
WalkthroughThis update includes versioning and debug logging adjustments along with new functionalities for data handling in the Changes
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #35 +/- ##
==========================================
+ Coverage 87.74% 87.93% +0.19%
==========================================
Files 13 13
Lines 563 572 +9
Branches 141 141
==========================================
+ Hits 494 503 +9
Misses 51 51
Partials 18 18 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range and nitpick comments (5)
src/hckr/utils/DataUtils.py (1)
Line range hint
43-43
: Rename unused loop variable.The variable
index
is unused in the loop and can be renamed to_
to indicate it is intentionally unused.- for index, row in df.head(count).iterrows(): + for _, row in df.head(count).iterrows():Tools
Ruff
8-8:
json
imported but unused (F401)Remove unused import:
json
tests/cli/data/test_peek.py (2)
Line range hint
3-5
: Remove unused imports.The imports for
pyarrow
andpyarrow.parquet
are not used in this file and should be removed to clean up the code.- import pyarrow as pa # type: ignore - from pyarrow import parquet as pq # type: ignoreTools
Ruff
63-63: f-string without any placeholders (F541)
Remove extraneous
f
prefix
Line range hint
30-30
: Remove extraneous f-string prefixes.There are several f-string prefixes used without placeholders, which are unnecessary and should be removed.
- print(f"Reading from file :{FILE}") + print("Reading from file :", FILE) - print(f"=" * 50) + print("=" * 50) - print(f"Running for {_format}") + print("Running for", _format)Also applies to: 37-37, 52-52, 63-63
Tools
Ruff
63-63: f-string without any placeholders (F541)
Remove extraneous
f
prefixsrc/hckr/cli/data.py (2)
Line range hint
2-2
: Remove unused imports.The imports for
logging
andcron_descriptor.get_description
are not used in this file and should be removed to clean up the code.- import logging - from cron_descriptor import get_description # type: ignoreAlso applies to: 9-9
Line range hint
103-103
: Remove unnecessary open mode parameters.The open mode parameters are unnecessary and should be removed to adhere to best practices and avoid potential bugs.
- with open(output, "wb") as out: + with open(output) as out:
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (5)
- src/hckr/about.py (1 hunks)
- src/hckr/cli/data.py (2 hunks)
- src/hckr/utils/DataUtils.py (4 hunks)
- tests/cli/data/test_peek.py (2 hunks)
- tests/cli/resources/data/peek/json-lines.json (1 hunks)
Files skipped from review due to trivial changes (1)
- src/hckr/about.py
Additional context used
Biome
tests/cli/resources/data/peek/json-lines.json
[error] 1-2: End of file expected (parse)
Use an array for a sequence of values:
[1, 2]
[error] 2-3: End of file expected (parse)
Use an array for a sequence of values:
[1, 2]
Ruff
src/hckr/utils/DataUtils.py
8-8:
json
imported but unused (F401)Remove unused import:
json
43-43: Loop control variable
index
not used within loop body (B007)Rename unused
index
to_index
tests/cli/data/test_peek.py
3-3:
pyarrow
imported but unused (F401)Remove unused import:
pyarrow
5-5:
pyarrow.parquet
imported but unused (F401)Remove unused import:
pyarrow.parquet
30-30: f-string without any placeholders (F541)
Remove extraneous
f
prefix
37-37: f-string without any placeholders (F541)
Remove extraneous
f
prefix
52-52: f-string without any placeholders (F541)
Remove extraneous
f
prefix
63-63: f-string without any placeholders (F541)
Remove extraneous
f
prefixsrc/hckr/cli/data.py
2-2:
logging
imported but unused (F401)Remove unused import:
logging
9-9:
cron_descriptor.get_description
imported but unused (F401)Remove unused import:
cron_descriptor.get_description
103-103: Unnecessary open mode parameters (UP015)
Remove open mode parameters
Additional comments not posted (1)
tests/cli/resources/data/peek/json-lines.json (1)
1-3
: Consider wrapping JSON entries in an array for compatibility.The static analysis tool suggests wrapping these individual JSON objects in an array. This would typically be necessary if the file is meant to be read as a single JSON array rather than JSON Lines. Please confirm the intended use.
If the intention is to use JSON Lines format, then the current format is correct and the tool's error can be ignored.
Verification successful
JSON Lines format is correctly used.
The application code correctly uses the
pd.read_json
method with thelines=True
parameter, indicating that the JSON Lines format is intended and properly handled.
src/hckr/utils/DataUtils.py
:df = pd.read_json(_file, lines=True)
The current format of the
json-lines.json
file is correct for JSON Lines and does not need to be wrapped in an array.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify if JSON Lines are being parsed correctly in the application. # Test: Check for JSON Lines parsing logic in the application code. rg --type python $'pd.read_json(.*lines=True.*)'Length of output: 80
Script:
#!/bin/bash # Description: Verify if JSON Lines are being parsed correctly in the application. # Test: Check for JSON Lines parsing logic in the application code. rg --type py 'pd.read_json\(.*lines=True.*\)'Length of output: 120
Tools
Biome
[error] 1-2: End of file expected (parse)
Use an array for a sequence of values:
[1, 2]
[error] 2-3: End of file expected (parse)
Use an array for a sequence of values:
[1, 2]
import fastavro | ||
import pandas as pd | ||
import rich | ||
from pyarrow import parquet as pq # type: ignore | ||
from rich.table import Table | ||
import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused import.
The json
import is not used in this file and should be removed to clean up the code.
- import json
Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
import json |
Tools
Ruff
8-8:
json
imported but unused (F401)Remove unused import:
json
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests