Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix lazy list #617

Merged
merged 4 commits into from
Jan 2, 2025
Merged

Fix lazy list #617

merged 4 commits into from
Jan 2, 2025

Conversation

mdellabitta
Copy link
Contributor

@mdellabitta mdellabitta commented Jan 2, 2025

Important

Centralizes iter function for compressed file iteration and adds txtFilter for .txt files, removing redundant code across multiple harvesters.

  • Refactoring:
    • Centralized iter function for ZipInputStream and TarInputStream in FileHarvester.scala.
    • Removed individual iter implementations from CommunityWebsHarvester.scala, DlgFileHarvester.scala, DplaJsonlFileHarvester.scala, and 10 other files.
  • File Filters:
    • Added txtFilter to FileFilters.scala for filtering .txt files.
  • Miscellaneous:
    • Updated SiFileHarvester.scala to use FileFilters.txtFilter instead of a custom filter class.

This description was created by Ellipsis for 7f024b3. It will automatically update as commits are pushed.

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Reviewed everything up to c133be6 in 2 minutes and 12 seconds

More details
  • Looked at 889 lines of code in 16 files
  • Skipped 0 files when reviewing.
  • Skipped posting 2 drafted comments based on config settings.
1. src/main/scala/dpla/ingestion3/harvesters/file/FileHarvester.scala:86
  • Draft comment:
    Consider adding exception handling for IOExceptions that may occur when reading entries from the ZipInputStream or TarInputStream to prevent potential crashes.
  • Reason this comment was not posted:
    Decided after close inspection that this draft comment was likely wrong and/or not actionable:
    The code does have some error handling with Try, but not for all IO operations. However, these methods return LazyList which means exceptions would propagate to the caller naturally. The caller (handleFile method) already takes a Try[Int] return type, suggesting there's likely error handling at that level. Without seeing the full error handling strategy across the codebase, suggesting local exception handling here could be redundant or conflict with the existing approach.
    I might be missing critical error cases that could crash the application. The existing error handling might not be sufficient for all failure scenarios.
    While error handling is important, suggesting local exception handling without understanding the broader error handling strategy could lead to redundant or conflicting approaches. The existing Try pattern suggests errors are handled at a higher level.
    Delete the comment as it makes assumptions about the error handling strategy without full context, and the code already shows signs of deliberate error handling patterns.
2. src/main/scala/dpla/ingestion3/harvesters/file/NorthwestHeritageFileHarvester.scala:42
  • Draft comment:
    The match statement in getInputStream is redundant since files are already filtered by zipFilter. Consider simplifying this method.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The getInputStream method in multiple files (e.g., NorthwestHeritageFileHarvester, OaiFileHarvester, etc.) uses a match statement to check the file extension. However, since the files are already filtered by the zipFilter, this match statement is redundant and can be simplified.

Workflow ID: wflow_2jJ4uqp5qqtm3ml2


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

Copy link

sonarqubecloud bot commented Jan 2, 2025

Quality Gate Failed Quality Gate failed

Failed conditions
35.7% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks good to me! Incremental review on 7f024b3 in 1 minute and 4 seconds

More details
  • Looked at 987 lines of code in 17 files
  • Skipped 0 files when reviewing.
  • Skipped posting 4 drafted comments based on config settings.
1. src/main/scala/dpla/ingestion3/harvesters/oai/LocalOaiHarvester.scala:4
  • Draft comment:
    Consider moving ParsedResult to a more common package if it's used across different harvesters, not just file-based ones.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The writeOut method in LocalHarvester is being used in multiple places, but the ParsedResult class is imported from file package. This could lead to confusion or tight coupling between unrelated components.
2. src/main/scala/dpla/ingestion3/harvesters/file/CommunityWebsHarvester.scala:100
  • Draft comment:
    Consider moving ParsedResult to a more common package if it's used across different harvesters, not just file-based ones.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The writeOut method in LocalHarvester is being used in multiple places, but the ParsedResult class is imported from file package. This could lead to confusion or tight coupling between unrelated components.
3. src/main/scala/dpla/ingestion3/harvesters/file/DlgFileHarvester.scala:105
  • Draft comment:
    Consider moving ParsedResult to a more common package if it's used across different harvesters, not just file-based ones.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The writeOut method in LocalHarvester is being used in multiple places, but the ParsedResult class is imported from file package. This could lead to confusion or tight coupling between unrelated components.
4. src/main/scala/dpla/ingestion3/harvesters/file/DplaJsonlFileHarvester.scala:107
  • Draft comment:
    Consider moving ParsedResult to a more common package if it's used across different harvesters, not just file-based ones.
  • Reason this comment was not posted:
    Confidence changes required: 50%
    The writeOut method in LocalHarvester is being used in multiple places, but the ParsedResult class is imported from file package. This could lead to confusion or tight coupling between unrelated components.

Workflow ID: wflow_nh918qknGCNMeouE


You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

@mdellabitta mdellabitta merged commit 7b10c08 into main Jan 2, 2025
3 checks passed
@mdellabitta mdellabitta deleted the fix-lazy-list branch January 2, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant