Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for StringIO #295

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Support for StringIO #295

wants to merge 1 commit into from

Conversation

awicenec
Copy link
Contributor

@awicenec awicenec commented Nov 6, 2024

JIRA Ticket

liu-418

Type

  • Feature (addition)
  • Bug fix
  • Refactor (change)
  • Documentation

Problem/Issue

Provide ability to read and write plain strings if requested by the user.

Solution

The basic solution is to use StringIO rather than BytesIO if requested by the user, but the IO implementation of DALiuGE is quite complex and requires a number of changes to make this happen. As a side-effect the functionality to provide data for a memory drop through the pydata field is now sort of type-safe and obeying the type specified in the pull-down menu. Moreover, when copying from one drop to another the original type of the data is preserved. As a side effect, when copying from one MemoryDrop to another the types of those drops need to match. This is a bit of a hidden feature right now and can lead to unexpected graph execution failures, since the way to specify the type of the target MemoryDrop is by using the pydata type pull-down menu, but leaving the pydata value empty. We might want to think abut a more explicit way of specifying the type of a MemoryDrop in the future.

Checklist

No specific documentation has been added.

  • Unittests added
    • The existing unit tests are covering a good part of this functionality already. We should probably add a specific graph for the more edge cases.
  • Documentation added
    • In particular the 'type checking' side effect should be documented, but since we should probably add an explicit type to DataDROPs more generally that will be the right time to document the whole story.

Summary by Sourcery

Add support for StringIO to enable reading and writing of plain strings, enhancing type safety and data type preservation during data operations.

New Features:

  • Introduce support for reading and writing plain strings using StringIO instead of BytesIO when specified by the user.

Enhancements:

  • Ensure type safety and preservation of data types when copying data between drops, with a focus on maintaining the original data type.

Copy link
Contributor

sourcery-ai bot commented Nov 6, 2024

Reviewer's Guide by Sourcery

This PR adds support for StringIO to handle string data alongside BytesIO in the DALiuGE engine. The implementation includes type-safe handling of pydata fields, preservation of data types when copying between drops, and UTF-8 string handling capabilities. The changes span across multiple components including data drops, IO handling, and data parsing.

Sequence diagram for data reading and writing with StringIO

sequenceDiagram
    participant User
    participant DataDROP
    participant DataIO
    participant StringIO

    User->>DataDROP: Request to read data
    DataDROP->>DataIO: getIO()
    DataIO->>StringIO: open(OpenMode.OPEN_READ)
    StringIO-->>DataIO: Return data
    DataIO-->>DataDROP: Return data
    DataDROP-->>User: Return data

    User->>DataDROP: Request to write data
    DataDROP->>DataIO: getIO()
    DataIO->>StringIO: open(OpenMode.OPEN_WRITE)
    StringIO-->>DataIO: Write data
    DataIO-->>DataDROP: Confirm write
    DataDROP-->>User: Confirm write
Loading

Class diagram for updated IO handling

classDiagram
    class DataIO {
        +open(mode: OpenMode)
        +read(count: int)
        +write(data)
        +close()
    }
    class MemoryIO {
        +_open()
        +_write(data)
        +exists()
        +delete()
    }
    class FileIO {
        +_read(count: int)
        +_write(data)
    }
    DataIO <|-- MemoryIO
    DataIO <|-- FileIO

    class DropParser {
        <<enumeration>>
        +PATH
        +DATAURL
        +UTF8
    }

    class DropLoader {
        +load_utf8(drop: DataDROP)
    }
Loading

File-Level Changes

Change Details Files
Added StringIO support in memory drops with type safety
  • Added string type handling in parse_pydata function
  • Modified initialize method to use StringIO based on pydata type
  • Added type checking when copying between MemoryDrops
  • Added handling for empty pydata values
daliuge-engine/dlg/data/drops/memory.py
Enhanced IO operations with string handling capabilities
  • Added string to bytes conversion for checksum calculation
  • Added StringIO support in _open method
  • Added UTF-8 encoding for string data in file operations
daliuge-engine/dlg/data/drops/data_base.py
daliuge-engine/dlg/data/io.py
Added UTF-8 parsing and loading functionality
  • Added UTF8 enum to DropParser
  • Implemented load_utf8 function for string data loading
  • Added UTF-8 handling in write_results method
daliuge-engine/dlg/named_port_utils.py
daliuge-engine/dlg/drop_loaders.py
daliuge-engine/dlg/apps/pyfunc.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@awicenec awicenec requested a review from myxie November 6, 2024 04:36
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @awicenec - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Please add unit tests covering the new StringIO functionality and type checking behavior between MemoryDrops to prevent potential regressions.
  • The new type checking behavior between MemoryDrops needs to be documented to prevent unexpected runtime failures. Please add documentation explaining this functionality to users.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

daliuge-engine/dlg/data/drops/memory.py Show resolved Hide resolved
if not isinstance(r, (bytes, memoryview, str)):
try:
r = str(r)
except: # not sure what the exception would be...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Avoid bare except clause

Bare except clauses can mask important errors. Consider catching specific exceptions like ValueError or TypeError.

Comment on lines +716 to +718
raise Exception(
"Can't write data of this type: ", type(r).__name__
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Raise a specific error instead of the general Exception or BaseException (raise-specific-error)

ExplanationIf a piece of code raises a specific exception type rather than the generic [`BaseException`](https://docs.python.org/3/library/exceptions.html#BaseException) or [`Exception`](https://docs.python.org/3/library/exceptions.html#Exception), the calling code can:
  • get more information about what type of error it is
  • define specific exception handling for it

This way, callers of the code can handle the error appropriately.

How can you solve this?

So instead of having code raising Exception or BaseException like

if incorrect_input(value):
    raise Exception("The input is incorrect")

you can have code raising a specific error like

if incorrect_input(value):
    raise ValueError("The input is incorrect")

or

class IncorrectInputError(Exception):
    pass


if incorrect_input(value):
    raise IncorrectInputError("The input is incorrect")

Comment on lines +725 to +727
raise Exception(
"Can't write data of this type: ", type(r).__name__
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (code-quality): Raise a specific error instead of the general Exception or BaseException (raise-specific-error)

ExplanationIf a piece of code raises a specific exception type rather than the generic [`BaseException`](https://docs.python.org/3/library/exceptions.html#BaseException) or [`Exception`](https://docs.python.org/3/library/exceptions.html#Exception), the calling code can:
  • get more information about what type of error it is
  • define specific exception handling for it

This way, callers of the code can handle the error appropriately.

How can you solve this?

So instead of having code raising Exception or BaseException like

if incorrect_input(value):
    raise Exception("The input is incorrect")

you can have code raising a specific error like

if incorrect_input(value):
    raise ValueError("The input is incorrect")

or

class IncorrectInputError(Exception):
    pass


if incorrect_input(value):
    raise IncorrectInputError("The input is incorrect")

@coveralls
Copy link

Coverage Status

coverage: 79.663% (-0.09%) from 79.752%
when pulling 208acd5 on liu-418
into c09b85e on master.

Copy link
Collaborator

@myxie myxie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @awicenec, I've added some suggestions for the Exceptions. Everything looks good.

if not isinstance(r, (bytes, memoryview, str)):
try:
r = str(r)
except: # not sure what the exception would be...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's quite unlikely that this lead to a exception/error, but my money would be a ValueError; this is the parent exception of UnicodeError which is probably the most likely error to be raised here.

A secondary, catch-all error I would raise here might be RuntimeError.

@@ -712,6 +712,19 @@ def write_results(self, result):
elif self.output_parser is DropParser.NPY:
drop_loaders.save_npy(o, r)
elif self.output_parser is DropParser.RAW:
if not isinstance(r, (bytes, memoryview, str)):
raise Exception(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these exceptions should be marked as TypeErrors.

r = str(r)
except: # not sure what the exception would be...
raise Exception(
"Can't write data of this type: ", type(r).__name__
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exception message is not technically correct - this exception is being raised because we can't convert r to type string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't want to convert to string here, we just want to make sure that we can write what was supplied on that port and is has to be one of these three types, without conversion, that is the whole point of 'RAW'.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the case, doesn't that make the r = str(r) conversion redundant? Either we want to raise the error because it isn't one of the three types it has to be, or we are happy converting it to str.

Copy link
Collaborator

@myxie myxie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awicenec the code is fine as it is if you'd like to get these changes merged. I think we can update the exception handling at a later date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants