-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for StringIO #295
base: master
Are you sure you want to change the base?
Support for StringIO #295
Conversation
Reviewer's Guide by SourceryThis PR adds support for StringIO to handle string data alongside BytesIO in the DALiuGE engine. The implementation includes type-safe handling of pydata fields, preservation of data types when copying between drops, and UTF-8 string handling capabilities. The changes span across multiple components including data drops, IO handling, and data parsing. Sequence diagram for data reading and writing with StringIOsequenceDiagram
participant User
participant DataDROP
participant DataIO
participant StringIO
User->>DataDROP: Request to read data
DataDROP->>DataIO: getIO()
DataIO->>StringIO: open(OpenMode.OPEN_READ)
StringIO-->>DataIO: Return data
DataIO-->>DataDROP: Return data
DataDROP-->>User: Return data
User->>DataDROP: Request to write data
DataDROP->>DataIO: getIO()
DataIO->>StringIO: open(OpenMode.OPEN_WRITE)
StringIO-->>DataIO: Write data
DataIO-->>DataDROP: Confirm write
DataDROP-->>User: Confirm write
Class diagram for updated IO handlingclassDiagram
class DataIO {
+open(mode: OpenMode)
+read(count: int)
+write(data)
+close()
}
class MemoryIO {
+_open()
+_write(data)
+exists()
+delete()
}
class FileIO {
+_read(count: int)
+_write(data)
}
DataIO <|-- MemoryIO
DataIO <|-- FileIO
class DropParser {
<<enumeration>>
+PATH
+DATAURL
+UTF8
}
class DropLoader {
+load_utf8(drop: DataDROP)
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @awicenec - I've reviewed your changes - here's some feedback:
Overall Comments:
- Please add unit tests covering the new StringIO functionality and type checking behavior between MemoryDrops to prevent potential regressions.
- The new type checking behavior between MemoryDrops needs to be documented to prevent unexpected runtime failures. Please add documentation explaining this functionality to users.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if not isinstance(r, (bytes, memoryview, str)): | ||
try: | ||
r = str(r) | ||
except: # not sure what the exception would be... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Avoid bare except clause
Bare except clauses can mask important errors. Consider catching specific exceptions like ValueError or TypeError.
raise Exception( | ||
"Can't write data of this type: ", type(r).__name__ | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Raise a specific error instead of the general Exception
or BaseException
(raise-specific-error
)
Explanation
If a piece of code raises a specific exception type rather than the generic [`BaseException`](https://docs.python.org/3/library/exceptions.html#BaseException) or [`Exception`](https://docs.python.org/3/library/exceptions.html#Exception), the calling code can:- get more information about what type of error it is
- define specific exception handling for it
This way, callers of the code can handle the error appropriately.
How can you solve this?
- Use one of the built-in exceptions of the standard library.
- Define your own error class that subclasses
Exception
.
So instead of having code raising Exception
or BaseException
like
if incorrect_input(value):
raise Exception("The input is incorrect")
you can have code raising a specific error like
if incorrect_input(value):
raise ValueError("The input is incorrect")
or
class IncorrectInputError(Exception):
pass
if incorrect_input(value):
raise IncorrectInputError("The input is incorrect")
raise Exception( | ||
"Can't write data of this type: ", type(r).__name__ | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Raise a specific error instead of the general Exception
or BaseException
(raise-specific-error
)
Explanation
If a piece of code raises a specific exception type rather than the generic [`BaseException`](https://docs.python.org/3/library/exceptions.html#BaseException) or [`Exception`](https://docs.python.org/3/library/exceptions.html#Exception), the calling code can:- get more information about what type of error it is
- define specific exception handling for it
This way, callers of the code can handle the error appropriately.
How can you solve this?
- Use one of the built-in exceptions of the standard library.
- Define your own error class that subclasses
Exception
.
So instead of having code raising Exception
or BaseException
like
if incorrect_input(value):
raise Exception("The input is incorrect")
you can have code raising a specific error like
if incorrect_input(value):
raise ValueError("The input is incorrect")
or
class IncorrectInputError(Exception):
pass
if incorrect_input(value):
raise IncorrectInputError("The input is incorrect")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @awicenec, I've added some suggestions for the Exceptions. Everything looks good.
if not isinstance(r, (bytes, memoryview, str)): | ||
try: | ||
r = str(r) | ||
except: # not sure what the exception would be... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's quite unlikely that this lead to a exception/error, but my money would be a ValueError
; this is the parent exception of UnicodeError
which is probably the most likely error to be raised here.
A secondary, catch-all error I would raise here might be RuntimeError
.
@@ -712,6 +712,19 @@ def write_results(self, result): | |||
elif self.output_parser is DropParser.NPY: | |||
drop_loaders.save_npy(o, r) | |||
elif self.output_parser is DropParser.RAW: | |||
if not isinstance(r, (bytes, memoryview, str)): | |||
raise Exception( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these exceptions should be marked as TypeErrors.
r = str(r) | ||
except: # not sure what the exception would be... | ||
raise Exception( | ||
"Can't write data of this type: ", type(r).__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exception message is not technically correct - this exception is being raised because we can't convert r
to type string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we don't want to convert to string here, we just want to make sure that we can write what was supplied on that port and is has to be one of these three types, without conversion, that is the whole point of 'RAW'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that is the case, doesn't that make the r = str(r)
conversion redundant? Either we want to raise the error because it isn't one of the three types it has to be, or we are happy converting it to str.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@awicenec the code is fine as it is if you'd like to get these changes merged. I think we can update the exception handling at a later date.
JIRA Ticket
liu-418
Type
Problem/Issue
Provide ability to read and write plain strings if requested by the user.
Solution
The basic solution is to use StringIO rather than BytesIO if requested by the user, but the IO implementation of DALiuGE is quite complex and requires a number of changes to make this happen. As a side-effect the functionality to provide data for a memory drop through the pydata field is now sort of type-safe and obeying the type specified in the pull-down menu. Moreover, when copying from one drop to another the original type of the data is preserved. As a side effect, when copying from one MemoryDrop to another the types of those drops need to match. This is a bit of a hidden feature right now and can lead to unexpected graph execution failures, since the way to specify the type of the target MemoryDrop is by using the pydata type pull-down menu, but leaving the pydata value empty. We might want to think abut a more explicit way of specifying the type of a MemoryDrop in the future.
Checklist
No specific documentation has been added.
Summary by Sourcery
Add support for StringIO to enable reading and writing of plain strings, enhancing type safety and data type preservation during data operations.
New Features:
Enhancements: