-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting up #2769 to diagnose CI issues [WIP] #2773
Conversation
This pull request has been linked to Shortcut Story #12282: Expand R-UDF validation coverage: make sure essential API calls work. |
b3d328a
to
3071939
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm all in favor of converting the dump function to C++ streams. I am far less in favor of having them implemented side by side with C I/O. We only have dump functions publicly visible for four of the schema types: array_schema, attribute, domain, dimension. The API functions themselves (in tiledb.cc) are the place to do the conversion; everything else can be done with streams. That means eliminating all dump
functions with FILE *
arguments in array schema classes; the ones in the API (obviously) have to stay.
fprintf(out, "%s", ss.str().c_str()); | ||
} | ||
|
||
void Dimension::dump_ss(std::stringstream& ss) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If we were keeping the old
dump
function, there would be no need for a different function name. Argument-dependent resolution would work just fine. - There's no need to specialize for string streams.
std::ostream
will work just as well.
@@ -76,6 +76,9 @@ class Add1InPlace : public tiledb::sm::Filter { | |||
void dump(FILE* out) const override { | |||
(void)out; | |||
} | |||
void dump_ss(std::stringstream& ss) const override { | |||
(void)ss; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you simply omit the argument name ss
, leaving the type, you don't need to dummy out a value to avoid an unused argument warning.
@@ -222,23 +222,29 @@ const Range& Dimension::domain() const { | |||
void Dimension::dump(FILE* out) const { | |||
if (out == nullptr) | |||
out = stdout; | |||
std::stringstream ss; | |||
dump_ss(ss); | |||
fprintf(out, "%s", ss.str().c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- No need for
fprintf
. A call tofwrite
is all that's needed. - Should call
fflush
at the end, since this is the top-level call.
* @return `TILEDB_OK` for success and `TILEDB_ERR` for error. | ||
*/ | ||
TILEDB_EXPORT int32_t tiledb_array_schema_dump_str( | ||
tiledb_ctx_t* ctx, const tiledb_array_schema_t* array_schema, char** out); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API has a security defect. There's no maximum length specified.
Personally, I'd rather not have our C API writing directly in user-allocated buffers. That pattern has created lots of systemic mischief (as with user-allocated query buffers). Better would be to have an API for strings with an alloc/free model and have the library manage the memory. We don't have such library-allocated strings ready today, however. (We're needing one already for better error reporting and for delivering warnings, for example.)
We will shortly, with #2748, have experimental features available. This API call could be put into experimental/
and used immediately pending a library-allocated string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @eric-hughes-tiledb -- I should add a comment, it does not write into user-allocated buffers. It allocates. (I was, perhaps naively, following the patter of tiledb_stats_raw_dump
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the deallocation handled?
Regardless of how it's handled, if there's an allocation here, the documentation for this function needs to (1) state that, and (2) say how to deallocate.
@eric-hughes-tiledb indeed. This draft PR is the outcome of a planning conversation with @ihnorton -- the problem at hand being, how to make Jupyter-notebook display of One choice is doing this re-work in core -- which, as you point out, is open-ended. There are more spots to convert from Another choice is that I abandon this PR, and imitate what Initially I had thought that the first option -- this PR -- was the more elegant. But perhaps I'm mistaken; perhaps I should go with option two in Thoughts? |
The bulk of what you've got here, to use C++ streams instead of FILE, is worth doing regardless. If you scope the work to only deal with array schema classes, that would be fine. We're concentrating on array schema right now for other reasons; it's fine to limit work to this boundary. The only question I've got is how we're returning internally-allocated objects through the C API. I've got a plan to keep a registry of all objects that we return this way (of any type, from any source), but that project is not quite ripe yet. (It's for lifespan management, detection of C client leaks, etc.) It's the reason I suggested putting the new API function into The much, much larger issue is the best way to implement this is with a generic schema walker that uses the visitor pattern. (I've got an independent reason for wanting this.) With such an interface available, you could implement your visitor completely externally and print out the schema however you wanted, all without hard-coding a particular dump format in the core library. |
I should also mention that actually implementing this would force awareness of something that's not on the surface of the whole issue of dumping schemas: the schema is version dependent. In general, each schema version needs a separate schema dump. Formats don't change rapidly, so it's possible to have a single piece of code that dumps multiple versions, but that's not the point. Older format versions have implicit defaults (such as for nullable attributes). The "schema" that's dumped is actually two slightly different things: (1) the versioned schema as it sits in storage, and (2) the schema in memory that's upgraded from what's in storage. Getting a schema dump may be for a purpose of either (1) or (2) or both, so if there's a single dump function, it should have both kinds of information in it. Perhaps-better is a dump facility that allows all three of these possibilities. |
This will be rolled in to other C API improvements, thanks for getting the ball rolling here. |
The array-schema dump method writes to
stdout
which is fine for many uses. In notebook contexts, though, it isn't fine --stdout
does not route to anything the notebook user can see. Notebook display ofarr.schema
works for Python because the Python library works around this by rolling its own array-schema dumper in its__repr__
method forArraySchema
here. Notebook display ofschema(arr)
does not work for R because the R library, in itsshow
generic forArraySchema
, directly invokes the core function which prints to stdout here. Rather than taking the roll-another-of-our-own approach inTileDB-R
(which could be done), on this PR we address the issue at the core level.On #2769 I am getting too many CI errors for me to understand. Here I'm starting the commits bottom-up: filters, then dimensions, then domains, then attributes, then array schema at
c_api
/cpp_api
, then array schema at topsm
level.TYPE: IMPROVEMENT
DESC: Create array-schema-to-string methods in support of notebook display of array schemas