Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue2798 - BOV Reader #2809

Merged
merged 30 commits into from
Jul 22, 2021
Merged

Issue2798 - BOV Reader #2809

merged 30 commits into from
Jul 22, 2021

Conversation

sgpearse
Copy link
Collaborator

@sgpearse sgpearse commented Jul 14, 2021

Fixes #2798 and fixes #2797 - bug6.bov and bug5.bov
Fixed by trimming white space after values are read, at lines 386, 420, and 463 in the _findToken template functions.

Fixes #2792 and #2790- bug4.bov and bug2.bov
Fixed by enforcing the DATA_ENDIAN value to be either BIG or LITTLE, and adding a sanity check that compares the grid size * data format size, to the size of the binary file being loaded.

@shaomeng The sanity check is in place, but I believe your bug report has the correctly sized data file for the header description. 128x512x256x8(double) = 134,217,728 bytes, which is the size of /glade/p/cisl/vast/vapor/Bugs/BOV_Bugs/Bx.bot.

#2791 Is not a bug - bug3.bov
This regards #2799, where the maximum timestep of DataStatus is not honored. FYI, the last values reported in a BOV file are registered as "true values". If you define TIME twice, the second one will overwrite the first one. This will be documented.

Fixes #2788 - bug1.bov
Fixed by correcting the count of bytes passed into _swapBytes()

Fixes #2789 and #2784 - Relative file paths
Fixed by prepending the BOV header file path to the data file path when realpath() could not be evaluated for the data file.

@sgpearse sgpearse marked this pull request as draft July 15, 2021 00:15
@sgpearse sgpearse marked this pull request as ready for review July 15, 2021 14:38
@sgpearse sgpearse requested review from shaomeng, StasJ and clyne and removed request for shaomeng July 15, 2021 14:54
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
@sgpearse sgpearse changed the title Issue2798 Issue2798 - BOV Reader Jul 19, 2021
lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@clyne clyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still a few more things that need to be addressed.

@shaomeng
Copy link
Contributor

//! The following BOV tags are mandatory for Vapor to ingest data:
//! - DATA_FILE
//! - DATA_SIZE
//! - DATA_FORMAT
//!
//! The following BOV tags are optional:
//! - BRICK_ORIGIN
//! - BRICK_SIZE
//! - TIME
//! - VARIABLE

I suppose when the optional tags ain't provided, a default value will be assigned? If so, can you document what the default value is for each optional tag?

@sgpearse
Copy link
Collaborator Author

I suppose when the optional tags ain't provided, a default value will be assigned? If so, can you document what the default value is for each optional tag?

Sure, I've put it in the DCBOV's class description in the header file.

@sgpearse
Copy link
Collaborator Author

What about BYTE_OFFSET?

Added to the optional tag list.

Copy link
Contributor

@shaomeng shaomeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more Q in the file DCBOV.cpp:

  1. shouldn't the macro be WIN32 or _WIN32?

    #ifdef _WINDOWS

  2. When the initialize function called twice, will the previous BOVCollection object get lost?

    VAPOR/lib/vdc/DCBOV.cpp

    Lines 37 to 40 in e599b72

    int DCBOV::initialize(const vector<string> &paths, const std::vector<string> &options)
    {
    _bovCollection = new BOVCollection();
    int rc = _bovCollection->Initialize(paths);

lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
for (size_t k = min[2]; k <= max[2]; k++) {
size_t zOffset = _gridSize[0] * _gridSize[1] * k;
for (size_t j = min[1]; j <= max[1]; j++) {
size_t xOffset = min[0];
size_t yOffset = _gridSize[0] * j;
size_t offset = formatSize * (xOffset + yOffset + zOffset) + _byteOffset;

fseek(fp, offset, SEEK_SET);
size_t rc = fread(readBuffer, formatSize, count, fp);
size_t rc = fseek(fp, offset, SEEK_SET);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scott, the function fseek returns a type of signed int. Please, don't do a signed -> unsigned conversion, please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after you fix this, please check other function return types, and make sure you don't make unintended conversions between signed and unsigned integers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Fixed. Pushing soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scott, 5 lines down, fread() will return a size_t type. Can you declare a separate variable with the proper type instead of re-using the same variable?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

include/vapor/DCBOV.h Outdated Show resolved Hide resolved
@sgpearse
Copy link
Collaborator Author

sgpearse commented Jul 21, 2021

@shaomeng regarding your two questions in DCBOV:

shouldn't the macro be WIN32 or _WIN32?

I'm not sure, but this is how the other DC classes are configuring themselves for windows.

When the initialize function called twice, will the previous BOVCollection object get lost?

@clyne may have some information on this, but my understanding is that each DC class (DCBOV, DCCF, DCWRF, etc) has a single "Collection" object that only gets initialized once.

@sgpearse
Copy link
Collaborator Author

I suppose when the optional tags ain't provided, a default value will be assigned? If so, can you document what the default value is for each optional tag?

I've added the defaults for the optional tokens in the DCBOV.h documentation.

@sgpearse sgpearse requested review from clyne and shaomeng July 21, 2021 21:14
@clyne
Copy link
Collaborator

clyne commented Jul 21, 2021

@shaomeng regarding your two questions in DCBOV:

shouldn't the macro be WIN32 or _WIN32?

I'm not sure, but this is how the other DC classes are configuring themselves for windows.

When the initialize function called twice, will the previous BOVCollection object get lost?

@clyne may have some information on this, but my understanding is that each DC class (DCBOV, DCCF, DCWRF, etc) has a single "Collection" object that only gets initialized once.

Initialize() will reset the state each time it is called. That is the intended behavior.

@clyne
Copy link
Collaborator

clyne commented Jul 21, 2021

@shaomeng have all of your concerns been addressed here so we can accept this?

@shaomeng
Copy link
Contributor

When the initialize function called twice, will the previous BOVCollection object get lost?

@clyne may have some information on this, but my understanding is that each DC class (DCBOV, DCCF, DCWRF, etc) has a single "Collection" object that only gets initialized once.

The answer is yes. When initialize is called the 2nd time, line 39 will create a new object and the old object is lost, resulting in a memory leak. Please do one of the following:

  1. properly delete the old object,
  2. simply wrap the pointer in a unique_ptr so you don't need to worry about it anymore.

@shaomeng
Copy link
Contributor

@shaomeng have all of your concerns been addressed here so we can accept this?

No, not addressed yet.

Copy link
Contributor

@shaomeng shaomeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q to Scott: I cannot understand the logic of using two variables to store the same token, and what problems that the logic tries to catch. Can you maybe use _tmpDataFormat and _dataFormat as an example to illustrate a problem that they will catch?

lib/vdc/BOVCollection.cpp Outdated Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Show resolved Hide resolved
lib/vdc/BOVCollection.cpp Show resolved Hide resolved
@sgpearse
Copy link
Collaborator Author

Q to Scott: I cannot understand the logic of using two variables to store the same token, and what problems that the logic tries to catch. Can you maybe use _tmpDataFormat and _dataFormat as an example to illustrate a problem that they will catch?

The _tmp* values need to be consistent across all BOV files. IE - We can't have a single dataset that has DATA_FORMAT headers of both FLOAT and DOUBLE. They need to be consistent among datasets.

When we read each BOV header, we assign a temporary variable (IE - _tmpDataFormat) to the current file being parsed. After parsing, we do validation. We check if this variable has already been assigned. If not, we assign the _tmpDataFormat to _dataFormat. If it has been assigned, and it disagrees with a previous file, we throw an error.

@shaomeng
Copy link
Contributor

Q to Scott: I cannot understand the logic of using two variables to store the same token, and what problems that the logic tries to catch. Can you maybe use _tmpDataFormat and _dataFormat as an example to illustrate a problem that they will catch?

The _tmp* values need to be consistent across all BOV files. IE - We can't have a single dataset that has DATA_FORMAT headers of both FLOAT and DOUBLE. They need to be consistent among datasets.

When we read each BOV header, we assign a temporary variable (IE - _tmpDataFormat) to the current file being parsed. After parsing, we do validation. We check if this variable has already been assigned. If not, we assign the _tmpDataFormat to _dataFormat. If it has been assigned, and it disagrees with a previous file, we throw an error.

Got it, thanks!

@sgpearse sgpearse merged commit 3ad55a6 into main Jul 22, 2021
@shaomeng shaomeng deleted the issue2798 branch July 22, 2021 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants