Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

campaign issues #4061

Closed
pnorbert opened this issue Feb 29, 2024 · 6 comments
Closed

campaign issues #4061

pnorbert opened this issue Feb 29, 2024 · 6 comments
Assignees
Milestone

Comments

@pnorbert
Copy link
Contributor

First access to campaign has debug prints, that should not be in master:

$ bpls -la ~/.campaign-store/test_s3d_31.aca
hostname: LAP131864.ornl.gov
-- Retrieved from DB data of mmd.0 size = 599 compressed = 1 compressed size = 599 original size = 1432 blob = 0x55d3aecf0568
-- Retrieved from DB data of md.idx size = 11045 compressed = 1 compressed size = 11045 original size = 1844320 blob = 0x55d3aecf0df8
-- Retrieved from DB data of md.0 size = 27249181 compressed = 1 compressed size = 27249181 original size = 248832248 blob = 0x7fe0f629f018
-- Retrieved from DB data of profiling.json size = 303219 compressed = 1 compressed size = 303219 original size = 3458742 blob = 0x55d3aeef09a8
Errno was 111
  double   ../data/ptj.field.bp/pressure  31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/species   31*{19, 2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/temp      31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/uvel      31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/vvel      31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/wvel      31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308

There is also a strange Errno 111 error printed, but it's not obvious there is an error.

adios@LAP131864:~/test/demo.uk_meeting$ bpls -la ~/.campaign-store/test_s3d_31.aca
hostname: LAP131864.ornl.gov
Errno was 111
  double   ../data/ptj.field.bp/pressure  31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/species   31*{19, 2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/temp      31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/uvel      31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/vvel      31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308
  double   ../data/ptj.field.bp/wvel      31*{2560, 960, 3456} = 1.79769e+308 / 2.22507e-308

Third,
adios2_campaign_manager.py has too many arguments, -p, -a, -s should be just a single full name. Support for full path to a campaign file would also be nicer, to work like bpls in the above example.

@pnorbert pnorbert added this to the v2.10.0 milestone Feb 29, 2024
@pnorbert
Copy link
Contributor Author

PR #4048 can be added to release 2.10 only if the first two issues are fixed.

@pnorbert
Copy link
Contributor Author

pnorbert commented Mar 1, 2024

The second issue about Errno 111 is that the remote server is cannot be connected (I forgot to launch it). We need a good error message here

@dmitry-ganyushin
Copy link
Contributor

The second issue about Errno 111 is that the remote server is cannot be connected (I forgot to launch it). We need a good error message here

This comes from evpath

	printf("Errno was %d\n", errno);

but the evpath library has very limited support.

@eisenhauer
Copy link
Member

The nice thing about EVPath is that it can have all the support we need it to have... Yeah, the printf() needs to be removed and the lack of a connection caught at the ADIOS2 level. The errno printout snuck in there when I was debugging something relatively recently. The not throwing an error is the real bug. Will sort that soon. I'm not sure we can get (or need) a better error than "connection to remote server failed". About the only other thing we might detect with more support from EVPath would be if you specified a hostname that couldn't be translated via DNS or something like that.

@dmitry-ganyushin
Copy link
Contributor

The following code is pretty common:

void Remote::Open(const std::string hostname, const int32_t port, const std::string filename,
                  const Mode mode, bool RowMajorOrdering)
{

    m_conn = CMinitiate_conn(ev_state.cm, contact_list);
    if (!m_conn)
        return;

The result is the same in both cases if connection is successful and not successful.

@eisenhauer
Copy link
Member

Well, not quite the same if you look at more than the first two lines... If we complete the Open() the m_Active member (which defaults to false) is set to true and the Remote's boolean operator returns the value of m_Active, so we know when the open succeeds. The boolean operator is used later to know if we can/should use the remote connection.

@pnorbert pnorbert closed this as completed Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants