Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python bindings - Bug sending 1 dimensional arrays #3503

Closed
stefaniereuter opened this issue Feb 23, 2023 · 8 comments
Closed

Python bindings - Bug sending 1 dimensional arrays #3503

stefaniereuter opened this issue Feb 23, 2023 · 8 comments
Assignees
Labels

Comments

@stefaniereuter
Copy link

Describe the bug
Using the dataman engine to send different numpy arrays, I encountered several different behaviours:

  1. If the array is multidimensional everything seems to work
  2. If trying to send an one dimensional array( e.g. [1 2 ] , the data I receive is random zeros [4.9e-324 9.9e-323]
  3. Trying to reshape the one dimensional array into a multidimensional array before sending results in two different behaviours.
  • If the original array is a random array (random.rand), I receive a correct result after reshaping
  • If the array is an arange array (e.g. np.arange(1,20))and I try to reshape I still get random zeros
  1. If I try to send an one dimensional random array (np.random.rand(1,20)) I don't receive any data and the values are still the pre initialized data values

To Reproduce

 
import numpy as np
from multiprocessing import Process, Pipe
import logging
import adios2

def thread_send(name:str):

    data = name.recv() #receive data from main thread
    shape = data.shape
    count = shape
    start = (0,)*len(shape)
    print(f"data on sender side \n{data!s}")

    adios_io = adios2.ADIOS()
    wan = adios_io.DeclareIO("Server")
    wan.SetEngine("Dataman")

    wan.SetParameters(
        {
            "IPAddress": "0.0.0.0",
            "Port": "12306",
            "Timeout": "5",
            "TransportMode": "reliable",
            "RendezvousReaderCount":"1",
        }
    )
    logging.info(f"Sender: initiating sending")
    writer = wan.Open("testdata_sender", adios2.Mode.Write)
    sendbuffer = wan.DefineVariable("np_data",data, shape, start, count, adios2.ConstantDims)
    if sendbuffer:
        writer.BeginStep()
        writer.Put(sendbuffer,data,adios2.Mode.Deferred)
        writer.EndStep()
    else:
        raise ValueError("DefineVariable failed")
    
    writer.Close()
  
    logging.info(f"Sender: sending finished")

def thread_receive(name:str):

    adios_io = adios2.ADIOS()
    wan = adios_io.DeclareIO("Client")
    wan.SetEngine("Dataman")
    wan.SetParameters(
        {
            "IPAddress": "0.0.0.0",
            "Port": "12306",
            "Timeout": "5",
            "TransportMode": "reliable",
            "RendezvousReaderCount":"1",
        }
    )   
    logging.info(f" Receiver: initiating receiving ")
    reader = wan.Open("testdata_receiver", adios2.Mode.Read)
    while True:
        stepStatus = reader.BeginStep()
        if stepStatus == adios2.StepStatus.OK:
            #inquire for variable
            recvar = wan.InquireVariable("np_data")
            if recvar:
                # determine the shape of the data that will be sent
                bufshape = recvar.Shape()
                # allocate buffer for now numpy
                data = np.ones(bufshape)
                #print(f"data before Get: \n{data!s}")
                reader.Get(recvar,data,adios2.Mode.Deferred)
                #print(f"data right after get This might be not right as data might not have been sent yet \n: {data!s}")
            else:
                raise ValueError(f"InquireVariable failed")
        elif stepStatus == adios2.StepStatus.EndOfStream:
            break
        else: 
            raise StopIteration(f"next step failed to initiate {stepStatus!s}")
        reader.EndStep()
        #print(f"After end step \n{data!s}")
    reader.Close()
    #print(f"after close \n {data!s}")
    logging.info(f" Receiver: finished receiving",)
    name.send(data)

"""
Different test data arrays. 
"""
data = np.random.rand(1,20) #no but doesn't even override the initialized data array therefore result is all 1. Does not receive data at all?
#data = np.arange(1,20) # receives random 0s
#data = np.arange(1,21).reshape(4,5) # compared to the random array reshape does not help here still receives random 0s
#data = np.full([4,5],7) # receives random 0s

#data = np.random.rand(1,20).reshape(4,5) #works
#data = np.ones([20,1])*7 #works
#data = np.ones([4,5]) #works
#data = np.random.rand(4,5) #works
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,
                    datefmt="%H:%M:%S")
master_proc, receiver_proc = Pipe()
sender_proc, master_proc2 = Pipe()
s = Process(target=thread_send, args=[sender_proc])
r = Process(target=thread_receive,args=[receiver_proc])
s.start()
r.start()
master_proc2.send(data)
data_r = master_proc.recv()
#data_r = None
print(f"data in master \n{data_r!s}")
r.join()
s.join()
assert np.array_equal(data, data_r)

Expected behavior
To receive the data I sent

Desktop (please complete the following information):

  • OS/Platform: Ubuntu 22.04
  • Build : conda create -n <name>-c conda-forge adios2 numpy mpi4py
  • Numpy version: 1.23.5 and 1.24.2
  • adios2 version: 2.8.3
@eisenhauer
Copy link
Member

Hmm. I'll see if I can reproduce. Should be able to switch to SST in this scenario for comparison and maybe get some more info as well.

@eisenhauer
Copy link
Member

OK, tried with the python 3.10.8 on my laptop and get some variants of the error below (despite trying to add freeze_support() to fix things). I'll try on some other platform when I get a chance, but this has me wondering if there's some python oddness happening. (Most of the dataman tests are with single-dimensional arrays, so it seems unlikely that something that basic is broken in dataman. But I've been surprised before.)

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.    self._launch(process_obj)

@dmitry-ganyushin
Copy link
Contributor

I can reproduce as reported with python 3.8

@stefaniereuter
Copy link
Author

Sorry for late reply. I'm currently running with Python 3.11 in a jupyter lab. Thanks for looking into it.

@eisenhauer
Copy link
Member

@dmitry-ganyushin Since you can reproduce, can I ask you to poke at this a bit? I'd be interested to know what happens when you switch to sst from dataman. If you get basically the same thing, then I'd worry about issues with the python interface, and I've really not looked at that at all. (Honestly, with a bit of enforced serialization, one could try this basic code with BP4 or BP5 files. Just force the file write to happen before the read happens. Then we'd have the file left around to run bpls over. That would narrow down the problem to the read side or write side, implicate or eliminate the engine, etc.)

@stefaniereuter
Copy link
Author

Hi @dmitry-ganyushin were you able to find the problem? I have not tried SST but can if it would help or did you try that already?

@dmitry-ganyushin
Copy link
Contributor

Thanks you for reporting this issue. It is fixed and it should be in the release 2.9. Maybe we could make a patch for 2.8.3 if you cannot wait.

@eisenhauer
Copy link
Member

Thanks you for reporting this issue. It is fixed and it should be in the release 2.9. Maybe we could make a patch for 2.8.3 if you cannot wait.

And the bug was specific to dataman, so using SST should be a working alternative.

vicentebolea pushed a commit to dmitry-ganyushin/ADIOS2 that referenced this issue Mar 10, 2023
vicentebolea pushed a commit to dmitry-ganyushin/ADIOS2 that referenced this issue Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants