Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataman additional bug in 1dimensional arrays #3538

Closed
stefaniereuter opened this issue Mar 15, 2023 · 8 comments
Closed

Dataman additional bug in 1dimensional arrays #3538

stefaniereuter opened this issue Mar 15, 2023 · 8 comments

Comments

@stefaniereuter
Copy link

stefaniereuter commented Mar 15, 2023

Thanks @dmitry-ganyushin for having a look into the python 1 dimensional arrays. I had a play with your commit and found that part of the problem (#3503) still exists.
I can now successfully send 1 dimensional float arrays. It seems that there is still a problem if the data type is not np.float64.

Describe the bug

  • Sending numpy.random.rand(3) #works
  • Sending numpy.random.randint(2,size=3) # receives wrong data
  • Sending numpy.arange(4) # receives wrong data
  • Sending numpy.arange(4,dtype = numpy.float64) # works
  • Sending numpy.arange(4,dtype = numpy.float32) #receives wrong data
  • Sending numpy.arange(4,dtype = numpy.float128) #receives wrong data
  • Sending numpy.arange(4,dtype=int) #receives wrong data

To Reproduce

import numpy as np
from multiprocessing import Process, Pipe
import logging
import adios2

def thread_send(name:str):

    data = name.recv() #receive data from main thread
    shape = data.shape
    count = shape
    start = (0,)*len(shape)
    print(f"data on sender side \n{data!s}")

    adios_io = adios2.ADIOS()
    wan = adios_io.DeclareIO("Server")
    wan.SetEngine("Dataman")

    wan.SetParameters(
        {
            "IPAddress": "0.0.0.0",
            "Port": "12306",
            "Timeout": "5",
            "TransportMode": "reliable",
            "RendezvousReaderCount":"1",
        }
    )
    logging.info(f"Sender: initiating sending")
    writer = wan.Open("testdata_sender", adios2.Mode.Write)
    sendbuffer = wan.DefineVariable("np_data",data, shape, start, count, adios2.ConstantDims)
    if sendbuffer:
        writer.BeginStep()
        writer.Put(sendbuffer,data,adios2.Mode.Deferred)
        writer.EndStep()
    else:
        raise ValueError("DefineVariable failed")
    
    writer.Close()
  
    logging.info(f"Sender: sending finished")

def thread_receive(name:str):

    adios_io = adios2.ADIOS()
    wan = adios_io.DeclareIO("Client")
    wan.SetEngine("Dataman")
    wan.SetParameters(
        {
            "IPAddress": "0.0.0.0",
            "Port": "12306",
            "Timeout": "5",
            "TransportMode": "reliable",
            "RendezvousReaderCount":"1",
        }
    )   
    logging.info(f" Receiver: initiating receiving ")
    reader = wan.Open("testdata_receiver", adios2.Mode.Read)
    while True:
        stepStatus = reader.BeginStep()
        if stepStatus == adios2.StepStatus.OK:
            #inquire for variable
            recvar = wan.InquireVariable("np_data")
            if recvar:
                # determine the shape of the data that will be sent
                bufshape = recvar.Shape()
                # allocate buffer for now numpy
                data = np.ones(bufshape)
                #print(f"data before Get: \n{data!s}")
                reader.Get(recvar,data,adios2.Mode.Deferred)
                #print(f"data right after get This might be not right as data might not have been sent yet \n: {data!s}")
            else:
                raise ValueError(f"InquireVariable failed")
        elif stepStatus == adios2.StepStatus.EndOfStream:
            break
        else: 
            raise StopIteration(f"next step failed to initiate {stepStatus!s}")
        reader.EndStep()
        #print(f"After end step \n{data!s}")
    reader.Close()
    #print(f"after close \n {data!s}")
    logging.info(f" Receiver: finished receiving",)
    name.send(data)

"""
Different test data arrays. 
"""
data2 = np.arange(3) #receives wrong data
#data2 = np.arange(3,dtype=np.float32) # receives wrong data
#data2 = np.arange(3,dtype=np.float64) # works
#data2 = np.arange(3,dtype=np.float128) #receives wrong data
#data2 = np.arange(3,dtype=int) # receives wrong data
#data2 = np.random.randint(2, size=10) #receives wrong data

print(f"data dtypes: {data2.dtype!s}")

format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,
                    datefmt="%H:%M:%S")
master_proc, receiver_proc = Pipe()
sender_proc, master_proc2 = Pipe()
s = Process(target=thread_send, args=[sender_proc])
r = Process(target=thread_receive,args=[receiver_proc])
s.start()
r.start()
master_proc2.send(data2)
data_r = master_proc.recv()
#data_r = None
print(f"data in master \n{data_r!s}")
r.join()
s.join()
assert np.array_equal(data2, data_r)

Expected behavior
Send all data types correctly

Desktop (please complete the following information):

  • OS/Platform: Ubuntu 22.04
  • Build: local build of release_29 branch with (the added fix from b3503-fix-release_29)
  • Python 3.10
  • numpy 1.23.4

Note
This might be related to problems that I had with sending strings. Although I have not yet looked into this more. So it might be completely unrelated

@eisenhauer
Copy link
Member

Hmm. There may be multiple bugs coming in to play here. Can I ask you to try this with the SST engine? We were pretty sure that this was a dataman-specific problem, but it would be good to be sure.

@stefaniereuter
Copy link
Author

stefaniereuter commented Mar 15, 2023 via email

@stefaniereuter
Copy link
Author

stefaniereuter commented Mar 15, 2023

Hi Greg I have the same problem with the SST engine.

import numpy as np
from multiprocessing import Process, Pipe
import logging
import adios2

def thread_send(name:str):

    data = name.recv() #receive data from main thread
    shape = data.shape
    count = shape
    start = (0,)*len(shape)
    print(f"data on sender side \n{data!s}")

    adios_io = adios2.ADIOS()
    sstIO = adios_io.DeclareIO("Server")
    sstIO.SetEngine("SST")

    logging.info(f"Sender: initiating sending")
    writer = sstIO.Open("testdatafile", adios2.Mode.Write)
    sendbuffer = sstIO.DefineVariable("np_data",data, shape, start, count, adios2.ConstantDims)
    if sendbuffer:
        writer.BeginStep()
        writer.Put(sendbuffer,data,adios2.Mode.Sync)
        writer.EndStep()
    else:
        raise ValueError("DefineVariable failed")
    
    writer.Close()
  
    logging.info(f"Sender: sending finished")

def thread_receive(name:str):

    adios_io = adios2.ADIOS()
    sstio = adios_io.DeclareIO("Client")
    sstio.SetEngine("SST")
 
    logging.info(f" Receiver: initiating receiving ")
    reader = sstio.Open("testdatafile", adios2.Mode.Read)
    while True:
        stepStatus = reader.BeginStep()
        if stepStatus == adios2.StepStatus.OK:
            #inquire for variable
            recvar = sstio.InquireVariable("np_data")
            if recvar:
                # determine the shape of the data that will be sent
                bufshape = recvar.Shape()
                # allocate buffer for new numpy
                data = np.ones(bufshape)
                reader.Get(recvar,data,adios2.Mode.Sync)
            else:
                raise ValueError(f"InquireVariable failed")
        elif stepStatus == adios2.StepStatus.EndOfStream:
            break
        else: 
            raise StopIteration(f"next step failed to initiate {stepStatus!s}")
        reader.EndStep()
    reader.Close()
    logging.info(f" Receiver: finished receiving",)
    name.send(data)

"""
Different test data arrays. 
"""

print("====================================")

#data2 = np.arange(3) #receives wrong data
#data2 = np.arange(3,dtype=np.float32) # receives wrong data
data2 = np.arange(3,dtype=np.float64) # works
#data2 = np.arange(3,dtype=np.float128) #receives wrong data
#data2 = np.arange(3,dtype=int) # receives wrong data
#data2 = np.random.randint(2, size=10) #receives wrong data
print(f"data dtypes: {data2.dtype!s}")
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,
                    datefmt="%H:%M:%S")
master_proc, receiver_proc = Pipe()
sender_proc, master_proc2 = Pipe()
s = Process(target=thread_send, args=[sender_proc])
r = Process(target=thread_receive,args=[receiver_proc])
s.start()
r.start()
master_proc2.send(data2)
data_r = master_proc.recv()
#data_r = None
print(f"data in master \n{data_r!s}")
r.join()
s.join()
assert np.array_equal(data2, data_r)

@eisenhauer
Copy link
Member

Thanks for checking. Dataman and SST don't share a lot at the lower levels, so it seems likely that something might be going on with the python bindings. Let me try again to see if I can kick an available python environment into cooperating enough that I can debug this...

@eisenhauer
Copy link
Member

OK, it took me a while to get Jupyter up and running ADIOS and to massage your example to where Python 3.10 was happy with it. The underlying problem is on the receiving side and is maybe conceptual with the python bindings. In particular this bit:

                data = np.ones(bufshape)
                reader.Get(recvar,data,adios2.Mode.Sync)

The problem is that 'np.ones(bufshape)' creates a numpy array of type float64. Then reader.Get() takes the address of that buffer and dumps in whatever datatype that the variable received happens to be. If that happens to be anything other than float64, you get what looks like bad data. If you actually control type of the receiving buffer to match the incoming data, this works fine. For example, if I switch to data2 = np.arange(3,dtype=np.float32) and then take care to make sure to specify the same type in the np.ones() call like this:

                typ = recvar.Type()
                print("Variable type is " + typ)
                # allocate buffer for new numpy
                data = np.ones(bufshape,dtype=np.float32)  
                print(f"received buffer  dtypes: {data.dtype!s}")     
                reader.Get(recvar,data,adios2.Mode.Sync)

I get reasonable output:

Variable type is float
received buffer  dtypes: float32
data in master 
[0. 1. 2.]

So, this should work properly for you if you make the types match using a priori knowledge of the datatypes on both sides. Or you could use the information from recvar.Type() to do this dynamically. I think the open question is : Could ADIOS do this better in Python? In C++, type safety prevents you from passing in a buffer of the wrong type because InquireVariable and Get are templated. Python types are obviously more malleable. Could our Python bindings detect this type mismatch and do something better? Presumably we could look at the type of the buffer passed in and the type of the data arriving and at least toss an exception. Could we instead transmorgify the original type of the buffer into something more appropriate? Should Get() in Python simply return a buffer instead of expecting one to be passed into it? I'm not enough of a Python guy to really know the answers to those questions, but it seems like something we should be looking at...

@eisenhauer
Copy link
Member

Stefanie, is this solved to your satisfaction? As of 2.9.0 we'll at least throw an exception when we've got a type mismatch. We can think about doing better and would be happy to hear recommendations on better semantics for Get() WRT buffer management.

@stefaniereuter
Copy link
Author

Hi Greg, yes thank you. I don't know if you got my last email so I'll copy that in here again:

thanks for taking the time. I'm sorry that the type mismatch in Python didn't even occur to me. I have now changed my code to use the information from recvar.Type(). The only problem that I had was for example that recvar.Type() for an int64 would return int64_t which is the c++ type but in python I had to generate a dictionary to convert it to a type that numpy understands and I can't just use .Type() directly.

To be honest I'm also not an python expert but from my experience a get method that would return a buffer instead of expecting on as an argument would be preferable. But an exception is really helpful too.

thank you

@eisenhauer
Copy link
Member

I don't think I saw an Email, but might have missed it. Yes, it occurred to me that we should have a recvar.PyType function that would return a python-compatible type that you could use directly. I've created a new issue suggesting that we reconsider the python bindings to be a bit easier to use. I'll go ahead and close this one. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants