Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

relu function in falcon is much bigger than in securenn #18

Open
lijin456 opened this issue Sep 25, 2021 · 12 comments
Open

relu function in falcon is much bigger than in securenn #18

lijin456 opened this issue Sep 25, 2021 · 12 comments

Comments

@lijin456
Copy link

  • the cpu time and wall clock time of running the relu function in securenn 100 times is 0.007129 sec and 0.0573575 sec respectively.
  • the cpu time and wall clock time of running the relu function in falcon100 times is 0.120657 sec and 0.0899462 sec respectively.

And it doesn't match the paper(falcon), which you mentioned that falcon improve the efficiency by abount 2X.
Do you have any advice? Thanks every much

@lijin456
Copy link
Author

Should I give more details? I use the code in your repositories, and don't change the parameters. Could you help?

@snwagh
Copy link
Owner

snwagh commented Sep 27, 2021

It is indeed puzzling, it shouldn't be the case. Are you running them on the same machine/set of machines? Are the parallelism parameters such as this the same across the machines? If you use too many cores for a machine with small number of cores, it can cause a slowdown instead and that might be one reason? Another thing would be to check the control flow to ensure that you are running exactly what you think you are.

Also, the 2x efficiency is theoretical, I don't remember what the concrete numbers were but I would assume they should be 2x or more.

@lijin456
Copy link
Author

lijin456 commented Sep 28, 2021

Thanks for your advices. That's alright, the no_cores in falcon and securenn is different. I run these code(seucrenn falcon) in three 8 vCPU 16 GiB Alibaba Cloud elastic compute service instances. Time decrease when I decrease the no_cores vlaue.

But I'm still confusing the relationship between the no_cores value in code with machine cpu cores number. In falcon, I change the no_cores from 8 to 4, the wall time change to 0.068s from 0.12s. But I change the no_cores in securenn from 8 to 4. the wall clock time change to 0.058s from 0.077s. The relu time in falcon is still greater. How I find the right no_cores numbers? Thanks again

@snwagh
Copy link
Owner

snwagh commented Sep 28, 2021

no_cores manually parallelizes the bottleneck part of the code. I haven't found an automated way to find the appropriate number of cores but usually little less than half the machine config is a good reference.

For instance, if you have 8 core machine with 16 threads, using fewer than 8 is a good idea, so something like 6 is a good number in my experience. Note that this is for runs over LAN/WAN. If you're running over localhost where all 3 executables (parties) are running on the same machine, then you want to use 6 cores split among all these parties so ideally I would set the no_cores to 2.

@lijin456
Copy link
Author

Sorry to bother you again. I'm confused at the parameters in falcon because it seems that falcon works worse than securenn. I had benchmarked some basic functions on 8 vCPU 16 GiB Alibaba Cloud elastic compute service instances. I only change the parameters no_cores=4 , these are the results(I call the functions 100 times):

falcon securenn
wall clock cpu time wall clock cpu time
relu 0.0672796 sec 0.089296 sec 0.0579777 sec 0.006221 sec
drelu 0.0585398 sec 0.075817 sec 0.0551445 sec 0.0061 sec
select share 0.0110452 sec 0.013957 sec 0.00659282 sec 0.006437 sec
debugDotProd 0.00552658 sec 0.006838 sec 0.00623013 sec 0.006237 sec

here is how I call debug functions in main.cpp (securenn)

   if (!STANDALONE)
       initializeMPC();
   start_m();
   debugDotProd();
   end_m(whichNetwork);

here is how I call debug functions in main.cpp (falcon)

start_m();
runTest("Debug", "DotProd", network);
end_m(network);

example: I call the functions 100time in debug functions.

for(int i = 0; i < 100 ;i++) funcSelectShares(a, b, selection, size);

there are any other parameter that I should caution. Thanks very much!

@snwagh
Copy link
Owner

snwagh commented Sep 30, 2021

No worries, feel free to create issues if you are unable to resolve it. It is a little hard to know what exactly might be causing the issue without taking a look at the code but here are a few thoughts:

I would look into what code is being run (tracing the control flow). For instance, it seems that SecureNN does not really have a debug ReLU function whereas Falcon does have one (in the original repo). So you want to double check what you are comparing when you run ReLU on both codebases.

Secondly, if you're running the code as is from the repo, then the debug function is vectorized over size 8 for SecureNN but only size 5 for Falcon. Finally, any reconstruct function calls would also affect the performance.

@lijin456
Copy link
Author

lijin456 commented Oct 6, 2021

Thanks for you advice. I alredy implemented relu debug function in securenn and changed the size to 10 in debug function in both repositories. I've pushed the code to github. I hope you can take a look at the code if you have time. Thank you very much.
here is the code I changed in securenn and the code I changed in falcon

@snwagh
Copy link
Owner

snwagh commented Oct 6, 2021

I ran your above codes on localhost and could reproduce a similar error, the SecureNN timings are indeed a bit lower. The reason is unclear to me. :( I tired tweaking a few things here and there but the numbers don't agree with the theory. I can't think of a reason why this is happening.

@imtiyazuddin
Copy link

The ReLU function seems to be working only for integer values. When I tried to give floating point values all I was getting was zeroes. I also tried converting floating vals to myType and run but for output I am getting some big values (I think they are my plaintext vector values multiplied by 2^scalefactor mod 2^32). Please correct me If I am wrong

@snwagh
Copy link
Owner

snwagh commented Nov 16, 2021

The functions are meant to work on fixed point values (thus integers only). It is hard to say what is causing the issue but if you might be printing the plain integer values which explains why they're scaled. Try to use some of the print functions provided to get human readable output.

@ZJG0
Copy link

ZJG0 commented Nov 17, 2021

I ran your above codes on localhost and could reproduce a similar error, the SecureNN timings are indeed a bit lower. The reason is unclear to me. :( I tired tweaking a few things here and there but the numbers don't agree with the theory. I can't think of a reason why this is happening.

This result is inconsistent with the conclusion in the paper. How to explain this problem or how to solve this problem?

@snwagh
Copy link
Owner

snwagh commented Nov 19, 2021

@ZJG0 Are you also having the same issue?

I am not very sure how to debug this issue. Maybe there's another library out there that implements both protocols that could be another comparison point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants