Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Julia vs. Matlab #3775

Closed
hadrix opened this issue Jul 20, 2013 · 19 comments
Closed

Performance Julia vs. Matlab #3775

hadrix opened this issue Jul 20, 2013 · 19 comments
Labels
performance Must go faster

Comments

@hadrix
Copy link

hadrix commented Jul 20, 2013

I have prepared two simple scripts for both Julia and Matlab that are intended to do the same, however they seem to perform very differently. I'm getting 90 second in Julia vs. 40 second in Matlab. The scripts don't do anything productive they just mimic the structures of most of my codes in Matlab. I wanted to test performance to see whether is worthwhile or not to migrate to Julia. How could the performance of Julia be improved?

Script for julia:

tic()
n = 10000;
m = 100;
l = 10;
A = zeros(n,m);
for i=1:n
    for j=1:m
        c = rand(10,10);
        A[i,j] = c[2,5];
    end
    h = find(A[i,:].>0.5);
    for k=1:l
        c = fft(A[:,l]);
        d = min(abs(c));
    end
end
toc()

Script for Matlab:

tic()
n = 10000;
m = 100;
l = 10;
A = zeros(n,m);
for i=1:n
    for j=1:m
        c = rand(10,10);
        A(i,j) = c(2,5);
    end
    h = find(A(i,:)>0.5);
    for k=1:l
        c = fft(A(:,l));
        d = min(abs(c));
    end
end
toc()
@johnmyleswhite
Copy link
Member

Have you read http://docs.julialang.org/en/latest/manual/performance-tips/? You might find it helpful, especially the comments about not running scripts in the global namespace.

@ViralBShah
Copy link
Member

Given that this code is largely doing vectorized computations, I suspect that it may not speed up much even when not running in the global namespace.

@stevengj
Copy link
Member

It's hard to optimize code that is doing nonsensical operations. e.g. why compute c = rand(10,10) if you only use c[2,5]? Why repeatedly compute h = find(...) when h is never used? And so on. You could delete almost all of the code without changing the result at the end of the loop!

Put another way, if your requirement is "do these nonsense operations", then by definition there is not much freedom to optimize. On the other hand, if you specify what result you actually want at the end of the loop, then many re-arrangements are often possible to improve performance.

@ViralBShah
Copy link
Member

Timing the code in a let block gives almost the same time, and shows that we allocate 3.3G of memory. That cost and then GC is most likely killing us.

julia> @time let 
                     n = 10000;
                     m = 100;
                     l = 10;
                     A = zeros(n,m);
                     for i=1:n
                         for j=1:m
                             c = rand(10,10);
                             A[i,j] = c[2,5];
                         end
                         h = find(A[i,:].>0.5);
                         for k=1:l
                             c = fft(A[:,l]);
                             d = min(abs(c));
                         end
                     end
       end    

elapsed time: 87.008487046 seconds (33289168252 bytes allocated)

@ViralBShah
Copy link
Member

@stevengj I take it based on the comment in the issue that this code mimics the structure of the real use case and is most likely a benchmark that @hadrix created for trying out julia. It suggests that the real code uses rand, find, and ffts, with vector indexing.

@stevengj
Copy link
Member

@ViralBShah, it doesn't matter. With the snippet given, we have no way to distinguish the essential from the inessential, and therefore only the most superficial optimization is possible.

Optimizing based on this fragment is simply not sensible.

@lindahua
Copy link
Contributor

I share the same confusion as to what this code intends to do. To me, it is more like a random bunch of computation that I simply can't make any sense out of it.

What about providing the real code and let us know what it wants to achieve? I am happy to help optimize it.

@hadrix
Copy link
Author

hadrix commented Jul 20, 2013

There is no point in giving you a real code. I use Matlab for prototyping so I don't want to spend time optimizing every code I do, basically because most of them are testing a stupid idea I won't implement. Once I know which code I want to use for real I just move to C, fortran to get good performance... I tried Julia to see whether prototyping is faster that Matlab. I understand the example I gave you is nonsense but I don't thing giving a real one would perform better than Matlab if I don't get any speedup in that dummy test. And again, I don't want to optimize the code, that's the point, I want fast coding to test 20 ideas every day.

Anyway that you everyone for your comments.

@ViralBShah
Copy link
Member

@hadrix For a fair comparison, you should probably use matlab with one computational thread. For your purposes of fast coding, you may not care about under the hood, but it certainly would help us figure out where we stand in comparison.

@hadrix
Copy link
Author

hadrix commented Jul 20, 2013

Yes you are right @ViralBShah, I was already running with one thread in Matlab, the elapsed time of 40 second is for one thread.

@JeffBezanson
Copy link
Member

Doesn't matter to me if the code is real or not; I treat everything as black boxes to optimize :)
I think this case is in line with our known main performance problems, and we plan to address it. Browsing the "performance" issue tag will reveal similar benchmarks.

@lindahua
Copy link
Contributor

I did a more detailed analysis of this code snippet.

The first part and second part actually perform quite differently. So I compare these in two parts:

MATLAB code of the first part:

function foo1()

n = 10000;
m = 100;
l = 10;
A = zeros(n,m);
for i=1:n
    for j=1:m
        c = rand(10,10);
        A(i,j) = c(2,5);
    end
    h = find(A(i,:)>0.5);
end
end

foo1();   % warm up
tic; foo1(); toc  % measure the time

My optimized version of Julia code of the first part:

function foo1()
    n = 10000
    m = 100
    l = 10
    A = zeros(n,m)

    # pre-allocate storage
    h = Array(Float64, 0)
    c = Array(Float64, 10, 10)   

    for i=1:n
        empty!(h)
        for j=1:m   # merge computation in one loop
            rand!(c)  # generate to a pre-allocated array
            a = c[2,5]
            A[i,j] = a
            if a > 0.5
                push!(h, a) 
            end
        end
    end
end

foo1()  # warm up
@time foo1()  # measure the time

For this part of the code, MATLAB takes 1.98 second, while Julia takes 0.37 second. So Julia is 5x faster than MATLAB.

Notes:

  • Run the code first and then measure the performance. JIT will be run at the first time to do compilation work.
  • Avoid allocating arrays in inner loop. Allocate arrays in advance, and use inplace functions within the inner loop.

@lindahua
Copy link
Contributor

Once you get A, the second part of the code does the following two things:

  • Compute column-wise FFT
  • Find minimum absolute value of the coefficients

This part of the code can be written as follows

function bar()
n = 10000;
l = 10;
for i=1:n
    A = rand(n, l);
    for k=1:l
       c = fft(A(:,l));
       d = min(abs(c));
    end
end

In Julia, this can be rewritten as

function bar()
    n = 10000
    l = 10
    A = Array(Float64, n, l)
    ac = Array(Float64, l)

    for i = 1:n
        rand!(A)
        c = fft(A, 1)  # column wise FFT
        for k = 1:l
            ac[k] = abs(c[k])
        end
        d = min(ac, 1) # column wise mean
    end
end

bar()
@time bar()

This part MATLAB takes 29 sec, while Julia takes 21 sec in my machine. Still not bad.

@lindahua
Copy link
Contributor

We have to acknowledge that Julia has not got the point that it can ideally execute any code in an optimal way. Straightforward translation of codes from MATLAB to Julia may not get you the best performance.

To enjoy the good performance of Julia, you may have to be careful in implementation, and avoid the practice that may kill performance (e.g. repeatedly creating new arrays in an inner loop).

@stevengj
Copy link
Member

@lindahua wrote "Julia has not got the point that it can ideally execute any code in an optimal way." This is not true of any language. In any language, there is a large difference (often orders of magnitude) between unoptimized and optimized code for any non-trivial task.

@hadrix wrote, "I don't want to spend time optimizing every code I do, basically because most of them are testing a stupid idea I won't implement. Once I know which code I want to use for real I just move to C, fortran to get good performance."

The whole point of Julia is that you don't need to move to C or Fortran to get good performance. If you care about performance you still have to optimize code to get the best out of any language, but optimizing Julia code is usually far easier that rewriting in a low-level language.

Of course, when you are prototyping, performance is not a priority, and you shouldn't be worrying about optimization. (Recall the well-known saying, "Premature optimization is the root of all evil.") But for unoptimized non-performance-critical code, factors of two in performance are hardly something to worry about—this is in the noise. (Even in the same language, two different unoptimized versions of the same algorithm will often vary by a factor of two or more.)

Certainly, Jeff should do his best to handle any random code that is thrown at Julia. But I maintain that it is a waste of time on this thread to try to come up with human-optimized versions of a nonsensical code snippet that lacks well-defined inputs and outputs, because the most important optimizations are impossible without knowing what code is supposed to do.

@JeffBezanson
Copy link
Member

I agree with steven there is no point in hand-optimizing this case.

@stevengj
Copy link
Member

Put another way, there are two kinds of performance questions in any language implementation:

  • "How can we modify the compiler etc. to make program X run faster without modification?" This is important, but it is equally important to keep expectations in line with reality: no compiler will achieve the theoretical maximum performance in all cases. Nor can we expect Julia to match C or Fortran speed for any random unoptimized code block.
  • "Is it possible to achieve close to optimal ("C") speed for problem X in our language?" These kinds of questions are partly about hand-optimization and coding style, and partly about whether compiler/language limitations are holding us back for any given problem. The goal in Julia is that the answer to this question is almost always "yes," but it will almost always require some hand optimization (especially devectorization) compared to the tersest and simplest Julia implementation.

It seems like this issue is about the first question, not the second. However, it sounds like this code snippet doesn't raise any performance issues that are not raised elsewhere, nor is the performance unreasonable (within a factor of 2 of matlab for mostly vectorized code), so I'm closing the issue. Feel free to re-open if I'm missing something.

@lindahua
Copy link
Contributor

@stevengj I agree with your comments.

What I intended to show in my codes above is that when you really want performance, it is possible to make the code (without a lot of efforts) run much faster when performance is critical, as well as some aspects that one should pay attention to when writing fast codes.

Whereas I don't know what @hadrix try to achieve with this code snippet, I do believe that some of the tricks that I use in optimizing this code would be helpful for him to optimize things where he really cares about the performance.

@hadrix
Copy link
Author

hadrix commented Jul 21, 2013

Thank you everyone for your comments, it has been quite helpful and I hope Julia keep improving.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants