Mathematica code execution speed testing












4












$begingroup$


I am testing the speed of Mathematica's latest version against a VB macro for simple loops to evaluate Pi.



So the VB macro is:



c = 0: n = 1000000: m = n * n

For i = 0 To n

For j = 0 To n

If (i * i + j * j) / m < 1 Then

c = c + 1

End If

Next

Next

Selection.Text = 4 * c / m


I know how to write the corresponding code in Mathematica, but I do not know the intricacies of Mathematica regarding faster program execution. Can you help please by giving me an example of best optimized code for this example?



There are approximately 4 trillion operations in this example and the VB macro takes less than a day to execute...



This program partitions a quarter of a circular area of radius = 1 to smaller squares of area $1/n^2$ then counts them and adds them together. Essentially, a Monte Carlo Method but with randomness removed.



With your help I have created this faster code which uses parallel computing:



arg = Range[ 0, 1000000]; calcCompiled = 
Compile[{i},
Module[{c = 0}, Do[If[(i*i + j*j)/1000000^2 < 1, ++c], {j, 1000000}];
c], CompilationTarget -> "C", RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"];
AbsoluteTiming@N[4 Total[calcCompiled[arg]]/1000000^2, 20]


Any suggestions how to make this faster are welcome....










share|improve this question











$endgroup$








  • 1




    $begingroup$
    There is a buildt in benchmark function in Wolfram, see reference.wolfram.com/language/Benchmarking/ref/Benchmark.html. An alternative comparison would be to do the same calculations in VB.
    $endgroup$
    – FredrikD
    Dec 26 '18 at 15:23










  • $begingroup$
    "Now I know how to write the corresponding code in Mathematica" ... so include your Mathematica code in the question. Copy and paste in Raw InputForm and format as a code block.
    $endgroup$
    – Bob Hanlon
    Dec 26 '18 at 16:13










  • $begingroup$
    Ok Bob, sorry my mistake, I've been writing code for Mathematica since 2002 and Mathematica 4, what do you think? Mathematica 4 was waayyyyy slower then....
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 16:25












  • $begingroup$
    Erm. The result of the code seems to be c=1 and that takes an unmeasurable small amount of time to execute. That is eactly one integer operation. But I am not fluent in Visual Basic, so I could be wrong. In total, I do not understand your request.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 16:32








  • 2




    $begingroup$
    @dimachaerus one issue with these kinds of comparisons is you're comparing apples to oranges. Procedural, loop-based code is not the kind of code that is fast in Mathematica (in the absence of Compile). Try a functional algorithm instead. But if VB compiles to low-level C anyway, this might still not quite be the right comparison. Speed tests are tricky if you need to get a truly meaningful result.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 19:20
















4












$begingroup$


I am testing the speed of Mathematica's latest version against a VB macro for simple loops to evaluate Pi.



So the VB macro is:



c = 0: n = 1000000: m = n * n

For i = 0 To n

For j = 0 To n

If (i * i + j * j) / m < 1 Then

c = c + 1

End If

Next

Next

Selection.Text = 4 * c / m


I know how to write the corresponding code in Mathematica, but I do not know the intricacies of Mathematica regarding faster program execution. Can you help please by giving me an example of best optimized code for this example?



There are approximately 4 trillion operations in this example and the VB macro takes less than a day to execute...



This program partitions a quarter of a circular area of radius = 1 to smaller squares of area $1/n^2$ then counts them and adds them together. Essentially, a Monte Carlo Method but with randomness removed.



With your help I have created this faster code which uses parallel computing:



arg = Range[ 0, 1000000]; calcCompiled = 
Compile[{i},
Module[{c = 0}, Do[If[(i*i + j*j)/1000000^2 < 1, ++c], {j, 1000000}];
c], CompilationTarget -> "C", RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"];
AbsoluteTiming@N[4 Total[calcCompiled[arg]]/1000000^2, 20]


Any suggestions how to make this faster are welcome....










share|improve this question











$endgroup$








  • 1




    $begingroup$
    There is a buildt in benchmark function in Wolfram, see reference.wolfram.com/language/Benchmarking/ref/Benchmark.html. An alternative comparison would be to do the same calculations in VB.
    $endgroup$
    – FredrikD
    Dec 26 '18 at 15:23










  • $begingroup$
    "Now I know how to write the corresponding code in Mathematica" ... so include your Mathematica code in the question. Copy and paste in Raw InputForm and format as a code block.
    $endgroup$
    – Bob Hanlon
    Dec 26 '18 at 16:13










  • $begingroup$
    Ok Bob, sorry my mistake, I've been writing code for Mathematica since 2002 and Mathematica 4, what do you think? Mathematica 4 was waayyyyy slower then....
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 16:25












  • $begingroup$
    Erm. The result of the code seems to be c=1 and that takes an unmeasurable small amount of time to execute. That is eactly one integer operation. But I am not fluent in Visual Basic, so I could be wrong. In total, I do not understand your request.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 16:32








  • 2




    $begingroup$
    @dimachaerus one issue with these kinds of comparisons is you're comparing apples to oranges. Procedural, loop-based code is not the kind of code that is fast in Mathematica (in the absence of Compile). Try a functional algorithm instead. But if VB compiles to low-level C anyway, this might still not quite be the right comparison. Speed tests are tricky if you need to get a truly meaningful result.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 19:20














4












4








4





$begingroup$


I am testing the speed of Mathematica's latest version against a VB macro for simple loops to evaluate Pi.



So the VB macro is:



c = 0: n = 1000000: m = n * n

For i = 0 To n

For j = 0 To n

If (i * i + j * j) / m < 1 Then

c = c + 1

End If

Next

Next

Selection.Text = 4 * c / m


I know how to write the corresponding code in Mathematica, but I do not know the intricacies of Mathematica regarding faster program execution. Can you help please by giving me an example of best optimized code for this example?



There are approximately 4 trillion operations in this example and the VB macro takes less than a day to execute...



This program partitions a quarter of a circular area of radius = 1 to smaller squares of area $1/n^2$ then counts them and adds them together. Essentially, a Monte Carlo Method but with randomness removed.



With your help I have created this faster code which uses parallel computing:



arg = Range[ 0, 1000000]; calcCompiled = 
Compile[{i},
Module[{c = 0}, Do[If[(i*i + j*j)/1000000^2 < 1, ++c], {j, 1000000}];
c], CompilationTarget -> "C", RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"];
AbsoluteTiming@N[4 Total[calcCompiled[arg]]/1000000^2, 20]


Any suggestions how to make this faster are welcome....










share|improve this question











$endgroup$




I am testing the speed of Mathematica's latest version against a VB macro for simple loops to evaluate Pi.



So the VB macro is:



c = 0: n = 1000000: m = n * n

For i = 0 To n

For j = 0 To n

If (i * i + j * j) / m < 1 Then

c = c + 1

End If

Next

Next

Selection.Text = 4 * c / m


I know how to write the corresponding code in Mathematica, but I do not know the intricacies of Mathematica regarding faster program execution. Can you help please by giving me an example of best optimized code for this example?



There are approximately 4 trillion operations in this example and the VB macro takes less than a day to execute...



This program partitions a quarter of a circular area of radius = 1 to smaller squares of area $1/n^2$ then counts them and adds them together. Essentially, a Monte Carlo Method but with randomness removed.



With your help I have created this faster code which uses parallel computing:



arg = Range[ 0, 1000000]; calcCompiled = 
Compile[{i},
Module[{c = 0}, Do[If[(i*i + j*j)/1000000^2 < 1, ++c], {j, 1000000}];
c], CompilationTarget -> "C", RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"];
AbsoluteTiming@N[4 Total[calcCompiled[arg]]/1000000^2, 20]


Any suggestions how to make this faster are welcome....







performance-tuning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 27 '18 at 13:56







dimachaerus

















asked Dec 26 '18 at 14:14









dimachaerusdimachaerus

255




255








  • 1




    $begingroup$
    There is a buildt in benchmark function in Wolfram, see reference.wolfram.com/language/Benchmarking/ref/Benchmark.html. An alternative comparison would be to do the same calculations in VB.
    $endgroup$
    – FredrikD
    Dec 26 '18 at 15:23










  • $begingroup$
    "Now I know how to write the corresponding code in Mathematica" ... so include your Mathematica code in the question. Copy and paste in Raw InputForm and format as a code block.
    $endgroup$
    – Bob Hanlon
    Dec 26 '18 at 16:13










  • $begingroup$
    Ok Bob, sorry my mistake, I've been writing code for Mathematica since 2002 and Mathematica 4, what do you think? Mathematica 4 was waayyyyy slower then....
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 16:25












  • $begingroup$
    Erm. The result of the code seems to be c=1 and that takes an unmeasurable small amount of time to execute. That is eactly one integer operation. But I am not fluent in Visual Basic, so I could be wrong. In total, I do not understand your request.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 16:32








  • 2




    $begingroup$
    @dimachaerus one issue with these kinds of comparisons is you're comparing apples to oranges. Procedural, loop-based code is not the kind of code that is fast in Mathematica (in the absence of Compile). Try a functional algorithm instead. But if VB compiles to low-level C anyway, this might still not quite be the right comparison. Speed tests are tricky if you need to get a truly meaningful result.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 19:20














  • 1




    $begingroup$
    There is a buildt in benchmark function in Wolfram, see reference.wolfram.com/language/Benchmarking/ref/Benchmark.html. An alternative comparison would be to do the same calculations in VB.
    $endgroup$
    – FredrikD
    Dec 26 '18 at 15:23










  • $begingroup$
    "Now I know how to write the corresponding code in Mathematica" ... so include your Mathematica code in the question. Copy and paste in Raw InputForm and format as a code block.
    $endgroup$
    – Bob Hanlon
    Dec 26 '18 at 16:13










  • $begingroup$
    Ok Bob, sorry my mistake, I've been writing code for Mathematica since 2002 and Mathematica 4, what do you think? Mathematica 4 was waayyyyy slower then....
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 16:25












  • $begingroup$
    Erm. The result of the code seems to be c=1 and that takes an unmeasurable small amount of time to execute. That is eactly one integer operation. But I am not fluent in Visual Basic, so I could be wrong. In total, I do not understand your request.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 16:32








  • 2




    $begingroup$
    @dimachaerus one issue with these kinds of comparisons is you're comparing apples to oranges. Procedural, loop-based code is not the kind of code that is fast in Mathematica (in the absence of Compile). Try a functional algorithm instead. But if VB compiles to low-level C anyway, this might still not quite be the right comparison. Speed tests are tricky if you need to get a truly meaningful result.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 19:20








1




1




$begingroup$
There is a buildt in benchmark function in Wolfram, see reference.wolfram.com/language/Benchmarking/ref/Benchmark.html. An alternative comparison would be to do the same calculations in VB.
$endgroup$
– FredrikD
Dec 26 '18 at 15:23




$begingroup$
There is a buildt in benchmark function in Wolfram, see reference.wolfram.com/language/Benchmarking/ref/Benchmark.html. An alternative comparison would be to do the same calculations in VB.
$endgroup$
– FredrikD
Dec 26 '18 at 15:23












$begingroup$
"Now I know how to write the corresponding code in Mathematica" ... so include your Mathematica code in the question. Copy and paste in Raw InputForm and format as a code block.
$endgroup$
– Bob Hanlon
Dec 26 '18 at 16:13




$begingroup$
"Now I know how to write the corresponding code in Mathematica" ... so include your Mathematica code in the question. Copy and paste in Raw InputForm and format as a code block.
$endgroup$
– Bob Hanlon
Dec 26 '18 at 16:13












$begingroup$
Ok Bob, sorry my mistake, I've been writing code for Mathematica since 2002 and Mathematica 4, what do you think? Mathematica 4 was waayyyyy slower then....
$endgroup$
– dimachaerus
Dec 26 '18 at 16:25






$begingroup$
Ok Bob, sorry my mistake, I've been writing code for Mathematica since 2002 and Mathematica 4, what do you think? Mathematica 4 was waayyyyy slower then....
$endgroup$
– dimachaerus
Dec 26 '18 at 16:25














$begingroup$
Erm. The result of the code seems to be c=1 and that takes an unmeasurable small amount of time to execute. That is eactly one integer operation. But I am not fluent in Visual Basic, so I could be wrong. In total, I do not understand your request.
$endgroup$
– Henrik Schumacher
Dec 26 '18 at 16:32






$begingroup$
Erm. The result of the code seems to be c=1 and that takes an unmeasurable small amount of time to execute. That is eactly one integer operation. But I am not fluent in Visual Basic, so I could be wrong. In total, I do not understand your request.
$endgroup$
– Henrik Schumacher
Dec 26 '18 at 16:32






2




2




$begingroup$
@dimachaerus one issue with these kinds of comparisons is you're comparing apples to oranges. Procedural, loop-based code is not the kind of code that is fast in Mathematica (in the absence of Compile). Try a functional algorithm instead. But if VB compiles to low-level C anyway, this might still not quite be the right comparison. Speed tests are tricky if you need to get a truly meaningful result.
$endgroup$
– b3m2a1
Dec 26 '18 at 19:20




$begingroup$
@dimachaerus one issue with these kinds of comparisons is you're comparing apples to oranges. Procedural, loop-based code is not the kind of code that is fast in Mathematica (in the absence of Compile). Try a functional algorithm instead. But if VB compiles to low-level C anyway, this might still not quite be the right comparison. Speed tests are tricky if you need to get a truly meaningful result.
$endgroup$
– b3m2a1
Dec 26 '18 at 19:20










3 Answers
3






active

oldest

votes


















10












$begingroup$

This is an algorithm of complexity $O(n)$ instead of $O(n^2)$. It runs through in about 8 milliseconds.



n = 1000000;
piapprox =
Total[Floor[
Sqrt[Ramp[Subtract[N[n^2] - 1., Range[1., n]^2]]]]] (4./n^2); //
RepeatedTiming // First

piapprox - Pi



0.00812



-4.00401*10^-6




Moreover, you get more bang for your bucks with the trapezoidal rule as follows:



n = 1000000;
weights = ConstantArray[1./n, n + 1];
weights[[{1, -1}]] = 0.5/n;
piapprox2 = 4. (weights.Sqrt[1. - Subdivide[0., 1., n]^2]);
piapprox2 - Pi



-1.17598*10^-9




And this computes Pi with 16-digit precision by approximating the intersection of the unit disk with the first quadrant by $2^k$ congruent triangles; the area of a single triangle is computed (one half of the result of Det) and multiplied by $4, 2^k$.



RepeatedTiming[
iters = 25;
{p, q} = N[IdentityMatrix[2]];
piapprox = 2^(k + 1) Det[{p, Nest[x [Function] Normalize[p + x], q, iters]}];
][[1]]
piapprox - Pi



0.000048



4.44089*10^-16




This uncompiled versions needs about 0.04 milliseconds.






share|improve this answer











$endgroup$













  • $begingroup$
    Sorry, I am not looking for fast methods to evaluate Pi. I am testing the speed of Mathematica for the particular loop example against Visual Basic and I am finding Mathematica to be very slow compared.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 18:53






  • 1




    $begingroup$
    It is slower in Mathematica because you did it the wrong way. For is well-known to be very slow when not compiled. Try Do instead.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 19:01










  • $begingroup$
    No, even simple: Do[ i * i + j * j, {i, 0, 10000}, {j, 0, 10000}] takes more time than VB.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:15












  • $begingroup$
    Then Compile it as it was shown by b3m2a1.
    $endgroup$
    – Henrik Schumacher
    Dec 27 '18 at 9:34



















10












$begingroup$

You're comparing a compiled language with an interpreted one. Or, at least, you are using the Wolfram language in interpreted mode. You'll have better results if you use a Compiled function.



Example without Compile — directly translated code:



calcNaive = Function[{n},
Module[{c = 0., m = N[n^2]},
Do[
Do[
If[(i*i + j*j)/m < 1.,
++c
]
, {i, 0., n}]
, {j, 0., n}];
4. c/m]];
AbsoluteTiming@calcNaive[10000]



{264.352184, 3.14199016}




First value in the output is timing in seconds, second is the function return value.



Note here the N[n^2] instead of simple n^2. This way we avoid using arbitrary-length integers in further code, which would result in a 17% longer computation for no real benefit. In general, if you don't need advanced precision, you'd better enter all your numbers as machine-precision numbers, e.g. 1000.0 instead of 1000.



Now the code using Compile (which always uses machine-precision numbers, in addition to C-language-compiler optimizations):



calcCompiled = Compile[{n},
Module[{c = 0, m = n^2},
Do[
Do[
If[(i*i + j*j)/m < 1,
++c
]
, {i, 0, n}]
, {j, 0, n}];
4 c/m]
, CompilationTarget -> "C"
, RuntimeOptions -> "Speed"];
AbsoluteTiming@calcCompiled[10000]



{0.918117, 3.14199016}




Notice more than two orders of magnitude faster calculation.






share|improve this answer











$endgroup$













  • $begingroup$
    Thanks. The right answer I was looking for.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:56



















6












$begingroup$

Here's another comparison. First let's look at the raw, inefficient version that someone who comes to Mathematica from another language might try:



badForMathematica[n_] :=
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
]

badForMathematica[1000] // AbsoluteTiming

{3.07878, 3.14552}


Eeesh. But this is the dumb way. Here's a new approach. First we take the your algorithm and realize you're just subdividing and counting hits inside the circle. We can do this much faster using vectorized operations. Here's a way to get all the hits:



countHits[{i1_, i2_}, {j1_, j2_}, m_] :=

With[{r1 = Range[i1, i2]^2, r2 = Range[j1, j2]^2},
Total@UnitStep[m - Join @@ Outer[Plus, r1, r2]]
];


We generate the Outer product you're really generating, then totaling the counts. Simple as that. Now we cook this into a larger algorithm where we work in chunks because I don't have enough memory to work all at once:



goodForMathematica[n_, chunkSize_: 1000] :=
Module[{m, c = 0, chunks},
m = n^2;
chunks =
If[chunkSize >= n,
{{0, n}},
If[#[[-1]] =!= n,
Append[#, n],
#
] &@Range[0, n, chunkSize]
];
Do[c += countHits[c1, c2, m], {c1, chunks}, {c2, chunks}];
(4.*c)/m
]


If you have lots of memory just do it straight out (but you probably don't have enough). Here's that test:



goodForMathematica[10000] // AbsoluteTiming

{3.37823, 3.14199}


We increased the count by an order of magnitude but had similar performance. Note that this algorithm has trash scaling, so we can expect that n = 20000 will take ~12s because we've got four blocks to work with:



1000000 will therefore take about ~35000s or ~10 hours. Not gonna actually do it.



Finally, here's probably the first thing to think of, which is to just use Compile:



compiledVersion =
Compile[{{n, _Integer}},
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
],
RuntimeOptions -> "Speed",
CompilationTarget -> "C"
];

compiledVersion[100000] // AbsoluteTiming

{14.8691, 3.14163}


And this blows everything else out of the water (notice I use 100000 per grid subdivision), but that makes sense, because we're really using C, not Mathematica.






share|improve this answer









$endgroup$













  • $begingroup$
    There's no need to put i and j into Module: Do already localizes their names.
    $endgroup$
    – Ruslan
    Dec 26 '18 at 20:04










  • $begingroup$
    @Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 20:06










  • $begingroup$
    Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task?
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 21:40













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "387"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f188423%2fmathematica-code-execution-speed-testing%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









10












$begingroup$

This is an algorithm of complexity $O(n)$ instead of $O(n^2)$. It runs through in about 8 milliseconds.



n = 1000000;
piapprox =
Total[Floor[
Sqrt[Ramp[Subtract[N[n^2] - 1., Range[1., n]^2]]]]] (4./n^2); //
RepeatedTiming // First

piapprox - Pi



0.00812



-4.00401*10^-6




Moreover, you get more bang for your bucks with the trapezoidal rule as follows:



n = 1000000;
weights = ConstantArray[1./n, n + 1];
weights[[{1, -1}]] = 0.5/n;
piapprox2 = 4. (weights.Sqrt[1. - Subdivide[0., 1., n]^2]);
piapprox2 - Pi



-1.17598*10^-9




And this computes Pi with 16-digit precision by approximating the intersection of the unit disk with the first quadrant by $2^k$ congruent triangles; the area of a single triangle is computed (one half of the result of Det) and multiplied by $4, 2^k$.



RepeatedTiming[
iters = 25;
{p, q} = N[IdentityMatrix[2]];
piapprox = 2^(k + 1) Det[{p, Nest[x [Function] Normalize[p + x], q, iters]}];
][[1]]
piapprox - Pi



0.000048



4.44089*10^-16




This uncompiled versions needs about 0.04 milliseconds.






share|improve this answer











$endgroup$













  • $begingroup$
    Sorry, I am not looking for fast methods to evaluate Pi. I am testing the speed of Mathematica for the particular loop example against Visual Basic and I am finding Mathematica to be very slow compared.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 18:53






  • 1




    $begingroup$
    It is slower in Mathematica because you did it the wrong way. For is well-known to be very slow when not compiled. Try Do instead.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 19:01










  • $begingroup$
    No, even simple: Do[ i * i + j * j, {i, 0, 10000}, {j, 0, 10000}] takes more time than VB.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:15












  • $begingroup$
    Then Compile it as it was shown by b3m2a1.
    $endgroup$
    – Henrik Schumacher
    Dec 27 '18 at 9:34
















10












$begingroup$

This is an algorithm of complexity $O(n)$ instead of $O(n^2)$. It runs through in about 8 milliseconds.



n = 1000000;
piapprox =
Total[Floor[
Sqrt[Ramp[Subtract[N[n^2] - 1., Range[1., n]^2]]]]] (4./n^2); //
RepeatedTiming // First

piapprox - Pi



0.00812



-4.00401*10^-6




Moreover, you get more bang for your bucks with the trapezoidal rule as follows:



n = 1000000;
weights = ConstantArray[1./n, n + 1];
weights[[{1, -1}]] = 0.5/n;
piapprox2 = 4. (weights.Sqrt[1. - Subdivide[0., 1., n]^2]);
piapprox2 - Pi



-1.17598*10^-9




And this computes Pi with 16-digit precision by approximating the intersection of the unit disk with the first quadrant by $2^k$ congruent triangles; the area of a single triangle is computed (one half of the result of Det) and multiplied by $4, 2^k$.



RepeatedTiming[
iters = 25;
{p, q} = N[IdentityMatrix[2]];
piapprox = 2^(k + 1) Det[{p, Nest[x [Function] Normalize[p + x], q, iters]}];
][[1]]
piapprox - Pi



0.000048



4.44089*10^-16




This uncompiled versions needs about 0.04 milliseconds.






share|improve this answer











$endgroup$













  • $begingroup$
    Sorry, I am not looking for fast methods to evaluate Pi. I am testing the speed of Mathematica for the particular loop example against Visual Basic and I am finding Mathematica to be very slow compared.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 18:53






  • 1




    $begingroup$
    It is slower in Mathematica because you did it the wrong way. For is well-known to be very slow when not compiled. Try Do instead.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 19:01










  • $begingroup$
    No, even simple: Do[ i * i + j * j, {i, 0, 10000}, {j, 0, 10000}] takes more time than VB.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:15












  • $begingroup$
    Then Compile it as it was shown by b3m2a1.
    $endgroup$
    – Henrik Schumacher
    Dec 27 '18 at 9:34














10












10








10





$begingroup$

This is an algorithm of complexity $O(n)$ instead of $O(n^2)$. It runs through in about 8 milliseconds.



n = 1000000;
piapprox =
Total[Floor[
Sqrt[Ramp[Subtract[N[n^2] - 1., Range[1., n]^2]]]]] (4./n^2); //
RepeatedTiming // First

piapprox - Pi



0.00812



-4.00401*10^-6




Moreover, you get more bang for your bucks with the trapezoidal rule as follows:



n = 1000000;
weights = ConstantArray[1./n, n + 1];
weights[[{1, -1}]] = 0.5/n;
piapprox2 = 4. (weights.Sqrt[1. - Subdivide[0., 1., n]^2]);
piapprox2 - Pi



-1.17598*10^-9




And this computes Pi with 16-digit precision by approximating the intersection of the unit disk with the first quadrant by $2^k$ congruent triangles; the area of a single triangle is computed (one half of the result of Det) and multiplied by $4, 2^k$.



RepeatedTiming[
iters = 25;
{p, q} = N[IdentityMatrix[2]];
piapprox = 2^(k + 1) Det[{p, Nest[x [Function] Normalize[p + x], q, iters]}];
][[1]]
piapprox - Pi



0.000048



4.44089*10^-16




This uncompiled versions needs about 0.04 milliseconds.






share|improve this answer











$endgroup$



This is an algorithm of complexity $O(n)$ instead of $O(n^2)$. It runs through in about 8 milliseconds.



n = 1000000;
piapprox =
Total[Floor[
Sqrt[Ramp[Subtract[N[n^2] - 1., Range[1., n]^2]]]]] (4./n^2); //
RepeatedTiming // First

piapprox - Pi



0.00812



-4.00401*10^-6




Moreover, you get more bang for your bucks with the trapezoidal rule as follows:



n = 1000000;
weights = ConstantArray[1./n, n + 1];
weights[[{1, -1}]] = 0.5/n;
piapprox2 = 4. (weights.Sqrt[1. - Subdivide[0., 1., n]^2]);
piapprox2 - Pi



-1.17598*10^-9




And this computes Pi with 16-digit precision by approximating the intersection of the unit disk with the first quadrant by $2^k$ congruent triangles; the area of a single triangle is computed (one half of the result of Det) and multiplied by $4, 2^k$.



RepeatedTiming[
iters = 25;
{p, q} = N[IdentityMatrix[2]];
piapprox = 2^(k + 1) Det[{p, Nest[x [Function] Normalize[p + x], q, iters]}];
][[1]]
piapprox - Pi



0.000048



4.44089*10^-16




This uncompiled versions needs about 0.04 milliseconds.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 26 '18 at 18:06

























answered Dec 26 '18 at 16:55









Henrik SchumacherHenrik Schumacher

53.4k471148




53.4k471148












  • $begingroup$
    Sorry, I am not looking for fast methods to evaluate Pi. I am testing the speed of Mathematica for the particular loop example against Visual Basic and I am finding Mathematica to be very slow compared.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 18:53






  • 1




    $begingroup$
    It is slower in Mathematica because you did it the wrong way. For is well-known to be very slow when not compiled. Try Do instead.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 19:01










  • $begingroup$
    No, even simple: Do[ i * i + j * j, {i, 0, 10000}, {j, 0, 10000}] takes more time than VB.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:15












  • $begingroup$
    Then Compile it as it was shown by b3m2a1.
    $endgroup$
    – Henrik Schumacher
    Dec 27 '18 at 9:34


















  • $begingroup$
    Sorry, I am not looking for fast methods to evaluate Pi. I am testing the speed of Mathematica for the particular loop example against Visual Basic and I am finding Mathematica to be very slow compared.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 18:53






  • 1




    $begingroup$
    It is slower in Mathematica because you did it the wrong way. For is well-known to be very slow when not compiled. Try Do instead.
    $endgroup$
    – Henrik Schumacher
    Dec 26 '18 at 19:01










  • $begingroup$
    No, even simple: Do[ i * i + j * j, {i, 0, 10000}, {j, 0, 10000}] takes more time than VB.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:15












  • $begingroup$
    Then Compile it as it was shown by b3m2a1.
    $endgroup$
    – Henrik Schumacher
    Dec 27 '18 at 9:34
















$begingroup$
Sorry, I am not looking for fast methods to evaluate Pi. I am testing the speed of Mathematica for the particular loop example against Visual Basic and I am finding Mathematica to be very slow compared.
$endgroup$
– dimachaerus
Dec 26 '18 at 18:53




$begingroup$
Sorry, I am not looking for fast methods to evaluate Pi. I am testing the speed of Mathematica for the particular loop example against Visual Basic and I am finding Mathematica to be very slow compared.
$endgroup$
– dimachaerus
Dec 26 '18 at 18:53




1




1




$begingroup$
It is slower in Mathematica because you did it the wrong way. For is well-known to be very slow when not compiled. Try Do instead.
$endgroup$
– Henrik Schumacher
Dec 26 '18 at 19:01




$begingroup$
It is slower in Mathematica because you did it the wrong way. For is well-known to be very slow when not compiled. Try Do instead.
$endgroup$
– Henrik Schumacher
Dec 26 '18 at 19:01












$begingroup$
No, even simple: Do[ i * i + j * j, {i, 0, 10000}, {j, 0, 10000}] takes more time than VB.
$endgroup$
– dimachaerus
Dec 26 '18 at 19:15






$begingroup$
No, even simple: Do[ i * i + j * j, {i, 0, 10000}, {j, 0, 10000}] takes more time than VB.
$endgroup$
– dimachaerus
Dec 26 '18 at 19:15














$begingroup$
Then Compile it as it was shown by b3m2a1.
$endgroup$
– Henrik Schumacher
Dec 27 '18 at 9:34




$begingroup$
Then Compile it as it was shown by b3m2a1.
$endgroup$
– Henrik Schumacher
Dec 27 '18 at 9:34











10












$begingroup$

You're comparing a compiled language with an interpreted one. Or, at least, you are using the Wolfram language in interpreted mode. You'll have better results if you use a Compiled function.



Example without Compile — directly translated code:



calcNaive = Function[{n},
Module[{c = 0., m = N[n^2]},
Do[
Do[
If[(i*i + j*j)/m < 1.,
++c
]
, {i, 0., n}]
, {j, 0., n}];
4. c/m]];
AbsoluteTiming@calcNaive[10000]



{264.352184, 3.14199016}




First value in the output is timing in seconds, second is the function return value.



Note here the N[n^2] instead of simple n^2. This way we avoid using arbitrary-length integers in further code, which would result in a 17% longer computation for no real benefit. In general, if you don't need advanced precision, you'd better enter all your numbers as machine-precision numbers, e.g. 1000.0 instead of 1000.



Now the code using Compile (which always uses machine-precision numbers, in addition to C-language-compiler optimizations):



calcCompiled = Compile[{n},
Module[{c = 0, m = n^2},
Do[
Do[
If[(i*i + j*j)/m < 1,
++c
]
, {i, 0, n}]
, {j, 0, n}];
4 c/m]
, CompilationTarget -> "C"
, RuntimeOptions -> "Speed"];
AbsoluteTiming@calcCompiled[10000]



{0.918117, 3.14199016}




Notice more than two orders of magnitude faster calculation.






share|improve this answer











$endgroup$













  • $begingroup$
    Thanks. The right answer I was looking for.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:56
















10












$begingroup$

You're comparing a compiled language with an interpreted one. Or, at least, you are using the Wolfram language in interpreted mode. You'll have better results if you use a Compiled function.



Example without Compile — directly translated code:



calcNaive = Function[{n},
Module[{c = 0., m = N[n^2]},
Do[
Do[
If[(i*i + j*j)/m < 1.,
++c
]
, {i, 0., n}]
, {j, 0., n}];
4. c/m]];
AbsoluteTiming@calcNaive[10000]



{264.352184, 3.14199016}




First value in the output is timing in seconds, second is the function return value.



Note here the N[n^2] instead of simple n^2. This way we avoid using arbitrary-length integers in further code, which would result in a 17% longer computation for no real benefit. In general, if you don't need advanced precision, you'd better enter all your numbers as machine-precision numbers, e.g. 1000.0 instead of 1000.



Now the code using Compile (which always uses machine-precision numbers, in addition to C-language-compiler optimizations):



calcCompiled = Compile[{n},
Module[{c = 0, m = n^2},
Do[
Do[
If[(i*i + j*j)/m < 1,
++c
]
, {i, 0, n}]
, {j, 0, n}];
4 c/m]
, CompilationTarget -> "C"
, RuntimeOptions -> "Speed"];
AbsoluteTiming@calcCompiled[10000]



{0.918117, 3.14199016}




Notice more than two orders of magnitude faster calculation.






share|improve this answer











$endgroup$













  • $begingroup$
    Thanks. The right answer I was looking for.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:56














10












10








10





$begingroup$

You're comparing a compiled language with an interpreted one. Or, at least, you are using the Wolfram language in interpreted mode. You'll have better results if you use a Compiled function.



Example without Compile — directly translated code:



calcNaive = Function[{n},
Module[{c = 0., m = N[n^2]},
Do[
Do[
If[(i*i + j*j)/m < 1.,
++c
]
, {i, 0., n}]
, {j, 0., n}];
4. c/m]];
AbsoluteTiming@calcNaive[10000]



{264.352184, 3.14199016}




First value in the output is timing in seconds, second is the function return value.



Note here the N[n^2] instead of simple n^2. This way we avoid using arbitrary-length integers in further code, which would result in a 17% longer computation for no real benefit. In general, if you don't need advanced precision, you'd better enter all your numbers as machine-precision numbers, e.g. 1000.0 instead of 1000.



Now the code using Compile (which always uses machine-precision numbers, in addition to C-language-compiler optimizations):



calcCompiled = Compile[{n},
Module[{c = 0, m = n^2},
Do[
Do[
If[(i*i + j*j)/m < 1,
++c
]
, {i, 0, n}]
, {j, 0, n}];
4 c/m]
, CompilationTarget -> "C"
, RuntimeOptions -> "Speed"];
AbsoluteTiming@calcCompiled[10000]



{0.918117, 3.14199016}




Notice more than two orders of magnitude faster calculation.






share|improve this answer











$endgroup$



You're comparing a compiled language with an interpreted one. Or, at least, you are using the Wolfram language in interpreted mode. You'll have better results if you use a Compiled function.



Example without Compile — directly translated code:



calcNaive = Function[{n},
Module[{c = 0., m = N[n^2]},
Do[
Do[
If[(i*i + j*j)/m < 1.,
++c
]
, {i, 0., n}]
, {j, 0., n}];
4. c/m]];
AbsoluteTiming@calcNaive[10000]



{264.352184, 3.14199016}




First value in the output is timing in seconds, second is the function return value.



Note here the N[n^2] instead of simple n^2. This way we avoid using arbitrary-length integers in further code, which would result in a 17% longer computation for no real benefit. In general, if you don't need advanced precision, you'd better enter all your numbers as machine-precision numbers, e.g. 1000.0 instead of 1000.



Now the code using Compile (which always uses machine-precision numbers, in addition to C-language-compiler optimizations):



calcCompiled = Compile[{n},
Module[{c = 0, m = n^2},
Do[
Do[
If[(i*i + j*j)/m < 1,
++c
]
, {i, 0, n}]
, {j, 0, n}];
4 c/m]
, CompilationTarget -> "C"
, RuntimeOptions -> "Speed"];
AbsoluteTiming@calcCompiled[10000]



{0.918117, 3.14199016}




Notice more than two orders of magnitude faster calculation.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 27 '18 at 7:21

























answered Dec 26 '18 at 19:47









RuslanRuslan

3,38911444




3,38911444












  • $begingroup$
    Thanks. The right answer I was looking for.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:56


















  • $begingroup$
    Thanks. The right answer I was looking for.
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 19:56
















$begingroup$
Thanks. The right answer I was looking for.
$endgroup$
– dimachaerus
Dec 26 '18 at 19:56




$begingroup$
Thanks. The right answer I was looking for.
$endgroup$
– dimachaerus
Dec 26 '18 at 19:56











6












$begingroup$

Here's another comparison. First let's look at the raw, inefficient version that someone who comes to Mathematica from another language might try:



badForMathematica[n_] :=
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
]

badForMathematica[1000] // AbsoluteTiming

{3.07878, 3.14552}


Eeesh. But this is the dumb way. Here's a new approach. First we take the your algorithm and realize you're just subdividing and counting hits inside the circle. We can do this much faster using vectorized operations. Here's a way to get all the hits:



countHits[{i1_, i2_}, {j1_, j2_}, m_] :=

With[{r1 = Range[i1, i2]^2, r2 = Range[j1, j2]^2},
Total@UnitStep[m - Join @@ Outer[Plus, r1, r2]]
];


We generate the Outer product you're really generating, then totaling the counts. Simple as that. Now we cook this into a larger algorithm where we work in chunks because I don't have enough memory to work all at once:



goodForMathematica[n_, chunkSize_: 1000] :=
Module[{m, c = 0, chunks},
m = n^2;
chunks =
If[chunkSize >= n,
{{0, n}},
If[#[[-1]] =!= n,
Append[#, n],
#
] &@Range[0, n, chunkSize]
];
Do[c += countHits[c1, c2, m], {c1, chunks}, {c2, chunks}];
(4.*c)/m
]


If you have lots of memory just do it straight out (but you probably don't have enough). Here's that test:



goodForMathematica[10000] // AbsoluteTiming

{3.37823, 3.14199}


We increased the count by an order of magnitude but had similar performance. Note that this algorithm has trash scaling, so we can expect that n = 20000 will take ~12s because we've got four blocks to work with:



1000000 will therefore take about ~35000s or ~10 hours. Not gonna actually do it.



Finally, here's probably the first thing to think of, which is to just use Compile:



compiledVersion =
Compile[{{n, _Integer}},
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
],
RuntimeOptions -> "Speed",
CompilationTarget -> "C"
];

compiledVersion[100000] // AbsoluteTiming

{14.8691, 3.14163}


And this blows everything else out of the water (notice I use 100000 per grid subdivision), but that makes sense, because we're really using C, not Mathematica.






share|improve this answer









$endgroup$













  • $begingroup$
    There's no need to put i and j into Module: Do already localizes their names.
    $endgroup$
    – Ruslan
    Dec 26 '18 at 20:04










  • $begingroup$
    @Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 20:06










  • $begingroup$
    Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task?
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 21:40


















6












$begingroup$

Here's another comparison. First let's look at the raw, inefficient version that someone who comes to Mathematica from another language might try:



badForMathematica[n_] :=
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
]

badForMathematica[1000] // AbsoluteTiming

{3.07878, 3.14552}


Eeesh. But this is the dumb way. Here's a new approach. First we take the your algorithm and realize you're just subdividing and counting hits inside the circle. We can do this much faster using vectorized operations. Here's a way to get all the hits:



countHits[{i1_, i2_}, {j1_, j2_}, m_] :=

With[{r1 = Range[i1, i2]^2, r2 = Range[j1, j2]^2},
Total@UnitStep[m - Join @@ Outer[Plus, r1, r2]]
];


We generate the Outer product you're really generating, then totaling the counts. Simple as that. Now we cook this into a larger algorithm where we work in chunks because I don't have enough memory to work all at once:



goodForMathematica[n_, chunkSize_: 1000] :=
Module[{m, c = 0, chunks},
m = n^2;
chunks =
If[chunkSize >= n,
{{0, n}},
If[#[[-1]] =!= n,
Append[#, n],
#
] &@Range[0, n, chunkSize]
];
Do[c += countHits[c1, c2, m], {c1, chunks}, {c2, chunks}];
(4.*c)/m
]


If you have lots of memory just do it straight out (but you probably don't have enough). Here's that test:



goodForMathematica[10000] // AbsoluteTiming

{3.37823, 3.14199}


We increased the count by an order of magnitude but had similar performance. Note that this algorithm has trash scaling, so we can expect that n = 20000 will take ~12s because we've got four blocks to work with:



1000000 will therefore take about ~35000s or ~10 hours. Not gonna actually do it.



Finally, here's probably the first thing to think of, which is to just use Compile:



compiledVersion =
Compile[{{n, _Integer}},
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
],
RuntimeOptions -> "Speed",
CompilationTarget -> "C"
];

compiledVersion[100000] // AbsoluteTiming

{14.8691, 3.14163}


And this blows everything else out of the water (notice I use 100000 per grid subdivision), but that makes sense, because we're really using C, not Mathematica.






share|improve this answer









$endgroup$













  • $begingroup$
    There's no need to put i and j into Module: Do already localizes their names.
    $endgroup$
    – Ruslan
    Dec 26 '18 at 20:04










  • $begingroup$
    @Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 20:06










  • $begingroup$
    Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task?
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 21:40
















6












6








6





$begingroup$

Here's another comparison. First let's look at the raw, inefficient version that someone who comes to Mathematica from another language might try:



badForMathematica[n_] :=
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
]

badForMathematica[1000] // AbsoluteTiming

{3.07878, 3.14552}


Eeesh. But this is the dumb way. Here's a new approach. First we take the your algorithm and realize you're just subdividing and counting hits inside the circle. We can do this much faster using vectorized operations. Here's a way to get all the hits:



countHits[{i1_, i2_}, {j1_, j2_}, m_] :=

With[{r1 = Range[i1, i2]^2, r2 = Range[j1, j2]^2},
Total@UnitStep[m - Join @@ Outer[Plus, r1, r2]]
];


We generate the Outer product you're really generating, then totaling the counts. Simple as that. Now we cook this into a larger algorithm where we work in chunks because I don't have enough memory to work all at once:



goodForMathematica[n_, chunkSize_: 1000] :=
Module[{m, c = 0, chunks},
m = n^2;
chunks =
If[chunkSize >= n,
{{0, n}},
If[#[[-1]] =!= n,
Append[#, n],
#
] &@Range[0, n, chunkSize]
];
Do[c += countHits[c1, c2, m], {c1, chunks}, {c2, chunks}];
(4.*c)/m
]


If you have lots of memory just do it straight out (but you probably don't have enough). Here's that test:



goodForMathematica[10000] // AbsoluteTiming

{3.37823, 3.14199}


We increased the count by an order of magnitude but had similar performance. Note that this algorithm has trash scaling, so we can expect that n = 20000 will take ~12s because we've got four blocks to work with:



1000000 will therefore take about ~35000s or ~10 hours. Not gonna actually do it.



Finally, here's probably the first thing to think of, which is to just use Compile:



compiledVersion =
Compile[{{n, _Integer}},
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
],
RuntimeOptions -> "Speed",
CompilationTarget -> "C"
];

compiledVersion[100000] // AbsoluteTiming

{14.8691, 3.14163}


And this blows everything else out of the water (notice I use 100000 per grid subdivision), but that makes sense, because we're really using C, not Mathematica.






share|improve this answer









$endgroup$



Here's another comparison. First let's look at the raw, inefficient version that someone who comes to Mathematica from another language might try:



badForMathematica[n_] :=
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
]

badForMathematica[1000] // AbsoluteTiming

{3.07878, 3.14552}


Eeesh. But this is the dumb way. Here's a new approach. First we take the your algorithm and realize you're just subdividing and counting hits inside the circle. We can do this much faster using vectorized operations. Here's a way to get all the hits:



countHits[{i1_, i2_}, {j1_, j2_}, m_] :=

With[{r1 = Range[i1, i2]^2, r2 = Range[j1, j2]^2},
Total@UnitStep[m - Join @@ Outer[Plus, r1, r2]]
];


We generate the Outer product you're really generating, then totaling the counts. Simple as that. Now we cook this into a larger algorithm where we work in chunks because I don't have enough memory to work all at once:



goodForMathematica[n_, chunkSize_: 1000] :=
Module[{m, c = 0, chunks},
m = n^2;
chunks =
If[chunkSize >= n,
{{0, n}},
If[#[[-1]] =!= n,
Append[#, n],
#
] &@Range[0, n, chunkSize]
];
Do[c += countHits[c1, c2, m], {c1, chunks}, {c2, chunks}];
(4.*c)/m
]


If you have lots of memory just do it straight out (but you probably don't have enough). Here's that test:



goodForMathematica[10000] // AbsoluteTiming

{3.37823, 3.14199}


We increased the count by an order of magnitude but had similar performance. Note that this algorithm has trash scaling, so we can expect that n = 20000 will take ~12s because we've got four blocks to work with:



1000000 will therefore take about ~35000s or ~10 hours. Not gonna actually do it.



Finally, here's probably the first thing to think of, which is to just use Compile:



compiledVersion =
Compile[{{n, _Integer}},
Module[{i, j, c = 0., m = n^2},
Do[
If[(i^2 + j^2)/m < 1,
c = c + 1
],
{i, 0, n - 1},
{j, 0, n - 1}
];
(4*c)/m
],
RuntimeOptions -> "Speed",
CompilationTarget -> "C"
];

compiledVersion[100000] // AbsoluteTiming

{14.8691, 3.14163}


And this blows everything else out of the water (notice I use 100000 per grid subdivision), but that makes sense, because we're really using C, not Mathematica.







share|improve this answer












share|improve this answer



share|improve this answer










answered Dec 26 '18 at 20:03









b3m2a1b3m2a1

27.8k357161




27.8k357161












  • $begingroup$
    There's no need to put i and j into Module: Do already localizes their names.
    $endgroup$
    – Ruslan
    Dec 26 '18 at 20:04










  • $begingroup$
    @Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 20:06










  • $begingroup$
    Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task?
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 21:40




















  • $begingroup$
    There's no need to put i and j into Module: Do already localizes their names.
    $endgroup$
    – Ruslan
    Dec 26 '18 at 20:04










  • $begingroup$
    @Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later.
    $endgroup$
    – b3m2a1
    Dec 26 '18 at 20:06










  • $begingroup$
    Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task?
    $endgroup$
    – dimachaerus
    Dec 26 '18 at 21:40


















$begingroup$
There's no need to put i and j into Module: Do already localizes their names.
$endgroup$
– Ruslan
Dec 26 '18 at 20:04




$begingroup$
There's no need to put i and j into Module: Do already localizes their names.
$endgroup$
– Ruslan
Dec 26 '18 at 20:04












$begingroup$
@Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later.
$endgroup$
– b3m2a1
Dec 26 '18 at 20:06




$begingroup$
@Ruslan ah yeah at first I was just directly copying the OPs For structure but then got annoyed with it. I'll prune it sometime later.
$endgroup$
– b3m2a1
Dec 26 '18 at 20:06












$begingroup$
Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task?
$endgroup$
– dimachaerus
Dec 26 '18 at 21:40






$begingroup$
Excellent answer! I noticed that the "i" outer loop can be further subdivided into n separate smaller loops to utilize the n different cores of my processor and kernels of Mathematica 11.3 for parallel processing. Any idea how to optimize code for this task?
$endgroup$
– dimachaerus
Dec 26 '18 at 21:40




















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematica Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathematica.stackexchange.com%2fquestions%2f188423%2fmathematica-code-execution-speed-testing%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Bressuire

Cabo Verde

Gyllenstierna