Increase performance creating Mandelbrot set in pythonImproving the speed of creation for three Perlin Noise Maps in Python?Mean shift image processing algorithm for color segmentationFastest way to count non-zero pixels using Python and PillowComparing pixels against RGB value in NumPyStreaming floating point data in Python2Genetic Sequence Visualizer - Generating large imagesImage template matching algorithm from scratchDetecting a certain amount of violet in an image - follow-upDrawing the Mandelbrot with multiple threads in rustApplying a filter to a large array with elements that are not regularly spacedASCII Mandelbrot

How do I extract a value from a time formatted value in excel?

What is paid subscription needed for in Mortal Kombat 11?

How to check is there any negative term in a large list?

Trouble understanding the speech of overseas colleagues

What can we do to stop prior company from asking us questions?

Gears on left are inverse to gears on right?

Integer addition + constant, is it a group?

Lay out the Carpet

Two monoidal structures and copowering

Are student evaluations of teaching assistants read by others in the faculty?

Do all network devices need to make routing decisions, regardless of communication across networks or within a network?

Balance Issues for a Custom Sorcerer Variant

Is this apparent Class Action settlement a spam message?

Proof of work - lottery approach

Short story about space worker geeks who zone out by 'listening' to radiation from stars

Did Dumbledore lie to Harry about how long he had James Potter's invisibility cloak when he was examining it? If so, why?

How to pronounce the slash sign

How can we prove that any integral in the set of non-elementary integrals cannot be expressed in the form of elementary functions?

How to run a prison with the smallest amount of guards?

How does it work when somebody invests in my business?

Why escape if the_content isnt?

Why didn't Theresa May consult with Parliament before negotiating a deal with the EU?

How to draw lines on a tikz-cd diagram

Failed to fetch jessie backports repository



Increase performance creating Mandelbrot set in python


Improving the speed of creation for three Perlin Noise Maps in Python?Mean shift image processing algorithm for color segmentationFastest way to count non-zero pixels using Python and PillowComparing pixels against RGB value in NumPyStreaming floating point data in Python2Genetic Sequence Visualizer - Generating large imagesImage template matching algorithm from scratchDetecting a certain amount of violet in an image - follow-upDrawing the Mandelbrot with multiple threads in rustApplying a filter to a large array with elements that are not regularly spacedASCII Mandelbrot













26












$begingroup$


I created a program in python that generates an image of the mandelbrot set. The only problem I have is that the program is quite slow, it takes about a quarter of an hour to generate the following image of 2000 by 3000 pixels:



enter image description here



I first created a matrix of complex numbers using numpy according to amount of pixels. I also created an array for the image generation.



import numpy as np
from PIL import Image

z = 0

real_axis = np.linspace(-2,1,num=3000)
imaginary_axis = np.linspace(1,-1,num=2000)

complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]

pixel_grid = np.zeros((2000,3000,3),dtype=np.uint8)


Then I check whether each complex number is in the mandelbrot set or not and give it an RGB colour code accordingly.



for complex_list in complex_grid:
for complex_number in complex_list:
for iteration in range(255):
z = z**2 + complex_number
if (z.real**2+z.imag**2)**0.5 > 2:
pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
break
else:
continue
z = 0


Finally I generate the image using the PIL library.



mandelbrot = Image.fromarray(pixel_grid)
mandelbrot.save("mandelbrot.png")


I am using jupyter notebook and python 3. Hopefully some of you can help me improve the performance of this program or other aspects of it.










share|improve this question











$endgroup$







  • 1




    $begingroup$
    If you want to skip points that are known to be inside the Mandelbrot set, the main cardioid and period bulbs can be skipped. If you skip processing points contained within the main cardioid and the main disk, you can significantly speed up the program. See also this resource for a further analysis of the main cardioid and disk. This optimization becomes less effective as you zoom in on the edges and becomes completely ineffective if these regions are off-screen.
    $endgroup$
    – Cornstalks
    yesterday











  • $begingroup$
    Please put spaces around your operators and after commas.
    $endgroup$
    – Almo
    6 hours ago















26












$begingroup$


I created a program in python that generates an image of the mandelbrot set. The only problem I have is that the program is quite slow, it takes about a quarter of an hour to generate the following image of 2000 by 3000 pixels:



enter image description here



I first created a matrix of complex numbers using numpy according to amount of pixels. I also created an array for the image generation.



import numpy as np
from PIL import Image

z = 0

real_axis = np.linspace(-2,1,num=3000)
imaginary_axis = np.linspace(1,-1,num=2000)

complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]

pixel_grid = np.zeros((2000,3000,3),dtype=np.uint8)


Then I check whether each complex number is in the mandelbrot set or not and give it an RGB colour code accordingly.



for complex_list in complex_grid:
for complex_number in complex_list:
for iteration in range(255):
z = z**2 + complex_number
if (z.real**2+z.imag**2)**0.5 > 2:
pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
break
else:
continue
z = 0


Finally I generate the image using the PIL library.



mandelbrot = Image.fromarray(pixel_grid)
mandelbrot.save("mandelbrot.png")


I am using jupyter notebook and python 3. Hopefully some of you can help me improve the performance of this program or other aspects of it.










share|improve this question











$endgroup$







  • 1




    $begingroup$
    If you want to skip points that are known to be inside the Mandelbrot set, the main cardioid and period bulbs can be skipped. If you skip processing points contained within the main cardioid and the main disk, you can significantly speed up the program. See also this resource for a further analysis of the main cardioid and disk. This optimization becomes less effective as you zoom in on the edges and becomes completely ineffective if these regions are off-screen.
    $endgroup$
    – Cornstalks
    yesterday











  • $begingroup$
    Please put spaces around your operators and after commas.
    $endgroup$
    – Almo
    6 hours ago













26












26








26


3



$begingroup$


I created a program in python that generates an image of the mandelbrot set. The only problem I have is that the program is quite slow, it takes about a quarter of an hour to generate the following image of 2000 by 3000 pixels:



enter image description here



I first created a matrix of complex numbers using numpy according to amount of pixels. I also created an array for the image generation.



import numpy as np
from PIL import Image

z = 0

real_axis = np.linspace(-2,1,num=3000)
imaginary_axis = np.linspace(1,-1,num=2000)

complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]

pixel_grid = np.zeros((2000,3000,3),dtype=np.uint8)


Then I check whether each complex number is in the mandelbrot set or not and give it an RGB colour code accordingly.



for complex_list in complex_grid:
for complex_number in complex_list:
for iteration in range(255):
z = z**2 + complex_number
if (z.real**2+z.imag**2)**0.5 > 2:
pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
break
else:
continue
z = 0


Finally I generate the image using the PIL library.



mandelbrot = Image.fromarray(pixel_grid)
mandelbrot.save("mandelbrot.png")


I am using jupyter notebook and python 3. Hopefully some of you can help me improve the performance of this program or other aspects of it.










share|improve this question











$endgroup$




I created a program in python that generates an image of the mandelbrot set. The only problem I have is that the program is quite slow, it takes about a quarter of an hour to generate the following image of 2000 by 3000 pixels:



enter image description here



I first created a matrix of complex numbers using numpy according to amount of pixels. I also created an array for the image generation.



import numpy as np
from PIL import Image

z = 0

real_axis = np.linspace(-2,1,num=3000)
imaginary_axis = np.linspace(1,-1,num=2000)

complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]

pixel_grid = np.zeros((2000,3000,3),dtype=np.uint8)


Then I check whether each complex number is in the mandelbrot set or not and give it an RGB colour code accordingly.



for complex_list in complex_grid:
for complex_number in complex_list:
for iteration in range(255):
z = z**2 + complex_number
if (z.real**2+z.imag**2)**0.5 > 2:
pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
break
else:
continue
z = 0


Finally I generate the image using the PIL library.



mandelbrot = Image.fromarray(pixel_grid)
mandelbrot.save("mandelbrot.png")


I am using jupyter notebook and python 3. Hopefully some of you can help me improve the performance of this program or other aspects of it.







python performance python-3.x image fractals






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 10 hours ago









Peilonrayz

26.3k338110




26.3k338110










asked yesterday









IanIan

18416




18416







  • 1




    $begingroup$
    If you want to skip points that are known to be inside the Mandelbrot set, the main cardioid and period bulbs can be skipped. If you skip processing points contained within the main cardioid and the main disk, you can significantly speed up the program. See also this resource for a further analysis of the main cardioid and disk. This optimization becomes less effective as you zoom in on the edges and becomes completely ineffective if these regions are off-screen.
    $endgroup$
    – Cornstalks
    yesterday











  • $begingroup$
    Please put spaces around your operators and after commas.
    $endgroup$
    – Almo
    6 hours ago












  • 1




    $begingroup$
    If you want to skip points that are known to be inside the Mandelbrot set, the main cardioid and period bulbs can be skipped. If you skip processing points contained within the main cardioid and the main disk, you can significantly speed up the program. See also this resource for a further analysis of the main cardioid and disk. This optimization becomes less effective as you zoom in on the edges and becomes completely ineffective if these regions are off-screen.
    $endgroup$
    – Cornstalks
    yesterday











  • $begingroup$
    Please put spaces around your operators and after commas.
    $endgroup$
    – Almo
    6 hours ago







1




1




$begingroup$
If you want to skip points that are known to be inside the Mandelbrot set, the main cardioid and period bulbs can be skipped. If you skip processing points contained within the main cardioid and the main disk, you can significantly speed up the program. See also this resource for a further analysis of the main cardioid and disk. This optimization becomes less effective as you zoom in on the edges and becomes completely ineffective if these regions are off-screen.
$endgroup$
– Cornstalks
yesterday





$begingroup$
If you want to skip points that are known to be inside the Mandelbrot set, the main cardioid and period bulbs can be skipped. If you skip processing points contained within the main cardioid and the main disk, you can significantly speed up the program. See also this resource for a further analysis of the main cardioid and disk. This optimization becomes less effective as you zoom in on the edges and becomes completely ineffective if these regions are off-screen.
$endgroup$
– Cornstalks
yesterday













$begingroup$
Please put spaces around your operators and after commas.
$endgroup$
– Almo
6 hours ago




$begingroup$
Please put spaces around your operators and after commas.
$endgroup$
– Almo
6 hours ago










5 Answers
5






active

oldest

votes


















33












$begingroup$

I'm going to reuse some parts of the answer I recently posted here on Code Review.



Losing your Loops




(Most) loops are damn slow in Python. Especially multiple nested loops.



NumPy can help to vectorize your code, i.e. in this case that more
of the looping is done in the C backend instead of in the Python
interpreter. I would highly recommend to have a listen to the talk
Losing your Loops: Fast Numerical Computing with NumPy by Jake
VanderPlas.




All those loops used to generate the complex grid followed by the nested loops used to iterate over the grid and the image are slow when left to the Python interpreter. Fortunately, NumPy can take quite a lot of this burden off of you.



For example



real_axis = np.linspace(-2, 1, num=3000)
imaginary_axis = np.linspace(1, -1, num=2000)
complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]


could become



n_rows, n_cols = 2000, 3000
complex_grid_np = np.zeros((n_rows, n_cols), dtype=np.complex)
real, imag = np.meshgrid(real_axis, imaginary_axis)
complex_grid_np.real = real
complex_grid_np.imag = imag


No loops, just plain simple NumPy.



Same goes for



for complex_list in complex_grid:
for complex_number in complex_list:
for iteration in range(255):
z = z**2 + complex_number
if (z.real**2+z.imag**2)**0.5 > 2:
pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
break
else:
continue
z = 0


can be transformed to



z_grid_np = np.zeros_like(complex_grid_np)
elements_todo = np.ones((n_rows, n_cols), dtype=bool)
for iteration in range(255):
z_grid_np[elements_todo] =
z_grid_np[elements_todo]**2 + complex_grid_np[elements_todo]
mask = np.logical_and(np.absolute(z_grid_np) > 2, elements_todo)
pixel_grid_np[mask, :] = (iteration, iteration, iteration)
elements_todo = np.logical_and(elements_todo, np.logical_not(mask))


which is just a single loop instead of three nested ones. Here, a little more trickery was needed to treat the break case the same way as you did. elements_todo is used to only compute updates on the z value if it has not been marked as done. There might also be a better solution without this.



I added the following lines



complex_grid_close = np.allclose(np.array(complex_grid), complex_grid_np)
pixel_grid_close = np.allclose(pixel_grid, pixel_grid_np)
print("Results were similar: ".format(all((complex_grid_close, pixel_grid_close))))


to validate my results against your reference implementation.



The vectorized code is about 9-10x faster on my machine for several n_rows/n_cols combinations I tested. E.g. for n_rows, n_cols = 1000, 1500:



Looped generation took 61.989842s
Vectorized generation took 6.656926s
Results were similar: True



Edit/Appendix: Further timings



Including the "no square root" optimization as suggested by trichoplax in his answer and Peter Cordes in the comments like so



mask = np.logical_and((z_grid_np.real**2+z_grid_np.imag**2) > 4, elements_todo)


will give you about another second and a half for n_rows, n_cols = 1000, 1500, i.e. about 12x the speed of the original solution



10 loops, best of 5: 4.98 s per loop


A quick implementation of Reinderien et al. hint towards the symmetry of the Mandelbrot set will again add a factor of about two to that (~24x the speed).



10 loops, best of 5: 2.54 s per loop


However, my quick hacking approach did not lead to an output that was within the tolerance of np.allclose compared to the original one. Funnily, it seems to be off by one at a single pixel, but visually still the same. Since this post is already quite long, I will leave the reimplementation as an exercise to the reader.






share|improve this answer











$endgroup$












  • $begingroup$
    I have some trouble understanding the numpy based code you posted, possibly because there are a few new commands, but I'll try to have a go at vectorized programming!
    $endgroup$
    – Ian
    18 hours ago










  • $begingroup$
    Could you narrow down what part of code/what commands are causing you headache? I would then add further details where needed.
    $endgroup$
    – Alex
    17 hours ago










  • $begingroup$
    np.absolute(z_grid_np) should be replaced with something that avoids the sqrt in the sqrt(real**2 + imag**2) formula it uses, unless Python can already do that optimization when comparing against a known-positive constant like 4. IDK if it's worth considering having separate real and imag arrays and doing the complex stuff manually. If it actually loops over your data for each step then that would reduce computational intensity (less work done per time your data is loaded / stored from memory).
    $endgroup$
    – Peter Cordes
    17 hours ago






  • 2




    $begingroup$
    Also worth looking at cache-blocking, if w * h times 16 bytes per complex-double is more than 32k (L1d size) or 256k (typical L2 cache size): loop over a fraction of the whole array repeatedly. (e.g. 1500 * 1000 * 16 = 24MB, which only fits on L3 cache on a big Xeon, or not at all on a normal desktop CPU.) 1500*16B = 24kB, so looping repeatedly over 1 row might be a win. (Or as other answers point out, different regions of the problem have different typical iteration counts, so working in square tiles might let you stop after a couple iterations when all pixels hit |m| > 4
    $endgroup$
    – Peter Cordes
    16 hours ago






  • 1




    $begingroup$
    This gets fairly involved and is IMHO clearly out of focus for this kind of code review and the experience the OP seems to have. Maybe we can continue the discussion somewhere else?
    $endgroup$
    – Alex
    16 hours ago


















15












$begingroup$

This will cover performance, as well as Python style.



Save constants in one place



You currently have the magic numbers 2000 and 3000, the resolution of your image. Save these to variables perhaps named X, Y or W, H.



Mention your requirements



You don't just rely on Python 3 and Jupyter - you rely on numpy and pillow. These should go in a requirements.txt if you don't already have one.



Don't save your complex grid



At all. complex_number should be formed dynamically in the loop based on range expressions.



Disclaimer: if you're vectorizing (which you should do), then the opposite applies - you would keep your complex grid, and lose some loops.



Don't use index lookups



You're using index to get your coordinates. Don't do this - form the coordinates in your loops, as well.



Mandelbrot is symmetrical



Notice that it's mirror-imaged. This means you can halve your computation time and save every pixel to the top and bottom half.



In a bit I'll show some example code accommodating all of the suggestions above. Just do (nearly) what @Alex says and I'd gotten halfway through implementing, with one difference: accommodate the symmetry optimization I described.






share|improve this answer











$endgroup$




















    7












    $begingroup$

    Mandelbrot-specific optimisations



    These can be combined with the Python-specific optimisations from the other answers.



    Avoid the redundant square root



    if (z.real**2+z.imag**2)**0.5 > 2:


    is equivalent to



    if z.real ** 2 + z.imag ** 2 > 4:


    (simply square both sides of the original comparison to get the optimised comparison)



    Avoid squaring unless you are using it



    Any points that get further than 2 from the origin will continue to escape towards infinity. So it isn't important whether you check that a point has gone outside a circle of radius 2, or that it has gone outside some other finite shape that fully contains that circle. For example, checking that the point is outside a square instead of a circle avoids having to square the real and imaginary parts. It also means you will need slightly more iterations, but very few and this should be outweighed by having each iteration faster.



    For example:



    if (z.real**2+z.imag**2)**0.5 > 2: # if z is outside the circle


    could be replaced by



    if not (-2 < z.real < 2 and -2 < z.imag < 2): # if z is outside the square


    The exception to this suggestion is if the circle is important to your output. If you simply plot points inside the set as black, and points outside the set as white, then the image will be identical with either approach. However, if you count the number of iterations a point takes to escape and use this to determine the colour of points outside the set, then the shape of the stripes of colour will be different with a square boundary than with a circular boundary. The interior of the set will be identical, but the colours outside will be arranged in different shapes.



    In your example image, not much is visible of the stripes of colour, with most of the exterior and interior being black. In this case I doubt there would be a significant difference in appearance using this optimisation. However, if you change to displaying wider stripes in future, this optimisation may need to be removed (depending on what appearance you want).



    Hard-code as much of the interior as you can



    The interior of the set takes far longer to calculate than the exterior. Each pixel in the interior takes a guaranteed 255 iterations (or more if you increase the maximum iterations for even higher quality images), whereas each pixel in the exterior takes less than this. The vast majority of the exterior pixels take only a few iterations.



    If you want the code to be adaptable for zooming in to arbitrary positions, then you won't know in advance which parts of the image are going to be interior points. However, if you only want this code to generate this one image of the whole set, then you can get a significant improvement in speed by avoiding calculating pixels you know are interior. For example, if you check whether the pixel is in the main cardioid or one of the large circles, you can assign all those pixels an iteration count of 255 without actually doing any iteration. The more you increase the maximum iterations, the more circles it will be worthwhile excluding in advance, as the difference in calculation time between the average exterior pixel and the average interior pixel will continue to diverge dramatically.



    I don't know the exact centres and radii of these circles, or an exact equation for the cardioid, but rough estimates that are chosen to not overlap the exterior will still make a big difference to the speed. Even excluding some squares chosen by eye that are entirely in the interior would help.






    share|improve this answer











    $endgroup$




















      3












      $begingroup$

      I'm not a python expert. I am pretty good with Mandlebrot generation (I've spent a lot of time on my custom Julia Set generator.)



      So I'll say this: optimize the heck out of stuff that will be running many iterations. Forget about clean-code or nice OOP principles. For lots-of-iterations stuff like this, you want as nitty gritty as possible.



      So let's take a look at your interior-most loop:



      z = z**2 + complex_number
      if (z.real**2+z.imag**2)**0.5 > 2:
      pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
      break
      else:
      continue


      Imagine what's happening behind the scenes in memory with just that very first line. You've got an instance of a complex number. You want to square it... so it has to create another instance of a complex object to hold the squared value. Then, you're adding another complex number to it - which means you're creating another instance of Complex to hold the result of the addition.



      You're creating object instances left and right, and you're doing it on an order of 3000 x 2000 x 255 times. Creating several class instances doesn't sound like much, but when you're doing it a billion times, it kinda drags things down.



      Compare that with pseudocode like:



      px = num.real
      py = num.imag
      while
      tmppx = px
      px = px * px - py * py + num.real
      py = 2 * tmppx * py + num.imag
      if condition-for-hitting-escape
      stuff
      if condition-for-hitting-max-iter
      moreStuff


      No objects are getting created and destroyed. It's boiled down to be as efficient as possible. It's not as nice looking... but when you're doing something a billion times, shaving off even a millionth of a second results in saving 15 minutes.



      And as someone else mentioned, you want to simplify the logic so that you don't have to do the square-root operation - and if you're okay with small variations in how the gradient is colored, changing the "magnitude" check with "are x or y within a bounding box".



      Aka, the more things you can remove out of that runs-a-billion-times loop, the better off you'll be.






      share|improve this answer









      $endgroup$




















        2












        $begingroup$

        There are a few tricks you can use to make a Mandelbrot renderer really fly.



        Detect cycles



        If a point lies inside the Mandelbrot set, successive iterations will cause it to decay into a cycle. The most economical way to detect this, I have found, is to do x iterations, test to see if it is the same as before, then increment x and repeat.



        Draw a half resolution version first



        That's a 1000x1500 image in your case. Calculate it such that each pixel represents a pixel in the real image. Then if a pixel is entirely surrounded by other pixels with the same iteration count, you can assume that it also has that iteration count and skip calculating it.



        This technique can miss fine strands, but it saves an enormous amount of time. You should also use a flood fill style algorithm whenever you calculate an unskippable pixel to find other pixels that may previously have been considered skippable but aren't. This should fix most of the problems.



        Also note that this is recursive. Before calculating the 1000x1500 version you should calculate a 500x750 version, before that a 250x375 version etc.



        The SuperFractalThing trick



        If you want to calculate deep fractals, you need to use high precision, which can be a huge drain on calculation time. However, strictly speaking you only need to use high precision for one pixel.



        We start from position $p_0$, and we follow the usual iterative formula:



        $p_x+1=p_x^2+p_0$



        We record all the values of $p_x$ as regular, double precision complex numbers. Now we calculate $q$, but we do it by calculating $d$, where $d_x=q_x-p_x$:



        $d_x+1 = 2d_xp_x + d_x^2 + (q_0-p_0)$



        This is a bit more complicated, but we only need to use double precision numbers, so it is much, much faster when deep zooming.



        One issue is that the $p$ sequence has to be at least as long as the $q$ sequence, and we cannot tell the best $p$ sequence in advance. In practice we often have to calculate new $p$ sequences using high precision arithmetic as we discover pixels with a longer escape time.



        Faster languages



        There is no getting around it, Python is slow. NumPy can do the heavy lifting, which can speed it up dramatically, but it's pretty awkward compared to the same code written in C. I suggest learning to use Ctypes and writing a small C library to follow the iterative formula.






        share|improve this answer











        $endgroup$












          Your Answer





          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216235%2fincrease-performance-creating-mandelbrot-set-in-python%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          5 Answers
          5






          active

          oldest

          votes








          5 Answers
          5






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          33












          $begingroup$

          I'm going to reuse some parts of the answer I recently posted here on Code Review.



          Losing your Loops




          (Most) loops are damn slow in Python. Especially multiple nested loops.



          NumPy can help to vectorize your code, i.e. in this case that more
          of the looping is done in the C backend instead of in the Python
          interpreter. I would highly recommend to have a listen to the talk
          Losing your Loops: Fast Numerical Computing with NumPy by Jake
          VanderPlas.




          All those loops used to generate the complex grid followed by the nested loops used to iterate over the grid and the image are slow when left to the Python interpreter. Fortunately, NumPy can take quite a lot of this burden off of you.



          For example



          real_axis = np.linspace(-2, 1, num=3000)
          imaginary_axis = np.linspace(1, -1, num=2000)
          complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]


          could become



          n_rows, n_cols = 2000, 3000
          complex_grid_np = np.zeros((n_rows, n_cols), dtype=np.complex)
          real, imag = np.meshgrid(real_axis, imaginary_axis)
          complex_grid_np.real = real
          complex_grid_np.imag = imag


          No loops, just plain simple NumPy.



          Same goes for



          for complex_list in complex_grid:
          for complex_number in complex_list:
          for iteration in range(255):
          z = z**2 + complex_number
          if (z.real**2+z.imag**2)**0.5 > 2:
          pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
          break
          else:
          continue
          z = 0


          can be transformed to



          z_grid_np = np.zeros_like(complex_grid_np)
          elements_todo = np.ones((n_rows, n_cols), dtype=bool)
          for iteration in range(255):
          z_grid_np[elements_todo] =
          z_grid_np[elements_todo]**2 + complex_grid_np[elements_todo]
          mask = np.logical_and(np.absolute(z_grid_np) > 2, elements_todo)
          pixel_grid_np[mask, :] = (iteration, iteration, iteration)
          elements_todo = np.logical_and(elements_todo, np.logical_not(mask))


          which is just a single loop instead of three nested ones. Here, a little more trickery was needed to treat the break case the same way as you did. elements_todo is used to only compute updates on the z value if it has not been marked as done. There might also be a better solution without this.



          I added the following lines



          complex_grid_close = np.allclose(np.array(complex_grid), complex_grid_np)
          pixel_grid_close = np.allclose(pixel_grid, pixel_grid_np)
          print("Results were similar: ".format(all((complex_grid_close, pixel_grid_close))))


          to validate my results against your reference implementation.



          The vectorized code is about 9-10x faster on my machine for several n_rows/n_cols combinations I tested. E.g. for n_rows, n_cols = 1000, 1500:



          Looped generation took 61.989842s
          Vectorized generation took 6.656926s
          Results were similar: True



          Edit/Appendix: Further timings



          Including the "no square root" optimization as suggested by trichoplax in his answer and Peter Cordes in the comments like so



          mask = np.logical_and((z_grid_np.real**2+z_grid_np.imag**2) > 4, elements_todo)


          will give you about another second and a half for n_rows, n_cols = 1000, 1500, i.e. about 12x the speed of the original solution



          10 loops, best of 5: 4.98 s per loop


          A quick implementation of Reinderien et al. hint towards the symmetry of the Mandelbrot set will again add a factor of about two to that (~24x the speed).



          10 loops, best of 5: 2.54 s per loop


          However, my quick hacking approach did not lead to an output that was within the tolerance of np.allclose compared to the original one. Funnily, it seems to be off by one at a single pixel, but visually still the same. Since this post is already quite long, I will leave the reimplementation as an exercise to the reader.






          share|improve this answer











          $endgroup$












          • $begingroup$
            I have some trouble understanding the numpy based code you posted, possibly because there are a few new commands, but I'll try to have a go at vectorized programming!
            $endgroup$
            – Ian
            18 hours ago










          • $begingroup$
            Could you narrow down what part of code/what commands are causing you headache? I would then add further details where needed.
            $endgroup$
            – Alex
            17 hours ago










          • $begingroup$
            np.absolute(z_grid_np) should be replaced with something that avoids the sqrt in the sqrt(real**2 + imag**2) formula it uses, unless Python can already do that optimization when comparing against a known-positive constant like 4. IDK if it's worth considering having separate real and imag arrays and doing the complex stuff manually. If it actually loops over your data for each step then that would reduce computational intensity (less work done per time your data is loaded / stored from memory).
            $endgroup$
            – Peter Cordes
            17 hours ago






          • 2




            $begingroup$
            Also worth looking at cache-blocking, if w * h times 16 bytes per complex-double is more than 32k (L1d size) or 256k (typical L2 cache size): loop over a fraction of the whole array repeatedly. (e.g. 1500 * 1000 * 16 = 24MB, which only fits on L3 cache on a big Xeon, or not at all on a normal desktop CPU.) 1500*16B = 24kB, so looping repeatedly over 1 row might be a win. (Or as other answers point out, different regions of the problem have different typical iteration counts, so working in square tiles might let you stop after a couple iterations when all pixels hit |m| > 4
            $endgroup$
            – Peter Cordes
            16 hours ago






          • 1




            $begingroup$
            This gets fairly involved and is IMHO clearly out of focus for this kind of code review and the experience the OP seems to have. Maybe we can continue the discussion somewhere else?
            $endgroup$
            – Alex
            16 hours ago















          33












          $begingroup$

          I'm going to reuse some parts of the answer I recently posted here on Code Review.



          Losing your Loops




          (Most) loops are damn slow in Python. Especially multiple nested loops.



          NumPy can help to vectorize your code, i.e. in this case that more
          of the looping is done in the C backend instead of in the Python
          interpreter. I would highly recommend to have a listen to the talk
          Losing your Loops: Fast Numerical Computing with NumPy by Jake
          VanderPlas.




          All those loops used to generate the complex grid followed by the nested loops used to iterate over the grid and the image are slow when left to the Python interpreter. Fortunately, NumPy can take quite a lot of this burden off of you.



          For example



          real_axis = np.linspace(-2, 1, num=3000)
          imaginary_axis = np.linspace(1, -1, num=2000)
          complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]


          could become



          n_rows, n_cols = 2000, 3000
          complex_grid_np = np.zeros((n_rows, n_cols), dtype=np.complex)
          real, imag = np.meshgrid(real_axis, imaginary_axis)
          complex_grid_np.real = real
          complex_grid_np.imag = imag


          No loops, just plain simple NumPy.



          Same goes for



          for complex_list in complex_grid:
          for complex_number in complex_list:
          for iteration in range(255):
          z = z**2 + complex_number
          if (z.real**2+z.imag**2)**0.5 > 2:
          pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
          break
          else:
          continue
          z = 0


          can be transformed to



          z_grid_np = np.zeros_like(complex_grid_np)
          elements_todo = np.ones((n_rows, n_cols), dtype=bool)
          for iteration in range(255):
          z_grid_np[elements_todo] =
          z_grid_np[elements_todo]**2 + complex_grid_np[elements_todo]
          mask = np.logical_and(np.absolute(z_grid_np) > 2, elements_todo)
          pixel_grid_np[mask, :] = (iteration, iteration, iteration)
          elements_todo = np.logical_and(elements_todo, np.logical_not(mask))


          which is just a single loop instead of three nested ones. Here, a little more trickery was needed to treat the break case the same way as you did. elements_todo is used to only compute updates on the z value if it has not been marked as done. There might also be a better solution without this.



          I added the following lines



          complex_grid_close = np.allclose(np.array(complex_grid), complex_grid_np)
          pixel_grid_close = np.allclose(pixel_grid, pixel_grid_np)
          print("Results were similar: ".format(all((complex_grid_close, pixel_grid_close))))


          to validate my results against your reference implementation.



          The vectorized code is about 9-10x faster on my machine for several n_rows/n_cols combinations I tested. E.g. for n_rows, n_cols = 1000, 1500:



          Looped generation took 61.989842s
          Vectorized generation took 6.656926s
          Results were similar: True



          Edit/Appendix: Further timings



          Including the "no square root" optimization as suggested by trichoplax in his answer and Peter Cordes in the comments like so



          mask = np.logical_and((z_grid_np.real**2+z_grid_np.imag**2) > 4, elements_todo)


          will give you about another second and a half for n_rows, n_cols = 1000, 1500, i.e. about 12x the speed of the original solution



          10 loops, best of 5: 4.98 s per loop


          A quick implementation of Reinderien et al. hint towards the symmetry of the Mandelbrot set will again add a factor of about two to that (~24x the speed).



          10 loops, best of 5: 2.54 s per loop


          However, my quick hacking approach did not lead to an output that was within the tolerance of np.allclose compared to the original one. Funnily, it seems to be off by one at a single pixel, but visually still the same. Since this post is already quite long, I will leave the reimplementation as an exercise to the reader.






          share|improve this answer











          $endgroup$












          • $begingroup$
            I have some trouble understanding the numpy based code you posted, possibly because there are a few new commands, but I'll try to have a go at vectorized programming!
            $endgroup$
            – Ian
            18 hours ago










          • $begingroup$
            Could you narrow down what part of code/what commands are causing you headache? I would then add further details where needed.
            $endgroup$
            – Alex
            17 hours ago










          • $begingroup$
            np.absolute(z_grid_np) should be replaced with something that avoids the sqrt in the sqrt(real**2 + imag**2) formula it uses, unless Python can already do that optimization when comparing against a known-positive constant like 4. IDK if it's worth considering having separate real and imag arrays and doing the complex stuff manually. If it actually loops over your data for each step then that would reduce computational intensity (less work done per time your data is loaded / stored from memory).
            $endgroup$
            – Peter Cordes
            17 hours ago






          • 2




            $begingroup$
            Also worth looking at cache-blocking, if w * h times 16 bytes per complex-double is more than 32k (L1d size) or 256k (typical L2 cache size): loop over a fraction of the whole array repeatedly. (e.g. 1500 * 1000 * 16 = 24MB, which only fits on L3 cache on a big Xeon, or not at all on a normal desktop CPU.) 1500*16B = 24kB, so looping repeatedly over 1 row might be a win. (Or as other answers point out, different regions of the problem have different typical iteration counts, so working in square tiles might let you stop after a couple iterations when all pixels hit |m| > 4
            $endgroup$
            – Peter Cordes
            16 hours ago






          • 1




            $begingroup$
            This gets fairly involved and is IMHO clearly out of focus for this kind of code review and the experience the OP seems to have. Maybe we can continue the discussion somewhere else?
            $endgroup$
            – Alex
            16 hours ago













          33












          33








          33





          $begingroup$

          I'm going to reuse some parts of the answer I recently posted here on Code Review.



          Losing your Loops




          (Most) loops are damn slow in Python. Especially multiple nested loops.



          NumPy can help to vectorize your code, i.e. in this case that more
          of the looping is done in the C backend instead of in the Python
          interpreter. I would highly recommend to have a listen to the talk
          Losing your Loops: Fast Numerical Computing with NumPy by Jake
          VanderPlas.




          All those loops used to generate the complex grid followed by the nested loops used to iterate over the grid and the image are slow when left to the Python interpreter. Fortunately, NumPy can take quite a lot of this burden off of you.



          For example



          real_axis = np.linspace(-2, 1, num=3000)
          imaginary_axis = np.linspace(1, -1, num=2000)
          complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]


          could become



          n_rows, n_cols = 2000, 3000
          complex_grid_np = np.zeros((n_rows, n_cols), dtype=np.complex)
          real, imag = np.meshgrid(real_axis, imaginary_axis)
          complex_grid_np.real = real
          complex_grid_np.imag = imag


          No loops, just plain simple NumPy.



          Same goes for



          for complex_list in complex_grid:
          for complex_number in complex_list:
          for iteration in range(255):
          z = z**2 + complex_number
          if (z.real**2+z.imag**2)**0.5 > 2:
          pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
          break
          else:
          continue
          z = 0


          can be transformed to



          z_grid_np = np.zeros_like(complex_grid_np)
          elements_todo = np.ones((n_rows, n_cols), dtype=bool)
          for iteration in range(255):
          z_grid_np[elements_todo] =
          z_grid_np[elements_todo]**2 + complex_grid_np[elements_todo]
          mask = np.logical_and(np.absolute(z_grid_np) > 2, elements_todo)
          pixel_grid_np[mask, :] = (iteration, iteration, iteration)
          elements_todo = np.logical_and(elements_todo, np.logical_not(mask))


          which is just a single loop instead of three nested ones. Here, a little more trickery was needed to treat the break case the same way as you did. elements_todo is used to only compute updates on the z value if it has not been marked as done. There might also be a better solution without this.



          I added the following lines



          complex_grid_close = np.allclose(np.array(complex_grid), complex_grid_np)
          pixel_grid_close = np.allclose(pixel_grid, pixel_grid_np)
          print("Results were similar: ".format(all((complex_grid_close, pixel_grid_close))))


          to validate my results against your reference implementation.



          The vectorized code is about 9-10x faster on my machine for several n_rows/n_cols combinations I tested. E.g. for n_rows, n_cols = 1000, 1500:



          Looped generation took 61.989842s
          Vectorized generation took 6.656926s
          Results were similar: True



          Edit/Appendix: Further timings



          Including the "no square root" optimization as suggested by trichoplax in his answer and Peter Cordes in the comments like so



          mask = np.logical_and((z_grid_np.real**2+z_grid_np.imag**2) > 4, elements_todo)


          will give you about another second and a half for n_rows, n_cols = 1000, 1500, i.e. about 12x the speed of the original solution



          10 loops, best of 5: 4.98 s per loop


          A quick implementation of Reinderien et al. hint towards the symmetry of the Mandelbrot set will again add a factor of about two to that (~24x the speed).



          10 loops, best of 5: 2.54 s per loop


          However, my quick hacking approach did not lead to an output that was within the tolerance of np.allclose compared to the original one. Funnily, it seems to be off by one at a single pixel, but visually still the same. Since this post is already quite long, I will leave the reimplementation as an exercise to the reader.






          share|improve this answer











          $endgroup$



          I'm going to reuse some parts of the answer I recently posted here on Code Review.



          Losing your Loops




          (Most) loops are damn slow in Python. Especially multiple nested loops.



          NumPy can help to vectorize your code, i.e. in this case that more
          of the looping is done in the C backend instead of in the Python
          interpreter. I would highly recommend to have a listen to the talk
          Losing your Loops: Fast Numerical Computing with NumPy by Jake
          VanderPlas.




          All those loops used to generate the complex grid followed by the nested loops used to iterate over the grid and the image are slow when left to the Python interpreter. Fortunately, NumPy can take quite a lot of this burden off of you.



          For example



          real_axis = np.linspace(-2, 1, num=3000)
          imaginary_axis = np.linspace(1, -1, num=2000)
          complex_grid = [[complex(np.float64(a),np.float64(b)) for a in real_axis] for b in imaginary_axis]


          could become



          n_rows, n_cols = 2000, 3000
          complex_grid_np = np.zeros((n_rows, n_cols), dtype=np.complex)
          real, imag = np.meshgrid(real_axis, imaginary_axis)
          complex_grid_np.real = real
          complex_grid_np.imag = imag


          No loops, just plain simple NumPy.



          Same goes for



          for complex_list in complex_grid:
          for complex_number in complex_list:
          for iteration in range(255):
          z = z**2 + complex_number
          if (z.real**2+z.imag**2)**0.5 > 2:
          pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
          break
          else:
          continue
          z = 0


          can be transformed to



          z_grid_np = np.zeros_like(complex_grid_np)
          elements_todo = np.ones((n_rows, n_cols), dtype=bool)
          for iteration in range(255):
          z_grid_np[elements_todo] =
          z_grid_np[elements_todo]**2 + complex_grid_np[elements_todo]
          mask = np.logical_and(np.absolute(z_grid_np) > 2, elements_todo)
          pixel_grid_np[mask, :] = (iteration, iteration, iteration)
          elements_todo = np.logical_and(elements_todo, np.logical_not(mask))


          which is just a single loop instead of three nested ones. Here, a little more trickery was needed to treat the break case the same way as you did. elements_todo is used to only compute updates on the z value if it has not been marked as done. There might also be a better solution without this.



          I added the following lines



          complex_grid_close = np.allclose(np.array(complex_grid), complex_grid_np)
          pixel_grid_close = np.allclose(pixel_grid, pixel_grid_np)
          print("Results were similar: ".format(all((complex_grid_close, pixel_grid_close))))


          to validate my results against your reference implementation.



          The vectorized code is about 9-10x faster on my machine for several n_rows/n_cols combinations I tested. E.g. for n_rows, n_cols = 1000, 1500:



          Looped generation took 61.989842s
          Vectorized generation took 6.656926s
          Results were similar: True



          Edit/Appendix: Further timings



          Including the "no square root" optimization as suggested by trichoplax in his answer and Peter Cordes in the comments like so



          mask = np.logical_and((z_grid_np.real**2+z_grid_np.imag**2) > 4, elements_todo)


          will give you about another second and a half for n_rows, n_cols = 1000, 1500, i.e. about 12x the speed of the original solution



          10 loops, best of 5: 4.98 s per loop


          A quick implementation of Reinderien et al. hint towards the symmetry of the Mandelbrot set will again add a factor of about two to that (~24x the speed).



          10 loops, best of 5: 2.54 s per loop


          However, my quick hacking approach did not lead to an output that was within the tolerance of np.allclose compared to the original one. Funnily, it seems to be off by one at a single pixel, but visually still the same. Since this post is already quite long, I will leave the reimplementation as an exercise to the reader.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 8 hours ago

























          answered yesterday









          AlexAlex

          1,007416




          1,007416











          • $begingroup$
            I have some trouble understanding the numpy based code you posted, possibly because there are a few new commands, but I'll try to have a go at vectorized programming!
            $endgroup$
            – Ian
            18 hours ago










          • $begingroup$
            Could you narrow down what part of code/what commands are causing you headache? I would then add further details where needed.
            $endgroup$
            – Alex
            17 hours ago










          • $begingroup$
            np.absolute(z_grid_np) should be replaced with something that avoids the sqrt in the sqrt(real**2 + imag**2) formula it uses, unless Python can already do that optimization when comparing against a known-positive constant like 4. IDK if it's worth considering having separate real and imag arrays and doing the complex stuff manually. If it actually loops over your data for each step then that would reduce computational intensity (less work done per time your data is loaded / stored from memory).
            $endgroup$
            – Peter Cordes
            17 hours ago






          • 2




            $begingroup$
            Also worth looking at cache-blocking, if w * h times 16 bytes per complex-double is more than 32k (L1d size) or 256k (typical L2 cache size): loop over a fraction of the whole array repeatedly. (e.g. 1500 * 1000 * 16 = 24MB, which only fits on L3 cache on a big Xeon, or not at all on a normal desktop CPU.) 1500*16B = 24kB, so looping repeatedly over 1 row might be a win. (Or as other answers point out, different regions of the problem have different typical iteration counts, so working in square tiles might let you stop after a couple iterations when all pixels hit |m| > 4
            $endgroup$
            – Peter Cordes
            16 hours ago






          • 1




            $begingroup$
            This gets fairly involved and is IMHO clearly out of focus for this kind of code review and the experience the OP seems to have. Maybe we can continue the discussion somewhere else?
            $endgroup$
            – Alex
            16 hours ago
















          • $begingroup$
            I have some trouble understanding the numpy based code you posted, possibly because there are a few new commands, but I'll try to have a go at vectorized programming!
            $endgroup$
            – Ian
            18 hours ago










          • $begingroup$
            Could you narrow down what part of code/what commands are causing you headache? I would then add further details where needed.
            $endgroup$
            – Alex
            17 hours ago










          • $begingroup$
            np.absolute(z_grid_np) should be replaced with something that avoids the sqrt in the sqrt(real**2 + imag**2) formula it uses, unless Python can already do that optimization when comparing against a known-positive constant like 4. IDK if it's worth considering having separate real and imag arrays and doing the complex stuff manually. If it actually loops over your data for each step then that would reduce computational intensity (less work done per time your data is loaded / stored from memory).
            $endgroup$
            – Peter Cordes
            17 hours ago






          • 2




            $begingroup$
            Also worth looking at cache-blocking, if w * h times 16 bytes per complex-double is more than 32k (L1d size) or 256k (typical L2 cache size): loop over a fraction of the whole array repeatedly. (e.g. 1500 * 1000 * 16 = 24MB, which only fits on L3 cache on a big Xeon, or not at all on a normal desktop CPU.) 1500*16B = 24kB, so looping repeatedly over 1 row might be a win. (Or as other answers point out, different regions of the problem have different typical iteration counts, so working in square tiles might let you stop after a couple iterations when all pixels hit |m| > 4
            $endgroup$
            – Peter Cordes
            16 hours ago






          • 1




            $begingroup$
            This gets fairly involved and is IMHO clearly out of focus for this kind of code review and the experience the OP seems to have. Maybe we can continue the discussion somewhere else?
            $endgroup$
            – Alex
            16 hours ago















          $begingroup$
          I have some trouble understanding the numpy based code you posted, possibly because there are a few new commands, but I'll try to have a go at vectorized programming!
          $endgroup$
          – Ian
          18 hours ago




          $begingroup$
          I have some trouble understanding the numpy based code you posted, possibly because there are a few new commands, but I'll try to have a go at vectorized programming!
          $endgroup$
          – Ian
          18 hours ago












          $begingroup$
          Could you narrow down what part of code/what commands are causing you headache? I would then add further details where needed.
          $endgroup$
          – Alex
          17 hours ago




          $begingroup$
          Could you narrow down what part of code/what commands are causing you headache? I would then add further details where needed.
          $endgroup$
          – Alex
          17 hours ago












          $begingroup$
          np.absolute(z_grid_np) should be replaced with something that avoids the sqrt in the sqrt(real**2 + imag**2) formula it uses, unless Python can already do that optimization when comparing against a known-positive constant like 4. IDK if it's worth considering having separate real and imag arrays and doing the complex stuff manually. If it actually loops over your data for each step then that would reduce computational intensity (less work done per time your data is loaded / stored from memory).
          $endgroup$
          – Peter Cordes
          17 hours ago




          $begingroup$
          np.absolute(z_grid_np) should be replaced with something that avoids the sqrt in the sqrt(real**2 + imag**2) formula it uses, unless Python can already do that optimization when comparing against a known-positive constant like 4. IDK if it's worth considering having separate real and imag arrays and doing the complex stuff manually. If it actually loops over your data for each step then that would reduce computational intensity (less work done per time your data is loaded / stored from memory).
          $endgroup$
          – Peter Cordes
          17 hours ago




          2




          2




          $begingroup$
          Also worth looking at cache-blocking, if w * h times 16 bytes per complex-double is more than 32k (L1d size) or 256k (typical L2 cache size): loop over a fraction of the whole array repeatedly. (e.g. 1500 * 1000 * 16 = 24MB, which only fits on L3 cache on a big Xeon, or not at all on a normal desktop CPU.) 1500*16B = 24kB, so looping repeatedly over 1 row might be a win. (Or as other answers point out, different regions of the problem have different typical iteration counts, so working in square tiles might let you stop after a couple iterations when all pixels hit |m| > 4
          $endgroup$
          – Peter Cordes
          16 hours ago




          $begingroup$
          Also worth looking at cache-blocking, if w * h times 16 bytes per complex-double is more than 32k (L1d size) or 256k (typical L2 cache size): loop over a fraction of the whole array repeatedly. (e.g. 1500 * 1000 * 16 = 24MB, which only fits on L3 cache on a big Xeon, or not at all on a normal desktop CPU.) 1500*16B = 24kB, so looping repeatedly over 1 row might be a win. (Or as other answers point out, different regions of the problem have different typical iteration counts, so working in square tiles might let you stop after a couple iterations when all pixels hit |m| > 4
          $endgroup$
          – Peter Cordes
          16 hours ago




          1




          1




          $begingroup$
          This gets fairly involved and is IMHO clearly out of focus for this kind of code review and the experience the OP seems to have. Maybe we can continue the discussion somewhere else?
          $endgroup$
          – Alex
          16 hours ago




          $begingroup$
          This gets fairly involved and is IMHO clearly out of focus for this kind of code review and the experience the OP seems to have. Maybe we can continue the discussion somewhere else?
          $endgroup$
          – Alex
          16 hours ago













          15












          $begingroup$

          This will cover performance, as well as Python style.



          Save constants in one place



          You currently have the magic numbers 2000 and 3000, the resolution of your image. Save these to variables perhaps named X, Y or W, H.



          Mention your requirements



          You don't just rely on Python 3 and Jupyter - you rely on numpy and pillow. These should go in a requirements.txt if you don't already have one.



          Don't save your complex grid



          At all. complex_number should be formed dynamically in the loop based on range expressions.



          Disclaimer: if you're vectorizing (which you should do), then the opposite applies - you would keep your complex grid, and lose some loops.



          Don't use index lookups



          You're using index to get your coordinates. Don't do this - form the coordinates in your loops, as well.



          Mandelbrot is symmetrical



          Notice that it's mirror-imaged. This means you can halve your computation time and save every pixel to the top and bottom half.



          In a bit I'll show some example code accommodating all of the suggestions above. Just do (nearly) what @Alex says and I'd gotten halfway through implementing, with one difference: accommodate the symmetry optimization I described.






          share|improve this answer











          $endgroup$

















            15












            $begingroup$

            This will cover performance, as well as Python style.



            Save constants in one place



            You currently have the magic numbers 2000 and 3000, the resolution of your image. Save these to variables perhaps named X, Y or W, H.



            Mention your requirements



            You don't just rely on Python 3 and Jupyter - you rely on numpy and pillow. These should go in a requirements.txt if you don't already have one.



            Don't save your complex grid



            At all. complex_number should be formed dynamically in the loop based on range expressions.



            Disclaimer: if you're vectorizing (which you should do), then the opposite applies - you would keep your complex grid, and lose some loops.



            Don't use index lookups



            You're using index to get your coordinates. Don't do this - form the coordinates in your loops, as well.



            Mandelbrot is symmetrical



            Notice that it's mirror-imaged. This means you can halve your computation time and save every pixel to the top and bottom half.



            In a bit I'll show some example code accommodating all of the suggestions above. Just do (nearly) what @Alex says and I'd gotten halfway through implementing, with one difference: accommodate the symmetry optimization I described.






            share|improve this answer











            $endgroup$















              15












              15








              15





              $begingroup$

              This will cover performance, as well as Python style.



              Save constants in one place



              You currently have the magic numbers 2000 and 3000, the resolution of your image. Save these to variables perhaps named X, Y or W, H.



              Mention your requirements



              You don't just rely on Python 3 and Jupyter - you rely on numpy and pillow. These should go in a requirements.txt if you don't already have one.



              Don't save your complex grid



              At all. complex_number should be formed dynamically in the loop based on range expressions.



              Disclaimer: if you're vectorizing (which you should do), then the opposite applies - you would keep your complex grid, and lose some loops.



              Don't use index lookups



              You're using index to get your coordinates. Don't do this - form the coordinates in your loops, as well.



              Mandelbrot is symmetrical



              Notice that it's mirror-imaged. This means you can halve your computation time and save every pixel to the top and bottom half.



              In a bit I'll show some example code accommodating all of the suggestions above. Just do (nearly) what @Alex says and I'd gotten halfway through implementing, with one difference: accommodate the symmetry optimization I described.






              share|improve this answer











              $endgroup$



              This will cover performance, as well as Python style.



              Save constants in one place



              You currently have the magic numbers 2000 and 3000, the resolution of your image. Save these to variables perhaps named X, Y or W, H.



              Mention your requirements



              You don't just rely on Python 3 and Jupyter - you rely on numpy and pillow. These should go in a requirements.txt if you don't already have one.



              Don't save your complex grid



              At all. complex_number should be formed dynamically in the loop based on range expressions.



              Disclaimer: if you're vectorizing (which you should do), then the opposite applies - you would keep your complex grid, and lose some loops.



              Don't use index lookups



              You're using index to get your coordinates. Don't do this - form the coordinates in your loops, as well.



              Mandelbrot is symmetrical



              Notice that it's mirror-imaged. This means you can halve your computation time and save every pixel to the top and bottom half.



              In a bit I'll show some example code accommodating all of the suggestions above. Just do (nearly) what @Alex says and I'd gotten halfway through implementing, with one difference: accommodate the symmetry optimization I described.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited yesterday

























              answered yesterday









              ReinderienReinderien

              4,640823




              4,640823





















                  7












                  $begingroup$

                  Mandelbrot-specific optimisations



                  These can be combined with the Python-specific optimisations from the other answers.



                  Avoid the redundant square root



                  if (z.real**2+z.imag**2)**0.5 > 2:


                  is equivalent to



                  if z.real ** 2 + z.imag ** 2 > 4:


                  (simply square both sides of the original comparison to get the optimised comparison)



                  Avoid squaring unless you are using it



                  Any points that get further than 2 from the origin will continue to escape towards infinity. So it isn't important whether you check that a point has gone outside a circle of radius 2, or that it has gone outside some other finite shape that fully contains that circle. For example, checking that the point is outside a square instead of a circle avoids having to square the real and imaginary parts. It also means you will need slightly more iterations, but very few and this should be outweighed by having each iteration faster.



                  For example:



                  if (z.real**2+z.imag**2)**0.5 > 2: # if z is outside the circle


                  could be replaced by



                  if not (-2 < z.real < 2 and -2 < z.imag < 2): # if z is outside the square


                  The exception to this suggestion is if the circle is important to your output. If you simply plot points inside the set as black, and points outside the set as white, then the image will be identical with either approach. However, if you count the number of iterations a point takes to escape and use this to determine the colour of points outside the set, then the shape of the stripes of colour will be different with a square boundary than with a circular boundary. The interior of the set will be identical, but the colours outside will be arranged in different shapes.



                  In your example image, not much is visible of the stripes of colour, with most of the exterior and interior being black. In this case I doubt there would be a significant difference in appearance using this optimisation. However, if you change to displaying wider stripes in future, this optimisation may need to be removed (depending on what appearance you want).



                  Hard-code as much of the interior as you can



                  The interior of the set takes far longer to calculate than the exterior. Each pixel in the interior takes a guaranteed 255 iterations (or more if you increase the maximum iterations for even higher quality images), whereas each pixel in the exterior takes less than this. The vast majority of the exterior pixels take only a few iterations.



                  If you want the code to be adaptable for zooming in to arbitrary positions, then you won't know in advance which parts of the image are going to be interior points. However, if you only want this code to generate this one image of the whole set, then you can get a significant improvement in speed by avoiding calculating pixels you know are interior. For example, if you check whether the pixel is in the main cardioid or one of the large circles, you can assign all those pixels an iteration count of 255 without actually doing any iteration. The more you increase the maximum iterations, the more circles it will be worthwhile excluding in advance, as the difference in calculation time between the average exterior pixel and the average interior pixel will continue to diverge dramatically.



                  I don't know the exact centres and radii of these circles, or an exact equation for the cardioid, but rough estimates that are chosen to not overlap the exterior will still make a big difference to the speed. Even excluding some squares chosen by eye that are entirely in the interior would help.






                  share|improve this answer











                  $endgroup$

















                    7












                    $begingroup$

                    Mandelbrot-specific optimisations



                    These can be combined with the Python-specific optimisations from the other answers.



                    Avoid the redundant square root



                    if (z.real**2+z.imag**2)**0.5 > 2:


                    is equivalent to



                    if z.real ** 2 + z.imag ** 2 > 4:


                    (simply square both sides of the original comparison to get the optimised comparison)



                    Avoid squaring unless you are using it



                    Any points that get further than 2 from the origin will continue to escape towards infinity. So it isn't important whether you check that a point has gone outside a circle of radius 2, or that it has gone outside some other finite shape that fully contains that circle. For example, checking that the point is outside a square instead of a circle avoids having to square the real and imaginary parts. It also means you will need slightly more iterations, but very few and this should be outweighed by having each iteration faster.



                    For example:



                    if (z.real**2+z.imag**2)**0.5 > 2: # if z is outside the circle


                    could be replaced by



                    if not (-2 < z.real < 2 and -2 < z.imag < 2): # if z is outside the square


                    The exception to this suggestion is if the circle is important to your output. If you simply plot points inside the set as black, and points outside the set as white, then the image will be identical with either approach. However, if you count the number of iterations a point takes to escape and use this to determine the colour of points outside the set, then the shape of the stripes of colour will be different with a square boundary than with a circular boundary. The interior of the set will be identical, but the colours outside will be arranged in different shapes.



                    In your example image, not much is visible of the stripes of colour, with most of the exterior and interior being black. In this case I doubt there would be a significant difference in appearance using this optimisation. However, if you change to displaying wider stripes in future, this optimisation may need to be removed (depending on what appearance you want).



                    Hard-code as much of the interior as you can



                    The interior of the set takes far longer to calculate than the exterior. Each pixel in the interior takes a guaranteed 255 iterations (or more if you increase the maximum iterations for even higher quality images), whereas each pixel in the exterior takes less than this. The vast majority of the exterior pixels take only a few iterations.



                    If you want the code to be adaptable for zooming in to arbitrary positions, then you won't know in advance which parts of the image are going to be interior points. However, if you only want this code to generate this one image of the whole set, then you can get a significant improvement in speed by avoiding calculating pixels you know are interior. For example, if you check whether the pixel is in the main cardioid or one of the large circles, you can assign all those pixels an iteration count of 255 without actually doing any iteration. The more you increase the maximum iterations, the more circles it will be worthwhile excluding in advance, as the difference in calculation time between the average exterior pixel and the average interior pixel will continue to diverge dramatically.



                    I don't know the exact centres and radii of these circles, or an exact equation for the cardioid, but rough estimates that are chosen to not overlap the exterior will still make a big difference to the speed. Even excluding some squares chosen by eye that are entirely in the interior would help.






                    share|improve this answer











                    $endgroup$















                      7












                      7








                      7





                      $begingroup$

                      Mandelbrot-specific optimisations



                      These can be combined with the Python-specific optimisations from the other answers.



                      Avoid the redundant square root



                      if (z.real**2+z.imag**2)**0.5 > 2:


                      is equivalent to



                      if z.real ** 2 + z.imag ** 2 > 4:


                      (simply square both sides of the original comparison to get the optimised comparison)



                      Avoid squaring unless you are using it



                      Any points that get further than 2 from the origin will continue to escape towards infinity. So it isn't important whether you check that a point has gone outside a circle of radius 2, or that it has gone outside some other finite shape that fully contains that circle. For example, checking that the point is outside a square instead of a circle avoids having to square the real and imaginary parts. It also means you will need slightly more iterations, but very few and this should be outweighed by having each iteration faster.



                      For example:



                      if (z.real**2+z.imag**2)**0.5 > 2: # if z is outside the circle


                      could be replaced by



                      if not (-2 < z.real < 2 and -2 < z.imag < 2): # if z is outside the square


                      The exception to this suggestion is if the circle is important to your output. If you simply plot points inside the set as black, and points outside the set as white, then the image will be identical with either approach. However, if you count the number of iterations a point takes to escape and use this to determine the colour of points outside the set, then the shape of the stripes of colour will be different with a square boundary than with a circular boundary. The interior of the set will be identical, but the colours outside will be arranged in different shapes.



                      In your example image, not much is visible of the stripes of colour, with most of the exterior and interior being black. In this case I doubt there would be a significant difference in appearance using this optimisation. However, if you change to displaying wider stripes in future, this optimisation may need to be removed (depending on what appearance you want).



                      Hard-code as much of the interior as you can



                      The interior of the set takes far longer to calculate than the exterior. Each pixel in the interior takes a guaranteed 255 iterations (or more if you increase the maximum iterations for even higher quality images), whereas each pixel in the exterior takes less than this. The vast majority of the exterior pixels take only a few iterations.



                      If you want the code to be adaptable for zooming in to arbitrary positions, then you won't know in advance which parts of the image are going to be interior points. However, if you only want this code to generate this one image of the whole set, then you can get a significant improvement in speed by avoiding calculating pixels you know are interior. For example, if you check whether the pixel is in the main cardioid or one of the large circles, you can assign all those pixels an iteration count of 255 without actually doing any iteration. The more you increase the maximum iterations, the more circles it will be worthwhile excluding in advance, as the difference in calculation time between the average exterior pixel and the average interior pixel will continue to diverge dramatically.



                      I don't know the exact centres and radii of these circles, or an exact equation for the cardioid, but rough estimates that are chosen to not overlap the exterior will still make a big difference to the speed. Even excluding some squares chosen by eye that are entirely in the interior would help.






                      share|improve this answer











                      $endgroup$



                      Mandelbrot-specific optimisations



                      These can be combined with the Python-specific optimisations from the other answers.



                      Avoid the redundant square root



                      if (z.real**2+z.imag**2)**0.5 > 2:


                      is equivalent to



                      if z.real ** 2 + z.imag ** 2 > 4:


                      (simply square both sides of the original comparison to get the optimised comparison)



                      Avoid squaring unless you are using it



                      Any points that get further than 2 from the origin will continue to escape towards infinity. So it isn't important whether you check that a point has gone outside a circle of radius 2, or that it has gone outside some other finite shape that fully contains that circle. For example, checking that the point is outside a square instead of a circle avoids having to square the real and imaginary parts. It also means you will need slightly more iterations, but very few and this should be outweighed by having each iteration faster.



                      For example:



                      if (z.real**2+z.imag**2)**0.5 > 2: # if z is outside the circle


                      could be replaced by



                      if not (-2 < z.real < 2 and -2 < z.imag < 2): # if z is outside the square


                      The exception to this suggestion is if the circle is important to your output. If you simply plot points inside the set as black, and points outside the set as white, then the image will be identical with either approach. However, if you count the number of iterations a point takes to escape and use this to determine the colour of points outside the set, then the shape of the stripes of colour will be different with a square boundary than with a circular boundary. The interior of the set will be identical, but the colours outside will be arranged in different shapes.



                      In your example image, not much is visible of the stripes of colour, with most of the exterior and interior being black. In this case I doubt there would be a significant difference in appearance using this optimisation. However, if you change to displaying wider stripes in future, this optimisation may need to be removed (depending on what appearance you want).



                      Hard-code as much of the interior as you can



                      The interior of the set takes far longer to calculate than the exterior. Each pixel in the interior takes a guaranteed 255 iterations (or more if you increase the maximum iterations for even higher quality images), whereas each pixel in the exterior takes less than this. The vast majority of the exterior pixels take only a few iterations.



                      If you want the code to be adaptable for zooming in to arbitrary positions, then you won't know in advance which parts of the image are going to be interior points. However, if you only want this code to generate this one image of the whole set, then you can get a significant improvement in speed by avoiding calculating pixels you know are interior. For example, if you check whether the pixel is in the main cardioid or one of the large circles, you can assign all those pixels an iteration count of 255 without actually doing any iteration. The more you increase the maximum iterations, the more circles it will be worthwhile excluding in advance, as the difference in calculation time between the average exterior pixel and the average interior pixel will continue to diverge dramatically.



                      I don't know the exact centres and radii of these circles, or an exact equation for the cardioid, but rough estimates that are chosen to not overlap the exterior will still make a big difference to the speed. Even excluding some squares chosen by eye that are entirely in the interior would help.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited yesterday

























                      answered yesterday









                      trichoplaxtrichoplax

                      592316




                      592316





















                          3












                          $begingroup$

                          I'm not a python expert. I am pretty good with Mandlebrot generation (I've spent a lot of time on my custom Julia Set generator.)



                          So I'll say this: optimize the heck out of stuff that will be running many iterations. Forget about clean-code or nice OOP principles. For lots-of-iterations stuff like this, you want as nitty gritty as possible.



                          So let's take a look at your interior-most loop:



                          z = z**2 + complex_number
                          if (z.real**2+z.imag**2)**0.5 > 2:
                          pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
                          break
                          else:
                          continue


                          Imagine what's happening behind the scenes in memory with just that very first line. You've got an instance of a complex number. You want to square it... so it has to create another instance of a complex object to hold the squared value. Then, you're adding another complex number to it - which means you're creating another instance of Complex to hold the result of the addition.



                          You're creating object instances left and right, and you're doing it on an order of 3000 x 2000 x 255 times. Creating several class instances doesn't sound like much, but when you're doing it a billion times, it kinda drags things down.



                          Compare that with pseudocode like:



                          px = num.real
                          py = num.imag
                          while
                          tmppx = px
                          px = px * px - py * py + num.real
                          py = 2 * tmppx * py + num.imag
                          if condition-for-hitting-escape
                          stuff
                          if condition-for-hitting-max-iter
                          moreStuff


                          No objects are getting created and destroyed. It's boiled down to be as efficient as possible. It's not as nice looking... but when you're doing something a billion times, shaving off even a millionth of a second results in saving 15 minutes.



                          And as someone else mentioned, you want to simplify the logic so that you don't have to do the square-root operation - and if you're okay with small variations in how the gradient is colored, changing the "magnitude" check with "are x or y within a bounding box".



                          Aka, the more things you can remove out of that runs-a-billion-times loop, the better off you'll be.






                          share|improve this answer









                          $endgroup$

















                            3












                            $begingroup$

                            I'm not a python expert. I am pretty good with Mandlebrot generation (I've spent a lot of time on my custom Julia Set generator.)



                            So I'll say this: optimize the heck out of stuff that will be running many iterations. Forget about clean-code or nice OOP principles. For lots-of-iterations stuff like this, you want as nitty gritty as possible.



                            So let's take a look at your interior-most loop:



                            z = z**2 + complex_number
                            if (z.real**2+z.imag**2)**0.5 > 2:
                            pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
                            break
                            else:
                            continue


                            Imagine what's happening behind the scenes in memory with just that very first line. You've got an instance of a complex number. You want to square it... so it has to create another instance of a complex object to hold the squared value. Then, you're adding another complex number to it - which means you're creating another instance of Complex to hold the result of the addition.



                            You're creating object instances left and right, and you're doing it on an order of 3000 x 2000 x 255 times. Creating several class instances doesn't sound like much, but when you're doing it a billion times, it kinda drags things down.



                            Compare that with pseudocode like:



                            px = num.real
                            py = num.imag
                            while
                            tmppx = px
                            px = px * px - py * py + num.real
                            py = 2 * tmppx * py + num.imag
                            if condition-for-hitting-escape
                            stuff
                            if condition-for-hitting-max-iter
                            moreStuff


                            No objects are getting created and destroyed. It's boiled down to be as efficient as possible. It's not as nice looking... but when you're doing something a billion times, shaving off even a millionth of a second results in saving 15 minutes.



                            And as someone else mentioned, you want to simplify the logic so that you don't have to do the square-root operation - and if you're okay with small variations in how the gradient is colored, changing the "magnitude" check with "are x or y within a bounding box".



                            Aka, the more things you can remove out of that runs-a-billion-times loop, the better off you'll be.






                            share|improve this answer









                            $endgroup$















                              3












                              3








                              3





                              $begingroup$

                              I'm not a python expert. I am pretty good with Mandlebrot generation (I've spent a lot of time on my custom Julia Set generator.)



                              So I'll say this: optimize the heck out of stuff that will be running many iterations. Forget about clean-code or nice OOP principles. For lots-of-iterations stuff like this, you want as nitty gritty as possible.



                              So let's take a look at your interior-most loop:



                              z = z**2 + complex_number
                              if (z.real**2+z.imag**2)**0.5 > 2:
                              pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
                              break
                              else:
                              continue


                              Imagine what's happening behind the scenes in memory with just that very first line. You've got an instance of a complex number. You want to square it... so it has to create another instance of a complex object to hold the squared value. Then, you're adding another complex number to it - which means you're creating another instance of Complex to hold the result of the addition.



                              You're creating object instances left and right, and you're doing it on an order of 3000 x 2000 x 255 times. Creating several class instances doesn't sound like much, but when you're doing it a billion times, it kinda drags things down.



                              Compare that with pseudocode like:



                              px = num.real
                              py = num.imag
                              while
                              tmppx = px
                              px = px * px - py * py + num.real
                              py = 2 * tmppx * py + num.imag
                              if condition-for-hitting-escape
                              stuff
                              if condition-for-hitting-max-iter
                              moreStuff


                              No objects are getting created and destroyed. It's boiled down to be as efficient as possible. It's not as nice looking... but when you're doing something a billion times, shaving off even a millionth of a second results in saving 15 minutes.



                              And as someone else mentioned, you want to simplify the logic so that you don't have to do the square-root operation - and if you're okay with small variations in how the gradient is colored, changing the "magnitude" check with "are x or y within a bounding box".



                              Aka, the more things you can remove out of that runs-a-billion-times loop, the better off you'll be.






                              share|improve this answer









                              $endgroup$



                              I'm not a python expert. I am pretty good with Mandlebrot generation (I've spent a lot of time on my custom Julia Set generator.)



                              So I'll say this: optimize the heck out of stuff that will be running many iterations. Forget about clean-code or nice OOP principles. For lots-of-iterations stuff like this, you want as nitty gritty as possible.



                              So let's take a look at your interior-most loop:



                              z = z**2 + complex_number
                              if (z.real**2+z.imag**2)**0.5 > 2:
                              pixel_grid[complex_grid.index(complex_list),complex_list.index(complex_number)]=[iteration,iteration,iteration]
                              break
                              else:
                              continue


                              Imagine what's happening behind the scenes in memory with just that very first line. You've got an instance of a complex number. You want to square it... so it has to create another instance of a complex object to hold the squared value. Then, you're adding another complex number to it - which means you're creating another instance of Complex to hold the result of the addition.



                              You're creating object instances left and right, and you're doing it on an order of 3000 x 2000 x 255 times. Creating several class instances doesn't sound like much, but when you're doing it a billion times, it kinda drags things down.



                              Compare that with pseudocode like:



                              px = num.real
                              py = num.imag
                              while
                              tmppx = px
                              px = px * px - py * py + num.real
                              py = 2 * tmppx * py + num.imag
                              if condition-for-hitting-escape
                              stuff
                              if condition-for-hitting-max-iter
                              moreStuff


                              No objects are getting created and destroyed. It's boiled down to be as efficient as possible. It's not as nice looking... but when you're doing something a billion times, shaving off even a millionth of a second results in saving 15 minutes.



                              And as someone else mentioned, you want to simplify the logic so that you don't have to do the square-root operation - and if you're okay with small variations in how the gradient is colored, changing the "magnitude" check with "are x or y within a bounding box".



                              Aka, the more things you can remove out of that runs-a-billion-times loop, the better off you'll be.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered yesterday









                              KevinKevin

                              1713




                              1713





















                                  2












                                  $begingroup$

                                  There are a few tricks you can use to make a Mandelbrot renderer really fly.



                                  Detect cycles



                                  If a point lies inside the Mandelbrot set, successive iterations will cause it to decay into a cycle. The most economical way to detect this, I have found, is to do x iterations, test to see if it is the same as before, then increment x and repeat.



                                  Draw a half resolution version first



                                  That's a 1000x1500 image in your case. Calculate it such that each pixel represents a pixel in the real image. Then if a pixel is entirely surrounded by other pixels with the same iteration count, you can assume that it also has that iteration count and skip calculating it.



                                  This technique can miss fine strands, but it saves an enormous amount of time. You should also use a flood fill style algorithm whenever you calculate an unskippable pixel to find other pixels that may previously have been considered skippable but aren't. This should fix most of the problems.



                                  Also note that this is recursive. Before calculating the 1000x1500 version you should calculate a 500x750 version, before that a 250x375 version etc.



                                  The SuperFractalThing trick



                                  If you want to calculate deep fractals, you need to use high precision, which can be a huge drain on calculation time. However, strictly speaking you only need to use high precision for one pixel.



                                  We start from position $p_0$, and we follow the usual iterative formula:



                                  $p_x+1=p_x^2+p_0$



                                  We record all the values of $p_x$ as regular, double precision complex numbers. Now we calculate $q$, but we do it by calculating $d$, where $d_x=q_x-p_x$:



                                  $d_x+1 = 2d_xp_x + d_x^2 + (q_0-p_0)$



                                  This is a bit more complicated, but we only need to use double precision numbers, so it is much, much faster when deep zooming.



                                  One issue is that the $p$ sequence has to be at least as long as the $q$ sequence, and we cannot tell the best $p$ sequence in advance. In practice we often have to calculate new $p$ sequences using high precision arithmetic as we discover pixels with a longer escape time.



                                  Faster languages



                                  There is no getting around it, Python is slow. NumPy can do the heavy lifting, which can speed it up dramatically, but it's pretty awkward compared to the same code written in C. I suggest learning to use Ctypes and writing a small C library to follow the iterative formula.






                                  share|improve this answer











                                  $endgroup$

















                                    2












                                    $begingroup$

                                    There are a few tricks you can use to make a Mandelbrot renderer really fly.



                                    Detect cycles



                                    If a point lies inside the Mandelbrot set, successive iterations will cause it to decay into a cycle. The most economical way to detect this, I have found, is to do x iterations, test to see if it is the same as before, then increment x and repeat.



                                    Draw a half resolution version first



                                    That's a 1000x1500 image in your case. Calculate it such that each pixel represents a pixel in the real image. Then if a pixel is entirely surrounded by other pixels with the same iteration count, you can assume that it also has that iteration count and skip calculating it.



                                    This technique can miss fine strands, but it saves an enormous amount of time. You should also use a flood fill style algorithm whenever you calculate an unskippable pixel to find other pixels that may previously have been considered skippable but aren't. This should fix most of the problems.



                                    Also note that this is recursive. Before calculating the 1000x1500 version you should calculate a 500x750 version, before that a 250x375 version etc.



                                    The SuperFractalThing trick



                                    If you want to calculate deep fractals, you need to use high precision, which can be a huge drain on calculation time. However, strictly speaking you only need to use high precision for one pixel.



                                    We start from position $p_0$, and we follow the usual iterative formula:



                                    $p_x+1=p_x^2+p_0$



                                    We record all the values of $p_x$ as regular, double precision complex numbers. Now we calculate $q$, but we do it by calculating $d$, where $d_x=q_x-p_x$:



                                    $d_x+1 = 2d_xp_x + d_x^2 + (q_0-p_0)$



                                    This is a bit more complicated, but we only need to use double precision numbers, so it is much, much faster when deep zooming.



                                    One issue is that the $p$ sequence has to be at least as long as the $q$ sequence, and we cannot tell the best $p$ sequence in advance. In practice we often have to calculate new $p$ sequences using high precision arithmetic as we discover pixels with a longer escape time.



                                    Faster languages



                                    There is no getting around it, Python is slow. NumPy can do the heavy lifting, which can speed it up dramatically, but it's pretty awkward compared to the same code written in C. I suggest learning to use Ctypes and writing a small C library to follow the iterative formula.






                                    share|improve this answer











                                    $endgroup$















                                      2












                                      2








                                      2





                                      $begingroup$

                                      There are a few tricks you can use to make a Mandelbrot renderer really fly.



                                      Detect cycles



                                      If a point lies inside the Mandelbrot set, successive iterations will cause it to decay into a cycle. The most economical way to detect this, I have found, is to do x iterations, test to see if it is the same as before, then increment x and repeat.



                                      Draw a half resolution version first



                                      That's a 1000x1500 image in your case. Calculate it such that each pixel represents a pixel in the real image. Then if a pixel is entirely surrounded by other pixels with the same iteration count, you can assume that it also has that iteration count and skip calculating it.



                                      This technique can miss fine strands, but it saves an enormous amount of time. You should also use a flood fill style algorithm whenever you calculate an unskippable pixel to find other pixels that may previously have been considered skippable but aren't. This should fix most of the problems.



                                      Also note that this is recursive. Before calculating the 1000x1500 version you should calculate a 500x750 version, before that a 250x375 version etc.



                                      The SuperFractalThing trick



                                      If you want to calculate deep fractals, you need to use high precision, which can be a huge drain on calculation time. However, strictly speaking you only need to use high precision for one pixel.



                                      We start from position $p_0$, and we follow the usual iterative formula:



                                      $p_x+1=p_x^2+p_0$



                                      We record all the values of $p_x$ as regular, double precision complex numbers. Now we calculate $q$, but we do it by calculating $d$, where $d_x=q_x-p_x$:



                                      $d_x+1 = 2d_xp_x + d_x^2 + (q_0-p_0)$



                                      This is a bit more complicated, but we only need to use double precision numbers, so it is much, much faster when deep zooming.



                                      One issue is that the $p$ sequence has to be at least as long as the $q$ sequence, and we cannot tell the best $p$ sequence in advance. In practice we often have to calculate new $p$ sequences using high precision arithmetic as we discover pixels with a longer escape time.



                                      Faster languages



                                      There is no getting around it, Python is slow. NumPy can do the heavy lifting, which can speed it up dramatically, but it's pretty awkward compared to the same code written in C. I suggest learning to use Ctypes and writing a small C library to follow the iterative formula.






                                      share|improve this answer











                                      $endgroup$



                                      There are a few tricks you can use to make a Mandelbrot renderer really fly.



                                      Detect cycles



                                      If a point lies inside the Mandelbrot set, successive iterations will cause it to decay into a cycle. The most economical way to detect this, I have found, is to do x iterations, test to see if it is the same as before, then increment x and repeat.



                                      Draw a half resolution version first



                                      That's a 1000x1500 image in your case. Calculate it such that each pixel represents a pixel in the real image. Then if a pixel is entirely surrounded by other pixels with the same iteration count, you can assume that it also has that iteration count and skip calculating it.



                                      This technique can miss fine strands, but it saves an enormous amount of time. You should also use a flood fill style algorithm whenever you calculate an unskippable pixel to find other pixels that may previously have been considered skippable but aren't. This should fix most of the problems.



                                      Also note that this is recursive. Before calculating the 1000x1500 version you should calculate a 500x750 version, before that a 250x375 version etc.



                                      The SuperFractalThing trick



                                      If you want to calculate deep fractals, you need to use high precision, which can be a huge drain on calculation time. However, strictly speaking you only need to use high precision for one pixel.



                                      We start from position $p_0$, and we follow the usual iterative formula:



                                      $p_x+1=p_x^2+p_0$



                                      We record all the values of $p_x$ as regular, double precision complex numbers. Now we calculate $q$, but we do it by calculating $d$, where $d_x=q_x-p_x$:



                                      $d_x+1 = 2d_xp_x + d_x^2 + (q_0-p_0)$



                                      This is a bit more complicated, but we only need to use double precision numbers, so it is much, much faster when deep zooming.



                                      One issue is that the $p$ sequence has to be at least as long as the $q$ sequence, and we cannot tell the best $p$ sequence in advance. In practice we often have to calculate new $p$ sequences using high precision arithmetic as we discover pixels with a longer escape time.



                                      Faster languages



                                      There is no getting around it, Python is slow. NumPy can do the heavy lifting, which can speed it up dramatically, but it's pretty awkward compared to the same code written in C. I suggest learning to use Ctypes and writing a small C library to follow the iterative formula.







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited 10 hours ago

























                                      answered 10 hours ago









                                      James HollisJames Hollis

                                      1812




                                      1812



























                                          draft saved

                                          draft discarded
















































                                          Thanks for contributing an answer to Code Review Stack Exchange!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid


                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.

                                          Use MathJax to format equations. MathJax reference.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function ()
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f216235%2fincrease-performance-creating-mandelbrot-set-in-python%23new-answer', 'question_page');

                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          -fractals, image, performance, python, python-3.x

                                          Popular posts from this blog

                                          Frič See also Navigation menuinternal link

                                          Identify plant with long narrow paired leaves and reddish stems Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) Announcing the arrival of Valued Associate #679: Cesar Manara Unicorn Meta Zoo #1: Why another podcast?What is this plant with long sharp leaves? Is it a weed?What is this 3ft high, stalky plant, with mid sized narrow leaves?What is this young shrub with opposite ovate, crenate leaves and reddish stems?What is this plant with large broad serrated leaves?Identify this upright branching weed with long leaves and reddish stemsPlease help me identify this bulbous plant with long, broad leaves and white flowersWhat is this small annual with narrow gray/green leaves and rust colored daisy-type flowers?What is this chilli plant?Does anyone know what type of chilli plant this is?Help identify this plant

                                          fontconfig warning: “/etc/fonts/fonts.conf”, line 100: unknown “element blank” The 2019 Stack Overflow Developer Survey Results Are In“tar: unrecognized option --warning” during 'apt-get install'How to fix Fontconfig errorHow do I figure out which font file is chosen for a system generic font alias?Why are some apt-get-installed fonts being ignored by fc-list, xfontsel, etc?Reload settings in /etc/fonts/conf.dTaking 30 seconds longer to boot after upgrade from jessie to stretchHow to match multiple font names with a single <match> element?Adding a custom font to fontconfigRemoving fonts from fontconfig <match> resultsBroken fonts after upgrading Firefox ESR to latest Firefox