Posted on July 30th, 2010 by David Fearon
Core i7-980X PC versus eight-core Xeon workstation

Having been writing about photo-realistic 3D graphics rendering for issue 192 of the magazine, I’ve been getting myself back up to speed with the state of 3D graphics and looking into the absolute best techniques for achieving realistic lighting. And along the way I’ve got a new insight into the sheer speed of the latest CPUs.
Turns out the best 3D rendering algorithm is a hugely intensive method known as path tracing, which is sort of like ray tracing’s dad. The theory behind the method actually pre-dates ray tracing, but it’s only now that PCs are getting fast enough for experimental dabbling at home.
The good part is that, while it needs a heck of a lot of computing power to do, path tracing is actually a fairly simple technique to implement.
But where to get a path-tracing application to play with?
Well, Kevin Beason has written a beautiful example of minimalist programming with his path-tracing renderer, smallpt. It’s a complete functioning renderer, with a 3D scene (based on the research-standard Cornell box scene) embedded into the program.
Smallpt generates and saves to disk the fully rendered, near-photorealistic image you can see above. And it’s written in a ridiculously compact 99 lines of C++ code. That’s the entire renderer, including the scene itself.
Kevin provides only the source code on his site, but I fancied running smallpt. So I spent a couple of hours getting it to compile under Visual C++ Express 2010, which is completely free and you can download from here.
The code assumes you’re using the open-source GCC compiler and his code includes some Linux/gcc programming tricks that don’t work under Windows, but a bit of tweaking later I had it rendering the Cornell-box scene.
It is awfully compute-intensive though, taking over 12 minutes to render a grainy 100-samples-per-pixel version on my Core 2 6300 everyday office PC:

Aha! This was a perfect opportunity to put my new quad-core Core 2 Q9400 system, that our lovely IT department built me a couple of weeks ago, through its paces. I added a few lines to the code of smallpt to get it to give me an overall time in seconds for the complete render, and set it going.
Straight away, render time came down to 252 seconds – just over four minutes.
Then I remembered my dual-Xeon workstation muscle machine, originally a test ‘white box’ from Intel that, ahem, never found its way back to them. The only reason I don’t use it as an everyday machine is its excessively loud industrial-level cooling system. But with its dual, quad-core Xeon processors, which cost some frightening amount of money when new, this was the perfect job for the Beast.
I set up the machine in a corner of the PC Pro Labs (well away from complaints about the noise) and installed Windows 7 Ultimate x64, just to make the test fair since that’s what’s running on my other PCs.
Then I fired up smallpt.exe and postponed making my next cup of tea, knowing it would rip through the render before I could even rise from my chair.
Oh.
Turns out my once-mighty eight-core workstation, barely over three years of age, is now slower for raw compute speed, and by a heck of a margin, than my quad-core machine.
In fact its two Xeon X5340 CPUs took 493 seconds to churn through the smallpt render: getting on for twice as long as my quad-core.
Deflated, I switched off the machine, then wandered over to Mike Jennings in his own corner of the Labs, engrossed in a graphics-card group test for the next issue of PC Pro.
“What’s the CPU in your test rig, Mike?”
“Oh, it’s a Core i7 980X. Six cores. Really fast!”
“Ah. Fast you say? Um, mind if I use it when you’re done?”
“Sure.”
So I did.
It’s not often I class a computer as astonishingly fast, but hell’s teeth this one certainly is.
The render completed in 73 seconds. That’s almost three-and-a-half times faster than my nearly-new Q9400 machine, and nearly seven times faster than my not-exactly-old, dual-Xeon workstation that was worth a good four thousand pounds when it was new.
Let’s consider those results on a per-socket basis.
With this pure-CPU, highly multithreaded task, the latest generation of enthusiast-level Intel CPUs are over thirteen times faster per processor than the professional-level Xeon CPU of three-and-a-bit years ago. And about five times faster per core.
I knew all this before, but seeing that machine chew through the render with such ferocious speed really brings home the level of engineering achievement that Intel continues to manage, year after year.
Try it yourself
If you want to try the unofficial PC Pro smallpt render test on your machine, you can download my compiled version here.
But wait! The multithreading needs the Microsoft OpenMP support DLL, vcomp90.dll, and the program won’t work without it.
The free-but-faffy way to get it is to install the Microsoft Visual C++ 2008 Redistributable Package from here.
Once the redistributable is installed, search for vcomp90.dll – it should be hiding in a subfolder somewhere within C:\Windows\winsxs – and just copy it to the same folder as smallpt.exe.
Now double-click the smallpt.exe file and the renderer will open in a command-prompt box, churn away for a while and save the rendered image file to the same folder when the render is complete. It’ll also give you the time taken to render when it’s finished.

You can open the resulting .ppm image using GIMP for Windows.
Let us know your results, for machines both old and new.
Has anybody out there got a machine that will break the minute mark?
Update:
Check out the posts below and you’ll see that Intel itself has risen to the challenge. Read all about the superchilled Intel test rig.
Tags: 3D graphics, Core i7, intel, path tracing, ray-tracing, Xeon
Follow any responses to this entry through the RSS 2.0 feed.
You can skip to the end and leave a response. Pinging is currently not allowed.
197 Responses to “ Core i7-980X PC versus eight-core Xeon workstation ”
Leave a Reply
Authors
- Barry Collins
- Chris Brennan
- Christine Horton
- Darien Graham-Smith
- Dave Stevenson
- Davey Winder
- David Bayon
- David Fearon
- Ewen Rankin
- Ian Devlin
- Jon Honeyball
- Jonathan Bray
- Kevin Partner
- Mike Jennings
- Nicole Kobie
- Sasha Muller
- Steve Cassidy
- Stewart Mitchell
- Stuart Turton
- Tim Danton
- Tom Arah
Categories
- About the bloggers
- Android App of the Week
- cloud computing
- Green
- Hardware
- How To
- iPhone App of the Week
- Just in
- Microsoft Office 2010
- Newsdesk
- Online business
- Random
- Rant
- Real World Computing
- Software
- View from the Labs
- Windows 7
- Windows 8
Archives
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
advertisement


July 30th, 2010 at 2:46 pm
Is it possible to post your modified source as Norton 2010 took it upon itself to delete the binary as soon as it was downloaded?
July 30th, 2010 at 2:57 pm
Will post as soon as I’ve cleaned it up to make it less embarrassing Chris
July 30th, 2010 at 3:15 pm
I couldn’t get smallpt to run. Get an error: “The application was unable to start correctly (0xc000007b)” I also found 4 version of vcomp90.dll in various subfolders in the winsxs directory. I just copy the most recent one.
July 30th, 2010 at 3:15 pm
I sympathise completely with the deflated feeling you went through.
My younger brother is doing a degree in architecture. As such he uses things like Auto CAD and 3d Studio Max every day. It became clear at Christmas his machine was no longer up to task. One of his projects was going to be late if he didn’t get the render going soon. The computer suites were full of people who were rendering for over 24hours, machines were crashing etc. He handed me his project and I ran it overnight on my Q6600 and it took just under 6 hours and was done, deadline made.
I then rebuilt his machine on the cheap (i5 760, 4GB DDR3 dual and a cheap Geforce GT 240 with a gig (came to about £350 I think although not certain)). I also popped in a non stock cooler to give it a hope in hell of not overheating. I also (and here is the key) installed the CUDA plugin for 3D studio Max and re-ran the same render at the same settings etc.
9 mins – Didn’t even get warm.
July 30th, 2010 at 3:18 pm
I got the dll here: http://www.dll-files.com/dllindex/dll-files.shtml?vcomp90 worked perfectly.. 1.7MB or 27KB? hmmm
July 30th, 2010 at 3:30 pm
Interesting, my single socket Xeon 5345 did it in 331 seconds, by some margin quicker than your dual socket Xeon 5340. Surely these two CPUs are not that different?
July 30th, 2010 at 3:34 pm
My Sony Vaio laptop did it in 202 seconds (1.6Ghz Core i7-620QM, 8GB RAM)
July 30th, 2010 at 3:44 pm
@Laurent – that is indeed interesting! Maybe I’ll head back down to the Labs and poke around in the BIOS of the Xeon machine, see if I can get more out of it.
@Hooch – you need the DLL version that matches your machine architecture. Easiest thing is just to try each one to find one that works!
July 30th, 2010 at 3:51 pm
353 seconds on Opteron 1352 Quad Core 2.1GHz. Not earth-shattering but it’s a point on a graph.
July 30th, 2010 at 3:52 pm
My stock i7-930 did it in 120 seconds, just goes to show how much faster the i7980X really is!
July 30th, 2010 at 3:55 pm
287 seconds
Intel Xeon W3520 @ 2.67GHz
Windows 7 Professional 32-bit
2GB RAM
———————————–
660 seconds
Windows 7 Enterprise 32-bit
Intel Core 2 Duo E4500 @ 2.20GHz
3.5GB RAM
July 30th, 2010 at 4:08 pm
Oops, typo, that should be a Core i7-720QM, I also re-ran it, with all other apps closed (I had done it with chat clients running and doing a remote support session to install a scanner. That brought it down to 199 seconds.
July 30th, 2010 at 4:26 pm
190 seconds on a Windows 7, Dell Precision m6500 running a Quad Core Intel Core i7 820QM @ 1.73Ghz
July 30th, 2010 at 4:37 pm
Thanks David, now I have CPU envy!
My results on an AMD Phenom 9950 Quad at 2.6GHz with 4G RAM on Win7 was 303 seconds.
July 30th, 2010 at 4:46 pm
303 seconds on E8400 @ 4Ghz 4Gb RAM
Win 7 Pro
July 30th, 2010 at 4:54 pm
Down to 291 seconds with Kaspersky off!
July 30th, 2010 at 5:05 pm
Do I see a new PC Pro review benchmark in the making?
July 30th, 2010 at 5:29 pm
I didn’t manage to break the 60s barrier. In fact, it was a slightly disappointing 1935s on my 1.6Ghz Atom. I have now shelved my plans to use it to render Toy Story 4..
July 30th, 2010 at 6:05 pm
XP Pro: i7-930 stock speed
111 seconds!
Wow!
July 30th, 2010 at 6:18 pm
114 seconds on my newly built Core i7 930 over-clocked to 4GHz!
Thanks for this – was looking for a way to max out all 8 threads!
Nice to see the old “Hit Enter to exit” joke making a new appearance. Almost as good as “Press start button to stop your computer”!
July 30th, 2010 at 6:23 pm
@Vic: Thanks for raining on my parade! For the record I’m on Windows 7, so there does seem to be a significant performance penalty.
July 30th, 2010 at 7:21 pm
Core i7, OC’d to 3570Mhz (170.0 x 21) = 94 seconds
July 30th, 2010 at 8:35 pm
Core 2 Duo E6600, just over 3.5 years old, Win7 Pro 32 bit:
652 seconds.
July 30th, 2010 at 8:36 pm
Oh dear, Core i5 @ 2.67 GHz, ended up 185 secs and grainy, I shut down NIS 2010 and it took 4 secs longer and I thought my Graphics card was CUDA-enables sigh.
July 30th, 2010 at 9:01 pm
Core i7 920 @4.00ghz 89 seconds while watching a movie as well. I love this machine!!
July 30th, 2010 at 9:57 pm
Well mine did it in 225 sec on an AMD 1055T – 6 core ,O/C to 3.4ghz with 2gb of 800mhz ddr2 ram running on win7 – 64bit
July 30th, 2010 at 10:08 pm
Dell XPS M1530, 3 years old W7 Home Premium x64 4GB RAM
569 seconds
Core 2 Duo T7500 @ 2.20GHz
Self built desktop
225 seconds
Phenom II X4 965 3.4 Ghz 4 Core
W7 Home Premium x64 8GB RAM.
I wonder what the Phenom could do if it was overclocked????
July 30th, 2010 at 10:40 pm
Sorry should have been more specific, Core i7 920@ 4.00 ghz, 6gb ddr 3, EX58-UD5 motherboard. Running windows 7 pro 64 bit. Just tried again and this time got an 88. Most impressed as I bought this machine in oct 09 and it still seems to be holding its own
.
July 30th, 2010 at 11:03 pm
Dell Studio 15 core i5-520M, 4GB DDR3 RAM, windows 7 home premium took 233 seconds
July 30th, 2010 at 11:29 pm
237 seconds on a Core i5-520M, 4GB DDR3 Windows 7 Professional 32-bit in a Dell Latitude E4310.
July 30th, 2010 at 11:45 pm
I ran the render prog on my 2 notebooks, the first one an “oldy” with a T4400 CPU, it took 547sec.
The second notebook a more recent one with a i3-310M CPU here it only took 294sec. Not bad I think eventhough hopelessly slow compaired to some of the other machines listed in the comments.
Both machines run Win7 home premium (the old one) and Win7 Pro both 32bit.
July 31st, 2010 at 12:17 am
I changed the OC setting on my i7-920 to (185×20) shut down all non-essential stuff and got the time down to 90 seconds. That’s with Vista x64.
July 31st, 2010 at 12:28 am
190×20 gives 87seconds but the temperatures were right up there. Definitely not a long term overclock!
July 31st, 2010 at 12:53 am
I also managed an 87 with nothing running except sophos but I don’t think I will do any better without overclocking more and I am not going to do that. I am more than happy with 87.
July 31st, 2010 at 1:23 am
79 seconds with i7 920 @ 4.20 GHz (Vista x64). That was as far as I dared to push my CPU. Running it at 3.00 GHz took 104 seconds, so simple maths dictate that 980X overclocked to 4.10 GHz should break one minute record.
July 31st, 2010 at 10:36 am
Now I’m thinking I may not actually be overclocking – my system is very similar to @Captainford except the CPU is a 930 and the MB a Gigabyte X58A-UDR3. This was bought as a “pre-overclocked” bundle of MB, CPU and memory. But the “system” panel in W7 shows “930 @ 2.80GHz 2.79 GHz”. Can any overclocker tell me: should the second frequency figure be the actual overclocked speed? (i.e. 4.0GHz in my case) .. or do I have to go into the BIOS to read the actual speed?
It’s all very confusing – I have Piriform Speccy – it shows “Stock Core Speed” at 2800MHz, but it then shows an individual “Core Speed” for each of the four cores at only 1619.2MHz. What is this about?
July 31st, 2010 at 11:39 am
@JohnAHind
The best utility I have for monitoring a system whilst doing an overclock is Core Temp. It’s a free download that shows actual speed, Base Clock speed, Frequency multiplyer, individual core loading and individual core temp (very useful that one). Always keep your i7s below 100C and preferably below 90C.
Windows only ever reports the stock speeds of your chip irrespective of any OC settings you may have specified in the BIOS.
What happens with i7s is that when they aren’t being used the multiplier drops thus reducing the overall speed of the CPU and saving energy. This is where the 1619Mhz comes from.
You won’t see the full speed until you load up the CPU for which the benchmark in question is ideal.
If your machine isn’t OC’d to start off with, disable turbo boost and gradually increase the base clock in 5Mhz steps doing an intensive 10min test each time to ensure stability.
@Lomskij
That’s an impressive OC. I would be interested to hear how you did it – what BIOS settings you are using and what CPU cooler you’ve got.
July 31st, 2010 at 12:13 pm
On an old athlon X2 4400 win 7 and 2Gb, Nvidia 240, took 636 seconds. Not a good image either-grainy
July 31st, 2010 at 12:31 pm
@Alex
Actually it’s a very basic OC: i7 920 D0 @ 4.20 GHz, CPU Ratio = 21, BCLK frequency = 200, core 1.41V.
Mobo asus rampage II extreme, cooling: megahalems + 2x 120mm nexus fans (paste mx-2), case: lian-li pc-x2000 with front fascia removed.
Running chassis fans @ 1,000 rpm and cpu fans @ 1,600 rpm, cpu temperature sits around 40C idle and 50C load, which is pretty good.
July 31st, 2010 at 12:32 pm
I have just tried an interesting experiment.With 2 cores active on the AMD 1055T 6 core processor – render time was 389 secs.So you would think that with 6 cores active it should be 2/3rds better at 130 secs.
In fact it is just over 1/3rd better at 223 secs at best.Just goes to show what happens when you have to share the available on die level 1 & 2 cache between 2 cores then 6.
July 31st, 2010 at 1:44 pm
AMD Phenom II with Liquid cooling and 4GB Ram, stock speed 3 cores on, 393 seconds, oc to 3ghz from 2.6ghz and down to 283 seconds next either more oc or stock speed with 4th core unlocked
July 31st, 2010 at 2:00 pm
Just overclocked CPU (Phenom II 710 2.6 Stock) to 3.5 ghz, 273 seconds
July 31st, 2010 at 3:26 pm
Booted XP in Safemode – and it ran 112 seconds. Hmmm, and I got the 111 seconds with Media player playing too. But I’m sure I can get it to the ninetys if I overclock. (Nice – this i7)
July 31st, 2010 at 4:00 pm
@Alex: Many thanks, Core Temp does give much more useful and credible information that Windows or Speccy! After actually applying the overclock (182×22 = 4GHz), I got a much more impressive 83 seconds, although it did max out two of the four cores at 100c during the smallpt run.
July 31st, 2010 at 4:21 pm
Got it working. 118s on Core i7 920 @ stock, 6GB DDR3, GA-EX58-UD5, W7. Thought my CPU was going to melt!
July 31st, 2010 at 8:11 pm
Ok, going extreme: 73 seconds!
Short time OC: i7 920 @ 4.5GHz (21 x 215, 1.5V), cpu reached lovely 72C during the test. I guess here my race ends, as trying to push the frequency any higher makes windows fall to BSOD :-/
July 31st, 2010 at 8:20 pm
Core i7920 at 4ghz with asus rampage 2 extreme. 83 seconds.
Core temp showed temperature rose to 69C. Interestingly, with hyper threading turned off time increased to 127s.
July 31st, 2010 at 11:39 pm
On my Dell Studio 1557 i7 Q720 laptop Win7 professional 64 bit with 4GB DDR3, no OC, I managed it in 195 sec – This seems slightly better than Dave Wright’s (#7 & 12) Sony (similar spec but he has twice the memory – not sure why)
August 1st, 2010 at 5:03 am
Q8300 @ 2.5GHz 4GB Win7 Pro
Norton 2010 off – 266 Sec
Norton 2010 on – 267 Sec
August 1st, 2010 at 9:29 am
@Sarcen possibly different back ground tasks running on our machines – mine is probably still clogged up with some Sony guff.
August 1st, 2010 at 10:40 am
So how well would it run if you implemented this article?
http://www.pcpro.co.uk/news/358027/gpu-compiler-could-turn-desktops-into-supercomputers
August 1st, 2010 at 11:33 am
It would be interesting if the source code was handed to Nvidea,ATI & microsoft to see what times we could get if it was rewritten in Cuda(Nvidea)/Direct Compute(ATI) and DX11(Microsoft) to take advantage of the GPU.I wonder what the times would be then?
August 1st, 2010 at 2:56 pm
@Rich: What cooling are you using? I have the same CPU and overclock as you and am getting the same benchmark result, but the “High” temperature is 100c. I am suspicious of this value as it is too much of a round number – I suspect it is the max of the measurement system not the actual maximum temperature, which is worrying (though no sign of instability). I have a Zalman sealed system water-cooler on the CPU so I would expect to get pretty optimal cooling performance and am disappointed my temperatures are so much higher than yours.
Interestingly Core Temp reports the multiplier increasing from x12 to x22.I would hope the multiplier would be automatically limited to keep the temperature safe, but Core Temp shows it sitting at x22 even after repeated runs with all four cores maxed at 100c.
The total power consumed by the system rose from 180w to 334w when running Smallpt!
August 1st, 2010 at 3:02 pm
@David Fearon: Could you maybe make a couple of improvements:
1. Trap the exception if the program cannot write the output file (for example if it is in Program Files and file protection prevents this). Most of us are not very interested in the output!
2. Make an auto-repeat option so the program can be used for long term burn-in tests.
August 1st, 2010 at 5:19 pm
Core i5-750@2.67Ghz, 4GB Ram, Win 7 x 64 HP:
194 seconds – Turbo Boost Off
183 seconds – Turbo boost On
Nice to know Turbo Boost makes some difference.
August 1st, 2010 at 9:39 pm
@JohnAHind: something is definitely wrong either with your cooling system or with the temperature sensors. Any single core on my i7 920 never exceeded 75C, and I run it at 21 x 200 @ 4.20GHz, 1.41v, on air. And your system doesn’t go into BSOD when reaching such ridiculous temperatures?!
By the way, what’s your voltage?
August 1st, 2010 at 11:58 pm
Hi, I bumped up my system to 190 x 21 (i thought I was running at 4ghz before but I wasn’t.) now getting a score of 82 which I am most pleased with. However during the test my core temps got to 93,93,90 and 89 respectively. This while within the limits of the processor seems very hot to me. I have got an antec 902 case with all fans on full and a Noctua CPU cooler which I am not sure of the model but it is enormous. Is this just the fact that some chips are better (cooler) than others or is there something wrong? Idle temps are 55-60. Any help would be most kindly appreciated.
August 2nd, 2010 at 8:37 am
@JohnAHind
Yeah, I have to agree with Lomskij, I think something is wrong with your cooling. I have a megahalems prolimatech cooler in an antec 1200 case. i7 920 idles at around 42-38 C and maxes out at 75C (at 4ghz) if I leave prime95 running for an hour. I think the i7 920 will not slow down until it reaches above 100C, so maybe you are right at the edge…
August 2nd, 2010 at 10:49 am
@Lomskij@Rich: I misreported some things, first I have a 930 CPU, not 920, second my cooler is a Corsair H50. The 182×22 overclock was in a profile supplied by OCUK and included a vcore boost to 1.3v and a DRAM boost to 1.64v. The overclock does eventually trigger a MB overtemp alarm set at 90c so I guess the measured values are not far wrong.
I took the overclock off and Core Temp now reports about 40c at no load rising to 70c when running Smallpt (with the overclock it was 57c at no load maxing out at 100c as reported before.
Definitely looks like the Corsair is not doing its stuff – its pump and fan are definitely running and the radiator is not excessively hot, so either the circulation is too low or the contact with the chip not good enough – I will need to investigate when I get the time.
Thanks again for your help and to David for giving us this displacement activity!
August 2nd, 2010 at 11:38 am
@billynw10 – the executable is purely a CPU test, it won’t make use of any CUDA or GPU features. But for those who’ve wondered above how fast a GPU version of smallpt would be, check out http://davibu.interfree.it/opencl/smallptgpu/smallptGPU.html
August 2nd, 2010 at 4:24 pm
Intel Xeon E5530 4GB RAM 131 Seconds.
Will run it on our dual X5670 with 32GB RAM tomorrow and post results.
August 2nd, 2010 at 4:45 pm
Work PC: Xeon W3520 @ 2.67 Ghz with 12 GB RAM – 116 Seconds.
I would imagine overclocking would make this a touch better – not going to risk melting my work PC though…!
August 2nd, 2010 at 8:25 pm
Thanks for the info, still unsure why it would be grainy. Going to check out your link cheers again
August 2nd, 2010 at 8:26 pm
@David Fearon: Thanks for the info, still unsure why it would be grainy. Going to check out your link cheers again
August 2nd, 2010 at 11:23 pm
Q6600 Quad Core OC to 3GHz, 4GB of RAM, Windows 7 64bit = 248 seconds. Fairly happy with that.
August 3rd, 2010 at 1:43 am
Assuming these tests were done at 100 samples per pixel then the Linux version seems to be dramatically quicker. I compiled from the original source not the modified version and mine completes 100 samples per pixel in 35 seconds. Core i7 980x not overclocked.
August 3rd, 2010 at 12:30 pm
Windows 7 Ultimate x64
Intel(R) Core(TM) i7 CPU Q 820 @ 1.73GHz
4GB RAM
August 3rd, 2010 at 4:18 pm
Just for the record in case I have put anyone off the Corsair H50, after reseating the pump/block on the chip I can now do 82s on my i7 930 @4GHz with the temp maxing at 80c.
@TrevorH: I think this is probably the compiler rather than Linux – better optimisation will work wonders on an app like this and David used the free version of Visual C++. If he publishes the source I will try a recompile on the commercial version.
@M Ahmed: Not much point giving us your spec unless you also give the benchmark time!
August 3rd, 2010 at 8:39 pm
Dell PowerEdge R710
Intel Xeon E5620 @ 2.40 GHz (x2)
36.0 GB RAM
Windows Server 2008 Enterprise x64
137 seconds
August 4th, 2010 at 11:21 am
Well here at Intel we could not resist the challenge, particularly when in our IT dpeartment we have overclocking nut Steve “DaFidgie” Anderson. We gave him the challenge of beating 60s but restricted him to only with a single socket setup. Last night he ran it on his rig. Result: 50 seconds to complete.
Setup details here:
http://www.flickr.com/photos/inteluknewsroom/4859279593/
August 4th, 2010 at 11:38 am
I just might have to do some testing on overclocked dual X5680s
August 4th, 2010 at 1:15 pm
App does not seem to work properly with dual CPUs. We tested on an EVGA SR2 with overclocked X5680s at 4.4ghz. 57s with one CPU enabled, 80s with both enabled, 133s with both enabled HT off. Exactly the opposite from what you would expect.
August 4th, 2010 at 2:53 pm
@Mark Laurence: interesting result that tallies with my disappointment at my own dual Xeon’s results. It’s possible that with such a tight loop there’s a thread-dispatch overhead or somesuch that’s negating the advantage. I’ll try tweaking the OpenMP pragma to make the threading slightly coarser (it’s line 79 in Kevin Beason’s code but I moved it down to the beginning of the x loop – line 82 – since that gave better performance on my Core 2 Duo) and see if that improves things on multiple socket systems.
It doesn’t surprise me too much that performance is degraded with HT on though; if all the threads are performing the same operations – as they are in this case – then the virtual cores are likely fighting each other for the same physical resources.
August 5th, 2010 at 8:04 am
Q9650
243s @ 3.0 GHz
180s @ 3.6 GHz
25.9% improvement in time with a 20% overclock
Interestingly 185s @ 3.7 GHz. I think it was overheating a bit too much and throttling CPU speed.
August 5th, 2010 at 3:36 pm
It helps if I posted the time…
168s
August 5th, 2010 at 10:27 pm
Another slow dual CPU result – dual Xeon X5472 with 16Gb RAM gave me 412 seconds…
August 6th, 2010 at 1:56 pm
Just to show you how far things have come in a relatively short time… I’ve recently decommissioned a couple of servers that have gone end-of-life as they’re around 5 years old. I replaced them with Dell R710s at pretty much exactly the same spec as Eric’s (above) so no need to repeat that result.
However, I thought I’d give it a go on one of the old boxes.
Xeon MP 2.2Ghz x 4 (quad socket, so 8 physical cores)
4GB RAM
Server 2003
710 Seconds. Those servers were the dog’s danglies when we shelled out £12,000 each for them back in the day. Now they’re only marginally faster than David’s E6300.
The march of progress eh?
August 6th, 2010 at 4:19 pm
@David Fearon: Au contraire re HT – it is definitely significantly faster with it on, as I think Mark Laurence said. I just tested it on my i7 930 @4GHz: 4core/8thread – 82secs, 4core/4thread – 124secs. Hyperthreading is actually more valuable here than overclocking. Interestingly this is also reflected in the temperatures – 12degrees hotter with HT on!
August 6th, 2010 at 5:18 pm
From the sublime to the ridiculous: MSI Wind U130 netbook (1.6Gh Atom) – 2334 Sec! However, this is a bit of a cheat as I’ve uped the RAM to 1Gb.
August 6th, 2010 at 8:51 pm
Ooo now there’s a challenge. Who can bring in the worst result? I’m sure there’s a manky old PII floating around somewhere in our ‘retired box’ area that we haven’t got round to skipping yet…
August 7th, 2010 at 1:10 am
leif@patroler:~/src/smallpt$ time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m49.354s
August 7th, 2010 at 1:16 am
289 seconds – AMD Phenom II X3 Black Edition 2.6 GHz OC’d to 3.2, 4 GB RAM
August 7th, 2010 at 1:24 am
So very tempted to try this on my AMD K6/2. However, I’d have to compile it myself as the machine runs Linux right now and I’m not sure I can face the faff!
August 7th, 2010 at 1:58 am
Core i7 920 running at 3.6GHz and 12GB of RAM, got it rendered in 92 seconds. ACPI temp never got above 41C and the highest any core got was 56C. Running a custom liquid cooling setup with all Koolance parts, cooling the CPU and both HD 5870s.
August 7th, 2010 at 2:04 am
[xxxx@xxxx smallpt]$ time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m29.127s
user 5m41.044s
sys 0m0.093s
Ha, 29 seconds
But of course I cheated – this server from my work has 32G RAM and 2 6-core CPUs (AMD 2427). OTOH, no overclocking of any kind.
August 7th, 2010 at 2:08 am
319 seconds to render on my aging phenomen II x4 980, using wine.
62 seconds when compiled from source.
August 7th, 2010 at 2:13 am
Ran it on my stock 1055t – 6 cores 2gig ram. smallpt 100 completed in only 44 seconds!
haha. beaten intel’s liquid cooling using AMDs stock fan!
August 7th, 2010 at 2:32 am
42.5s – dual i7
% time ./smallpt 100
Rendering (100 spp) 100.00%./smallpt 100 325.30s user 0.16s system 766% cpu 42.462 total
August 7th, 2010 at 2:50 am
My 2 years old dual xeon dev server took 32 seconds to run it. There must have been something wrong on your setup
time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m32.824s
user 4m17.610s
sys 0m0.060s
August 7th, 2010 at 3:28 am
475 seconds with AMD Athlon 64 X2 5600 @ 2.8GHZ, 2GB Ram, Windows 7 32 bit. Not that it matters for the render, ATI Radeon HD 5700. And yes, it does run Crysis.
August 7th, 2010 at 3:37 am
274 sec. on a 3-year-old Core 2 Quad Q9300 (2.5 GHz)
Gonna go try this on XP (second 1/2 of my dual-boot sys.) to see if I can improve on that a little.
August 7th, 2010 at 3:56 am
Oops! Doesn’t appear to work on the 64-bit edition of XP Pro.
Oh, well.
August 7th, 2010 at 4:08 am
159 secs
AMD Phenom II 1090T @ 4.2ghz on air
8gb RAM
Win 7 64 bits
Result #87 seems strange…
I had several version of vcomp90.dll in the winxs dir. Some of them prevented smallpt.exe from starting.
August 7th, 2010 at 4:17 am
Seems to run *much* faster on Linux than windows. 58s on a Phenom II 955 on Fedora 13.
August 7th, 2010 at 4:23 am
There is something seriously wrong with the optimizations in your windows binary…
Ran in 36 seconds on a 4 x 8224 SE AMD opteron IBM x-server running linux (8 total cores at 3.2GHz)
August 7th, 2010 at 4:23 am
Dell 910, Intel X7560 32 cores, 512GB, CentOS 5.5, GCC 4.4.4
time ~/smallpt 100
Rendering (100 spp) 100.00%
real 0m7.284s
user 6m42.990s
sys 0m0.107s
August 7th, 2010 at 4:33 am
Very disappointed with my result. 236 seconds running a Phenom II x4 965BE @ 3.6ghz, 4gb OCZ BlackEdition (7-7-7-30 @ 1300mhz) and Windows 7 Professional 64-bit. I was hoping <150 seconds.
I had to use a 32bit version of the dll for the application to work.
August 7th, 2010 at 4:46 am
Rendering with 100 samples per pixel: 100.00%
RENDER COMPLETE. Render time: 669 seconds.
4 year old dell walmart special AMD X2 2.1ghz AM2 socket
August 7th, 2010 at 4:46 am
Takes 22 seconds on a Quad-CPU, Quad-Core Xeon X5560 @ 2.80GHz, using 14 out of the 16 cores (was run inside a KVM instance with access to 14 of the cores).
# time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m21.964s
user 5m0.981s
sys 0m0.231s
August 7th, 2010 at 5:04 am
Rendering (100 spp) 100.00%
real 0m31.440s
user 4m6.590s
sys 0m0.020s
August 7th, 2010 at 5:06 am
addendum: forgot specs =(
870 Lynnfield
August 7th, 2010 at 5:37 am
Gave this an honest try on an IBM BlueGene/P supercomputer, but I can’t get it to compile with automatic vector optimizations (mpixlcxx -o smallpt -O5 smallpt.cpp -qarch=450 segfaults on run)… and I don’t want to futz with the source code. Without the optimizations (mpixlcxx -o smallpt smallpt.cpp -qarch=450) DOES “run”, but 256 cpu’s took > 3 minutes to compute smallpt at only 4 samples/pixel (slower than an intel atom)
August 7th, 2010 at 5:37 am
The multi-processor results seem strange. My i7 980x did it in 62 seconds.
August 7th, 2010 at 5:49 am
Intel Core i5 650 @ 3.2ghz (stock)
4gb Dual Channel DDR3
ubuntu 10.04
time ./smallpt 100
Rendering (100 spp) 100.00%
real 1m11.334s
user 4m40.100s
sys 0m0.240s
August 7th, 2010 at 6:23 am
@ schatterjee
I guess we could post screenshots; here’s mine: http://img.photobucket.com/albums/v490/drphilngood/Screenshot-smallpt-revised.png
August 7th, 2010 at 6:44 am
time ./smallpt 100
Rendering (100 spp) 100.00%
real 17m55.874s
user 32m10.921s
sys 0m4.116s
Fuck yah! 1075s on an Intel(R) Atom(TM) CPU N280 @ 1.66GHz
running Ubuntu Netbook Remix 10.4
August 7th, 2010 at 6:59 am
Takes 4.2s on a IBM Power 780, 32 cores @ 4.1GHz
# time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m4.244s
user 2m15.144s
sys 0m0.083s
August 7th, 2010 at 7:23 am
616 seconds
E6600 @ 2.4ghz
My system always runs hot, but I hit 101*C in core 1 and 99* C in core 2 at 99% complete. Guess its a good thing I don’t normally stress this system too much.
August 7th, 2010 at 7:46 am
I can beat that with 28 seconds
time ./smallpt 100
Rendering (100 spp) 100%
real 0m28.278s
user 3m43.530s
sys 0m0.270s
Running: Ubuntu 10.04 64bit
6Gig DDR3 RAM
Core i7 975 @ 4.15GHz (single socket, 4 Cores, 8 with HT)
Running the original smallpt app – I’m guessing the windows version / compiler isn’t as efficient as gcc-4.4
August 7th, 2010 at 7:51 am
101 seconds; core i7 940 @3.2, win7×64, 6GB RAM — all 8 threads pegged at 100%, wow. haven’t seen a cpu do that before!
August 7th, 2010 at 8:01 am
Eight core (plus hyperthreading) 2.26 GHz Mac Pro, Mac OS X 10.6.4:
g++ smallpt.cpp -o smallpt -O3 -fopenmp -ffast-math
time ./smallpt 100
Rendering (100 spp)
100.00%391.575u 0.267s 0:25.43 1540.8% 0+0k 0+7io 0pf+0w
Yes, you read that properly – less than 26 seconds when running 16 threads.
Still waiting for the VAX to finish…
August 7th, 2010 at 8:54 am
616 Seconds on my HP 6910 Laptop (CPU T8100 @2.10GHz 4GB Memory Windows 7 x64)
August 7th, 2010 at 8:58 am
David can you post your source code for the windows rewrite? I would like to see how you got it to work in .net.
August 7th, 2010 at 8:59 am
Can you post the source code you complied in .net? I want to see how you changed it.
August 7th, 2010 at 9:55 am
Mobile AMD Athlon 4 @ 1.2GHz. (circa 2004)
Ubuntu 10.04 Netbook Remix
$ time ./smallpt 100
Rendering (100 spp) 100.00%
real 34m49.818s
user 23m24.616s
sys 0m8.641s
Hmm. Maybe it would be worth replacing this lappy.
Predicted finish on Via C3 @ 600Mhz > 6000 seconds.
August 7th, 2010 at 11:37 am
332s on E7300@3.33GHz (and ~40C core temps) WinXP Pro 32bit
August 7th, 2010 at 12:18 pm
8567.931 seconds Via C3 @ 600Mhz running Debian Lenny.
Almost 9000!!!
August 7th, 2010 at 1:05 pm
Render Time: 342 seconds
ASUS P5Q Pro LGA 775 Intel P45 ATX Intel Motherboard
Intel Core 2 Duo E8500 Wolfdale 3.16GHz LGA 775
Mushkin Blackline 4GB (2 x 2GB) DDR2 1066 (PC2 8500) Dual Channel
Vista Ultimate SP1 64-bit
————————
Render Time with WINE: 1618 Seconds
Render Time compiled on box: 767 Seconds
BIOSTAR 945GC Micro 775 LGA Micro ATX Intel Motherboard
Intel Celeron 420 Conroe-L 1.6GHz LGA 775
CORSAIR 2GB (2 x 1GB) DDR2 SDRAM DDR2 667 (PC2 5300)
Gentoo Linux (Kernel 2.6.33)
WINE version 1.1.12
————————
Render Time: with WINE:1574 Seconds
Render Time compiled on box: 773 Seconds
Dell Inspiron 2200 Laptop (Intel Pentium M 1.7GHz & 512MB Ram)
Gentoo Linux (Kernel 2.6.27.12)
WINE version 1.1.44
August 7th, 2010 at 1:41 pm
On an dual core i3 laptop I get:
275 sec in Win7-64,
98 sec in Ubuntu-64 and
157 sec with Ubuntu-32 as guest in Virtual Box (Win7-64 host)!
Conclusion: Windows really sucks when it comes to computing, alternatively, the adaption to Windows is not as easy as the author thought.
August 7th, 2010 at 1:59 pm
ubuntu 10.04 64bit
intel i7 980x (not overclocked)
8gb
time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m22.649s
user 4m25.870s
sys 0m0.020s
August 7th, 2010 at 7:18 pm
On a Pentium 133 (no MMX)
$ time ./smallpt 100
Rendering (100 spp) 100.00%
real 292m56.066s
user 291m55.199s
sys 0m7.108s
August 7th, 2010 at 7:24 pm
AMD users, fret not. There appears to be something wrong with the optimizations in the windows binary. I ran smallpt 100 in an ubuntu virtual box machine on my Phenom II X4 965 @3.4 Ghz and got this:
sean@sean-ubuntu-2:~/Desktop/smallpt$ time ./smallpt 100
Rendering (100 spp) 100.00%
real 1m22.444s
user 5m22.656s
sys 0m1.688s
August 7th, 2010 at 7:33 pm
106s
Almost 2-year old stock Core i7 940.
August 7th, 2010 at 8:36 pm
this timings with NO overclocking but some tinkering with gcc options.
With small toys like this program it can pay a 20% improvement
root# time nice -n -19 ./smallpt 100
Rendering (100 spp) 100.00%
real 0m57.522s
user 3m41.974s
sys 0m0.352s
g++ -O3 \
-m64 \
-march=nocona -pipe \
-ffast-math \
-ftree-parallelize-loops=8 \
-funroll-all-loops \
-fopenmp \
smallpt.cpp \
-o smallpt
August 7th, 2010 at 8:42 pm
0m36.904s
Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
time nice -n -19 ./smallpt 100
Rendering (100 spp) 100.00%
real 0m36.904s
user 4m50.778s
sys 0m0.040s
g++ -O3 \
-m64 \
-march=core2 -pipe \
-ffast-math \
-ftree-parallelize-loops=8 \
-funroll-all-loops \
-fopenmp \
smallpt.cpp \
-o smallpt
August 7th, 2010 at 8:47 pm
88 seconds on my O/C’ed i7-920 d0 to 4.0 GHz running win7 x64 pro, 12 GB RAM. EVGA 3x SLI m/b.
August 7th, 2010 at 8:56 pm
wait, it may be not such important to use those flags, I’ve tried again on the i7 950 with:
g++ -O3 -m64 -fopenmp smallpt.cpp -o smallpt
time nice -n -19 ./smallpt 100 ; sensors
Rendering (100 spp) 100.00%
real 0m38.208s
user 5m1.787s
sys 0m0.020s
cpu ended at a maximum of 71°C
so maybe gcc is getting better?
gcc (Gentoo 4.4.4-r1 p1.0, pie-0.4.5) 4.4.4
why I’m getting so low times comparing to those of the majority of people here?
and yes, I’ve checked image.ppm and it’s good
???
August 7th, 2010 at 9:41 pm
Very old Sony Vaio PII 300 with 160MB Ram, 119 minutes 34 seconds running Ubuntu.
August 7th, 2010 at 10:30 pm
@Henry.
Boo, you spoilt all my fun. I thought we had a some right old sheds to hand, but you win. I curse the decision we made last year to skip the 286 that we’d been using to cook EPROMs . With a bit of foresight, victory would have been mine!
For the record, it all bowled along plenty quick on my OC’ed 920, but I’m still staggered by how crap it ran on the quad socket server rig. Wow.
August 7th, 2010 at 10:38 pm
Intel I7 Core 950
3 Gig memory
Linux Fedora 11 Kernel 2.6.3.10
——–
real 0m36.860s
user 4m50.667s
sys 0m0.097s
______
He Shoots! He Scores!
August 7th, 2010 at 11:41 pm
Amiga 1200, 60 MHz m68060:
:
34727.345u 9979.182s 12:34:08.70
:
That’s 579 minutes or so. If a 16 thread Xeon 5500 series system can do it in 26 seconds, then that’s 1350 times faster. Divide that by 16 (the number of threads the Xeon system runs simultaneously) and again by 37.75 (the ratio of clock speed (2266/60)), and the Xeon is only 2.23 times faster than the m68060!
August 8th, 2010 at 12:14 am
AMD Athlon(tm) 64 X2 Dual Core Processor 5600+
32bit 2.6.33.2
time -p ./smallpt 100
Rendering (100 spp) 100.00%
real 521.05
August 8th, 2010 at 2:13 am
127s on hp Z800 with single E5540 at a measly 2.53GHz on XP 64
August 8th, 2010 at 2:35 am
Intel i7 920 @ 4.0ghz w/ HT
Win 7 Pro x64
6GB RAM
83 seconds while doing other things in the background.
August 8th, 2010 at 4:33 am
Toshiba TE2100
WinXP SP3
512MB RAM
2596 Sec
August 8th, 2010 at 5:35 am
Could someone organize these, like the Hexus Pi-Fast Challenge did back in the day?
http://pifast.hexus.net/pifast.php
August 8th, 2010 at 7:14 am
root@eris:/tmp# time ./smallpt 5000
Rendering (5000 spp) 100.00%
real 6m3.586s
user 367m35.048s
sys 0m0.250s
Or for 100spp:
root@eris:/tmp# time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m7.849s
user 7m37.749s
sys 0m0.160s
This is a quad socket Xeon X7560 system. 32 cores at 2.26ghz (With HT on, 64 threads)
August 8th, 2010 at 10:13 am
Intel Pentium 4 (Northwood) 2,53GHz , Win XP 768MB ram.
Render time: 2367 seconds.
Maybe I should try on my linux pc (CEL @ 300MHz, 256MB ram)…
August 8th, 2010 at 10:35 am
640s on a Core2 E4500 2.2Ghz
Vista 32 4G ram
Actually, FinalRender for 3DSMAX is a lot faster than that little program. I can get much higher quality in a quarter the time.
August 8th, 2010 at 4:36 pm
This code is wrong.
It relies on linux’s erand function behavior.
But the compiled version uses different erand’s and that’s why this is so much slower than linux’s version.
Really, this is the code’s author fault. The computational complexity of code depends on random values from erand function. Way to go.
August 8th, 2010 at 5:03 pm
Processor: Intel(R) Core(TM) i7 CPU 920 @ 3.8GHz,
OS: ArchLinux x86_64,
Result: 100spp in 28.4 seconds.
August 8th, 2010 at 10:06 pm
21 SECONDS.
Dual AMD Opteron 12 Core (24 Cores total), with NO SPECIAL COOLING (just a basic Dynatron A6 fan for each CPU).
Here’s the catch, I had to run it in Ubuntu 9.04 x64. The Windows version distributed on your site was behaving eratically (it kept alternating between freezing for ~ 20 seconds and bursting, so it took 595 seconds to finish on Windows, which was unjustifiably slow). Any ideas why it’s bottlenecked on Windows?
Unless Intel’s rig was similarly bottlenecked on Windows, I think the AMD Opteron Magny Cours beats the Intel i7 on performance AND price (total machine cost less than $2500) hands down.
August 8th, 2010 at 10:13 pm
Running the process with high priority on Ubuntu, I got it down to LESS THAN 19.9 SECONDS, using Dual AMD Opteron 6168 Magny Cours CPUs.
time nice -n -20 ./smallpt 100
Rendering (100 spp) 100.00%
real 0m19.900s
user 7m33.940s
sys 0m0.110s
August 8th, 2010 at 10:15 pm
oops, I meant to say LESS THAN 20 SECONDS…
I will now remove one of the CPUs to see how a single CPU AMD Opteron 6168 compares to the i7 980x.
August 8th, 2010 at 10:42 pm
38.36 SECONDS –> SINGLE CHIP AMD OPTERON 6168 Magny Cours(12 Cores total).
I measured this after removing the second CPU from my ASUS KGPE-D16 board. Again, this was done with NO SPECIAL COOLING beyond a basic Dynatron A6 fan. Compared to Intel’s monstrosity, they should be embarassed with 50 seconds.
time nice -n -20 ./smallpt 100
Rendering (100 spp) 100.00%
real 0m38.360s
user 7m30.590s
sys 0m0.080s
The cost for this CPU is only $750USD and the fan cost me 20 bucks (compared to $999 for the i7 980x + God know’s how much for Intel’s freezer box). AMD, are you listening?
August 8th, 2010 at 10:57 pm
Ooh, you might be onto something there, thiscodeiswrong (#140).
I think it’s unfair to say it’s the author’s “fault”. They targeted the code to GCC, so it’s really an oversight of this blog’s writer. The behavioral difference in random function is an implementation detail.
For reference, I tried this on my x3650, 2-way E5420 (they don’t do HT). All stock, production server, RAM is irrelevant: 40sec for 100spp.
Buoyed by this rather pleasing speed, I’m running a 3840×2400 render at 10k spp. For more fun, I’ve tweaked the scene a little, making the red wall specular, added another small mirror-ball, and reduced transmission/reflection on the balls to make it look a bit realer.
Here’s one I prepared earlier:
http://furinkan.meidokon.net/img/smallpt_1920×1200_5000spp_hacked_layout_tweaked_walls.jpg
August 8th, 2010 at 11:17 pm
@ Ho Tuan
“Intel’s monstrosity”, is at a huge disadvantage to boxen, like yours, testing with Linux, and would beat your scores if running Linux while destroying them if not limited to one socket, as well. Not to mention the fact that there are much cheaper i7s that would also beat your score.
AMD make great CPUs, at a good price, but so does Intel.
August 9th, 2010 at 12:12 am
@ drphilngood
Point taken. I want to see how the Intel machine performs on Linux. I am genuinely curious. For standardization, let’s say Ubuntu. I couldn’t really do a fair test on a Windows machine because of issues with the version of smallpt Fearon distributed.
Regarding your point that cheaper i7’s beating out an AMD Opteron Magny Cours, it is totally unfounded. Perhaps you should instead criticize the fact that it is unfair for me to compare a 12 core (single-socket) and 24 core (dual-socket) system to Intel’s 6 core CPU, or that it is unfair to compare server grade CPUs to consumer CPUs. Beyond that, it should not be at all surprising that a 12 or 24 core system could beat out a 6 core system at a task that is inherently multithreaded (even if, per core, the AMD chip is way lacking).
That said, I think it is ridiculous that many of the Intel i7s are more expensive than the majority of AMD’s server-grade CPUs. Kudos to Intel for pushing the frontiers on per-core performance, but it is hard to justify the cost of that performance to customers when you’re playing a game of diminishing returns.
My company build HPCs and servers around Harvard/MIT and it is ridiculous how expensive Intel chips are.
Let’s just say that there is a good reason why the world’s fastest supercomputer, the Cray XT5 uses only AMD Opterons. You simply can’t get that much computing power with Intel without breaking the bank.
August 9th, 2010 at 12:33 am
@ Ho Tuan
Ho Tuan said:
“Regarding your point that cheaper i7’s beating out an AMD Opteron Magny Cours, it is totally unfounded.”
___________________________________
See the scores above. My own i7 870(see comments 100 & 101, above), in a rig I built about a year ago, cost only around $500USD, then, which is cheaper than the $750USD, you claim to have paid for your AMD Opteron 6168 Magny Cours.
Here are the scores:
Yours -> “38.36 SECONDS –> SINGLE CHIP AMD OPTERON 6168 Magny Cours(12 Cores total).”
Mine -> 31.440s(see comment #100 & 101) Intel i7 870(4 Cores total)
…and there are many similar scores, above.
Now, how is my statement, “unfounded”?
August 9th, 2010 at 2:40 am
A version that uses Boost Random is posted here: http://pastebin.org/459419. Same performance on Linux. Should give much better performance on Windows. Tested with Boost 1.38 and 1.41.
August 9th, 2010 at 4:04 am
@drphilngood
I stand corrected. I didn’t notice your posts amidst the flurry of discussion.
Question: Your “rig I built about a year ago” somehow beat out a stock i7 980x on Linux, the prize machine in question. How the heck did you manage that?
————————
Post #66 TrevorH:
Assuming these tests were done at 100 samples per pixel then the Linux version seems to be dramatically quicker. I compiled from the original source not the modified version and mine completes 100 samples per pixel in 35 seconds. Core i7 980x not overclocked.
———————————
Ultimately, I’m guilty of making an unfair comparison between server and consumer chips. Server chips are underclocked and then priced higher in exchange for guaranteed reliability and cooler chips. The Opteron 6168 is, by designed, underclocked to 1.9GHz (even though the die, if sold as a Phenom, could
in theory run at ~3 GHz with stock coolers). My Opteron 6168 stayed at 45C at 100% utilization during the duration of the test on stock coolers. Can your i7 do that?
August 9th, 2010 at 4:31 am
@ Ho Tuan
Like many, if not most, of us here, I overclock, but you knew that, didn’t ya’? Didn’t you mention, earlier, that your rig beat “Intel’s monstrosity”, that’s “overclocked to nearly 5GHz.”
Hopefully, a screenshot will eliminate any lingering doubts you might have, though:
http://img.photobucket.com/albums/v490/drphilngood/Screenshot-smallpt-revised.png
August 9th, 2010 at 4:42 am
addendum:
Sorry, I thought the rest of your post was directed at TrevorH.
No, but each core idles at between 30C & 34C and the hottest core only reached 59C during the test. In addition, the cores jumped back to idle temps within a couple of seconds of completion.
August 9th, 2010 at 5:34 am
Ok. Here is a fast one:
Pentium II 450Mhz 256Meg ram
Linux Fedora 4 Kernel 2.6.11
74 Minutes 9 Seconds.
PS: Ho Tuan, is AMD paying you?
August 9th, 2010 at 6:10 am
84 Seconds
•i7 860 @ 4GHZ.
•8GB 1333 RAM.
•Asus P7P55D Deluxe
August 9th, 2010 at 6:15 am
Done!
486@500MHz (AMD Geode LX800, Alix1c board with 256Mb, ubuntu server 8.4, while running some extra things triggered by crond, stock kernel)
=178 minutes, 22 seconds
That’s over 10 700 seconds…beat that Davek (#115)!¨ Only Henry (#120) has worse time, I have to pull my 486@66 from attic to bet him!
August 9th, 2010 at 6:48 am
@drphilngood
So let me get this straight…
A single, STOCK, AMD Opteron Magny Cours ($750), that is arguably way underclocked, finishes the job in 38 seconds, 3 seconds behind a single, STOCK, i7 980x ($1000). Double the AMD up and it finishes in 19.9 seconds (cost $1500). Last I checked, you can’t double up an i7, but you can double up the i7 980x’s server version, the Xeon 5670 (total cost $3200). Now the Xeon 5670 is like a slightly underclocked i7 980x, so it’s not surprising that it lags behind Magny Cours in multithread-application benchmarks (http://www.anandtech.com/show/2978/amd-s-12-core-magny-cours-opteron-6174-vs-intel-s-6-core-xeon).
Huh…
In my line of work, servers have to work reliably at low TDP without racking up our customer’s AC bill, so overclocks don’t count in the cost/performance comparison. Also, 59C is way too hot if I need to build HPC clusters capable of crunching numbers at 100% capacity 24/7 for weeks on end.
@Stan
No. AMD is not paying me. Unlike Intel, I doubt they have the extra cash to pay me.
August 9th, 2010 at 7:03 am
Hiiiiiiii everybody!
@Stan
Agreed–AMD should totally pay @Ho Tuan.
@DrFeelinGood
Is Intel paying *you*?
August 9th, 2010 at 7:09 am
@ Dr. Nick
Shhhhh, don’t tell or the IRS will put me in a higher tax bracket, again. :p
August 9th, 2010 at 7:40 am
@ Ho Tuan
That’s all fine and good but, if you remember, our conversation started because you thought you had beaten the score of “Intel’s monstrosity”, and stated that, “they should be embarassed with 50 seconds.” So, I thought I’d do you a favor and give you the 411 on why there was such a huge discrepancy in the scores. After that, I have just been answering your questions.
Yes, you keep telling us that your Magny Cours’ are server chips; we all understand that. However, the OP finished the article by asking for “results, for machines both old and new.” So, if you feel that your machine is at a disadvantage, I’m sorry but it was YOUR decision to post YOUR results. Perhaps, later on, they’ll have some sort of server benchmark that you’ll do better in.
_________________________________________
To all:
I support both AMD and Intel; I just want a fast CPU and could care less where it comes from. *Looks over at server containing recycled AMD socket 939 4800+ that cost me $1K USD a half-dozen years ago* Since I realize that only competition will insure that CPU performance keeps improving without prices skyrocketing, I hope both companies survive so their competition will continue.
August 9th, 2010 at 10:47 am
@thiscodeiswrong: The Windows-compiled version does in fact use erand48(), taken from the FreeBSD resource at http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/gen/erand48.c
. Possibly the erand48() that Kevin Beason uses is quicker, or perhaps sqrt() is super-optimized by g++ (there’s a lot of sqrt in there).
@vivo and others compiling it with gcc/g++: yes it is faster when compiled under g++ but I never said it wasn’t
It was intended as a quick CPU test, not an exercise in code optimization. Nonetheless I’m interested to see where the speed-up comes from under g++. If anyone fancies profiling the code please feel free!
August 9th, 2010 at 2:46 pm
Running Ubuntu 10.04 on Core i7 980x without overclock.
root@localhost:/test# time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m36.231s
user 6m31.300s
sys 0m0.110s
August 9th, 2010 at 3:10 pm
@drphilngood
.
No, you weren’t answering @Ho Tuan’s questions, you were failing to see his point. He was comparing the performance of Intel’s, high-priced, super-overclocked rig to a low price, standard AMD high-performance chip that seemed to behave just as well, maybe even better, to make a point on economics. You obviously missed that by throwing your overclocked i7 at him to refute his econ point.
It was premature for him to claim that he “beat” the Intel rig, but @Ho Tuan acknowledged in his first post, that we still don’t know how the Intel rig does in Linux, so its hard to tell. Based on everyone else’s score, he still had one of the best single chip scores (esp. for an unoverclocked, single, chip stock CPU), and with two Opterons, the best score, with the exception of that Quad socket Xeon X7560(7.849s!), but those chips cost $4,000 a pop.
I’m an Intel guy myself, but even I will admit that AMD clearly wins on economics, and you’d be hard pressed to find someone who doesn’t agree. I just want to have the fastest chips, even if I go broke
If you want to continue your pissing contest, take it to the other forum (though it looks like there’s more AMD love over there).
August 9th, 2010 at 3:16 pm
E8400 @ 4.0GHZ 262 Seconds!!
August 9th, 2010 at 4:25 pm
@David Fearon: It isn’t GCC, it’s the Linux implementation of erand48() that’s at play. All of the other erand48() implementations I’ve seen appear to use a mutex. This affects real scalability. Linux doesn’t because it offers a re-entrant version, erand48_r(). The version used in the smallpt test isn’t really thread safe, contrary to the software authors assertions.
The version posted above should be about as fast as the Linux implementations on all platforms. And it is thread safe without needing a mutex, since there is a separate PRNG assigned to each thread.
August 9th, 2010 at 4:33 pm
@ jackinthebox
I will respectfully disagree. Perhaps if you read more of our discussion, you will see things differently.
Yes, he had a great score with two CPUs, and a good one with one, but from his post in this thread and in this one:
http://www.pcpro.co.uk/blogs/2010/08/06/intels-own-superchilled-test-rig/#comments
that:
…it was obvious he didn’t understand that he was comparing his Linux score to the Intel rig’s Windows score and I told him so.
The OP asked us to post our smallpt scores, not to debate which CPU designer/manufacturer is better/cheaper, so I fail to see any pertinent point that you say I missed. Furthermore, I wasn’t “throwing” my “overclocked i7″, at him, I never mentioned it *to him* until he said, “Regarding your point that cheaper i7’s beating out an AMD Opteron Magny Cours, it is totally unfounded.” Perhaps you should read/reread our entire exchange.
Yes, as I’ve already acknowledged, I believe that AMD makes great chips and I support both with my purchases. I’m not sure I would say “that AMD clearly wins on economics”, but they very well might and I never said they didn’t, either.
Finally, your “pissing contest”, remark seems strange, and out of place, in a thread in which the OP asked for benchmark scores. However, I don’t feel that I’ve misbehaved, in any way, and I don’t think it is your place to tell me “take it to the other forum”, nor was it necessary.
August 9th, 2010 at 6:47 pm
What are the modification did you make to get the smallpt.cpp to compiled in MSVC++ 2010? Can you make the code available?
August 9th, 2010 at 7:33 pm
My dual 2.66 GHz MacPro (X5550, total 8 cores/16 threads) nails this in 21.3 seconds. As pointed out by JohnAHind above, this is probably a result of better code (and perhaps better OpenMP support) from gcc 4.2 compared to the free version of Visual C++ rather than any difference in the operating system.
lyon:~ nicko$ time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m21.332s
user 5m20.907s
sys 0m0.699s
August 9th, 2010 at 10:38 pm
Ouch, 239 seconds on Win7 with a Phenom II x4 955 and 4GB of DDR2 RAM.
@drphilngood
Everyone was contributing benchmark scores to this list, including @Ho Tuan, until you decided to open your mouth. The ratio of your ego to actual test score contribution on this list converges on infinity. Do us all a favor and muzzle yourself. It’s guys like you that take the fun out of this.
August 10th, 2010 at 4:48 am
213 seconds on my Core i3 530 2GB DDR3
August 10th, 2010 at 5:44 am
Just for laughs, I decided to try running it with Wine and my 870 Lynnfield managed 100 samples per pixel in 88 seconds. If anyone wants to also try it, but experiences problems, just lemme’ know and I’ll try to help. =)
Think I’ll try running it on the ole’ x2, now.
@jackinthebox
You are entitled to your opinion but please stop with the personal attacks. I have nothing else to say to you.
August 10th, 2010 at 7:11 am
@jackinthebox
I appreciate your support, though let’s try to keep it peaceful from now on.
Let’s all simma down now.
August 10th, 2010 at 2:23 pm
2xAMD8431 (12 cores total)
Linux RHEL5.5 64 bit
$ g++ -O3 -march=amdfam10 -fopenmp -ffast-math smallpt.cpp -o smallpt
$ time ./smallpt 100 2>/dev/null >/dev/null
real 0m30.271s
user 5m53.548s
sys 0m0.950s
August 10th, 2010 at 2:38 pm
@AMD12
Awesome share; duplicating your test knocked nearly two full seconds off my score. Kudos!!
real 0m29.650s
user 3m54.020s
sys 0m0.020s
{Lynnfield 870}
August 10th, 2010 at 5:10 pm
My last one – here’s a VAXstation 4000/60:
:
249192.062u 4637.635s 71:55:41.77
:
Seventy-one hours. Darn – I thought it’d have been a little faster…
August 13th, 2010 at 8:12 am
My Athlon 64 X2 workstation: 667s. Stupidly expensive 4×4 Xeon system from Dell: 815s, but the app was only using half the cores (why?) so let’s call it 407s.
August 17th, 2010 at 1:57 pm
@John Klos
Oh no! I’ve been beaten!
September 3rd, 2010 at 3:29 pm
Xeon E5430 @ 2.66Ghz
IBM ThinkStation.
Quad core I think, running Server 2008.
smallpt took 363 seconds! I’m pretty impressed!
September 12th, 2010 at 2:40 am
E4400 2gb ram with 64bit win7 took 745sec
September 19th, 2010 at 6:17 pm
My AMD Phenom II X4 955 Black editon processor at stock speed of 3.2Ghz with 4 Gb of g-skill 1600mhz ddr3 ram running windows 7 pro 64 bit managed 240s! Im so pleased!
September 21st, 2010 at 11:10 pm
Just a few….
AMD Phenom(tm) 9550 Quad-Core Processor
Rendering (100 spp) 100.00%
real 1m27.508s
user 5m46.590s
sys 0m0.270s
Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
Rendering (100 spp) 100.00%
real 2m41.740s
user 10m11.720s
sys 0m0.020s
Intel(R) Xeon(R) CPU E5506 @ 2.13GHz (4 core)
Rendering (100 spp) 100.00%
real 1m55.870s
user 7m41.781s
sys 0m0.084s
AMD Sempron(tm) Processor 2600+
Rendering (100 spp) 100.00%
real 14m27.438s
user 13m11.683s
sys 0m1.731s
November 7th, 2010 at 1:34 pm
Sony Vaio P Series Shamefull Girlie/Handbag laptop that barely handles YouTube
1833 seconds – performance you can be truely proud of.
December 3rd, 2010 at 10:24 am
NEC Versa L1101
Pentium M 1.8GHz, 1GB DDRI RAM
Windows 7 Ultimate 32-bit
(Using X-OS Transformation Pack)
1781 seconds
February 28th, 2011 at 9:35 am
mine goes here: interl i3 2.93gb
32 seconds
March 14th, 2011 at 6:50 pm
288 seconds
2x Dual Core Intel Xeon 5160 3.0Ghz
4Gb ram
March 19th, 2011 at 12:15 am
My Dual Core AMD Fx60 2.61GHz with Twin Invidia 7800GT (SLI cards) 539 seconds. I was quite pleased considering it is now 4 and a bit years old.
May 29th, 2011 at 8:15 am
Intel Core 2 DUO E8500 @ 3.8ghz
4gb ddr 2 ram
win 7 x64
312 seconds…..
June 23rd, 2011 at 2:24 am
Intel i7 2600K @ 4.1ghz
16gb hyperTX 1666
win7 x64
71 seconds…. end….
July 25th, 2011 at 6:48 am
Intel core i7 740QM @ 1.86Ghz
8GB RAM
win7 x64
3 runs: 175~177 seconds
August 28th, 2011 at 6:26 am
Intel Atom N 230 (single core, Ubuntu 10.04 (64 bit) command line only)
12 min, 18.466 s
Intel P4, 3.06 GHz, Ubuntu 11.04 32 bit, 12 min 42.878
Intel i5 460 (4GB Ram) Win 7 Pro 64 bit, 248 seconds
AMD Athlon 3500 + (2 GD Ram) Ubuntu 10.04 (64 bit) 366 s
September 10th, 2011 at 6:19 pm
Intel i7 2600k @ 4.5 Ghz
Corsair Vengeance 16GB
Win 7 x64
67 seconds
October 20th, 2011 at 6:08 am
Laptop with Intel i7 Q720 Quad Core @ 1.6 GHz – 364s [6m 4s]
November 6th, 2011 at 3:48 pm
Maybe you should try seeing both sides of this issue instead of assuming that yours is the only valid opinion. Id still read it, I like the way you write. But I can see some people getting upset
November 21st, 2011 at 11:27 pm
The things i have seen in terms of computer system memory is the fact there are specifications such as SDRAM, DDR etc, that must fit in with the technical specs of the motherboard. If the personal computer’s motherboard is very current while there are no operating system issues, modernizing the memory space literally takes under a couple of hours. It’s on the list of easiest personal computer upgrade techniques one can visualize. Thanks for discussing your ideas.
December 20th, 2011 at 5:47 pm
time ./smallpt 100
Rendering (100 spp) 100.00%
real 0m32.854s
user 4m15.115s
sys 0m0.162s
This is on an early 2011 15″ Macbook Pro, core i7 2.3 Ghz (2820QM) running under OSX 10.7.2 with 16GB RAM.
December 20th, 2011 at 5:50 pm
I didn’t realise how long ago this article was written, but I wondered about the 1 minute challenge and just over a year later that doesn’t seem too hard to beat. It’s good to see that we are indeed seeing some progress with CPU technologies.
)
January 22nd, 2012 at 10:08 am
intel i7 3930K@4.6ghz, 16gb , windows ult 64, took 51 seconds