

During this monthโs briefing on the Mac Pro, Apple said, paraphrasing, that they tried to put a TITAN X in the 2013 Mac Pro, but it was too hot and caused the CPU to throttle. That was my interpretation at least.
Apple has a tendency to put underpowered GPUs in their desktops, so itโs great to hear theyโre seeking to improve this for the next Mac Pro. Theyโve gotten plenty of feedback that โproโ users want more powerful GPUs, such as NVIDIAโs GTX 1080 or 1080 Ti.
Decades ago, CPUs added SIMD units (e.g. Intelโs MMX) for number crunching. Despite techniques to improve use of the CPU, thereโs no denying a recent shift away from the CPU to the GPU for this kind of computation.
GPUs have become so powerful that weโre using them more, leading to a rise in general-purpose computing on GPUs. Itโs a prominent time for GPUs now, AMD will be competing at the high-end again with their Vega series, and NVIDIA Volta is coming soon after that.
โMost of the software out there thatโs been written to target [certain kinds of high-end cinema production tasks] doesnโt know how to balance itself well across multiple GPUs but can scale across a single large GPU.โ
~ Craig Federighi
Itโs true some apps donโt take much advantage of multiple GPUs, but the real point is that old weak GPUs wonโt compensate for a new powerful one, and we shouldnโt generalize beyond that context. Itโs fairly likely that Apple will sell a Mac Pro configuration in 2019 containing two GPUs.
A few years ago, if you were to buy a new Mac Pro, you could spend as much to upgrade the CPU ($3500) as it cost for the whole machine by itself ($3000).
Earlier this month, Apple updated their Mac Pro configurations and pricing to something much more reasonable by todayโs standards. A CPU upgrade from 6-core to 12-core is now merely $2000:
That cost matches the difference in Intelโs list price between the 6-core ($580) and 12-core ($2600) processors, so Apple is simply passing along Intelโs suggested pricing.
Note that the clock speed of the 6-core unit is 30% faster than the 12-core one, so youโd be upgrading to slower cores. Apps that rely upon single-core performance will actually run slower on the more expensive CPU.
Now consider that $700 will buy you an NVIDIA GTX 1080 Ti. Last month this was the most powerful GPU available to the public. For about the same cost as adding 6 CPU cores, you can have 3 of the most powerful consumer GPUs in the world.
Itโs a shame you wonโt be able to plug all three of those monsters into the next Mac Pro. Assuming, that is, Apple designs it with similar expansion constraints as the cheese grater tower, which was the most expandable Mac Pro theyโve ever made.
Apple supported dual GPUs before the 2013 trash can Mac Pro. In 2012, the cheese grater Mac Pro had a stock configuration that came with two AMD Radeon HD 5770 GPUs.
As with the GPUs in the subsequent 2013 Mac Pro, these were considered to be mid-range, not high-end. They are rated under 110-watts TDP, so they draw less wattage and generate less heat than high-end GPUs (250 watts TDP is typical nowadays) of the sort that youโd want in a workstation-class tower.
โBut you could replace those cards with more powerful ones,โ one might say. While thatโs possible, the box wasnโt designed to allow it.
It has a decent power supply, 980 watts. However, it provides only 150 watts to each video card, or a total of 300 watts. To support a pair of 250-watt GPUs, like the GTX 1080 Ti, youโd have to supply at least 200 watts more power somehow, possibly by wiring additional power cables from the drive bay ports or from the remaining two expansion slots.
Back in 2009โ2012 when this Mac Pro model was being built, high-end GPUs tended to be 180โ210 watts each, but even then youโd still be short at least 60 watts. In fact, Apple offered an alternative Mac Pro configuration with a single high-end AMD Radeon HD 5870, rated at 228-watts TDP. You couldnโt insert another one without additional custom wiring. The cheese grater case was simply not designed for a pair of high-end GPUs.
On the other hand, this Mac Pro had a replaceable CPU module that allowed upgrading the system from single to dual CPUs. Thatโs pretty coolโโโin most other computers, youโd have to replace the motherboard to upgrade to dual CPUs. But nowadays youโd want this kind of upgradeability with GPUs.
OK, so letโs suppose Appleโs next Mac Pro is modular and expandable, like the cheese grater tower, and fitted with the latest tech: PCIe 4.0, Oculink-2, Thunderbolt 3, USB-C 3.1, M.2 NVMe SSDs, and LGA-2066 or possibly LGA-3647 sockets for high-end CPUs with 44 lanes of PCIe connectivity.
That sounds promising, but how many high-end GPUs will it support? Maybe two, if you fiddle with it?
This question reminds me of those people last year who were saying no one legitimately needs more than 16 GB RAM in a Macbook Pro.
Three GPUs in a computer might seem ridiculous. Amongst the single digit percentage of Mac customers who buy a Mac Pro, an even smaller fraction would need more than two GPUs.
Thereโd never be a controversy about Apple supporting only two GPUs. But if one came up, a person might claim that no one legitimately needs more than two GPUs. Then a person would be wrong.
If you try to make an AI program to recognize 1000 types of objects in a photo, one of the things youโd find out is that even with multiple GPUs it can take weeks or months to train a neural net.
We can train a model from scratch to its best performance on a desktop with 8 NVIDIA Tesla K40s in about 2 weeks.
~ Jon Shlens, Googleย Research
The common advice to avoid this delay is to adapt one thatโs already been trained, using a technique called โtransfer learningโ. But depending on what you want to do, you canโt always be using someone elseโs pre-trained neural netโโโat some point youโre stuck in a process where each iteration could take weeks. If at that point youโre using only one GPU, adding a couple more would reduce the turnaround time significantly.
A common problem with neural nets created by โsupervised trainingโ is they need a huge amount of training data or else slight variations in input will fool them. That is, suppose you train a neural net to recognize cats, but your training data consists of photos where all the cats are sitting upright. It might not recognize a cat thatโs upside down. Your program will handle these kinds of differences better if you augment the training set by adding rotated and resized copies of the originals.
In a notable example, Baidu Research made a breakthrough in speech recognition, aided by layering noise over their initial data set.
Baidu gathered about 7,000 hours of data on people speaking conversationally, and then synthesized a total of roughly 100,000 hours by fusing those files with files containing background noise.
~ Derrick Harris,ย GigaOm
But then they had 14 times more data, which takes that much longer to process. A few years ago Baidu used 8 GPUs. More recently they reported working with as many as 40 or 128 GPUs.
Recently some of the coolest results in AI are based on a technique where two neural nets compete against each other (โgenerative adversarial networksโ) to improve their ability to create and recognize types of data.
With a larger model and dataset, Ian [Goodfellow] needed to parallelize the model across multiple GPUs. Each job would push multiple machines to 90% CPU and GPU utilization, but even then the model took many days to train.
~ OpenAIย Blog
That means youโre training two neural nets, not just one. Again, youโll want as much computing power as you can get.
When I started doing deep learning work, I used GPU spot instances on Amazon cloud. After running up a small bill, I began looking into external GPU boxes, but ended up assembling a PC.
The GPUs on AWS are now rather slow (one GTX 1080 is four times faster than a AWS GPU) and prices have shot up dramatically in the last months. It now again seems much more sensible to buy your own GPU.
~ Timย Dettmers
I bought just one GPU. But knowing that I might need to add more, I made sure that the PC would be able to take up to four.
Itโd been a long time since Iโd built a computer from components, and way back then I wasnโt paying attention to how many GPUs I could put in it. To my surprise, many computers cannot support two GPUs at full bandwidth, and most cannot support more than two. There are only a few expensive motherboards that can support 4 GPUs simultaneously with x16 lanes of PCIe 3.0 connectivity.
The ability to add 3 more GPUs cost me about $600. Thatโs about one-third of what the base system (w/o GPUs) would cost if built with standard components. This amount represents the higher prices for a quad-GPU capable motherboard, a big 1500-watt power supply, and a CPU with 40 PCI lanes (as opposed to 28 lanes). I literally paid extra for expandability itself.
The Mac Pro is positioned as Appleโs most powerful computer, but computing power is coming in various forms. High-speed expansion slots provide the ability to upgrade computing power in whatever form it takes in the future. Weโre not just talking about GPUs hereโโโit might be FPGAs, or ASICs like Googleโs TPU. Or fast storage.
Computer manufacturers generally do not sell a single line of desktop computers with different cases. For example, you donโt usually see something like this:
Instead, the models of a computer line are positioned according to CPU speed, storage, and/or screen size. So this is whatโs typical:
But doesnโt it make sense to sell optional expandability? It has value. You can put a price on it.
I know what youโre thinking: โThereโs no way Apple would do this. Itโs not worthwhile because there arenโt enough customers for the bigger Mac Pro.โ
Youโre probably right.
And besides, itโll have Thunderbolt-3, and maybe Oculink-2 which is even faster, so people can connect external GPUs if they need themโฆ you know, like this:
Anyway, Iโm looking forward to seeing how Apple rethinks the Mac Pro.
When we hit three GPUs, we are technically in a niche categoryโฆ From talks with ASUS, despite the fact that a product may be geared towards a niche market, that product may sell well to the standard market if it is perceived to be good.
~ Ian Cutress, AnandTech (review of multi-GPU boards)
Asking the tough questions.
Yes, but cryptocurrency algorithms can run on a GPU without a lot of data transfer, so miners could use PCI riser cables or a PCI splitter, and connect each GPU using just one PCI lane. That means they didnโt need a 40-lane CPU and a quad-x16 slot motherboard.
Probably, at least Nervana and Groq are working on that. In the meantime, GPUs have versatility that make them useful for the foreseeable future.
Maybe, but then you might just get more ASICs on each expansion card.
Oh yeah, those too I guess.
Itโs true that games show no significant difference, but thatโs probably because theyโre tuned to run well on GPUs at x8. Games donโt saturate a x16 PCIe 3.0 connection, but other apps like deep learning programs can.
Right, 4ร16 = 64, which is more than 40. The motherboard has two PCIe switches which each can multiplex 16 lanes of signal from the CPU to 32 lanes for two slots. So the 4 slots use only 32 lanes from the CPU.
Apple used a similar kind of PCIe switch for the Thunderbolt 2 controllers in the 2013 Mac Pro.
Some people say the latency is so bad that itโs better to just use 8 lanes, but thatโs a myth. Apparently this idea spread due to game testing that showed slightly higher frame rates on x8 GPUs than on switched x16 GPUs. But games are not good for this kind of benchmark because they donโt exceed x8 bandwidthโโโyou can get similar results on a unswitched x16 slot.
I havenโt seen any good tests, but an NVIDIA benchmark showed very little latency or loss in bandwidth across a PLX (Avago/Broadcom) switch.
Here it is next to the cheese grater:
OK, so I have to vent a little about motherboard and case design. Cool desktop computer designs are totally thwarted by the way uATX, mITX, and ATX boards are standardized. This sector has so much room for innovation with non-standard motherboards and backplanes.
A lot of people are calling for Apple to just put the next Mac Pro in a big box case, but I think itโd be a shame if thatโs exactly what Apple does. They went to extremes with the 2013 Mac Pro but reverting to a standard tower box isnโt so appealing either.
I wanted a smallish form factor but the motherboard is so big (extended ATX) that it would have to go in a fairly large case, and I didnโt like anything I saw. So I experimented with MakerBeam rails instead of imitating NVIDIAโs DevBox with a Carbide Air 540 case. It ended up being taller than I intended, but slim in other dimensions.
A single-fan liquid cooler has been completely adequate for the CPU. The GPU is connected with a x16 riser cable and can slide all the way over the CPU, since the CPU doesnโt have a huge heat sink on top of it. That means multiple GPUs can be spaced apart with wide air gaps.
Itโs surprisingly quiet with two 140mm fans, even under sustained heavy load. What did make a huge amount of noise was a 5TB hard drive that was hammering away while feeding images to a convnet. After a couple annoying days of that, I switched to a SSD. It has no drive baysโโโthereโs a M.2 drive and additional SSDs can be velcroed on a side rail.
Create your free account to unlock your custom reading experience.