Before you go, check out these stories!

0
Hackernoon logoThe Mac Pro: A Case for Expansion by@tompark

The Mac Pro: A Case for Expansion

Author profile picture

@tomparkTom Park

Some things I learned while building a GPU rig for deepย learning

During this monthโ€™s briefing on the Mac Pro, Apple said, paraphrasing, that they tried to put a TITAN X in the 2013 Mac Pro, but it was too hot and caused the CPU to throttle. That was my interpretation at least.

Apple has a tendency to put underpowered GPUs in their desktops, so itโ€™s great to hear theyโ€™re seeking to improve this for the next Mac Pro. Theyโ€™ve gotten plenty of feedback that โ€œproโ€ users want more powerful GPUs, such as NVIDIAโ€™s GTX 1080 or 1080 Ti.

2017 Mac Pro concept by Pascal Eggertโ€Šโ€”โ€Š[Source]

Decades ago, CPUs added SIMD units (e.g. Intelโ€™s MMX) for number crunching. Despite techniques to improve use of the CPU, thereโ€™s no denying a recent shift away from the CPU to the GPU for this kind of computation.

GPUs have become so powerful that weโ€™re using them more, leading to a rise in general-purpose computing on GPUs. Itโ€™s a prominent time for GPUs now, AMD will be competing at the high-end again with their Vega series, and NVIDIA Volta is coming soon after that.

โ€œMost of the software out there thatโ€™s been written to target [certain kinds of high-end cinema production tasks] doesnโ€™t know how to balance itself well across multiple GPUs but can scale across a single large GPU.โ€
~ Craig Federighi

Itโ€™s true some apps donโ€™t take much advantage of multiple GPUs, but the real point is that old weak GPUs wonโ€™t compensate for a new powerful one, and we shouldnโ€™t generalize beyond that context. Itโ€™s fairly likely that Apple will sell a Mac Pro configuration in 2019 containing two GPUs.

So you need a fastย computer

A few years ago, if you were to buy a new Mac Pro, you could spend as much to upgrade the CPU ($3500) as it cost for the whole machine by itself ($3000).

[Source]

Earlier this month, Apple updated their Mac Pro configurations and pricing to something much more reasonable by todayโ€™s standards. A CPU upgrade from 6-core to 12-core is now merely $2000:

[Source]

That cost matches the difference in Intelโ€™s list price between the 6-core ($580) and 12-core ($2600) processors, so Apple is simply passing along Intelโ€™s suggested pricing.

Note that the clock speed of the 6-core unit is 30% faster than the 12-core one, so youโ€™d be upgrading to slower cores. Apps that rely upon single-core performance will actually run slower on the more expensive CPU.

Now consider that $700 will buy you an NVIDIA GTX 1080 Ti. Last month this was the most powerful GPU available to the public. For about the same cost as adding 6 CPU cores, you can have 3 of the most powerful consumer GPUs in the world.

Itโ€™s a shame you wonโ€™t be able to plug all three of those monsters into the next Mac Pro. Assuming, that is, Apple designs it with similar expansion constraints as the cheese grater tower, which was the most expandable Mac Pro theyโ€™ve ever made.

Two mid-range GPUs. Or a single high-endย one.

Apple supported dual GPUs before the 2013 trash can Mac Pro. In 2012, the cheese grater Mac Pro had a stock configuration that came with two AMD Radeon HD 5770 GPUs.

A dual-GPU, modular Mac Pro (mid-2012)โ€Šโ€”โ€ŠPhoto: Tom Park / CCย BY-SA

As with the GPUs in the subsequent 2013 Mac Pro, these were considered to be mid-range, not high-end. They are rated under 110-watts TDP, so they draw less wattage and generate less heat than high-end GPUs (250 watts TDP is typical nowadays) of the sort that youโ€™d want in a workstation-class tower.

โ€œBut you could replace those cards with more powerful ones,โ€ one might say. While thatโ€™s possible, the box wasnโ€™t designed to allow it.

It has a decent power supply, 980 watts. However, it provides only 150 watts to each video card, or a total of 300 watts. To support a pair of 250-watt GPUs, like the GTX 1080 Ti, youโ€™d have to supply at least 200 watts more power somehow, possibly by wiring additional power cables from the drive bay ports or from the remaining two expansion slots.

Back in 2009โ€“2012 when this Mac Pro model was being built, high-end GPUs tended to be 180โ€“210 watts each, but even then youโ€™d still be short at least 60 watts. In fact, Apple offered an alternative Mac Pro configuration with a single high-end AMD Radeon HD 5870, rated at 228-watts TDP. You couldnโ€™t insert another one without additional custom wiring. The cheese grater case was simply not designed for a pair of high-end GPUs.

On the other hand, this Mac Pro had a replaceable CPU module that allowed upgrading the system from single to dual CPUs. Thatโ€™s pretty coolโ€Šโ€”โ€Šin most other computers, youโ€™d have to replace the motherboard to upgrade to dual CPUs. But nowadays youโ€™d want this kind of upgradeability with GPUs.

OK, so letโ€™s suppose Appleโ€™s next Mac Pro is modular and expandable, like the cheese grater tower, and fitted with the latest tech: PCIe 4.0, Oculink-2, Thunderbolt 3, USB-C 3.1, M.2 NVMe SSDs, and LGA-2066 or possibly LGA-3647 sockets for high-end CPUs with 44 lanes of PCIe connectivity.

That sounds promising, but how many high-end GPUs will it support? Maybe two, if you fiddle with it?

Pfft, how many GPUs do you reallyย need?

This question reminds me of those people last year who were saying no one legitimately needs more than 16 GB RAM in a Macbook Pro.

โ€œOwn the worldโ€™s greatest gaming computer and convince everyone itโ€™s for your researchโ€โ€Šโ€”โ€Šhttp://imgur.com/wiLsGqA

Three GPUs in a computer might seem ridiculous. Amongst the single digit percentage of Mac customers who buy a Mac Pro, an even smaller fraction would need more than two GPUs.

Thereโ€™d never be a controversy about Apple supporting only two GPUs. But if one came up, a person might claim that no one legitimately needs more than two GPUs. Then a person would be wrong.

Two weeks, two weeks, twoย weeks

If you try to make an AI program to recognize 1000 types of objects in a photo, one of the things youโ€™d find out is that even with multiple GPUs it can take weeks or months to train a neural net.

We can train a model from scratch to its best performance on a desktop with 8 NVIDIA Tesla K40s in about 2 weeks.
~ Jon Shlens, Googleย Research

The common advice to avoid this delay is to adapt one thatโ€™s already been trained, using a technique called โ€œtransfer learningโ€. But depending on what you want to do, you canโ€™t always be using someone elseโ€™s pre-trained neural netโ€Šโ€”โ€Šat some point youโ€™re stuck in a process where each iteration could take weeks. If at that point youโ€™re using only one GPU, adding a couple more would reduce the turnaround time significantly.

Suddenly synthesized

A common problem with neural nets created by โ€œsupervised trainingโ€ is they need a huge amount of training data or else slight variations in input will fool them. That is, suppose you train a neural net to recognize cats, but your training data consists of photos where all the cats are sitting upright. It might not recognize a cat thatโ€™s upside down. Your program will handle these kinds of differences better if you augment the training set by adding rotated and resized copies of the originals.

In a notable example, Baidu Research made a breakthrough in speech recognition, aided by layering noise over their initial data set.

Baidu gathered about 7,000 hours of data on people speaking conversationally, and then synthesized a total of roughly 100,000 hours by fusing those files with files containing background noise.
~ Derrick Harris,ย GigaOm

But then they had 14 times more data, which takes that much longer to process. A few years ago Baidu used 8 GPUs. More recently they reported working with as many as 40 or 128 GPUs.

Now you have twoย problems

Recently some of the coolest results in AI are based on a technique where two neural nets compete against each other (โ€œgenerative adversarial networksโ€) to improve their ability to create and recognize types of data.

With a larger model and dataset, Ian [Goodfellow] needed to parallelize the model across multiple GPUs. Each job would push multiple machines to 90% CPU and GPU utilization, but even then the model took many days to train.
~ OpenAIย Blog

That means youโ€™re training two neural nets, not just one. Again, youโ€™ll want as much computing power as you can get.

Would you pay extra for expansion slots?

When I started doing deep learning work, I used GPU spot instances on Amazon cloud. After running up a small bill, I began looking into external GPU boxes, but ended up assembling a PC.

The GPUs on AWS are now rather slow (one GTX 1080 is four times faster than a AWS GPU) and prices have shot up dramatically in the last months. It now again seems much more sensible to buy your own GPU.
~ Timย Dettmers

I bought just one GPU. But knowing that I might need to add more, I made sure that the PC would be able to take up to four.

Itโ€™d been a long time since Iโ€™d built a computer from components, and way back then I wasnโ€™t paying attention to how many GPUs I could put in it. To my surprise, many computers cannot support two GPUs at full bandwidth, and most cannot support more than two. There are only a few expensive motherboards that can support 4 GPUs simultaneously with x16 lanes of PCIe 3.0 connectivity.

The ability to add 3 more GPUs cost me about $600. Thatโ€™s about one-third of what the base system (w/o GPUs) would cost if built with standard components. This amount represents the higher prices for a quad-GPU capable motherboard, a big 1500-watt power supply, and a CPU with 40 PCI lanes (as opposed to 28 lanes). I literally paid extra for expandability itself.

Would Apple sell additional expansion as anย option?

The Mac Pro is positioned as Appleโ€™s most powerful computer, but computing power is coming in various forms. High-speed expansion slots provide the ability to upgrade computing power in whatever form it takes in the future. Weโ€™re not just talking about GPUs hereโ€Šโ€”โ€Šit might be FPGAs, or ASICs like Googleโ€™s TPU. Or fast storage.

Computer manufacturers generally do not sell a single line of desktop computers with different cases. For example, you donโ€™t usually see something like this:

2013 Mac Pro concept by Scott Richardson โ€”[Source]

Instead, the models of a computer line are positioned according to CPU speed, storage, and/or screen size. So this is whatโ€™s typical:

https://www.apple.com/mac-pro/specs/

But doesnโ€™t it make sense to sell optional expandability? It has value. You can put a price on it.

Cโ€™mon you have to admit itโ€™d be pretty coolย ๐Ÿ˜Ž

I know what youโ€™re thinking: โ€œThereโ€™s no way Apple would do this. Itโ€™s not worthwhile because there arenโ€™t enough customers for the bigger Mac Pro.โ€

Youโ€™re probably right.

And besides, itโ€™ll have Thunderbolt-3, and maybe Oculink-2 which is even faster, so people can connect external GPUs if they need themโ€ฆ you know, like this:

Photo: Peter Wigginsโ€Šโ€”โ€Š[Source]

Anyway, Iโ€™m looking forward to seeing how Apple rethinks the Mac Pro.

When we hit three GPUs, we are technically in a niche categoryโ€ฆ From talks with ASUS, despite the fact that a product may be geared towards a niche market, that product may sell well to the standard market if it is perceived to be good.
~ Ian Cutress, AnandTech (review of multi-GPU boards)

Q&A

Asking the tough questions.

Whatโ€™s the big deal about multiple GPUsโ€Šโ€”โ€Šwerenโ€™t bitcoin miners putting 6 or 8 GPUs on a PC in a milkย crate?

Yes, but cryptocurrency algorithms can run on a GPU without a lot of data transfer, so miners could use PCI riser cables or a PCI splitter, and connect each GPU using just one PCI lane. That means they didnโ€™t need a 40-lane CPU and a quad-x16 slot motherboard.

In bitcoin mining, GPUs were overtaken by ASICs. Wonโ€™t that happen in AIย too?

Probably, at least Nervana and Groq are working on that. In the meantime, GPUs have versatility that make them useful for the foreseeable future.

ASICs have lower power requirements than GPUs, so wouldnโ€™t a big power supply be unnecessary for multipleย ASICs?

Maybe, but then you might just get more ASICs on each expansion card.

You mentioned FPGAs and ASICs, but what about quantum processors?

Oh yeah, those too I guess.

Why do you need 16 lanes per GPU? Games donโ€™t show any difference between GPUs running on x8 vsย x16.

Itโ€™s true that games show no significant difference, but thatโ€™s probably because theyโ€™re tuned to run well on GPUs at x8. Games donโ€™t saturate a x16 PCIe 3.0 connection, but other apps like deep learning programs can.

How does a CPU with only 40 PCI lanes connect to 4 GPUs that each use 16ย lanes?

Right, 4ร—16 = 64, which is more than 40. The motherboard has two PCIe switches which each can multiplex 16 lanes of signal from the CPU to 32 lanes for two slots. So the 4 slots use only 32 lanes from the CPU.

Apple used a similar kind of PCIe switch for the Thunderbolt 2 controllers in the 2013 Mac Pro.

Donโ€™t those PCI switches introduce a lot ofย latency?

Some people say the latency is so bad that itโ€™s better to just use 8 lanes, but thatโ€™s a myth. Apparently this idea spread due to game testing that showed slightly higher frame rates on x8 GPUs than on switched x16 GPUs. But games are not good for this kind of benchmark because they donโ€™t exceed x8 bandwidthโ€Šโ€”โ€Šyou can get similar results on a unswitched x16 slot.

I havenโ€™t seen any good tests, but an NVIDIA benchmark showed very little latency or loss in bandwidth across a PLX (Avago/Broadcom) switch.

Show us your rig. Pics or it didnโ€™tย happen.

Here it is next to the cheese grater:

An open case rig next to a Mac Pro 5,1โ€Šโ€”โ€ŠPhoto: Tom Park / CCย BY-SA

What theโ€ฆ ugly. Why noย case?

OK, so I have to vent a little about motherboard and case design. Cool desktop computer designs are totally thwarted by the way uATX, mITX, and ATX boards are standardized. This sector has so much room for innovation with non-standard motherboards and backplanes.

A lot of people are calling for Apple to just put the next Mac Pro in a big box case, but I think itโ€™d be a shame if thatโ€™s exactly what Apple does. They went to extremes with the 2013 Mac Pro but reverting to a standard tower box isnโ€™t so appealing either.

I wanted a smallish form factor but the motherboard is so big (extended ATX) that it would have to go in a fairly large case, and I didnโ€™t like anything I saw. So I experimented with MakerBeam rails instead of imitating NVIDIAโ€™s DevBox with a Carbide Air 540 case. It ended up being taller than I intended, but slim in other dimensions.

A single-fan liquid cooler has been completely adequate for the CPU. The GPU is connected with a x16 riser cable and can slide all the way over the CPU, since the CPU doesnโ€™t have a huge heat sink on top of it. That means multiple GPUs can be spaced apart with wide air gaps.

Itโ€™s surprisingly quiet with two 140mm fans, even under sustained heavy load. What did make a huge amount of noise was a 5TB hard drive that was hammering away while feeding images to a convnet. After a couple annoying days of that, I switched to a SSD. It has no drive baysโ€Šโ€”โ€Šthereโ€™s a M.2 drive and additional SSDs can be velcroed on a side rail.

Tags

Join Hacker Noon

Create your free account to unlock your custom reading experience.