This summer I visited the Texas Advanced Computing Center for a week-long Supercomputing Institute. The material presented during the institute was slanted towards the introductory, but it was a good trip. The MPI and optimization sessions were quite interesting, and Austin is a wonderful city. TACC’s marquee system is a NSF Cyberinfrastructure “Track 2” system named Ranger. Much like our group’s fastest cluster, it’s a quad-socket, quad-core AMD Opteron machine running CentOS (specifically, Rocks 4.2.1. Unlike our system, they have 62,976 CPU cores instead of 256. You’d think this would be a cabling nightmare. Nope! Check it out:
Pretty impressive. In the first picture, you see the two Sun Constellation Infiniband switches, back-to-back, with the longhorns on top. If not for each of those switches having over 3,000 ports, the system would require many, many more individual switches to link all the nodes together. In the background you can see the front side of invididual racks, each of which contain 4 rows of 12 nodes each (with cooling stacks inbetween compute racks).
More interestingly, in the second picture you can see what the cabling looks like behind the racks. On the right in that photo is a typical 48-node compute rack. Only 4 Infiniband cables are required per 12 nodes, because each cable contains 3 connects! Nice. They dont have any extra gig-E cables getting in the way, nor KVM cables sitting around. The power cables route nicely into the sides. The rack on the left is a bit more crowded: it’s a disk storage rack. Still, the cables aren’t a mess. Check out all the hot-swappable power supplies in both racks!
Adding up the money spent on building construction, initial hardware, operations and support, Ranger’s expense is roughly $50,000 per day over an expected 4 year lifespan. The air conditioning alone eats over a megawatt of power. You know, an Alaskan supercomputing center with an innovative building design for natural A/C could be a Really Good Idea.
What would you do with tens of thousands of cores? The most popular applications are computational fluid dynamics (physics, engineering) and molecular dynamics (chemistry, computational biology). In a few years, writers of desktop applications are going to be facing this problem on a much smaller scale, as new machines are sold with 8-16 cores standard.
I think the most promising future applications are in optimization and artificial intelligence. They may have unusually productive weak scaling properties. In the meantime, the best way to get a grant for these machines is to have a problem (or rather, a solution!) which involves a strong scaling calculation. Fortunately, our simulation codes fit the bill.
I have to wonder what they will do with Ranger in ten years…or less. Decommissioning a machine of that size is a large undertaking. Would it be sold to recyclers?
The TACC staff are quite nice and energetic. It’s clear they’re seeking out people with good ideas for scientific problems that need ten-thousand-core solutions. If they ever host a conference or workshop you’re interested in, it’s worth the trip. Rent a car to see Austin, and avoid my folly of visiting the week it’s over 100 degrees outside every day! Hah.