Tech topic: historic change in computing performance growth

69 posts / 0 new
Last post
Sat, Dec 10, 2011 - 1:02am
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

A little less wasted power: just "turn the cores off"

Nvidia has unveiled its long-anticipated Tegra 3 processor, the world’s first quad-core chip designed specifically for mobile devices.

The Tegra 3 is the world’s first quad-core ARM A9-based processor, and features a 12-core GeForce graphics unit. Nvidia says the Tegra 3 offers three times the performance of its previous dual-core Tegra generation, and boasts improved multitasking, better web browsing, and smoother app performance.

...

Naturally, with such a boost in power, our thoughts turn to battery life. After all, what good is a beefier processor if you have to plug your tablet in for recharging every few hours?

Well, Nvidia and Asus promise the Tegra 3 won’t suck down power as quickly as you think. In fact, Asus says we’ll be able to watch up to 12 hours of HD video with the Prime running on the Tegra 3.

So how do they do it? It’s all about Variable Symmetric Multiprocessing, or vSMP for short. Essentially, tucked inside the Tegra 3 is a fifth “companion” processor that’s designed to kick in during times of low processing loads. The four main processors are only used when peak performance is required, such as when you’re playing games, multitasking or actively browsing through Internet pages. But when you’re simply reading a static web page, or when your tablet is otherwise sitting idle, the Tegra 3’s four primary processing cores shut off completely so your battery life won’t be sapped.

As we just shared in our coverage of leaked details concerning the HTC Edge, processor companies like Nvidia, Intel and AMD have long leveraged multi-core designs to mitigate diminishing gains in pure clock speed. So, for example, while it may be imprudent or technically infeasible to clock a mobile phone processor at, say, 2GHz for a variety of reasons, a company can roll out a multi-core chip at a much lower clock speed and still deliver faster performance (if not also improved power management, as vSMP shows us).

https://www.wired.com/gadgetlab/2011/11/nvidia-tegra-3-processor/

Fri, Dec 16, 2011 - 10:15pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

AMD's Bulldozer server benchmarks are a catastrophe

The desktop benchmark scores for AMD's new Bulldozer architecture didn't make happy reading for fans of the chip company, with the new design sometimes failing to beat AMD's own predecessor architecture, let alone Intel's comparable offerings. Hope still persisted, however, that the processor's architecture might fare better when tasked with server workloads. With the release last week of AMD's first Bulldozer server processors, branded the Opteron 6200 series and codenamed "Interlagos," a host of such benchmarks have arrived from AMD and others.

One reason for the underwhelming performance on the desktop is that the Bulldozer architecture emphasizes multithreaded performance over single-threaded performance. For desktop applications, where single-threaded performance is still king, this is a problem. Server workloads, in contrast, typically have to handle multiple users, network connections, and virtual machines concurrently. This makes them a much better fit for processors that support lots of concurrent threads. Some commentators have even suggested that Bulldozer was, first and foremost, a server processor; relatively weak desktop performance was to be expected, but it would all come good in the server room.

Unfortunately for AMD, it looks as though the decisions that hurt Bulldozer on the desktop continue to hurt it in the server room. Although the server benchmarks don't show the same regressions as were found on the desktop, they do little to justify the design of the new architecture.

...

AnandTech found that the Xeons were a few percent faster than the Opteron 6200 system in its vApus FOS tests. Worse, the Opteron 6200 and 6100 systems were all but tied. AMD's new architecture offered essentially no advantage and used slightly more power.

In vApus II, the Opteron systems performed better, though still behind the Xeon, and once more there was nothing to choose between the 6200 and 6100. And again, the 6200 used more power to achieve the same performance.

...

The desktop Bulldozer benchmarks were a horror show performance for AMD. The newest and greatest architecture often failed to beat its predecessor, let alone the Intel competition. There were no such disasters when looking at server workloads. Much as expected, thread-heavy server workloads fare a lot better, with Interlagos matching or beating Magny-Cours almost across the board (though AnandTech did find a couple of exceptions).

However, the results fall far short of a resounding success for AMD. The results are broadly split between "tied with Opteron 6100" and "33 percent or less faster than Opteron 6100." For a processor with 33 percent more cores, running highly scalable multithreaded workloads, that's a poor show. Best-case, AMD has stood still in terms of per-thread performance. Worst case, the Bulldozer architecture is so much slower than AMD's old design that the new design needs four more threads just to match the old design. AMD compromised single-threaded performance in order to allow Bulldozer to run more threads concurrently, and that trade-off simply hasn't been worth it.

For the workloads such as SAP where the performance has scaled, Opteron 6200 still represents an reasonable upgrade for existing 6100 customers—but it leaves us wondering what might have happened if AMD had simply extended its old architecture. Another four cores in a Magny-Cours processor would show close to the same 33 percent gain, and would do so without compromising single-threaded performance.

The situation up against Intel is even more dire. In AnandTech's benchmarks, the 6200 failed to beat Intel's Xeon processors, in spite of Intel's core and thread deficit. In others, 6200 pulled ahead, with a lead topping out at about 30 percent.

...

After the poor desktop performance, the possibility still existed that the Bulldozer architecture would start to make sense once we could see the server performance. Now the benchmarks have arrived, AMD's perseverance with Bulldozer is bordering on the incomprehensible. There's just no upside to the decisions AMD has made. All of which raises a question: why did AMD go this route? The company must have known about the weak single-threaded performance and the detrimental effect this would have in real-world applications long before the product actually shipped, so why stick with it? Perhaps AMD's anticipation of high clock speeds caused the company to stick with the design, and there's still a possibility that it might one day attain those clock speeds—but we've seen AMD's arch-competitor, Intel, make a similar gamble with the Pentium 4, and for Intel, it never really paid off.

AMD is boasting that Opteron 6200 is the "first and only" 16-core x86 processor on the market. Not only is this not really true (equating threads and cores is playing fast and loose with the truth), it just doesn't matter. In its effort to add all those "cores," performance has been severely compromised. AMD faces an uphill struggle just to compete with its own old chips—let alone with Intel.

https://arstechnica.com/business/news/2011/11/bulldozer-server-benchmarks-are-here-and-theyre-a-catastrophe.ars/

Fri, Dec 16, 2011 - 10:29pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Nvidia CEO: Supercomputing gated by power

When it comes to imagining the future of computing, the biggest constraint is electrical power rather than raw computing horsepower.

During a keynote talk today at the SC11 conference on supercomputing in Seattle, Nvidia CEO Jen-Hsun Huang said that the graphics processor company now thinks in terms of "power limits" as it designs future products because power has become a limiting factor.

...

Huang said that increasing energy efficiency of high-end computers is needed to continue pushing the limits of what can be done. That includes realistic graphics for video games and animated movies but also more powerful specialized devices, such as portable medical devices or consumer robots.

"We think as a company in power envelopes now," Huang said during the talk, which was available via Webcast. "In order for us to deliver the best performance, we need to know the power limits we have."

...

"We know that in order to achieve the next level, power has become the imperative," Huang said, adding that GPUs are one of the technologies that will can help maintain the current pace of progress.

https://news.cnet.com/8301-11128_3-57325222-54/nvidia-ceo-supercomputing-gated-by-power/

Fri, Dec 16, 2011 - 11:16pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

"Many Cores" or GPUs?

Sometimes the best part of the article is the comments section. "Multicore heaters", LOL!

Intel claims MIC beats GPUs for parallelism

Sylvie Barak

11/22/2011 2:42 PM EST

MOUNTAIN VIEW, Calif.--While graphics processors, or GPUs, are certainly well known in the world of gaming and advanced graphics, the question of whether they can be used to accelerate computation within supercomputers is much more recent.
At the recent Supercomputing 2011 show in Seattle, keynote speaker Jen-Hsun Huang, CEO of Nvidia Corp. said GPU technology was an essential ingredient on the path to reaching exascale computing within a 20MW power envelope, but Intel has strongly disagreed.

Intel Corp. is pushing forward its own version of parallel architecture, in the form of Many Integrated Cores (MIC), which it says will be easier for programmers to use and for the industry to scale.

...

BobsUrUncle

11/23/2011 10:27 PM EST

Let's see - a bunch of 386 cores with no DMA or onboard I/O except for PCIe. No seperate buses to connect the cores (just shared memory). No cost amortization from the graphic business. Yeah, sounds like a real winner.

I could care less about GPUs -- I'm a HW guy. If the FPGA vendors would drop their prices -- I could design cost competitive accelerators that would run circles around these multicore heaters.

...

KarlS

11/25/2011 10:23 AM EST

Well, I am a systems as well as a HW guy. Both MIC and GPU first bring raw data into memory and it does not matter whether a core or GPU processes it, it must be taken from memory. The video refers this data movement as a problem for GPU only. This is typical sales hype. A better approach is to bring the raw data into LOCAL memory and do the processing in the GPU or some other PU, preferably one programmed in openCL. the only data movement is processed data into main memory. Yes, a work unit must be passed to a GPU along with the raw data, but since the same processing is applied to different data over and over, the code should reside in local memory, eliminating that memory transfer.

https://www.eetimes.com/electronics-news/4230832/Intel-claims-MIC-beats-GPUs-for-parallelism

Sat, Dec 17, 2011 - 2:46am Thieving Corp.
bbacq
Offline
-
Ottawa, ON
Canada
Joined: Sep 13, 2011
242
1619

@Thievery: bbacq AWOL

Hey Thievery haven't been back since I posted last, sorry, but thanks for all the info! What to say about all this...

Von Neumann worked, compilers could be written, programmers could understand how to make them work, then semis shrank and kept it going and growing. Until now-ish. The real problem with computation is the time to move information. I think cpnscarlett was talking about getting a gig signal across a board, synchronization issues etc. Yup, I've lived it.

Light is one answer, one might think, and people chase optical computing. But I think it's silly. Bosons don't like to interact, but fermions do, so do telecommunications with bosons (like photons) and processing with fermions (electrons).

At the last company, I did Dense Wavelength Division Multiplexing (DWDM) and we used Erbium-Doped Fiber Amplifiers (EDFAs) to build Reconfigurable Add/Drop Multiplexors (ROADMS) and while photons interact a bit (a bad thing in a ROADM) I can't imagine much computation being possible, and I think optical computing is out. A digression perhaps but let's strike it off the list of candidates to save our CPU-hungry asses.

You have already covered the current chip scene pretty well. I remember I wrote a CPU prediction article in 1990 in which I forecast the need for water-cooled packaging in the 90's, but voltages and power dropped enough to delay its need. Maybe someone will come up with something clever, but truly, it'd just be another stop-gap. I've talked to people in Canada who think we should open data-centers in the arctic. Seriously. Yup, power is an issue.

I think, as I said earlier, we have about enough CPU already to saturate a human, and we should be able to get the power down with cleverness. I should have read your ARM stuff more carefully, might go back and scan it. I recall hearing of an asynchronous core architecture (bacq when I was in that game) that should have had the potential to be both fast and lean on power. Haven't kept up, though, I moved on to using them.

I think someone is eventually going to come up with an architecture that is taught, not programmed, and that is when we are going to see some interesting new developments.

Quantum computing looks pretty interesting. Pick a number between 1 and 1000. I will bet you a bazillion silver eagles that if you promise to answer ten questions truthfully, I can tell you the number. Is it more than 500? Yes. Less than 750? You get the idea. The tenth question has you trapped. Each answer is one bit of information and 2^10 is 1024, so I have you covered. But if you could answer a quantum computer in quantum-speak (qubits, not bits), it would know after only five or six Q+A sessions (don't quote me on the exact number, but you get the idea). It's weird and wonderful and all our cryptography is toast if anyone figures out how to build one, as Shor proved.

I want to ask the question: "Who benefits from even more compute power?" You pointed out that the only ones who really need more are those hunting for truth in the very large and the very small. I am a bit concerned that this pursuit is futile, that ultimately we have a Godel loop on our hands and it's turtles all the way down. Ultimately, faith is required and we can't figure it all out. And things like BitCoin, which might be the answer to silly fiat currencies that are much-discussed elsewhere on TF, fail in the presence of high asymmetries in compute-power. Maybe I'm just a jaded computer geek from the 80's, but if the commercial software didn't suck so bad, I think I would I have enough for my needs, and I don't want the NSA and CIA to have oodles when I don't.

Not that I am paranoid or anything. Or wait, I *must* be: "Only the paranoid survive" - Andy Grove.

And I am still kicking...

Thanks for the forum, Thievery,

bbacq

Sat, Dec 24, 2011 - 12:37pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

@bbacq - The Great von Neumann Experiment

Hey bbacq, nice to see you bacq here. No worries about AWOL. I keep quite busy myself. I still have a small backlog of articles to post. I appreciate your comments, please stop by every now and then.

I'm glad to run into a hardware type here. I went to CS school in the 80's, started working as a programmer in 88. So maybe you can imagine how this topic is of personal interest. Roughly midway through my career this brick wall thing happens to the performance growth of the machines I use to make a living. The expectation gap is only going to grow from here. No one seems to know what to do about it. All the easy and obvious approaches to higher speed come with higher costs. Cloud means bringing the networking stack into the "processor" - there is no way it is cheaper, but it's one thing you can do with existing hardware. The new chips are just not going to have the same speedups as we saw in previous decades. We will be lucky if they are not slower than the preceding generation.

My take on all this: it is an architectural problem. von Neumann works, but that does not mean it is the only architecture that can work. There is a huge perception gap on this issue. It's all we've seen so it's all we know. (It reminds me of the monetary perception gap that readers of this site can appreciate.) It is very difficult to envision something different from what we have experienced directly. For most it is literally inconceivable. Especially when their livelihood depends on existing approaches and revenue sources. That's why I expect startups will lead the way, even with all the money recently handed to established companies for exascale.

It is unfortunate that von Neumann did not live long enough to fully explore his ideas on a 2.0 architecture. It seems that near the end of his life he was closing in on ideas to overcome the limitations of 1.0; but since it was too ambitious for that time, most dismissed it as "Johnny going off the rails again". I believe you are correct, and Johnny likely would have agreed with you, that the new architectures will be similar to the way Nature works. I fully agree that today's many-core is still "in the box" and will run into the same walls soon enough.

I only posted Patterson to show that this is an issue recognized by academics, and he is a prominent one in this area, but I don't think he has any solutions. He is only interesting because he is telling us that there is a problem; vs. most of the establishment who just seem to be terrified of the end of the bull run.

We are at the end of the performance growth of the great von Neumann 1.0 experiment. I believe it is inevitable that there will be new architectures that challenge vN1.0, and that at least some of them will turn out to be better. It is the only way out. (There will be advances in materials and devices, sure, but those will likely be one-time gains that do not affect the long term shape of the performance history chart on page 9 of the report.) I intend to be ready for this inevitability and prepare accordingly, as a programmer and system designer. I plan to be able to recognize better architectures as soon as they emerge, and train myself to be able to exploit them. Quantum seems to me to be far off. I believe architectural advances can be made using existing devices. We just need to me a little more open minded than the average hardware practitioner.

I do think there is room for machine learning techniques; but I don't think programming is going to disappear for a while. We are far from "automatic programming" in vN1.0. It is possible the new architectures will make it easier, but I would not count on it. It certainly seems that at least in some aspects it will be very different from the programming we know today.

By the way, there was a recent article on the Casey Research site, "Brain vs. Computer":

https://www.caseyresearch.com/cdd/brain-vs-computer

Will our electronic creations ever exceed our innate capabilities? Almost certainly. Futurist Ray Kurzweil predicts that there will be cheap computers with the same capabilities as the brain by 2023. To us, that seems incredibly unlikely. But on a slightly longer time frame, given the exponential advances of the field, it is quite possible that there are humans alive today who will live to see the day.

The main stumbling block right now is that, as ever more powerful computers are built, there is a concurrent expansion of power, management, and structural issues.

You say exponential and I say see chart on page 9: https://www.tfmetalsreport.com/comment/520167#comment-520167. So I agree that Kurzweil is probably off timewise. Overall Casey did not do too badly, the power wall and memory wall are acknowledged, if as a single sentence with no details provided, but at least the article reaches reasonable conclusions. It also mentions the exascale giveaway; I'm not sure that the ARPA pixie dust is going to make the problems magically disappear.

Fri, Jan 6, 2012 - 10:37pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Global race for exascale computing is on

November 22, 2011

Global race for exascale computing is on

Interview: Top computer scientist Peter Beckman details the steps the U.S. and others are taking in the competition to build an exascale supercomputer

By Patrick Thibodeau | Computerworld|

The international competition to build an exascale supercomputer is gaining steam, especially in China and Europe, according to Peter Beckman , a top computer scientist at the U.S. Department of Energy's Argonne National Laboratory.

An exascale system will require new approaches in software, hardware, and storage. It is why Europe and China, in particular, are marshaling scientists, research labs and government funding on exascale development. They see exascale systems as an opportunity to build homegrown technology industries, particularly in high-performance computing, according to Beckman.

...

An exascale system is measured in exaflops; an exaflop is 1 quintillion (or 1 million trillion) floating point operations per second. It is 1,000 times more powerful than a petaflop system, the fastest systems in use today.

The Department of Energy (DOE) is expected to deliver to Congress by Feb. 10 a report detailing this nation's plan to achieve exascale computing. The government recently received responses from 22 technology firms to its request for information (RFI) about the goal to develop an exascale system by 2019-2020 that uses no more than 20 megawatts (MWs) of power. To put that power usage in perspective, a 20-petaflop system being developed by IBM, which will likely be considered one of the most energy efficient in the world, will use seven to eight MWs.

https://www.infoworld.com/d/data-center/global-race-exascale-computing-179816

Fri, Jan 6, 2012 - 10:54pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Intel pushes 50-core chip, mulls exascale computing

November 15, 2011

Intel pushes 50-core chip, mulls exascale computing

Intel aims for an exascale supercomputer by 2018 and is building more power-efficient server chips

By Agam Shah | IDG News Service

Intel is drumming up support for its latest 50-core Knights Corner and Xeon E5 server chips, which are key elements in the company's plans to scale performance while reducing power consumption moving toward an exascale supercomputer by 2018.

Intel showed for the first time at the SC11 supercomputing conference a chip code-named Knights Corner, which has more than 50 cores designed to handle high-performance computing workloads.
...

The Knights Corner chip mixes standard x86 CPU cores with specialized cores and works as an accelerator alongside the CPU to boost parallel application performance. The Knights Corner chip is an important component in Intel's aim to reach exascale computing by 2018, Curley said.

Exascale computing is a key landmark in the computing space to enable new medicine, defense, energy and science applications. Countries such as Japan, China and the U.S. are in a race to get to exaflop computing and beyond.

However, design constraints and high power consumption have limited the development of an exaflop supercomputer. Chip makers like Intel, Nvidia and AMD are developing accelerators to work alongside the CPU to boost supercomputing performance while reducing power consumption.
https://www.infoworld.com/d/computer-hardware/intel-pushes-50-core-chip-mulls-exascale-computing-179122
Fri, Jan 6, 2012 - 11:10pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Battery Breakthrough Could Improve Capacity And Reduce Charge Ti

Battery Breakthrough Could Improve Capacity And Reduce Charge Time By A Factor Of Ten Each

from TechCrunch by Devin Coldewey

It’s no secret that batteries are holding back mobile technology. It’s nothing against the battery companies, which are surely dedicating quite a lot of R&D to improving their technology, hoping to be the first out of the gate with a vastly improved AA or rechargeable device battery. But battery density has been improving very slowly over the last few years, and advances have had to be in processor and display efficiency, in order to better use that limited store of power.

Researchers at Northwestern University claim to have created an improved lithium ion battery that not only would hold ten times as much energy, but would charge ten times as quickly.

It’s probably safe to call it a breakthrough.

...

A possible downside is a faster degradation process; after 150 charges and discharges, the batteries showed only a 5x improvement to capacity and charge speed. Of course, those 150 charges would be the energy equivalent of 1500 charges of today’s batteries.

https://techcrunch.com/2011/11/14/battery-breakthrough-could-improve-capacity-and-reduce-charge-time-by-a-factor-of-ten-each/

Fri, Jan 6, 2012 - 11:18pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

European Researcher Sees Path to Low-Power Quantum Computers

European Researcher Sees Path to Low-Power Quantum Computers


Electronics could be 100 times less energy-hungry thanks to a quantum phenomenon known as the tunnel effect - by 2017 in consumer electronics

Nov. 22 -- By 2017, quantum physics will help reduce the energy consumption of our computers and cellular phones by up to a factor of 100. For research and industry, the power consumption of transistors is a key issue. The next revolution will likely come from tunnel-FET, a technology that takes advantage of a phenomenon referred to as "quantum tunneling." At the EPFL, but also in the laboratories of IBM Zurich and the CEA-Leti in France, research is well underway. As part of a special issue of Nature devoted to silicon, Adrian Ionescu, an EPFL researcher, has written an article on the topic.

Transistors that exploit a quantum quirk

Today's computers have no less than a billion transistors in the CPU alone. These small switches that turn on and off provide the famous binary instructions, the 0s and 1s that let us send emails, watch videos, move the mouse pointer… and much more. The technology used in today's transistors is called "field effect;" whereby voltage induces an electron channel that activates the transistor. But field effect technology is approaching its limits, particularly in terms of power consumption.

Tunnel-FET technology is based on a fundamentally different principle. In the transistor, two chambers are separated by an energy barrier. In the first, a horde of electrons awaits while the transistor is deactivated. When voltage is applied, they cross the energy barrier and move into the second chamber, activating the transistor in so doing.

In the past, the tunnel effect was known to disrupt the operation of transistors. According to quantum theory, some electrons cross the barrier, even if they apparently don't have enough energy to do so. By reducing the width of this barrier, it becomes possible to amplify and take advantage of the quantum effect – the energy needed for the electrons to cross the barrier is drastically reduced, as is power consumption in standby mode.

Mass production is imminent

"By replacing the principle of the conventional field effect transistor by the tunnel effect, one can reduce the voltage of transistors from 1 volt to 0.2 volts," explains Ionescu. In practical terms, this decrease in electrical tension will reduce power consumption by up to a factor of 100. The new generation microchips will combine conventional and tunnel-FET technology. "The current prototypes by IBM and the CEA-Leti have been developed in a pre-industrial setting. We can reasonably expect to see mass production by around 2017."

https://www.hpcwire.com/hpcwire/2011-11-23/european_researcher_sees_path_to_low-power_quantum_computers.html

Fri, Jan 6, 2012 - 11:31pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Supercomputers Turn Green in Race to Exascale Mountaintop

The world’s supercomputers are getting greener. But they better keep it up if they’re going to break the vaunted exascale barrier any time soon.

The latest ranking of the most efficient supercomputers on earth — the biannual Green500 — shows that the greenest supercomputers are getting greener at an increasingly faster rate, thanks in part to the rise of graphics processors in these massive server clusters. But the trend must continue if we’re to reach the widely held goal of building exascale supercomputers that consume a manageable 20 megawatts of power by the end of the decade.

An exascale cluster would be 1,000 times more powerful than today’s fastest supercomputers.

IBM snagged the top five spots in the Green500 with its custom Blue Gene/Q systems, up from the top two in June. But heterogeneous systems — which are made of off-the-shelf x86 CPUs and graphics processor accelerators — claimed a larger chunk near the top of the list. “The GPUs are continuing to dominate,” said Kirk Cameron, a Virginia Tech computer science professor and co-keeper of the Green500 list.

...

According to Cameron, we need the same kind of efficiency improvements that we’ve seen over the last three or four years to get to exascale at 20 megawatts. But this may not happen. The key question is whether the current efficiency gains result more from wringing inefficiencies out of the technology or from true innovation. “Are we just hitting the low hanging fruit and then those trends are going to stop, or are we going to continue to see these accelerating increases in the efficiency year-to-year?” said Cameron.

“I would say that we’re probably mostly low hanging fruit,” he added. And if that’s the case, he said, “we’re in a lot of trouble.” Reaching the exascale goal would then require “a serious paradigm shift.”

https://www.wired.com/wiredenterprise/2011/11/supercomputers-turn-green/

Fri, Jan 6, 2012 - 11:45pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

IBM Talks Up Three Paths Toward New Chips

IBM Talks Up Three Paths Toward New Chips

from WSJ.com: Digits by Don Clark

Few experts believe the customary ways of improving computer chips will go on forever. IBM has been particularly vocal about the issue, and on Monday is disclosing what it believes are breakthroughs in three promising areas of research.

...problems emerged early in the last decade. Higher operating frequencies began to consume too much power and generate too much heat, especially as consumers began to gravitate toward battery-operated portable PCs. And once elements in transistors shrunk to a certain size they began to leak current, adding to the power-consumption problems.

Engineers responded by changing some key materials and designs for making conventional silicon transistors, which appears to be working fine for chips now just hitting the market. Intel, for example, is introducing a novel three-dimensional transistor design into its latest microprocessors, which have circuit dimensions measured at 22 nanometers, or billionths of a meter.

But none of those approaches will work, Meyerson says, once circuitry shrinks to around seven nanometers. “We can debate if it’s in five years or ten years but the game is over,” he says.

https://blogs.wsj.com/digits/2011/12/05/ibm-talks-up-three-paths-toward-new-chips/

https://gigaom.com/2011/12/05/ibms-3-big-chip-breakthroughs-explained/

Sat, Jan 14, 2012 - 12:11pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Video: Challenges on the path to Exascale computing

Video: Challenges on the path to Exascale computing

Sylvie Barak

12/6/2011 6:19 PM EST




SAN FRANCISCO--Like the race to put man on the moon back in the 1960’s, the race to achieve exascale computing is becoming a pressing, global ambition.
...
While the goal is clear and the purpose of achieving exascale is underscored by the urgency of dealing with some of Earth’s primary problems, the challenges and pitfalls on the path to exascale are numerous.
From limited power budgets, to floor space limitations, to the reliability of monster systems, the road to successfully achieving exascale is a long and difficult one.

https://www.eetimes.com/electronics-news/4231159/Video--Challenges-on-the-path-to-Exascale-computing

SC11 - Challenges on the Path to Exascale Computing - EE Times

Selected quotes and my comments:

"The big discussion with exascale ... is that, the technology, the way we're going to be building machines in the future is different."

"...from the hardware side, it's really about power consumption, and how do you drive down, both your idle power and your dynamic power that you're using while computing. And the idle power is particularly important because as we're looking at integrating more and more into a single transistor (sic) along with Moore's law, the leakage will shoot up, unless we do something, you know, dramatic, to take care of that."

If you have studied the sources cited here, you already knew that.

"Really one of the biggest challenges in exascale is about the power usage. We could build, if a customer had enough money, an exascale system today, but it would require a gigawatt of power. That's equivalent to all the output of Hoover Dam, just to run a supercomputer."

Hmmm, could there possibly be something wrong with the way we build systems today?

"As these systems are becoming bigger and bigger, we don't have the ability to give enough power to them. We're talking today, the largest systems consuming about ten megawatts, which is pretty much the power of a small town."

"As defined by the U.S. Dept. of Energy, their target is to build an exascale system for twenty megawatts of power."

So they are budgeting "just" two small towns worth of power. Lovely.

"... what becomes expensive, is the data movement. So in fact you have to think about the FLOPS becoming free, and the data movement is what you pay for."

"If you can bring the memory very close to the processor..."

At least someone has a clue. Very far from a cigar, though.

"But perhaps just as daunting, if not more, is the challenge of parallelizing software. How do you build applications that can take advantage of this scale of resources? Billions of threads, millions of cores... How do you scale these applications? How do you write them? How do you debug? How do you make sure you have optimal performance?"

Please spare us the software guilt trip, in-the-box thinker. Do not scapegoat software for your inability to produce better hardware.

"While the challenges... are numerous and significant, the supercomputing industry is unanimous in its determination to reach its next milestone, sooner, rather than later."

Geez, good thing they have determination, otherwise they would have to admit defeat before spending all their (gov't provided) funding.

They offer so many challenges, and so few solutions. Do you believe it's likely these are the people who will bring us faster computing?

Sat, Jan 14, 2012 - 1:00pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Here's one to watch - but does it change the chart long-term?

Transistor Tech Startup Takes On Intel With Powerful New Chip Creation Technique

SuVolta, a new company based in Los Gatos and only in the public eye for six months, has created an alternative to a certain Intel chip-making technique that could improve the system-on-a-chip production and significantly decrease power consumption. Their partner, Fujitsu, has just demonstrated the technology in a super-low-voltage SRAM chip, showing that the technique is very far from vaporware

...

Their tech, in brief, is a new technique for producing transistors called Deeply Depleted Channel, and it’s a different material stacking method that allows for an even lower voltage to be used to reliably power the gate. In Fujistu’s demonstration, a small SRAM cell that would normally take 1V to power successfully ran with just 0.425V. Power savings of over 50% on such a low level are hugely significant.

https://techcrunch.com/2011/12/07/transistor-tech-startup-takes-on-intel-with-powerful-new-chip-creation-technique/

Power-Sipping Chip Details Disclosed

By Don Clark

Saving power in chips for mobile devices has become a pressing priority in the tech world. A startup called SuVolta recently made some impressive claims in the field without providing many details–until now.

The Silicon Valley company, along with initial partner Fujitsu, on Wednesday is specifying technology changes it says can reduce the power drawn by transistors on chips by roughly 50% without reducing their speed. SuVolta says the technology can work using less than half a volt to drive chip circuitry, compared to the 1 volt minimum required for most chips.

...

...a big change is afoot, Thompson says. The greatest sales volume in semiconductors is shifting to products that power many smartphones and tablet-style computers, multi-function creations known as systems-on-a-chip, or SoCs.

Thompson, who spent 12 years at Intel, says the trend poses problems for his former employer–and a radical new technology it was working on while he was there. The Silicon Valley giant has opted for its next generation of chips to create a novel kind of three-dimensional transistor design to boost performance and save power.

There’s one problem, Thompson says. The 3D design is fine for microprocessors that sell for hundreds of dollars a piece, but hard to combine on SoCs that sell for $10 or so and may have to exploit a variety of different transistors.

In Intel’s core market “they have a fine solution, Thompson says. “I think when they want to do lower-cost SoCs I think they will be at a disadvantage.”

https://blogs.wsj.com/digits/2011/12/07/power-sipping-chip-details-disclosed/?mod=WSJBlog

Breakthrough: Chip startup can cut CPU power use by 50 percent

December 7, 2011 | Dean Takahashi Fujitsu revealed today that it has confirmed that startup SuVolta’s PowerShrink technology can cut power consumption in a chip by 50 percent without hurting performance. ...

When coupled with other techniques for lowering voltage, the technology can reduce power consumption by 80 percent or more. That’s a fundamental breakthrough, and it’s a rare one, since most venture investments go into applications these days, not core technology.

It’s also an incredibly significant breakthrough that challenges the giants of microprocessor manufacturing, Intel, AMD and Nvidia. Reducing power consumption is the biggest challenge in electronics today, since it means mobile devices can last longer on a battery charge. Once it hits the market in 2012, the technology could enable much smaller, thinner and more powerful laptops, smartphones, and tablets.

Los Gatos, Calif.-based SuVolta, which came out of stealth mode in June, also disclosed the first details of how its low-power transistor technology, dubbed Deeply Depleted Channel (DDC), works. The technology will allow for better low-power chips for at least the next couple of generations through sub-20-nanometer production. (A nanometer is a billionth of a meter.)

The DDC transistor (pictured in the graphic below) reduces threshold voltage variability and enables continued shrinkage of chip circuits. The structure on a transistor works by forming a deeply depleted channel when a voltage is applied to the transistor. Fujitsu has been able to use the technology in a test memory chip known as an static random access memory (SRAM), which can operate below 500 millivolts with the SuVolta technology. In that test chip, voltage was reduced two-fold and the signal-noise ratio was improved two-fold.

If it works across all sorts of chips, it could extend battery life on one end of the computing spectrum and reduce the spiraling electrical costs for servers and supercomputers.

The DDC has different regions that allow for different levels of flow of electrical current. The design lowers the operating voltage by 30 percent and results in less “leakage,” or the unintended loss of electrical energy. Overall, the result is that the transistor allows for multiple voltage settings, which is essential for today’s low-power products, said Scott Thompson (pictured left), chief technology officer, in an interview.

Techniques like this are needed because the manufacturing gains of shrinking chips — where, per Moore’s Law, the observation that the number of transistors on a chip doubles with each generation every couple of years — isn’t reducing costs or providing performance gains like it once did. Thompson said he believes that the transition between chip manufacturing generations will slow down, so chip makers will need a solution like SuVolta’s to make continued advances.

https://venturebeat.com/2011/12/07/in-fundamental-breakthrough-fujitsu-confirms-suvolta-cuts-chip-power-use-by-50-percent/

Sat, Jan 14, 2012 - 1:44pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Revisiting Supercomputer Architectures

Revisiting Supercomputer Architectures


The chronology of high performance computing can be divided into "ages" based on the predominant systems architectures for the period. Starting in the late 1970s vector processors dominated HPC. By the end of the next decade massively parallel processors were able to make a play for market leader. For the last half of the 1990s, RISC based SMPs were the leading technology. And finally, clustered x86 based servers captured market priority in the early part of this century.

This architectural path was dictated by the technical and economic effect of Moore's Law. Specifically, the doubling of processor clock speed every 18 to 24 months meant that without doing anything, applications also roughly doubled in speed at the same rate. One effect of this "free ride" was to drive companies attempting to create new HPC architectures from the market. Development cycles for new technology simply could not outpace Moore's Law-driven gains in commodity technology, and product development costs for specialized systems could not compete against products sold to volume markets.

...

In the mid 2000s, Moore's Law went through a major course correction. While the number of transistors on a chip continued to double on schedule, the ability to increase clock speed hit a practical barrier -- "the power wall." The exponential increase in power required to increase processor cycle times hit practical cost and design limits. The power wall led to clock speeds stabilizing at roughly 3GHz and multiple processor cores being placed on a single chip with core counts now ranging from 2 to 16. This ended the free ride for HPC users based on ever faster single-core processors and is forcing them to rewrite applications for parallelism.

In addition to the power wall, the scale out strategy of adding capacity by simply racking and stacking more compute server nodes caused some users to hit other walls, specifically the computer room wall (or "wall wall") where facilities issues became a major problem. These include physical space, structural support for high density configurations, cooling, and getting enough electricity into the building.

The market is currently looking to a combination of four strategies to increase the performance of HPC systems and applications: parallel applications development; adding accelerators to standard commodity compute nodes; developing new purpose-built systems; and waiting for a technology breakthrough.

Waiting for a breakthrough, LOL! Now, let's see an example of architectural blindness:

Parallelism is like the "little girl with the curl," when parallelism is good it is very, very good, and when it is bad it is horrid. Very good parallel applications (aka embarrassingly parallel) fall into such categories as: signal processing, Monte Carlo analysis, image rendering, and the TOP500 benchmark. The success of these areas can obscure the difficulty in developing parallel applications in other areas. Embarrassingly parallel applications have a few characteristics in common:

  • The problem can be broken up into a large number of sub-problems.
  • These sub-problem are independent of one another, that is they can be solved in any order and without requiring any data transfer to or from other sub-problems,
  • The sub-problems are small enough to be effectively solved on whatever the compute node du jour might be.

When these constraints break down, the programming problem first becomes interesting, then challenging, then maddening, then virtually impossible. The programmer must manage ever more complex data traffic patterns between sub-problems, plus control the order of operations of various tasks, plus attempt to find ways to break larger sub-problems into sub-sub-problems, and so on. If this were easy it would have been done long ago.

It has not been done yet because no one has built a computer with the proper architecture. Everything is sequential stored program or some minor variation thereof.

...

Waiting for a technology breakthrough (or the "then a miracle happens" strategy) is always an alternative; it is also the path of least resistance, and one step short of despair. Today we are looking at such technologies as optical computing, quantum entanglement communications, and quantum computers for potential future breakthroughs.

The issue with relying on future technologies is there is no way to tell first, if a technology concept can be turned into viable a product -- there is many a slip between the lab and loading dock. Second, even if it can be shown that a concept can be productized, it is virtually impossible to predict when the product will actually reach the market. Even products based on well understood production technologies can badly overrun schedules, sometimes bringing to grief those vendors and users who bet on new products.

...

The HPC market is at a point where the business climate will support greater levels of innovation at the architectural level, which should lead to new organizing principle for HPC systems. The goal here is to find new approaches that will effectively combine and optimize the various standard components into systems that can continue to grow performance across a broad range of applications.

Why does it have to be "standard components"? More inside-the-box thinking.

Of course we can always wait for a miracle to happen.

The only miracle we need is a change in the way we think about the problem.

https://www.hpcwire.com/hpcwire/2011-12-08/revisiting_supercomputer_architectures.html

Sat, Jan 14, 2012 - 3:04pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

"Build me a computer worthy of Morgor"

This one is a special stinky treat for Turdville. In a way, it comes full circle to the relevance feedback that I got when I first started posting about this subject.

If you are not quite sure about mine and other's comments regarding alternative computer architectures, consider this but one example of what can be done when we apply the "foul craft" of changing the computation architecture; and who is doing it, and for what purposes:

  • December 13, 2011, 9:05 AM

Maxeler Makes Waves With Dataflow Design

By Don Clark

Many companies are trying novel chip technologies to accelerate specialized kinds of computing jobs. But there’s signs that something a bit more radical, backed by technology vendors such as Maxeler Technologies, is also winning converts.

The London-based company was formed to commercialize a technology known by the phrase dataflow, a concept discussed since the 1970s that represents a sharp break from the fundamental design used in most computers and microprocessor chips inside them.

That traditional approach–named after computing pioneer John von Neumann, and sometimes called a control-flow architecture–involves programs that tell the computer what to do, one step at a time. Those steps can happen in any order, and the result is a system that can do multiple kinds of tasks pretty well.

In the dataflow approach, the chip or computer is essentially tailored for a particular program, and works a bit like a factory floor. Data flows into the hardware and passes to specialized “workers”–circuitry designed to do one particular job, like a specific piece of arithmetic. Each of those components handles a different part of the total computing job, and many pieces of data are moving thru the electronic production line at once.

...

There’s new evidence of a payoff for certain kinds of customers from J.P. Morgan, which has been working since 2008 to adopt Maxeler’s hardware and software to help assess its trading risks. The companies say the approach has allowed the financial-services company to quickly examine tens of thousands of possible scenarios for how its investments might be affected by events in financial markets, reducing the time for running certain scenarios from hours to a few minutes.

J.P. Morgan’s use of the technology was judged the most “cutting-edge IT initiative” in the sector in the annual American Financial Technology Awards issued in early December. The bank was able to increase the speed of certain financial calculations from 155 to 284 times, while taking up one-thirtieth of the space in its computer room, says Rob Daly, an editor at Waters magazine involved in the awards.

https://blogs.wsj.com/digits/2011/12/13/maxeler-makes-waves-with-dataflow-design/?mod=WSJBlog

J.P. Morgan Deploys Maxeler Dataflow Supercomputer for Fixed Income Trading


LONDON, Dec. 15 -- Maxeler Technologies, the market leader in Maximum Performance Computing technology, today announced that it has gone live with a supercomputer solution for fixed income trading operations at J.P. Morgan. As a result, the analysis and profiling of certain intra-day trading risk within the investment bank will run on a Maxeler dataflow supercomputer.

Maxeler's approach to supercomputing will enable J.P. Morgan to assess tens of thousands of possible market scenarios, constantly examining the time path and structure of the associated risk. This means that complex scenarios can now be calculated in a few minutes rather than hours.

...

Peter Cherasia, Head of Markets Strategies at J.P. Morgan, commented: "With the new Maxeler technology, J.P. Morgan's trading businesses can now compute orders of magnitude more quickly, making it possible to improve our understanding and control of the profile of our complex trading risk."

As part of the deal, J.P. Morgan has ordered a second powerful supercomputer for use within other parts of the fixed income business at the bank. This new dataflow supercomputer will be equivalent to over 12,000 conventional x86 cores, providing 128 Teraflops of performance.

"This new level of performance comes at a fraction of the cost, power and space compared to using conventional computers", said Gerald Aigner, a senior advisor at Maxeler Technologies who previously had responsibility for datacenters at Google. "Maxeler dataflow computers bring a novel data-centric computing technology to the forefront of high performance computing. Where computation really matters, the Maxeler approach has the potential to make a significant difference."

In addition to compute performance, total cost of ownership is becoming a key efficiency indicator for data-center operations. As the new Maxeler machine at J.P. Morgan is 25 times smaller than equivalent x86 machines, this translates to 25 times less power consumption and 25 times less space requirement in the data-center per computation.

...

"Following our strategic investment in Maxeler Technologies earlier this year, we are excited to see further direct impact on our compute capabilities in less than six months," added Cherasia.

https://www.hpcwire.com/hpcwire/2011-12-15/j.p._morgan_deploys_maxeler_dataflow_supercomputer_for_fixed_income_trading.html

Thu, Jan 19, 2012 - 1:35pm Thieving Corp.
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Wall Street Shopping for Fast FPGAs

This is a paid subscriber-only article, but the summary tells us what's going on. The wsj story linked above mentions this dataflow architecture is implemented using FPGAs. https://en.wikipedia.org/wiki/Field-programmable_gate_array

Not many companies can afford to design computers for a single job. What is making the dataflow design more feasible is the use of a kind of chip called FPGAs, for field programmable gate arrays, whose circuitry can be reconfigured with electric signals once they have left the factory.

https://blogs.wsj.com/digits/2011/12/13/maxeler-makes-waves-with-dataflow-design/?mod=WSJBlog

Now we see this:

January 17, 2012

Wall Street Shopping for Fast FPGAs

Xilinx Virtex-7 and Altera Stratix V FPGAs are becoming hot commodities for financial trading systems trying to shave fractions of a second off the time to process a trade and reap millions in profits.
https://confidential.eetimes.com/news-updates/4234757/Wall-Street-Shopping-for-Fast-FPGAs Notice what is going on here. The CPU blockheads are unable to build a faster processor/computer because their minds are trapped inside the box of the sequential stored program architecture and blinded by their existing revenue streams. The users with the need for speed are interpreting this as damage and routing around it, implementing alternative processor architectures using "prefab" programmable chips, and getting significant performance advantages.
Thu, Jan 19, 2012 - 5:33pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

Designer of Microprocessor-Memory Chip Aims to Topple Walls

More of a clue here, a good improvement, still no cigar.

Designer of Microprocessor-Memory Chip Aims to Topple Memory and Power Walls

by Michael Feldman Whether you're talking about high performance computers, enterprise servers, or mobile devices, the two biggest impediments to application performance in computing today are the memory wall and the power wall. Venray Technology is aiming to knock down those walls with a unique approach that puts CPU cores and DRAM on the same die. The company has been in semi-stealth mode since it inception seven years ago, but is now trying to get the word out about its technology as it searches for a commercial buyer.

https://www.hpcwire.com/hpcwire/2012-01-17/designer_of_microprocessor-memory_chip_aims_to_topple_memory_and_power_walls.html

Fri, Feb 3, 2012 - 10:14pm
Thieving Corp.
Offline
-
Washington, DC
Joined: Jul 14, 2011
148
538

DARPA Program Attacks Power Wall For Embedded Computing

DARPA Program Attacks Power Wall For Embedded Computing


Jan. 26 -- Computational capability is an enabler for nearly every military system. But computational capability is increasingly limited by power requirements and the constraints on the ability to dissipate heat. One particular military computational need is found in intelligence, surveillance and reconnaissance systems where sensors collect more information than can be processed in real time. To continue to increase processing speed, new methods for controlling power constraints are required.

In the past, computing systems could rely on increasing computing performance with each processor generation. Following Moore’s Law, each generation brought with it double the number of transistors. And according to Dennard’s Scaling, clock speed could increase 40 percent each generation without increasing power density. This allowed increased performance without the penalty of increased power.

That expected increase in processing performance is at an end,” said DARPA Director Regina E. Dugan. “Clock speeds are being limited by power constraints. Power efficiency has become the Achilles Heel of increased computational capability.

https://www.hpcwire.com/hpcwire/2012-01-30/darpa_program_attacks_power_wall_for_embedded_computing.html

Fri, Apr 27, 2012 - 12:51pm Thieving Corp.
bbacq
Offline
-
Ottawa, ON
Canada
Joined: Sep 13, 2011
242
1619

Thanks again..

For keeping us all current on these technology issues, TCorp!

Depressing, though, to see JPM pushing tech to do faster quant, which is inherently flawed. But it gives them an edge as they (unknowingly?) bring down the system...

Nassim Taleb is writing a new book in which he outlines his concept of "antifragility".

Interview with Taleb can be found here...

Things feel pretty fragile...

best

bbacq