Pages: 1 [2] 3
Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2009-06-21 12:34:32


Windows HPC: GPGPU programming with CUDA


SC08: Nvidia rep discusses & demos the CUDA programming model



Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2009-06-25 21:22:29

nVidia's GT300 specifications revealed - it's a cGPU!


Even though it shares the same first two letters with GT200 architecture [GeForce Tesla], GT300 is the first truly new architecture since SIMD [Single-Instruction Multiple Data] units first appeared in graphical processors.

GT300 architecture groups processing cores in sets of 32 - up from 24 in GT200 architecture. But the difference between the two is that GT300 parts ways with the SIMD architecture that dominate the GPU architecture of today. GT300 Cores rely on MIMD-similar functions [Multiple-Instruction Multiple Data] - all the units work in MPMD mode, executing simple and complex shader and computing operations on-the-go. We're not exactly sure should we continue to use the word "shader processor" or "shader core" as these units are now almost on equal terms as FPUs inside latest AMD and Intel CPUs.

GT300 itself packs 16 groups with 32 cores - yes, we're talking about 512 cores for the high-end part. This number itself raises the computing power of GT300 by more than 2x when compared to the GT200 core. Before the chip tapes-out, there is no way anybody can predict working clocks, but if the clocks remain the same as on GT200, we would have over double the amount of computing power.

If for instance, nVidia gets a 2 GHz clock for the 512 MIMD cores, we are talking about no less than 3TFLOPS with Single-Precision. Dual precision is highly-dependant on how efficient the MIMD-like units will be, but you can count on 6-15x improvement over GT200.


More . . .

Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2009-06-26 12:03:14

PGI to deliver CUDA GPU-accelerated Fortran compiler


PGI [The Portland Group] and nVidia announced that PGI will release the "Fortran language specification for CUDA GPUs". With availability in November of this year, this announcement represents the highest level of support for GPU-accelerated Fortran.

While we were surprised that nVidia remained silent on the whole CUDA development for own GPU Technology Conference happening at the end of September, nVidia is looking to expand the CUDA architecture from the world of SIMD to MIMD and attack CPUs in HPC space [High Performance Computing].

Long story short, arguably the best Fortran compiler out there will now have native support for nVidia GPUs, leaving CUDA for years to come. The compiler enables developers to send the general-purpose computational kernels to GPUs rather than on the x64 processors. We're happy to see GPGPU/GPU Computing trend spreading its wings.


More . . .

Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2009-08-17 18:39:33


Rumor: Nvidia Working on Dual GT300 Videocard to Replace GTX 295



You can finally find Nvidia's dual-GPU GTX 295 videocards in stock at pretty much any e-tailer who carries the part, but if you've waited this long, you might want to consider holding out a few more months. According to the latest rumblings, Nvidia plans to replace the flagship part with a dual GT300 card.

News and rumor site Fudzilla claims to have confirmed the rumor, but other details, including exactly when it will ship, remain sparse. If all goes to plan, Nvidia might have a demo ready in late Q4 2009 and start shipping in January 2010, but that remains to be seen.

The new card will apparently be DirectX 11 compatible and built to run parallel processing CUDA, DirectX compute, or OpenCL. It will also go toe-to-toe with AMD's upcoming dual RV870 card.


link

Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2009-10-01 10:49:17


Nvidia reveals Fermi GPU architecture


In what looks like a first wave of chip details, Nvidia has been praising Fermi's GPU-compute capabilities.

Fermi has 16 streaming multiprocessors and 512 discrete cores, more than double the number of CUDA cores of the GT200.

Nvidia's next generation GPU uses 64-bit interfaces and six DRAM interfaces, which means that Fermi has a total path to memory that is 384 bits wide. This is fewer than the GT200, but Fermi more than makes up for that by delivering nearly twice the bandwidth per pin via support for GDDR5 memory.



More . . .

Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2009-10-03 11:27:20


Nvidia fires off Fermi, pledges radical new GPUs: Three billion-transistor HPC chip, anyone?


Nvidia last night introduced the new GPU design that will feed into its next-generation GeForce graphics chips and Tesla GPGPU offerings but which the company also hopes will drive it ever deeper into general number crunching.

While the new chip is dubbed 'Fermi', so is the architecture that connects a multitude of what Nvidia calls a "Streaming Multiprocessor". The SM design the company outlined yesterday contains 32 basic Cuda cores - four times as many found in previous generations of SM - each comprising one integer and one floating-point maths unit. It is able to schedule two groups of 32 threads - a group Nvidia calls a "warp" - at once.

This is only the first Firmi GPU design. It's aimed at science and engineering GPGPU apps rather than game graphics, so future Fermi-based GeForce chips will likely sport less complex layouts. GT300, Nvidia's next GPU core, will be derived from Fermi, but don't expect it to show off all the superlatives Nvidia has been claiming for the Fermi chip.



More . . .

Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2009-11-20 17:38:39


Prof. Wu Feng of Virginia Tech talks about their use of GPU Computing @ SC09


Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2009-12-03 12:00:56


Andy Keane, General Manager for the Tesla GPU Computing Business Unit at NVIDIA discusses what NVIDIA is showing this week (November 2009) at SC09.


Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-01-12 11:48:48


Parallel programming in reality


Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-01-20 11:21:27


University of Maryland is using GPUs to understand ionized plasma under thermal nuclear conditions for physics and energy research, research that is aiding the design of new energy generation devices.

The university is using large GPU clusters to provide much more science for the dollar, while reducing calculation times from a day to an hour.


Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-01-31 00:50:48



Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-02-06 07:36:49

Beside the regular NV70 and GT300 codenames [codename for the GPU], nVidia's insiders called the GPU architecture - Fermi. Enrico Fermi was an Italian physicist who is credited with the invention of nuclear reactor. That brings us to one of codenames we heard for one of the GT300 board itself - "reactor".
When it comes to boards themselves, you can expect to see configurations with 1.5, 3.0 GB and 6GB of GDDR5 memory, but more on that a little bit later.

GPU specifications
This is the meat part you always want to read fist. So, here it how it goes:

* 3.0 billion transistors
* 40nm TSMC
* 384-bit memory interface
* 512 shader cores [renamed into CUDA Cores]
* 32 CUDA cores per Shader Cluster
* 1MB L1 cache memory [divided into 16KB Cache - Shared Memory]
* 768KB L2 unified cache memory
* Up to 6GB GDDR5 memory
* Half Speed IEEE 754 Double Precision


As you can read for yourself, the GT300 packs three billion transistors of silicon real estate, packing 16 Streaming Multiprocessor [new name for former Shader Cluster] in a single chip. Each of these sixteen multiprocessors packs 32 cores and this part is very important - we already disclosed future plans in terms to this cluster in terms of future applications. What makes a single unit important is the fact that it can execute an integer or a floating point instruction per clock per thread.



More . . .

Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-02-06 17:31:00


Nvidia Fermi [aka GF100]:



Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-02-07 15:06:32


AMD's Margaret Lewis and partners discuss the increasing importance of heterogeneous platforms for HPC, being driven by AMD's ATI Stream technology.


Rakarin
 
BAM!ID: 1019
Joined: 2006-05-30
Posts: 92
Credits: 0
World-rank: 0

2010-02-07 16:44:51

I've wondered why ATI and Nvidia haven't made GPU cards where the "video card" parts are stripped out, perhaps even removing elements of the GPU. Basically, just make it a streaming-data chipset without the graphics capabilities. At the very least, it would let you use an ATI or Nvidia "processor" without having to attach a dummy loop-back VGA plug.
Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-02-07 17:00:01

Rakarin wrote:
I've wondered why ATI and Nvidia haven't made GPU cards where the "video card" parts are stripped out, perhaps even removing elements of the GPU. Basically, just make it a streaming-data chipset without the graphics capabilities. At the very least, it would let you use an ATI or Nvidia "processor" without having to attach a dummy loop-back VGA plug.



I would expect that enhancement is close to the horizon.

I'm really expecting Intel to get into this with their Larrabee architecture.

. . . imagine a real co-processor card instead of crunching on a video card. . . .

Guest

2010-02-07 17:35:53

Sid2 wrote:
I would expect that enhancement is close to the horizon.

I'm really expecting Intel to get into this with their Larrabee architecture.

. . . imagine a real co-processor card instead of crunching on a video card. . . .


you still do not understand what's on..

every kind of co-processor we have seen (and there have been quite some of them) will do nothing usefull unless software get's programmed to use them.

starting with 8088 - adding an 8087 (which cost you more than 1000 DM back then) did help nothing - you had to get the software that would use it also. so some extra thousands went away to buy the special autocad-version that would use it. but it could do maths up to 100 times faster than the CPU.

then cyrix and some other companies came up with alternatives. to use them to their full potential, you had to have the software that was using the special features or simply have something like an ix87.

then with i486 the co-processor moved in with the main CPU - and still the same situation.

and it's still there - only some apps get tuned to use the extra capabilities. and some of them are hardly even able to profit from it because they cannot be parallelized much or those special features the coprocessor provides, are simply useless for it.



Rakarin
 
BAM!ID: 1019
Joined: 2006-05-30
Posts: 92
Credits: 0
World-rank: 0

2010-02-07 18:23:11

frankhagen wrote:
Sid2 wrote:

. . . imagine a real co-processor card instead of crunching on a video card. . . .


you still do not understand what's on..
every kind of co-processor we have seen (and there have been quite some of them) will do nothing usefull unless software get's programmed to use them.


I don't think it's entirely fair to assume that wasn't understood. Even most people with a basic understanding of computers understand that a graphic co-processor (video card) will not work unless you load drivers. Now, there is a lot of general ignorance out there, and every time I have to deal with "I *need* a monitor with *lots* of RAM!" and "I want one of *those* keyboard with lights and buttons so my computer will be as fast as her's" a little piece of my soul implodes.

Keep in mind the context we are discussing here is data crunching on GPU's for high performance computing and supercomputers. These are areas where the IT techs and project managers will hopefully know better. (Not high level managers. "We can't have a green data center because corporate colors are blue and gray.&quot In my post, I figured it would cut costs to make an nVidia "co-processor card" without video capabilities. If nothing else, it eliminates the need for video port loop-backs. Right now, I know there is a cost issue. The cards are being mass-produced. It may not be cost-effective now to make a new, specialized card. However, in the future, it makes sense. Look at SLI with Nvidia. Why not have an "SLI model" that cuts cost by removing the video port, and perhaps... how do I say this... "direct video rendering" capabilities? Just make an SLI co-card that's the same model, but only acts as a co-processor. To the PC and operating system, it's still a video card, just without displaying pretty pictures on a monitor. Similarly with heterogenous supercomputers. Why not just have cards that don't have DVI ports and don't need loop-backs? At that point, we would essentially have a co-processor card that looks like a video card to the PC.

Note that I'm still talking about a CUDA or OpenCL environment. The difference is more streamlined, and therefore potentially somewhat cheaper, hardware for the same chipset. No one is talking about duct-taping a CUDA or Cell processor to the motherboard and expecting the data gnomes to suddenly work faster.
Guest

2010-02-07 18:54:44

Rakarin wrote:
I don't think it's entirely fair to assume that wasn't understood. Even most people with a basic understanding of computers understand that a graphic co-processor (video card) will not work unless you load drivers. Now, there is a lot of general ignorance out there, and every time I have to deal with "I *need* a monitor with *lots* of RAM!" and "I want one of *those* keyboard with lights and buttons so my computer will be as fast as her's" a little piece of my soul implodes.


right - "general ignorance out there". of course installing some thing, loading it's drivers and hoping to see some benefit simply depends on ones ability to get a thing one can make use of.


Keep in mind the context we are discussing here is data crunching on GPU's for high performance computing and supercomputers. These are areas where the IT techs and project managers will hopefully know better. (Not high level managers. "We can't have a green data center because corporate colors are blue and gray.&quot


are we? afaik w7 as a so called mainstream OS does use the physx extensions if they are available. do not ask me what the benefits are, i switched it off right away..

In my post, I figured it would cut costs to make an nVidia "co-processor card" without video capabilities. If nothing else, it eliminates the need for video port loop-backs. Right now, I know there is a cost issue. The cards are being mass-produced. It may not be cost-effective now to make a new, specialized card. However, in the future, it makes sense. Look at SLI with Nvidia. Why not have an "SLI model" that cuts cost by removing the video port, and perhaps... how do I say this... "direct video rendering" capabilities? Just make an SLI co-card that's the same model, but only acts as a co-processor. To the PC and operating system, it's still a video card, just without displaying pretty pictures on a monitor. Similarly with heterogenous supercomputers. Why not just have cards that don't have DVI ports and don't need loop-backs? At that point, we would essentially have a co-processor card that looks like a video card to the PC.


that's things mungled up. first: those video-ports are only a very very small part of produtcions costs, so what the hack. second it's an OS thing (for windoze) that you need to have a monitor attached to a GPU to make use of it. and 3rd - of course it would be possible to tunnel that and and use it not matter what some silly OS thinks.

Note that I'm still talking about a CUDA or OpenCL environment. The difference is more streamlined, and therefore potentially somewhat cheaper, hardware for the same chipset. No one is talking about duct-taping a CUDA or Cell processor to the motherboard and expecting the data gnomes to suddenly work faster.


even if some kind of coprocessor (and those IGP's are exactly that) has entered the chipset - you either got the software to make use of it or have a nice piece of sillycone..

to sum it up: intel, amd, nvidia or who ever else may come up with anything they like - for the mainstream it's eather x86-compliant or special and has to wait until software is available to use it.


Rakarin
 
BAM!ID: 1019
Joined: 2006-05-30
Posts: 92
Credits: 0
World-rank: 0

2010-02-07 20:35:28

frankhagen wrote:
right - "general ignorance out there". of course installing some thing, loading it's drivers and hoping to see some benefit simply depends on ones ability to get a thing one can make use of.


You have no idea how much I dread some of our VP's hearing about CUDA, then wanting $2K video cards so their Outlook runs faster. We already had one insist on an 8-core nehalem Mac tower, running parallels, because he couldn't tolerate how slow Outlook ran on his PC. Now, the explanations that he kept hundreds of megs of crap in his inbox, 500Mb+ in his calendar alone, sever personal folders file (with lots of duplication) and many of those being well over 1Gb each, *and* running Google Desktop to index his mailbox "because it searches faster than Outlook" (and breaks HIPPA rules) fell on deaf ears. Some of the people I find to be the most profoundly willfully ignorant are highly educated.


frankhagen wrote:
are we? afaik w7 as a so called mainstream OS does use the physx extensions if they are available. do not ask me what the benefits are, i switched it off right away..


Um... ah.... WHAT?!? Why? I can see native drivers. That makes sense. Or the ability to make calls from the OS or kernel. Ok... But what would the OS itself do with it? The only possible thing I can see is one of those experimental desktop manages where all desktop items are treated as photographs you can slosh around the desktop.

Or... It's possibly forward thinking for using the physx engine as a math co-processor. That's possible.
Guest

2010-02-07 20:52:09
last modified: 2010-02-07 20:55:18

Rakarin wrote:
You have no idea how much I dread some of our VP's hearing about CUDA, then wanting $2K video cards so their Outlook runs faster. We already had one insist on an 8-core nehalem Mac tower, running parallels, because he couldn't tolerate how slow Outlook ran on his PC.


you bet it do know!


Now, the explanations that he kept hundreds of megs of crap in his inbox, 500Mb+ in his calendar alone, sever personal folders file (with lots of duplication) and many of those being well over 1Gb each, *and* running Google Desktop to index his mailbox "because it searches faster than Outlook" (and breaks HIPPA rules) fell on deaf ears. Some of the people I find to be the most profoundly willfully ignorant are highly educated.


sorry i have to go german now: der grad der akademischen bildung ist reziprok zur altagstauglichkeit.

been there, seen that - there is no cure.


frankhagen wrote:
It's possibly forward thinking for using the physx engine as a math co-processor. That's possible.


yup - and then we are back where apps have to be desinged to make us of it...

.. so all that hype about GPU's ends up where those guys who are coding make use of it or not.

some rare apps in BOINCworld prove that it can be done (where it makes sense) - but as for now we even are crunching those perl scripts..
Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-02-17 13:51:20


This video documents the building of Phase 2 of Atlas Folder over the weekend of May 8th-10th. Atlas Folder is a GPU "folding farm" that donates computer time to Stanford University's Folding@Home project. Prior to this addition Atlas Folder consisted of 23 nVidia GTX295s and one PS3.

This effort added a total of 32 nVidia 9800GX2 dual-GPU video cards to the machine bringing the total number of discrete GPUs to 110. These cards, formerly belonging to nitteo of team overclock.net, were jointly purchased by Jason Farque of atlasfolding.com and John Van Arnam of fold4life.com. Fold on!


Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-05-12 00:21:42


For years Moore's Law ruled the CPU but, at 45-years old, it has hit a brick wall. The big brains at Berkeley agree, and share their views in a paper titled Parallel Computing: A View From Berkeley.

This brick wall to increase performance is now in front of every type of computing - from consumer to supercomputing.

We have reached the limit of what is possible with one or more traditional, serial central processing units, or CPUs. It is past time for the computing industry--and everyone who relies on it for continued improvements in productivity, economic growth and social progress--to take the leap into parallel processing.


Intel and AMD recognize that parallel computing is driving the PC industry. That is the reason both companies are attempting to morph from CPU companies to visual computing ones. AMD is trying to do this with their purchase of ATI and Intel is attempting it by trying to create their own discrete GPU.

NVIDIA's Chief Scientist Bill Dally says there is a new sheriff in town.

To continue scaling computer performance, it is essential that we build parallel machines using cores optimized for energy efficiency, not serial performance. Building a parallel computer by connecting two to 12 conventional CPUs optimized for serial performance, an approach often called multi-core, will not work. This approach is analogous to trying to build an airplane by putting wings on a train. Conventional serial CPUs are simply too heavy (consume too much energy per instruction) to fly on parallel programs and to continue historic scaling of performance.


The CPU will remain an essential part of the PC, but the benefits of investing in a high-performance GPU far outweigh the benefits of adding a powerful CPU. We believe that the most important processor for the 21st century is the GPU, and with that comes the new way to use Moore's Law.




More . . .

STE\/E
Tester
BAM!ID: 57534
Joined: 2008-08-27
Posts: 2078
Credits: 1,915,714,502,863
World-rank: 2

2010-05-12 09:54:02
last modified: 2010-05-12 09:59:49

The CPU will remain an essential part of the PC, but the benefits of investing in a high-performance GPU far outweigh the benefits of adding a powerful CPU. We believe that the most important processor for the 21st century is the GPU, and with that comes the new way to use Moore's Law


Seems to me I basically expressed the same opinion several times in the past but it was Scoffed at by some people and Dismissed as being Irrelevant to BOINC'ing ... But then I'm not a Big Brain from Berkeley so what would I know ...
Sid2
 
Forum moderator - BOINCstats SOFA member
BAM!ID: 28578
Joined: 2007-06-13
Posts: 7336
Credits: 593,088,993
World-rank: 3,482

2010-05-12 11:41:00

PoorBoy wrote:

Seems to me I basically expressed the same opinion several times in the past but it was Scoffed at by some people and Dismissed as being Irrelevant to BOINC'ing ... But then I'm not a Big Brain from Berkeley so what would I know ...



Two years ago, crunching on a PS3 was the hot setup for BOINC.

. . . who knows what the hot hardware will be in two years, my guess is coprocessors that will make the current crop of GPU's as obsolete as PS3's.

Pages: 1 [2] 3

Index :: Gadgets, Games and Gizmos :: GPU Computing: the Essential Guide
Reason: