The Cell [Archives] - Canardpc.com

Voir la version complète : The Cell

beud

08/02/2005, 05h25

Il merite bien un petit topic a lui tout seul non?

je demarre avec des slides impressionnates sur Pc Watch

http://pc.watch.impress.co.jp/docs/2005/0208/kaigai153.htm

putain il tourne avec 1v le petit!
2 Vt, a tt les coup c'est du Transmeta inside ca...

http://pc.watch.impress.co.jp/docs/2005/0208/kaigaip022.jpg

beud

08/02/2005, 07h05

http://pc.watch.impress.co.jp/docs/2005/0208/kaigai001.jpg

beud

08/02/2005, 07h07

http://pc.watch.impress.co.jp/docs/2005/0208/kaigai005.jpg

Romuald

08/02/2005, 07h32

9Core d'execution cadancé à 4Ghz pour 1.1v, faut pas nous prendre pour des cons aussi.. Ca marche un tel truc pareil?
C'est un sorte d'Opteron multicore, avec son controleur de mémoire intégré et de bus interface controleur intégré too.
J'ai quand même pas mal de doute, ont dirait la vieille news d'Intel qui prévoyait des CPU au delà de 6Ghz.. :whistle:

Doc TB

08/02/2005, 11h34

Doc TB

08/02/2005, 11h52

Utilisé en tandem, il permet d'exécuter simultanément 10 séquences d'instructions, à comparer à [b]la limitation de l'architecture x86 qui ne peut exécuter que 2 processus simultanément.

C'est n'importe quoi, qu'est ce que c'est que cette connerie encore ?

Je supose qu'ils veulent dire par là que les CPU x86 n'en sont qu'au dual core.

Le topic est là (http://forum.x86-secret.com/viewtopic.php?t=3051) maintenant.

On en est a 2 pour le moment et 4 dans trois mois, soit bien avant que le Cell ne soit dispo. Qd il le sera, on en sera surement a 8.

Ceci dit, ils ont mal interpretes, parcequ'ils parlaient d'instructions par cycle. Et la, on en est déjà à 4...

ludoschmitt

08/02/2005, 12h02

Yasko

08/02/2005, 12h10

Utilisé en tandem, il permet d'exécuter simultanément 10 séquences d'instructions, à comparer à [b]la limitation de l'architecture x86 qui ne peut exécuter que 2 processus simultanément.

C'est n'importe quoi, qu'est ce que c'est que cette connerie encore ?

Je supose qu'ils veulent dire par là que les CPU x86 n'en sont qu'au dual core.

Le topic est là (http://forum.x86-secret.com/viewtopic.php?t=3051) maintenant.

Quoted depuis le topic originel :

Oui, mais c'est pas lié à l'architecture du x86. Tu peux mettre autant de cores que tu veux que ca va pas changer l'architecture d'un core (d'exécution).
Peut être lié au registres de pointeur de pile ESP et de code EIP ?

Doc TB

08/02/2005, 12h22

Je suis d'accord avec toi mais on a pas eu d'annonce concernant le quadri core que je saches alors que pour eux y a eu une anonce d'un Cell octo core. Et puis il font de l'information en général chez Silicon ils sont pas spécialisés hardware.

Et puis comme le dit Romuald le HT c'est pas deux intruction par cycle. Si j'ai bien compris c'est avantageux quand les applis sont optimisées ou en multitâche. Alors qu'avec du multi core on peut ordonancer tout ça je crois. :??:

Vous confondez pas mal de chose la. L'HT, c'est 2 threads executés simultanément. Alors, si on parle de thread (process), Oui, le P4 actuel peut executer 2 threads en meme temps. Et avec les Bi-Core + HT, ce sera 4.

Maintenant si on parle d'instructions par secondes, le P4 peut sortir 2 instructions par cycle au max, mais ca n'a rien a voir avec l'HT, ca vient des ALU double vitesse. Sur P4 Bi Core, ce sera 4.

Neo_13

08/02/2005, 12h24

pour moi, le cell, c'est de la branlette

la ps2 devait tout déchirer sa mère et finalement, bof

les power devait tuer le x86 ya 20ans

bref, les vannes de ce type sont à ranger avec le mac tuera le pc, etc...

et un core power + 8 dsp spécialisés = un g5 avec plus d'altivec... comment ils font 4GHz alors qu'il ne s'en sortent pas à 2.5GHz (watercooling) ?

Doc TB

08/02/2005, 12h35

EPIC

08/02/2005, 12h37

Aprés avoir matté les docs, il ne s'agit que d'un CPU basique de type Power PC et de 8 DSP qui traitent du SIMD 128-bit uniquement. J'appelle pas ca un CPU multi-core moi, un CPU multi-core, c'est plusieurs cores de CPU sur un meme die. La y a qu'un CPU. Ou alors, faut dire que le GeForce6 est un C¨PU multi-core parcequ'il y a plusieurs unités de shaders...

Et par deçu tout, j'emet de fort toute quand au fonctionnement à 4 GHz à l'heure actuelle...

Il deviendra multicore par la force des choses quand plusieurs cell seront accouplés ensemble comme prévu dans le brevet. Enfait tel que présenté nous avons le cell dans sa version minimaliste ce qui permettrait d'attiendre les 4 Ghz maintenant avec 4 die de 234 Millions de transistors chaque il est utopique d'imaginer le tout fonctionnant à 4Ghz ou alors bonjour la consomation et le dégagement thermique sans parler des Yields des chaines de productions... Une fréquence de 2Ghz serait beaucoup plus réliste et si on prends le cas de la Future PS3 qui devrait en être équipé il faut compté avec 64 Mo d'eDRAM ce qui représente presque 1 Milliards de transistors en plus, soit au final une puce qui avoisine les deux milliards de transitors comme le Montecito d'Intel...

Sinon effectivement il s'agit probablement d'un POWER PC fortement inspiré du PPC 970 d'ailleurs car il intègre le VMX (non commercial d'IBM du clone de L'altivec de Motorola)on peut même se demander si cette puce intègrera pas bienôt une nouvelle génération de MAC...

EPIC

08/02/2005, 12h45

J'ai décortiqué un peu le bébé :

- 56 (8*7) Millions de transistors pour le core des DSP

Comment tu as eu cette information sur le nombre de transitors des DSP ? autant pour les caches ok et pour le core du Power PC mais pour le nombres de transitors des DSP je ne comprends pas...

Doc TB

08/02/2005, 12h47

Tu as le nombre de transistors total, tu as l'aire du die, tu as la taille des caches, tu as une photo du die. Tu calcule.

C'est a la louche d'environ 10%, mais bon, on y est la.

EPIC

08/02/2005, 12h50

Ouais effectivement je ne pensait pas à la surface...

Sinon 7 millons de transistors ça me parait un peu léger quoi que je ne connais pas le nombre de transitors qu'occupe le SSE ou l'Ativec... j'aurais plutôt estimer le tout à 20-30 Millions de transistors parceque il semble qu'il y ait beaucoup plus dunité de clacul que dans l'altivec ou le SSE !

Doc TB

08/02/2005, 12h53

Ceci dit, y a pas mal de parametre a prendre en compte comme la densité du cache (bien plus elevée que celle du core) et le fait que le cache necessite 54 bit par octet de mémoire.

EPIC

08/02/2005, 13h00

et un core power + 8 dsp spécialisés = un g5 avec plus d'altivec... comment ils font 4GHz alors qu'il ne s'en sortent pas à 2.5GHz (watercooling) ?

Alors là c'est le grand mystère !!!! Moi aussi ça m'interpelle d'autant que les pipelines des Power et POWER PC ne sont pas très long (le plus profonds étant celui du PPC 970 à avec 16 étages) brefs j'ai du mal à y croire... peutêtre qu'en 65 nm ça pourrait le faire ! :??:

Doc TB

08/02/2005, 13h06

EPIC

08/02/2005, 13h34

Peut etre aussi que seul les DSP tournent a 4 GHz...

Peut etre également que le CPU tourne à 400 MHz de base et que, comme il peut lacher 10 instructions par cycle, c'est un "équivalent 4 GHz". Ca c'est trés probable et la technique a déja été utilisée par tout le monde, comme Intel avec son FSB 800 MHz !

Une sorte d'équivalence en sorte. un peu comme le principe des mémoire DDR...

Cela dit je penserait plus à un Cell cadencé à 1Ghz réel ce qui ferait au final avec 4 Cell (un Quad-Core donc)une fréquence équivalent à 4 Ghz...enfin pour ce genre d'architecture je ne pense pas que la féquence soit l'atout principal de ce processeur !

Doc TB

08/02/2005, 13h35

C'est possible aussi

Yasko

08/02/2005, 14h41

http://www.matbe.com/images/biblio/cpu/000000002803.jpg
http://www.matbe.com/actualites/imprimer/8754/

ludoschmitt

08/02/2005, 14h46

J'ai Googlizer (merci Peggasuss) pour des infos supplémentaires sur le Cell.

J'ai trouvé une présentation qui a l'air pas mal:
http://www.blachford.info/computer/Cells/Cell0.html
Là je suis en train de lire la part 1.

Yasko

08/02/2005, 14h59

Déja postée dans le topic "News en tout genre".
C'est des déductions tirées de l'analyse des brevets du Cell (balèze le gars si il s'est pas planté).

Yasko

08/02/2005, 15h24

Rambus in the Cell @ Anandtech (http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2341)

EPIC

08/02/2005, 16h57

Déja postée dans le topic "News en tout genre".
C'est des déductions tirées de l'analyse des brevets du Cell (balèze le gars si il s'est pas planté).

C'est clair J'ai fait la même chose en étudiant le brevet il ya deux ans déja ! j'en ai même fait un dossier de quarante pages...

Alexko

08/02/2005, 18h32

Alexko

08/02/2005, 18h35

Peut etre aussi que seul les DSP tournent a 4 GHz...

Peut etre également que le CPU tourne à 400 MHz de base et que, comme il peut lacher 10 instructions par cycle, c'est un "équivalent 4 GHz". Ca c'est trés probable et la technique a déja été utilisée par tout le monde, comme Intel avec son FSB 800 MHz !

C'est aussi ce que je pense. Sur les "SPU" il est possible de faire un pipeline très long, donc une telle fréquence devrait être possible en 90 nm SOI (Intel fait bien du 7.6 GHz sur ses ALU double-pumped). Par contre, je vois vraiment pas comment le PowerPC pourrait atteindre 4 GHz...

EPIC

08/02/2005, 19h04

(Intel fait bien du 7.6 GHz sur ses ALU double-pumped).

Ouais enfin c'est un abus de langage généralement toléré, il me semble que c'est plutôt parceque les 2 ALU sont capable selon les cas (c'est a dire pas tout le temps)d'éxécuté deux µops en 1 cycle et donc par extension à 3.8 Ghz ça nous fait théoriquement une ALU cadencé à une fréquence équivalente à 7.6 Ghz...

Par contre, je vois vraiment pas comment le PowerPC pourrait atteindre 4 GHz...

Pareil c'est du gros délire ce truc !!!! On le voit bien ce proc à été prévus pour paraléliser un maximum de calcul grâce à ses unités SIMD 128 bits...
Le Power PC (si s'en est un) n'est là que pour assurer un role "plus secondaire" fare tourner l'OS répartir la charge des DSP et au final assurer une compatibilité avec le reste de la gamme serveur d'IBM...

Yasko

08/02/2005, 19h38

Les unités SIMD du Cell @ Ars Technica (http://arstechnica.com/articles/paedia/cpu/cell-1.ars)

These DSP cores, which IBM calls "synergistic processing elements" (SPE), but I'm going to call "SIMD processing elements" (SPE) because "synergy" is a dumb word, are really the heart of the entire Cell concept.

EPIC

08/02/2005, 19h51

C'est un proc massivement vectoriel... :love: :love: :love:

Yasko

08/02/2005, 19h56

Oui, et les 8 DSP peuvent être sérialisés, avec un système de partage/duplication de leur mémoire cache.
cf article de Blachford.

Yasko

08/02/2005, 21h44

http://pc.watch.impress.co.jp/docs/2005/0208/kaigaip028.jpg

Inspektor_Gadget

08/02/2005, 22h40

Vais peut-être poser une question très idiote.... mais les 8 DSP sont-ils reprogrammables ? OU bien sont-ils dédiés à une tâche fixe figée ad vitam aeternam dans le silicium (on Insultor il est vrai) ?

Yasko

08/02/2005, 23h22

A mon avis, le terme DSP n'est pas très adapté, ces 8 unités s'apparentent plus à des CPU à part entière.
Architecture d'une PU :
http://arstechnica.com/images/cell/figure6.gif

beud

09/02/2005, 05h57

Tout le monde a l'air bien septique pour la frequence... mais quel interet aurait-ils a raconter des conneries?

Je vois bien les cores SIMDs a 4Ghz et le controleur PowerPC a 2Ghz par ex.

Qd mm, les bandes passantes sont siderantes:
Rambus 25.6GB/sec
FlexIO 76.8GB/s

mon pentium M est a 3GB/sec et ma carte 9800XT a 23GB/sec, j'ose pas imagine ce que donnerai un flight simulator porte sur un Cell, si celui si sature ses i/o's ! (ya pas de raison que non)

EPIC

09/02/2005, 08h45

A mon avis, le terme DSP n'est pas très adapté, ces 8 unités s'apparentent plus à des CPU à part entière.
Architecture d'une PU :
http://arstechnica.com/images/cell/figure6.gif

Je te rejoint la dessus car effectivement les SPE s'apparente "vachement" à des CPU avec leur propres mémoires certes ils seront bien plus spécialisés qu'un CPU généraliste, mais bon ca y ressemble quand même fortement...

EPIC

09/02/2005, 08h51

Qd mm, les bandes passantes sont siderantes:
Rambus 25.6GB/sec
FlexIO 76.8GB/s

T'as encore rien vus ! La technologie XDR (anciennement Yellowstone) est prévue pour atteindre les 100 Go/sec, elle utilise le principet de l'ODR (Octal Data Rate) bref les fréquences de base ne sont pas de 3.2 Ghz ni 6.4 Ghz mais plutôt de 400-800 Mhz avec du dual channel et l'ODR on atteint ainsi des fréquences équivalentes monstrueuse et des bandes passantes du même genre...

Dandu

09/02/2005, 11h41

jihef

09/02/2005, 11h58

Ca fait vraiment envie tous ces GFLOP partout mais la question que je me pose est : qu'en est-il du software ?

A l'heure ou on cherche différentes methodes pour réduire le cout de developpement des logiciels, ils proposent un proc qui se programme quasiment directement. De plus les 8 unites sont des unites vectorielles donc assez specifiques. Ca va surement etre dur a programmer.

D'un cote c'est bien en tout cas pour moi car si le Cell prend et qu'il est vraiment ardu a exploiter de facon logicielle ben j'aurais du boulot.

De l'autre si peu de personnes ne veulent investir dans des logiciels pour lui il se peut qu'il reste cantonné à des apllications spécifiques.

Je penses que le succes du Cell depend plus du bon vouloir des editeurs de logiciels plutot que de ses enormes capacites de calcul. En tout cas pour ce qui est du marché du PC.

Yasko

09/02/2005, 12h12

La difficulté va surtout être au niveau du compilateur je pense.
Et d'un autre coté, le fait que l'architecture d'un Cell soit figée (voir article de Blachford), peut contrebalancer les nouvelles difficultés liés aux choix pris.

If there is a law in computing, Abstraction is it, it is an essential piece of today's computing technology, much of what we do would not be possible without it. Cell however, has abandoned it. The programming model for the Cell will be concrete, when you program an APU you will be programming what is in the APU itself, not some abstraction. You will be "hitting the hardware" so to speak.

While this may sound like sacrilege and there are reasons why it is a bad idea in general there is one big advantage: Performance. Every abstraction layer you add adds computaions and not by some small measure, an abstraction can decrease performance by a factor of ten fold. Consider that in any modern system there are multiple abstraction layers on top of one another and you'll begin to see why a 50MHz 486 may of seemed fast years ago but runs like a dog these days, you need a more modern processor to deal with the subsequently added abstractions.

The big disadvantage of removing abstractions is it will significantly add complexity for the developer and it limits how much the hardware designers can change the system. The latter has always been important and is essentially THE reason for abstraction but if you've noticed modern processors haven't really changed much in years. The Cell designers obviously don't expect their architecture to change significantly so have chosen to set it in stone from the beginning. That said there is some flexibility in the system so it can change at least partially.

The Cell approach does give some of the benefits of abstraction though. Java has achieved cross platform compatibility by abstracting the OS and hardware away, it provides a "virtual machine" which is the same across all platforms, the underlying hardware and OS can change but the virtual machine does not.

Cell provides something similar to Java but in a completely different way. Java provides a software based "virtual machine" which is the same on all platforms, Cell provides a machine as well - but they do it in hardware, the equivalent of Java's virtual machine is the Cells physical hardware. If I was to write Cell code on OS X the exact same Cell code would run on Windows, Linux or Zeta because in all cases it is the hardware Cells which execute it.

It should be pointed out that this does not mean you have to program the Cells in assembly, Cells will have compilers just like everything else. Java provides a virtual machine but you don't program it directly either.

La difficulté ne va même pas être dans le compilateur (enfin, pas toute), mais dans le microcode (on peut l'appeler comme ça ?) du Cell.

EPIC

10/02/2005, 09h02

et les latences équivalentes.

c'est de la technologie rambus, hein. Ca a un gros défaut (a part le prix, qui est pas vraiment un défaut, ca se contourne), les latences.

en débit pur, de la rambus en dual va mieux que de la DDR en dual, mais les latences sont en retrait. et on voit ce que donne si on augmente les latences, un cache de 2Mo qui est pas plus efficace que un cache de 1Mo

Les latences seront élevés c'est indéniable cela dit pour une architecture vectorielle qui traite des flux massifs de données le désavantage d'une latence élevé est à relativiser car au final c'est la BP qui prime le plus pour ce genre d'architecture...
c'est aussi pour ça qu'un contrôleur mémoire à été intégré sans parler du PPE qui sera là pour gérer les requètes d'accès à la mémoire par les SPE donc au final les latences seront plus élevés certes mais pas forcément dans des proportions extravagantes elles ne devraient pas nuire plus que ça au reste de l'architecture.

Et je ne vois pas vraiment l'utilité de 9 core pour du jeux, mais bon, je suppose qu'ils ont bien une petite idée derrière la tête.

Regarde pour la PS2 toute la partie géométrique est assuré par les 2 VPU de l'Emotion Engine alors que Graphic Synthesizer et ses 16 pipelines ne traite que la partie pixel
Le Cell faira probablement pareil il s'occupera de la gestion polygonale, de la physique alors que la puce graphique (conçus en partie par nVidia) ne gèrera peut-être que les pixel shaders l'anti-aliasing et la compression vidéo... Enfin bref cela serait assez logique vu la puissance du cell en FPU... bref cela permettra de mettre plus de Pixel pipeline dans le GPU (24 voir 32 serait envisageable).

Yasko

10/02/2005, 09h40

c'est aussi pour ça qu'un contrôleur mémoire à été intégré sans parler du PPE qui sera là pour gérer les requètes d'accès à la mémoire par les SPE

Les SPE doivent pouvoir s'addresser directement au DMAC sans passer par le PPE, non ?

http://www.blachford.info/computer/Cells/Cell_Arch.gif

EPIC

10/02/2005, 09h51

A l'heure ou on cherche différentes methodes pour réduire le cout de developpement des logiciels, ils proposent un proc qui se programme quasiment directement. De plus les 8 unites sont des unites vectorielles donc assez specifiques. Ca va surement etre dur a programmer.

Pour la programmation des SPE se sera certainement coton cela dit SONY avec la PS2 à de l'expérience aussi dans ce domaine (5 ans) car les nombreuses critiques qui lui était faite était justement sur la difficulté de programmation à cause des VPU notamemnt et de leur gestion mémoire...

De l'autre si peu de personnes ne veulent investir dans des logiciels pour lui il se peut qu'il reste cantonné à des apllications spécifiques.
Je penses que le succes du Cell depend plus du bon vouloir des editeurs de logiciels plutot que de ses enormes capacites de calcul. En tout cas pour ce qui est du marché du PC.

Tout est possible cela dit dans le projet il y a IBM on sait déja que le proc tournera sous Linux grâce à l'Archi POWER pour le reste je ne doute pas une seule seconde qu'IBM supportera les dévellopeurs comme le font Intel ou encore AMD...

EPIC

10/02/2005, 09h52

Dandu

10/02/2005, 11h46

passer la gestion des polygones au CPU, c'est quand même un gros retour en arrière, non ?

même si le CPU a une puissance géométrique monstrueuse, il atteint pas les perfs d'un GPU dédié avec des fonctions cablées, a priori.

maintenant, Sony a toujours essayé de cacher le nombre de polygne par des gros effets de lumière, et faits la même chose sur la PSP -en puissance brute, c'est assez faible, en traitement d'image, c'est puissant-.

enfin, si c'est bien fait, ca porte pas trop a conséquences, les kyro2 approchaient des geforce 2 GTS en condition réelle, sans T&L et avec une puissance brute nettement moins grandes, mais un système bien pensé.

Reste que pour le Cell, je suis dubitatif, autant de puissances et de complexités dans un CPU qui servira à une console, ca m'étonne.

a mon avis, la version dans la console sera allégée, ou alors ils ont trouvés un truc miracle pour le produire a pas cher. Parce que bon, on peut pas se permettre de mettre des centaines d'€ dans une console.

Romuald

10/02/2005, 12h03

Dandu

10/02/2005, 12h10

Bah, déjà la gravure à 0.65 (étonnante..) réduit les coùt..
Ceci dit, les premières PS1 était à plus de 1400Fr et pareil pour les PS2.

Ils comptent peut-être aussi enfin faire une console qui peut durer?

Même en réduisant les couts, le processeur reste énorme, donc cher à produire. En plus, a une fréquence pareille, ca va chauffer, donc ventilation, gros boitier, etc.

Une console qui dure, c'est pas viable. On compare automatiquement les consoles aux PC. Même si au début la console est en avance, après 6 mois, elle est au niveau du PC, puis elle y reste 1 an environ (on tire plus du matos de la console que celui du PC, du a l'architecture figée).
Après, la console tient plus la route quand on compare les graphismes, et comme le kéké de base, il regarde les graphismes, ben on doit lancer la rumeur d'une nouvelle console.

Durée de vie moyenne commerciale : 3 ans, parfois un peu plus, mais pas beaucoup.

A part les japonais qui achètent encore des jeux SuperNes, evidemment.

On peut ire ce que on veut de la PS2, mais au niveau graphisme, a tient plus trop la route face a un PC (ou une Xbox, d'ailleurs), même si au nveau gameplay, ca dépasse le PC pour pas mal de jeux (hors simulateur de vol et parfois FPS).

Romuald

10/02/2005, 12h25

Yasko

10/02/2005, 12h55

dandu, il faut que tu lises l'article du Cell sur Onversity (http://www.onversity.com/cgi-bin/progarti/art_aff.cgi?Eudo=bgteob&M=informat&O=touslesmots&P=a0804), ou la véritable nature du la cellule macrophage de méchant intel. :sarcastic:

EPIC

10/02/2005, 13h22

Tiens un truc qui me fait bien marrer si on prend au sérieux cette news http://www.macdigit.com/index.php/weblog/more/the_cell/, Samuel devra renommer bientôt le site en PPC-secret.com désopilant... :lol: :lol: :lol:

Minuteman

10/02/2005, 13h31

Le bus de transfert aussi est fabuleux, avec un taux de transfert de 6,4 gigahertz entre la RAM et le noyau central

Oulalaaa :sarcastic:

EPIC

10/02/2005, 13h40

passer la gestion des polygones au CPU, c'est quand même un gros retour en arrière, non ?

Ben je ne pense pas l'idéal serait de faire traiter toute les opérations par la même puce comme ça plus aucun échange GPU <-> CPU un goulet d'étranglement (et pas des moindre éliminé)

même si le CPU a une puissance géométrique monstrueuse, il atteint pas les perfs d'un GPU dédié avec des fonctions cablées, a priori.

Par définition un GPU cablé est figé dès lors de sa conception une puce comme le cell pourra être programmé à volonté.

Reste que pour le Cell, je suis dubitatif, autant de puissances et de complexités dans un CPU qui servira à une console, ca m'étonne.

a mon avis, la version dans la console sera allégée, ou alors ils ont trouvés un truc miracle pour le produire a pas cher. Parce que bon, on peut pas se permettre de mettre des centaines d'€ dans une console.

D'après le brevet il y aurait 4 cell de prévus donc on est loin d'une version allégé pour la PS3... Maintenant les impératifs économiques étant ce qu'il sont...

Dandu

10/02/2005, 13h51

Niveau graphisme il faut voir ce dont on a besoin aussi.. pour afficher sur une TV standart en 1024 maximum il n'y a pas autant besoin que sur un PC en 1280 ou 1600.

Et même si ça reste limite encore la PS2 à pas mal à offrir, quand je vois GT4 que j'attend avec impatience.. C'est assez somptueux.

Pour les graphismes de la PS2, ils utilisent un gros artifices, le nombre de polygone, qui est assez faible (logique, la consoles est pas super puissante), est caché par de beaux effets dans tout les sens. C'est assez efficace comme système.

Pour la résolution, faut pas penser en termes de graphismes actuels, ou la console reste limitée a du 640*480 (même la XBOX), mais penser au futur (qui est le présent au japon) : la HDTV

donc au minimum une résolution de 1280*720, soit pratiquement 4X plus que actuellement.

sans compter que l'antialiasing est nécessaire sur une TV, parce que les écrans sont nettement plus grand que un moniteur informatique à résolution égale.

on peut se passer d'AA sur un 17" en 1280*1024, pas sur une TV 82cm dans la même résolution.

Pour le retard console/PC : c'est pas une question de puissance brute ou pas, mais de technologie. Sur PC, on change de technologie graphique tout les ans environ, sur console, tout les 3 ans au mieux.

Une XBOX a un CPU a 733 et une carte graphique équivalente a une geforce4 Ti (un peu moins en puissance brute) et est donc limitée au PixelShader 1.4 (je crois)

Même si on exploite beaucoup plus la puissance de la XBox que un PC équivalent, on reste limité par la technologie embarquée.

L'analyse de onversity est intéressante, mais je pense pas vraiment qu'il aie raison.

La console restera un loisirs, et le PC, à base de x86, restera un outil de travail qui peut etre détourné.

C'est comme les sites sur les macs qui pensent que si on porte MacOS X sur PC, Apple va casser la baraque. C'est ridicule. Parce que les défauts du système (de tout système d'ailleurs) seront apparents. Si Windows ne tournait que sur 3% du marché et avec un matériel limité, il serait plus performant, et il y aurait moins de virus et de problèmes.

La preuve ? La XBOX utilise un Windows modifié, ca plante pas vraiment, y a pas de réeel problème, et la console est plus performantes que un PC équivalen.

jihef

10/02/2005, 15h43

Tout est possible cela dit dans le projet il y a IBM on sait déja que le proc tournera sous Linux grâce à l'Archi POWER pour le reste je ne doute pas une seule seconde qu'IBM supportera les dévellopeurs comme le font Intel ou encore AMD...

Oui mais tourner sur ne veut pas forcement dire tirant parti de l'enorme puissance de calcul. Il faudrat des bons drivers et des soft optimisés spécifiquement ce qui va surement prendre du temps et ce n'est pas juste en recompilant les sources que l'on pourra arriver a utiliser toute la puissance.

Vive l'assembleur !!

Childerik

10/02/2005, 15h55

http://www.macdigit.com/index.php/weblog/more/the_cell/

L'exemple parfait de l'intégriste macintosh, bête et méchant :sarcastic:. C'est con parce que je ne cracherais pas du tout devant un portable mac avec un PPC G4, mais quand on voit des commentaires de cette sorte, ben çà dégoûte un peu ...

ludoschmitt

10/02/2005, 17h05

Blase_888

10/02/2005, 17h51

serieusement je trouve ces progresssion de prosseseur bien compliquer pourquoi on ne retournerai pas a des arvhitecture plus simple moi de transistort un pipeline long avec une bonne correction d'erreure et un bonne fréquence vert les 7-8Ghtz ? :??:

Dandu

10/02/2005, 17h55

le nombre de transistor d'un processeur n'augmente pas tellement depuis quelques années, hein. c'est juste une illusion due au fait que les caches augmentent.

entre un Celeron covington (sans L2) et un Pentium-M Dothan, y a un énorme gouffre en transitors, mais le core est pas super différent.

En plus, les longs pipeline avec processeur rapide, on voit ce que ca donne, hein.

Doc TB

10/02/2005, 18h08

Ca donne quoi ?

Dandu

10/02/2005, 18h22

ben face a un processeur plus efficace avec un pipeline plus court, ca tient pas trop la route.

le long pipeline, c'est efficace por monter en fréquence, mais pas trop au niveau des perfs.

jihef

10/02/2005, 18h23

PeGGaaSuSS

10/02/2005, 18h23

ben face a un processeur plus efficace avec un pipeline plus court, ca tient pas trop la route.

le long pipeline, c'est efficace por monter en fréquence, mais pas trop au niveau des perfs.

Tu trouve que sa a beaucoup aidé ?
En un an de Prescott on a pris 'que' 400MHz.
Et les modeles 3600 et 3800 sont surement pas les plus vendus.

Yasko

10/02/2005, 18h48

Il y a des bons compilateurs capables de vectoriser efficacement du code ?
Pas évident dans un code classique, car souvent beaucoup de dépendances dans les traitements.

C'est au moment de la conception de l'architecture qu'il faut prendre en compte cet aspect.
Souvent ca complique bien les choses puisque il faut casser le fil logique de l'algorithme conceptuel (celui avec les dépendances), pour une sorte de regroupement thématique (plusieurs opérations/traitements similaires, consécutifs, et indépendants).

jihef

10/02/2005, 18h50

Bref un compilateur qui serait capable d'exploiter le Cell n'existe pas encore.

edit : ca oblige le concepteur de programme Cell a aller contre la maniere "naturelle" d'ecrire les choses puisque le compilo. ne peut pas faire ca. Bonjour les coûts de développement.

Yasko

10/02/2005, 18h52

Bah, le problème avec la parallèlisation/vectorisation, c'est que c'est pas trop la façon naturelle/intuitive de faire les choses, et le compilateur ne peut pas non plus faire de miracle.
Refaire ce travail de remaniement d'architecture, j'y crois pas trop, ca me semble bien compliqué...

Blase_888

10/02/2005, 19h08

bas un pipiline long tres bien optimiser se serai pas possible pour le futur ?

Dandu

10/02/2005, 19h12

pour le parallélisme, y a pas trop de problèmes, vu que la PS2 est déja assez dure a programmer. C'est aussi un des avantages de la Xbox (et de feu la dreamcast), ça se base sur directX, donc c'est facile à porter et a programmer pour les prorammeurs PC.

Le long pipeline, c'est pas spécifiquement le prescott, c'est netburst en général, et ça fonctionne, ça monte en fréquences.

Maintenant, que Intel commercialise pas des prescoot très rapide ou que question efficacité ça limite, c'est un autre problème.

pour la parralléisation en codant, en école d'info il y a 4 ans, on apprenait déja a programmer en multi-thread, c'est un peu différent, mais pas vraiment plus compliqué si on s'y connait un peu en programmation.

Bon, evidemment, le codeur qui a fait du PHP ou du basic, il aura un peu de mal, mais c'est pas insurmontable.

le véritable problème, c'est que toutes les formes de codes ne sont pas multi-threadable.

Et pour les retouches en assembleur, c'est souvent que une petite partie de programmes, genre les grosses boucles et/ou gros calcul répétitif, pas tout.

EPIC

10/02/2005, 19h13

Pas évident dans un code classique, car souvent beaucoup de dépendances dans les traitements.

C'est au moment de la conception de l'architecture qu'il faut prendre en compte cet aspect.
Souvent ca complique bien les choses puisque il faut casser le fil logique de l'algorithme conceptuel (celui avec les dépendances), pour une sorte de regroupement thématique (plusieurs opérations/traitements similaires, consécutifs, et indépendants).

D'ou l'intéret des processeurs VLIW et de l'archiecture EPIC de l'Itanium...

ludoschmitt

17/02/2005, 11h46

Dans les suites de l'actualité du Cell, Apple souhaite entrer dans l'arene.

http://www.silicon.fr/getarticle.asp?ID=8570

Une technologie, qualifiée de multi processing, qui a incité la presse américaine à affubler le processeur du doux nom de 'supercomputer on a chip', super ordinateur sur une puce. Un surnom qui en revanche est peu indiqué pour le positionnement de Cell sur le marché du digital home !

En lisant ça on peut penser qu'il aura une ambiguité sur la position de ce futur processeur. Serait-ce un chip uniquement destiné au grand public ? Parce que j'avais cru entendre dire qu'ils prévoyaient de l'intégrer dans des stations de travail ?

Pascal_TTH

17/02/2005, 13h03

J'ai décortiqué un peu le bébé :

234 Millions de transistors :

- 113 (8*14) Millions de transistors pour le cache de chaque DSP
- 28 (1*28à Millions pour le cache L2 du CPU
- 28 Millions de transistors pour le core du CPU
- 56 (8*7) Millions de transistors pour le core des DSP
- 10 Millions de transistors pour le reste

Serieux, moi un DSP a 7 millions de transistors pour le core, ca me fait pas trop bander...

Je me posais jutement la question et j'arrivais à décompte comparables. Bref, Cell c'est un Power PC et 8 petits "DSP". Pour le coup des Gflops, ça me fait trop rire : l'unité la plus pitoyable jamais inventée. Ca représente de la puissance FPU sans aucune standardisation... Bref, facile de faire un gros chiffre avec un DSP ! Curieux de savoir quel est la part du Power PC dans le score ! :lol:

Yasko

17/02/2005, 15h16

Bref, Cell c'est un Power PC et 8 petits "DSP". Pour le coup des Gflops, ça me fait trop rire : l'unité la plus pitoyable jamais inventée. Ca représente de la puissance FPU sans aucune standardisation... Bref, facile de faire un gros chiffre avec un DSP ! Curieux de savoir quel est la part du Power PC dans le score ! :lol:
Les 8 unités de calcul ne sont pas vraiment des DSP qui sont assez figés dans leurs traitements. Les "SPE" du Cell semblent aussi polyvalents dans leurs calculs que les ALU/INT et FPU d'un CPU x86.
Quant au PowerPC, c'est lui le chef, donc ben, il fait pas grand chose et file le boulot aux autres, c'est ça son job... :whistle:

Doc TB

17/02/2005, 16h44

Les "SPE" du Cell semblent aussi polyvalents dans leurs calculs que les ALU/INT et FPU d'un CPU x86.

Tu sors ca d'ou ?

Yasko

17/02/2005, 17h03

De l'article de Blachford et d'Ars Technica :
http://arstechnica.com/images/cell/figure3.gif.
Mais bon c'est vrai que "semblent aussi polyvalents dans leurs calculs que les ALU/INT et FPU d'un CPU x86.", c'est peut être pousser un peu loin, surtout vu ce qu'on sait actuellement sur ces SPE (c'est à dire, pas grand chose...)

Doc TB

17/02/2005, 17h45

Bah non justement, tu suis pas la :

http://arstechnica.com/images/cell/figure7.png

En haut, on a VALU (Vectoriel ALU) pour faire du SIMD, ALU pour faire le reste et LSU (Load/Store Unit) pour les accés avec l'exterieur : C'est le CPU

Les 8 unités en dessous la, elles n'ont qu'une unité vectorielle, un LSU et un cache. C'est exactement ce qu'on appelle un DSP. Ces unités ne sont capables que de faire du vectoriel, ce n'est en aucun cas des CPUs complets. On peut comparer ca a un core de P4 et 8 cores de GeForce 1 inclus dans le meme die.

Yasko

17/02/2005, 18h22

Oui, j'avais lu la suite, mais je sais pas, c'est bizarre.
Sur le diagramme que tu as posté, les unités de traitements des SPE se resument à une VALU, alors que sur le diagramme de l'article du jour précédent, il y a également 2 FPU et 2 ALU dans les unités de traitement de chaque SPE.

Euh... bah j'ai compris, c'est parce que c'est le diagramme du PPC et pas celui d'un SPE... :whistle:

Mais euh, c'est la faute au titre ! ("Part I: the SIMD processing units"). :D

Dandu

17/02/2005, 19h11

on sait faire quoi avec des unités simd en pagaille ?

je vois le traitement video, le traitement des effets sur des textures, et calculer plein plein plein des polygones pour un jeu (le gros point faible des Playstation), mais a part ca ?

une IA ou un moteur physique, c'est pas super prédictible ni facilement parallélisable, il me semble.

et pour les traitement 3D, on a les GPU nvida/ATI qui font ca tres bien.

Yasko

17/02/2005, 20h01

Bah, on est pas obligé de utiliser les VALU en vectoriel. Qui peut le plus peut le moins.
Pour les entiers, OK, mais par contre, ce qui m'étonne du coup, c'est qu'il n'y a que 2 FPU dans tout le Cell, celles du PPC. En regard de la quantité de GFLOPS annoncée, ca parait assez surprenant...

Pascal_TTH

17/02/2005, 22h28

Ca ne peut pas faire de l'IA ni du moteur physique, ce sont des parties trop conditionnelles qui demandent une unité de prédiction de branchement. Tridam donne comme exemple des traitement de Vertex Shader. De plus les SPE semblent d'une précision très limitée... C'est franchement de plus en plus louche. Je me demande quel est l'intérêt de ces SPE s'ils collent un VPU/GPU dans la consolle. Sans compter que Cell n'est pas 2x plus lent en en double précision mais d'après IBM 10x plus lent.

Cell

4 GHz * 8 (nombre de SPE) * 2 (odd & even pipe) * 4 (4-way simd) = 256 GFlops.

Dans les Gflops annoncés, rien ne vient du PPC... D'ailleurs des Gflops ou du n'importe quoi, c'est choux vert et vert choux. On leur fait dire n'importe quoi à ces Gflops.

GeForce 6800

Pour les cg, par exemple une 6800U:
64 (64-way mimd) * 2 (2 alu) * 0.4 GHz = 51.2 GFlops

Maintenant si on compte les modifiers et mad comme 2 instructions :

64 * 5 (instruction1 + modifier1 + mad + modifier2) * 0.4 GHz = 128 GFlops

2 6800U en SLI = 256 GFlops
quotes de Tridam...

Nvidia chiffre le NV40 à 1 Tflops en précisant toutefois que c'est la somme de toutes les unités certaines n'étant pas programmables.
quote de Xellos

Ce Cell ressemble de plus en plus à une vaste blague qu'à un processeur... ne parlons même pas de super calculateur vu la précision. :lol: Le processeur qui va vite mais calcul à "l'à peu près" :D

jihef

17/02/2005, 22h33

Source pour confirmer la perte de 10x la perf en double

http://www.realworldtech.com/page.cfm?ArticleID=RWT021005084318

voir a "floating point capabilities"

lechenejb

18/02/2005, 07h51

ba en tout cas, ils auront réussi à faire un sacré ramdam avec l'annonce :lol:

Childerik

18/02/2005, 09h00

ba en tout cas, ils auront réussi à faire un sacré ramdam avec l'annonce :lol:

Il y a des gens qui vont déchanter quand ils apprendront que c'est très loin d'être le x86 killer attendu :lol:.

Neo_13

18/02/2005, 13h19

le ventilateur de l'année

paulez

20/02/2005, 02h49

C'est surtout qu'il est conçu pour être utilisé dans l'informatique industrielle, pour par exemple aller décoder du HD-TV dans une platine de salon, son architecture me semble plutôt faite pour ça que pour aller concurrencer Intel dans les PCs de bureau...

Lissyx

20/02/2005, 03h07

paulez

20/02/2005, 13h14

Je pense oui, et il n'était pas prévu de sortir plusieurs versions avec plus ou moins de DSP selon l'utilisation qu'un constructeur voulait faire du Cell ?

Yasko

20/02/2005, 13h51

C'est le nombre de Cells qui est variable je crois.
La PS3 en aurait 4.

Yasko

17/03/2005, 11h59

Understanding the Cell Microprocessor @ Anandtech (http://anandtech.com/cpuchipsets/showdoc.aspx?i=2379)

Yasko

19/03/2005, 00h28

Understanding the Cell Microprocessor @ Anandtech (http://anandtech.com/cpuchipsets/showdoc.aspx?i=2379)

The larger you make the instruction window, the more parallelism that can be extracted simply because the CPU is looking at a wider set of instructions from which to select independent ones. At the same time, the larger you make the instruction window, the lower your clock speed can be. (page 7)

Est-ce que vous savez pourquoi cette dépendance entre largeur de la fenêtre d'instruction et fréquence max ?

La conséquence qui me vient à l'esprit, si on augmente le nombre d'instructions dans cette fenêtre, c'est que potentiellement, on augmente le nombre d'instructions déja traitées (dans le désordre) et qui restent bloquées dans la fenêtre d'instruction, parce qu'une instruction antérieure n'a pas encore été traitée (puisqu'il faut les retirer dans l'ordre).
Mais pourquoi ce rapport avec la fréquence ?

JulioDup

19/03/2005, 12h01

Je pense que c'est juste du a une question de complexité, plus la fenetre d'instruction est grande, plus les dépendance inter-instruction son nombreuse, ça utilise plus de transistor du coup la fréquence diminue.
Enfin, maintenant il est clair que pour faire un x86 killer, faudra un PPE un peu plus balaize que ça (ce qu'anandtech dis d'ailleurs), parce que un cpu in-order avec des complios qui sont pas sensé faire de reordonancement trop poussé ...
En parlant de ça, j'ai regardé en diagonale l'article d'anandtech il parle pas vraiment du SMT du PPE, est-ce qu'il ne serais pas là juste pour compenser l'approche in-order : en cas de defaut cache normalement il devrait attendre que la donné arrive de la mémoire, il ont peut-être une memoire XDR de-la-mort-qui-tue mais avec un cpu à 4 Ghz ça fait beaucoup de cycle de latence, donc je me disais que que leur SMT pourrais juste servir à switcher de tâche pendant un défaut cache, un peu comme le sun niagara (un autre cpu in-order d'ailleurs ;-)).

Oxygen3

23/03/2005, 05h53

On peut comparer ca a un core de P4 et 8 cores de GeForce 1 inclus dans le meme die.

pour moi le cell l'a toujours été :)

a titre de comparaison, un die de r420 fait dans les 180M de tistors, soit 16x11M grosso modo
sachant qu'il y'a du cache, un controleur mémoire complexe et pas mal d'autres trucs, on arrive à quelque chose de comparable aux 7M des dsp annexes du cell

krumtrash

23/03/2005, 10h23

Ce qui m'impressionne le plus, ce ne sont pas les spéculations sur le Cell! c'est que IBM produit le CPU de toutes les prochaines consoles!

Avec des approches, à priori differentes:

PS3: PPC + moultes DSP
XB2: PPC x3
Big N: ??? ( simple PPC )

Ca c'est la grande classe :jap:

Et d'après vous, le CPU de la XB2 sera-t-il plus mieux que le Cell ? :D

Doc TB

23/03/2005, 13h20

Branlette tout ca, je me souvient de la PS2 qui devait etre 1000x plus puissante qu'un PC, emotion engine, gnia gnia gnia branlette. Au final, qd c'est arrivé, bof quoi.

Romuald

23/03/2005, 13h27

1000x plus? Tu es sûr?

Parce que c'était bien au dessus de ce qu'il y avait à l'époque certes mais bon, je les vois mal annoncé 1000x plus.

The_ED

23/03/2005, 13h37

1000x plus? Tu es sûr?

Parce que c'était bien au dessus de ce qu'il y avait à l'époque certes mais bon, je les vois mal annoncé 1000x plus.

Je confirme, Sony a réedité le truc avec la PS3, y a 2-3 ans ils annoncaient que la PS2 serai 1000* plus puissante que la PS2

Romuald

23/03/2005, 13h43

Ils n'y vont pas de main morte dit donc. :D

Yasko

23/03/2005, 14h09

Ils n'y vont pas de main morte dit donc. :D

Pour la branlette, non, c'est mieux... :D

Dandu

23/03/2005, 14h45

The_ED

23/03/2005, 16h18

06/09/2002 12:12:00
La Playstation 3 devrait bel et bien arriver en 2005. Sony confirme ses intentions de faire une console 1000 fois plus puissante que la PlayStation 2. Les petites équipes de développement pourraient en faire les frais.

http://www.overgame.com/page/19560.htm

EDIT: la source originale est un article du financial time

PeGGaaSuSS

23/03/2005, 18h47

Je pense que c'est pas un problème, ils peuvent trouver un truc insignifiant qu'ils rendent 1000x plus puissant, et sa suffit pour justifier le 1000x plus puissants.
Sa veut absolument pas dire que la console seras 1000x meilleures !

Alexko

24/03/2005, 20h15

johnnyholzeisen

24/03/2005, 20h36

Childerik

24/03/2005, 23h46

Est-ce que le prix de vente est "quelque chose d'insignifiant" :whistle:

Quand je vois le prix des consoles (Xbox, PS2, GC) maintenant, je me dis qu'il faut être très passionné (et assez riche) pour les acheter avant leur sortie avec des frais de réservation.

S'il n'y avait que les consoles.

Au risque de paraitre vieux jeu, on peut toujours s'éclater sur un Game Cube alors que son prix est enfin raisonnable.

C'est pareil sur PC : il y a pleins de jeux sortis entre 1999 et 2003 qui sont très jouables sur une GF4, ou à la limite sur une R9700 Pro, sans se taper du 20-25 fps dès qu'on veut de la qualité d'affichage maximale. Ces jeux comme ces cartes sont maintenant abordables (d'occasion).

Ce qui ne va pas, c'est qu'une carte graphique, elle ne peut pas dépasser 1 an et demi sans être arrêtée niveau production. C'est sympa pour les petites bourses :sarcastic:. Les cartes censées les remplacer sont à nouveau coûteuses.

GutsBlack

30/06/2005, 12h39

Je n'arrive pas à comprendre quelque chose. Dans la PS2 on avait l'EE qui disposait bien sur de 2 unités vectorielles VU0 et VU1, néanmois il disposait aussi d'un MipsIII en guise de "processeur" à tout faire.

Dans la PS3 on a le GPU de nVidia et le "Cell" qui n'est pas vraiment un processeur mais plutôt une grosse unité de calcul mathématique/physique commandé par un pseudo PPC.

Mais le Cell peut-il être vraiment considéré comme un processeur ? J'ai plu l'impression que le "Cell" est plus proche d'un "PPU" (signés AGEI PhysX)

Quand pensez-vous ?

Dandu

30/06/2005, 12h44

jihef

30/06/2005, 12h50

Ce que je trouve étonnant, c'est que on doive adapter Linux (et pas qu'un peu, on dirait) pour fonctionner sur un Cell. hors Linux fonctionne depuis très longtemps sur PowerPC, donc ca devrait fonctionner sans problème sir le Cell est un PowerPC :heink:

enfin, il me semble que si un nouveau CPU x86 sort, on doit pas adapter le noyau, c'est compatible, non ?

Le cpu a beau etre compatible PPC mais l'architecture (machine pas CPU) doit changer d'un mac donc y'a forcement des trucs a faire. De plus y'a la gestion des SPE bien que je pense que Linux soit assez bien foutu pour que ca ne pose pas trop de problemes. Pour le x86 c'est sur mais l'architecture est celle du PC donc ca roule.

GutsBlack

30/06/2005, 12h52

Il possède une technologie proche mais il n'a rien à voir quand même. D'ailleurs regarde un x86, il y a bien des noyaux pour i386/686/k2/k7.

Donc toute les différences que le "Cell" apportent on été inclus dans le noyau afin de le supporter.

Une question interessente serais, combien de temps il leur a fallu pour le faire fonctionner dessus ?

Minuteman

30/06/2005, 12h56

Il possède un technologie proche mais il n'a rien à voir quand même. D'ailleurs regarde un x86, il y a bien des noyaux pour i386/686/k2/k7.

C'est pas tout à fait comparable, c'est "juste" des optimisations ça, un linux compilé en i386 tournera sur tout processeur x86 à partir du 386 sans adaptation.

Pour le cell je sais pas ce qu'ils ont dû faire par contre.

GutsBlack

30/06/2005, 13h07

A priori vu qu'IBM a envoyé les patchs et qu'il semblerait qu'il soit intégré dans le 2.6.13 (rappelons que nous somme au 2.6.12), apparemment c'est assez rapide, enfin ils ont peut-être commencer à écrire les patchs il y a 4 ans :)

Doc TB

30/06/2005, 13h13

Ce que je trouve étonnant, c'est que on doive adapter Linux (et pas qu'un peu, on dirait) pour fonctionner sur un Cell. hors Linux fonctionne depuis très longtemps sur PowerPC, donc ca devrait fonctionner sans problème sir le Cell est un PowerPC :heink:

enfin, il me semble que si un nouveau CPU x86 sort, on doit pas adapter le noyau, c'est compatible, non ?

Oui enfin l'adaptation est super simple hein, puisqu'elle se fait en une seule revision mineure de kernel. Cell = PowerPC. Par contre, les fameux DSP, c'est dans le cul :D

GutsBlack

30/06/2005, 13h19

Bah de toute façon je pense que le portage du "Cell" n'a pas été fait pour faire tourner un OS dessus mais plutôt pour lui faire faire des choses spécifiques avec comme base un noyau linux.

fefe

30/06/2005, 15h18

Les 2 articles d'anand y compris les leaks des mecs de Sony et Microsoft ont ete vires du Web, en copy paste un mirror :)

Slashdot: “Anandtech follows up their initial in-depth coverage of the Xbox 360 and PS3 CPU with the real truth about the next-gen consoles' Poor CPU Performance. From the article: "Speaking under conditions of anonymity with real world game developers who have had first hand experience writing code for both the Xbox 360 and PlayStation 3 hardware (and dev kits where applicable), we asked them for nothing more than their brutal honesty. What did they think of these new consoles? Are they really outfitted with the PC-eclipsing performance we've been lead to believe they have? The answer is actually quite frequently found in history; as with anything, you get what you pay for."

Article 2 -----------------------------------------------------------
In our last article we had a fairly open-ended discussion about many of the challenges facing both of the recently announced next-generation game consoles. We discussed misconceptions about the Cell processor and its ability to accelerate physics calculations, as well as touched on the GPUs of both platforms. In the end, both the Xbox 360 and the PlayStation 3 are much closer competitors than you would think based on first impressions.

The Xbox 360’s Xenon CPU features more general purpose cores than the PlayStation 3 (3 vs. 1), however game developers will most likely only be using one of those cores for the majority of their calculations, leveling the playing field considerably.

The Cell processor derives much of its power from its array of 7 SPEs (Synergistic Processing Elements), however as we discovered in our last article, their purpose is far more specialized than we had thought. Speaking with Epic Games’ head developer, Tim Sweeney, he provided a much more balanced view of what sorts of tasks could take advantage of the Cell’s SPE array.

The GPUs of the next-generation platforms also proved to be quite interesting. In Part I we speculated as to the true nature of NVIDIA’s RSX in the PS3, concluding that it’s quite likely little more than a higher clocked G70 GPU. We will expand on that discussion a bit more in this article. We also looked at Xenos, the Xbox 360’s GPU and characterized it as equivalent to a very flexible 24-pipe R420. Despite the inclusion of the 10MB of embedded DRAM, Xenos and RSX ended up being quite similar in our expectations for performance; and that pretty much summarized all of our findings - the two consoles, although implementing very different architectures, ended up being so very similar.

So we’ve concluded that the two platforms will probably end up performing very similarly, but there was one very important element excluded from the first article: a comparison to present-day PC architectures. The reason a comparison to PC architectures is important is because it provides an evaluation point to gauge the expected performance of these next-generation consoles. We’ve heard countless times that these new consoles would offer better gaming performance than anything we’ve had on the PC, or anything we would have for a matter of years. Now it’s time to actually put those claims to the test, and that’s exactly what we did.

Speaking under conditions of anonymity with real world game developers who have had first hand experience writing code for both the Xbox 360 and PlayStation 3 hardware (and dev kits where applicable), we asked them for nothing more than their brutal honesty. What did they think of these new consoles? Are they really outfitted with the PC-eclipsing performance we’ve been lead to believe they have? The answer is actually quite frequently found in history; as with anything, you get what you pay for.

Learning from Generation X

The original Xbox console marked a very important step in the evolution of gaming consoles - it was the first console that was little more than a Windows PC.

The original Xbox was basically a PC

It featured a 733MHz Pentium III processor with a 128KB L2 cache, paired up with a modified version of NVIDIA's nForce chipset (modified to support Intel's Pentium III bus instead of the Athlon XP it was designed for). The nForce chipset featured an integrated GPU, codenamed the NV2A, offering performance very similar to that of a GeForce3. The system had a 5X PC DVD drive and an 8GB IDE hard drive, and all of the controllers interfaced to the console using USB cables with a proprietary connector.

For the most part, game developers were quite pleased with the original Xbox. It offered them a much more powerful CPU, GPU and overall platform than anything had before. But as time went on, there were definitely limitations that developers ran into with the first Xbox.

One of the biggest limitations ended up being the meager 64MB of memory that the system shipped with. Developers had asked for 128MB and the motherboard even had positions silk screened for an additional 64MB, but in an attempt to control costs the final console only shipped with 64MB of memory.

Developers wanted more memory, but the first Xbox only shipped with 64MB

The next problem is that the NV2A GPU ended up not having the fill rate and memory bandwidth necessary to drive high resolutions, which kept the Xbox from being used as a HD console.

Although Intel outfitted the original Xbox with a Pentium III/Celeron hybrid in order to improve performance yet maintain its low cost, at 733MHz that quickly became a performance bottleneck for more complex games after the console's introduction.

The combination of GPU and CPU limitations made 30 fps a frame rate target for many games, while simpler titles were able to run at 60 fps. Split screen play on Halo would even stutter below 30 fps depending on what was happening on screen, and that was just a first-generation title. More experience with the Xbox brought creative solutions to the limitations of the console, but clearly most game developers had a wish list of things they would have liked to have seen in the Xbox successor. Similar complaints were levied against the PlayStation 2, but in some cases they were more extreme (e.g. its 4MB frame buffer).

Given that consoles are generally evolutionary, taking lessons learned in previous generations and delivering what the game developers want in order to create the next-generation of titles, it isn't a surprise to see that a number of these problems are fixed in the Xbox 360 and PlayStation 3.

One of the most important changes with the new consoles is that system memory has been bumped from 64MB on the original Xbox to a whopping 512MB on both the Xbox 360 and the PlayStation 3. For the Xbox, that's a factor of 8 increase, and over 12x the total memory present on the PlayStation 2.

The other important improvement with the next-generation of consoles is that the GPUs have been improved tremendously. With 6 - 12 month product cycles, it's no surprise that in the past 4 years GPUs have become much more powerful. By far the biggest upgrade these new consoles will offer, from a graphics standpoint, is the ability to support HD resolutions.

There are obviously other, less-performance oriented improvements such as wireless controllers and more ubiquitous multi-channel sound support. And with Sony's PlayStation 3, disc capacity goes up thanks to their embracing the Blu-ray standard.

The Xbox 360: two parts evolution, one part mistake?

But then we come to the issue of the CPUs in these next-generation consoles, and the level of improvement they offer. Both the Xbox 360 and the PlayStation 3 offer multi-core CPUs to supposedly usher in a new era of improved game physics and reality. Unfortunately, as we have found out, the desire to bring multi-core CPUs to these consoles was made a reality at the expense of performance in a very big way.

-------------------------------------------------------------

Problems with the Architecture

At the heart of both the Xenon and Cell processors is IBM’s custom PowerPC based core. We’ve discussed this core in our previous articles, but it is best characterized as being quite simple. The core itself is a very narrow 2-issue in-order execution core, featuring a 64KB L1 cache (32K instruction/32K data) and either a 1MB or 512KB L2 cache (for Xenon or Cell, respectively). Supporting SMT, the core can execute two threads simultaneously similar to a Hyper Threading enabled Pentium 4. The Xenon CPU is made up of three of these cores, while Cell features just one.

Each individual core is extremely small, making the 3-core Xenon CPU in the Xbox 360 smaller than a single core 90nm Pentium 4. While we don’t have exact die sizes, we’ve heard that the number is around 1/2 the size of the 90nm Prescott die.

Cell's PPE is identical to a single core in Xenon. The die area of the Cell processor is 221 mm^2, note how little space is occupied by the PPE - it is a very simple core.

IBM’s pitch to Microsoft was based on the peak theoretical floating point performance-per-dollar that the Xenon CPU would offer, and given Microsoft’s focus on cost savings with the Xbox 360, they took the bait.

While Microsoft and Sony have been childishly playing this flops-war, comparing the 1 TFLOPs processing power of the Xenon CPU to the 2 TFLOPs processing power of the Cell, the real-world performance war has already been lost.

Right now, from what we’ve heard, the real-world performance of the Xenon CPU is about twice that of the 733MHz processor in the first Xbox. Considering that this CPU is supposed to power the Xbox 360 for the next 4 - 5 years, it’s nothing short of disappointing. To put it in perspective, floating point multiplies are apparently 1/3 as fast on Xenon as on a Pentium 4.

The reason for the poor performance? The very narrow 2-issue in-order core also happens to be very deeply pipelined, apparently with a branch predictor that’s not the best in the business. In the end, you get what you pay for, and with such a small core, it’s no surprise that performance isn’t anywhere near the Athlon 64 or Pentium 4 class.

The Cell processor doesn’t get off the hook just because it only uses a single one of these horribly slow cores; the SPE array ends up being fairly useless in the majority of situations, making it little more than a waste of die space.

We mentioned before that collision detection is able to be accelerated on the SPEs of Cell, despite being fairly branch heavy. The lack of a branch predictor in the SPEs apparently isn’t that big of a deal, since most collision detection branches are basically random and can’t be predicted even with the best branch predictor. So not having a branch predictor doesn’t hurt, what does hurt however is the very small amount of local memory available to each SPE. In order to access main memory, the SPE places a DMA request on the bus (or the PPE can initiate the DMA request) and waits for it to be fulfilled. From those that have had experience with the PS3 development kits, this access takes far too long to be used in many real world scenarios. It is the small amount of local memory that each SPE has access to that limits the SPEs from being able to work on more than a handful of tasks. While physics acceleration is an important one, there are many more tasks that can’t be accelerated by the SPEs because of the memory limitation.

The other point that has been made is that even if you can offload some of the physics calculations to the SPE array, the Cell’s PPE ends up being a pretty big bottleneck thanks to its overall lackluster performance. It’s akin to having an extremely fast GPU but without a fast CPU to pair it up with.

-------------------------------------------------

What About Multithreading?

We of course asked the obvious question: would game developers rather have 3 slow general purpose cores, or one of those cores paired with an array of specialized SPEs? The response was unanimous, everyone we have spoken to would rather take the general purpose core approach.

Citing everything from ease of programming to the limitations of the SPEs we mentioned previously, the Xbox 360 appears to be the more developer-friendly of the two platforms according to the cross-platform developers we've spoken to. Despite being more developer-friendly, the Xenon CPU is still not what developers wanted.

The most ironic bit of it all is that according to developers, if either manufacturer had decided to use an Athlon 64 or a Pentium D in their next-gen console, they would be significantly ahead of the competition in terms of CPU performance.

While the developers we've spoken to agree that heavily multithreaded game engines are the future, that future won't really take form for another 3 - 5 years. Even Microsoft admitted to us that all developers are focusing on having, at most, one or two threads of execution for the game engine itself - not the four or six threads that the Xbox 360 was designed for.

Even when games become more aggressive with their multithreading, targeting 2 - 4 threads, most of the work will still be done in a single thread. It won't be until the next step in multithreaded architectures where that single thread gets broken down even further, and by that time we'll be talking about Xbox 720 and PlayStation 4. In the end, the more multithreaded nature of these new console CPUs doesn't help paint much of a brighter performance picture - multithreaded or not, game developers are not pleased with the performance of these CPUs.

What about all those Flops?

The one statement that we heard over and over again was that Microsoft was sold on the peak theoretical performance of the Xenon CPU. Ever since the announcement of the Xbox 360 and PS3 hardware, people have been set on comparing Microsoft's figure of 1 trillion floating point operations per second to Sony's figure of 2 trillion floating point operations per second (TFLOPs). Any AnandTech reader should know for a fact that these numbers are meaningless, but just in case you need some reasoning for why, let's look at the facts.

First and foremost, a floating point operation can be anything; it can be adding two floating point numbers together, or it can be performing a dot product on two floating point numbers, it can even be just calculating the complement of a fp number. Anything that is executed on a FPU is fair game to be called a floating point operation.

Secondly, both floating point power numbers refer to the whole system, CPU and GPU. Obviously a GPU's floating point processing power doesn't mean anything if you're trying to run general purpose code on it and vice versa. As we've seen from the graphics market, characterizing GPU performance in terms of generic floating point operations per second is far from the full performance story.

Third, when a manufacturer is talking about peak floating point performance there are a few things that they aren't taking into account. Being able to process billions of operations per second depends on actually being able to have that many floating point operations to work on. That means that you have to have enough bandwidth to keep the FPUs fed, no mispredicted branches, no cache misses and the right structure of code to make sure that all of the FPUs can be fed at all times so they can execute at their peak rates. We already know that's not the case as game developers have already told us that the Xenon CPU isn't even in the same realm of performance as the Pentium 4 or Athlon 64. Not to mention that the requirements for hitting peak theoretical performance are always ridiculous; caches are only so big and thus there will come a time where a request to main memory is needed, and you can expect that request to be fulfilled in a few hundred clock cycles, where no floating point operations will be happening at all.

So while there may be some extreme cases where the Xenon CPU can hit its peak performance, it sure isn't happening in any real world code.

The Cell processor is no different; given that its PPE is identical to one of the PowerPC cores in Xenon, it must derive its floating point performance superiority from its array of SPEs. So what's the issue with 218 GFLOPs number (2 TFLOPs for the whole system)? Well, from what we've heard, game developers are finding that they can't use the SPEs for a lot of tasks. So in the end, it doesn't matter what peak theoretical performance of Cell's SPE array is, if those SPEs aren't being used all the time.

Don't stare directly at the flops, you may start believing that they matter.

Another way to look at this comparison of flops is to look at integer add latencies on the Pentium 4 vs. the Athlon 64. The Pentium 4 has two double pumped ALUs, each capable of performing two add operations per clock, that's a total of 4 add operations per clock; so we could say that a 3.8GHz Pentium 4 can perform 15.2 billion operations per second. The Athlon 64 has three ALUs each capable of executing an add every clock; so a 2.8GHz Athlon 64 can perform 8.4 billion operations per second. By this silly console marketing logic, the Pentium 4 would be almost twice as fast as the Athlon 64, and a multi-core Pentium 4 would be faster than a multi-core Athlon 64. Any AnandTech reader should know that's hardly the case. No code is composed entirely of add instructions, and even if it were, eventually the Pentium 4 and Athlon 64 will have to go out to main memory for data, and when they do, the Athlon 64 has a much lower latency access to memory than the P4. In the end, despite what these horribly concocted numbers may lead you to believe, they say absolutely nothing about performance. The exact same situation exists with the CPUs of the next-generation consoles; don't fall for it.

------------------------------------------------------

Why did Sony/MS do it?

For Sony, it doesn't take much to see that the Cell processor is eerily similar to the Emotion Engine in the PlayStation 2, at least conceptually. Sony clearly has an idea of what direction they would like to go in, and it doesn't happen to be one that's aligned with much of the rest of the industry. Sony's past successes have really come, not because of the hardware, but because of the developers and their PSX/PS2 exclusive titles. A single hot title can ship millions of consoles, and by our count, Sony has had many more of those than Microsoft had with the first Xbox.

Sony shipped around 4 times as many PlayStation 2 consoles as Microsoft did Xboxes, regardless of the hardware platform, a game developer won't turn down working with the PS2 - the install base is just that attractive. So for Sony, the Cell processor may be strange and even undesirable for game developers, but the developers will come regardless.

The real surprise was Microsoft; with the first Xbox, Microsoft listened very closely to the wants and desires of game developers. This time around, despite what has been said publicly, the Xbox 360's CPU architecture wasn't what game developers had asked for.

They wanted a multi-core CPU, but not such a significant step back in single threaded performance. When AMD and Intel moved to multi-core designs, they did so at the expense of a few hundred MHz in clock speed, not by taking a step back in architecture.

We suspect that a big part of Microsoft's decision to go with the Xenon core was because of its extremely small size. A smaller die means lower system costs, and if Microsoft indeed launches the Xbox 360 at $299 the Xenon CPU will be a big reason why that was made possible.

Another contributing factor may be the fact that Microsoft wanted to own the IP of the silicon that went into the Xbox 360. We seriously doubt that either AMD or Intel would be willing to grant them the right to make Pentium 4 or Athlon 64 CPUs, so it may have been that IBM was the only partner willing to work with Microsoft's terms and only with this one specific core.

Regardless of the reasoning, not a single developer we've spoken to thinks that it was the right decision.

---------------------------------------------------------

The Saving Grace: The GPUs

Although both manufacturers royally screwed up their CPUs, all developers have agreed that they are quite pleased with the GPU power of the next-generation consoles.

First, let's talk about NVIDIA's RSX in the PlayStation 3. We discussed the possibility of RSX offloading vertex processing onto the Cell processor, but more and more it seems that isn't the case. It looks like the RSX will basically be a 90nm G70 with Turbo Cache running at 550MHz, and the performance will be quite good.

One option we didn't discuss in the last article, was that the G70 GPU may feature a number of disabled shader pipes already to improve yield. The move to 90nm may allow for those pipes to be enabled and thus allowing for another scenario where the RSX offers higher performance at the same transistor count as the present-day G70. Sony may be hesitant to reveal the actual number of pixel and vertex pipes in the RSX because honestly they won't know until a few months before mass production what their final yields will be.

Despite strong performance and support for 1080p, a large number of developers are targeting 720p for their PS3 titles and won't support 1080p. Those that are simply porting current-generation games over will have no problems running at 1080p, but anyone working on a truly next-generation title won't have the fill rate necessary to render at 1080p.

Another interesting point is that despite its lack of "free 4X AA" like the Xbox 360, in some cases it won't matter. Titles that use longer pixel shader programs end up being bound by pixel shader performance rather than memory bandwidth, so the performance difference between no AA and 2X/4X AA may end up being quite small. Not all titles will push the RSX to the limits however, and those titles will definitely see a performance drop with AA enabled. In the end, whether the RSX's lack of embedded DRAM matters will be entirely dependent on the game engine being developed for the platform. Games that make more extensive use of long pixel shaders will see less of an impact with AA enabled than those that are more texture bound. Game developers are all over the map on this one, so it wouldn't be fair to characterize all of the games as falling into one category or another.

ATI's Xenos GPU is also looking pretty good and most are expecting performance to be very similar to the RSX, but real world support for this won't be ready for another couple of months. Developers have just recently received more final Xbox 360 hardware, and gauging performance of the actual Xenos GPU compared to the R420 based solutions in the G5 development kits will take some time. Since the original dev kits offered significantly lower performance, developers will need a bit of time to figure out what realistic limits the Xenos GPU will have.

--------------------------------------------------------

Final Words

Just because these CPUs and GPUs are in a console doesn't mean that we should throw away years of knowledge from the PC industry - performance doesn't come out of thin air, and peak performance is almost never achieved. Clever marketing however, will always try to fool the consumer.

And that's what we have here today, with the Xbox 360 and PlayStation 3. Both consoles are marketed to be much more powerful than they actually are, and from talking to numerous game developers it seems that the real world performance of these platforms isn't anywhere near what it was supposed to be.

It looks like significant advancements in game physics won't happen on consoles for another 4 or 5 years, although it may happen with PC games much before that.

It's not all bad news however; the good news is that both GPUs are quite possibly the most promising part of the new consoles. With the performance that we have seen from NVIDIA's G70, we have very high expectations for the 360 and PS3. The ability to finally run at HD resolutions in all games will bring a much needed element to console gaming.

And let's not forget all of the other improvements to these next-generation game consoles. The CPUs, despite being relatively lackluster, will still be faster than their predecessors and increased system memory will give developers more breathing room. Then there are other improvements such as wireless controllers, better online play and updated game engines that will contribute to an overall better gaming experience.

In the end, performance could be better, the consoles aren't what they could have been had the powers at be made some different decisions. While they will bring better quality games to market and will be better than their predecessors, it doesn't look like they will be the end of PC gaming any more than the Xbox and PS2 were when they were launched. The two markets will continue to coexist, with consoles being much easier to deal with, and PCs offering some performance-derived advantages.

With much more powerful CPUs and, in the near future, more powerful GPUs, the PC paired with the right developers should be able to bring about that revolution in game physics and graphics we've been hoping for. Consoles will help accelerate the transition to multithreaded gaming, but it looks like it will take PC developers to bring about real change in things like game physics, AI and other non-visual elements of gaming.

Article 1 -----------------------------------------------------------

The initial coverage (pulled of too)

The point of a gaming console is to play games. The PC user in all of us wants to benchmark, overclock and upgrade even the unreleased game consoles that were announced at E3, but we can't. And these sorts of limits are healthy, because it lets us have a system that we don't tinker with, that simply performs its function and that is to play games.

The game developers are the ones that have to worry about which system is faster, whose hardware is better and what that means for the games they develop, but to us, the end users, whether the Xbox 360 has a faster GPU or the PlayStation 3's CPU is the best thing since sliced bread doesn't really matter. At the end of the day, it is the games and the overall experience that will sell both of these consoles. You can have the best hardware in the world, but if the games and the experience aren't there, it doesn't really matter.

Despite what we've just said, there is a desire to pick these new next-generation consoles apart. Of course if the games are all that matter, why even bother comparing specs, claims or anything about these next-generation consoles other than games? Unfortunately, the majority of that analysis seems to be done by the manufacturers of the consoles, and fed to the users in an attempt to win early support, and quite a bit of it is obviously tainted.

While we would've liked this to be an article on all three next-generation consoles, the Xbox 360, PlayStation 3 and Revolution, the fact of the matter is that Nintendo has not released any hardware details about their next-gen console, meaning that there's nothing to talk about at this point in time. Leaving us with two contenders: Microsoft's Xbox 360, due out by the end of this year, and Sony's PlayStation 3 due out in Spring 2006.

This article isn't here to crown a winner or to even begin to claim which platform will have better games, it is simply here to answer questions we all have had as well as discuss these new platforms in greater detail than we have before.

Before proceeding with this article, there's a bit of required reading to really get the most out of it. We strongly suggest reading through our Cell processor article [anandtech.com], as well as our launch coverage of the PlayStation 3 [anandtech.com]. We would also suggest reading through our Xbox [anandtech.com] 360 [anandtech.com] articles [anandtech.com] for background on Microsoft's console, as well as an earlier piece published on multi-threaded game development [anandtech.com]. Finally, be sure that you're fully up to date on the latest GPUs [anandtech.com], especially the recently announced NVIDIA GeForce 7800 GTX [anandtech.com] as it is very closely related to the graphics processor in the PS3.

This article isn't a successor to any of the aforementioned pieces, it just really helps to have an understanding of everything we've covered before - and since we don't want this article to be longer than it already is, we'll just point you back there to fill in the blanks if you find that there are any.

Now, on to the show...

A Prelude on Balance

The most important goal of any platform is balance on all levels. We've seen numerous examples of what architectural imbalances can do to performance, having too little cache or too narrow of a FSB can starve high speed CPUs of data they need to perform. GPUs without enough memory bandwidth can't perform anywhere near their peak fillrates, regardless of what they may be. Achieving a balanced overall platform is a very difficult thing on the PC, unless you have an unlimited budget and are able to purchase the fastest components. Skimping on your CPU while buying the most expensive graphics card may leave you with performance that's marginally better, or worse, than someone else with a more balanced system with a faster CPU and a somewhat slower GPU.

With consoles however, the entire platform is designed to be balanced out of the box, as best as the manufacturer can get it to be, while still remaining within the realm of affordability. The manufacturer is responsible for choosing bus widths, CPU architectures, memory bandwidths, GPUs, even down to the type of media that will be used by the system - and most importantly, they make sure that all elements of the system are as balanced as can be.

The reason this article starts with a prelude on balance is because you should not expect either console maker to have put together a horribly imbalanced machine. A company who is already losing money on every console sold, will never put faster hardware in that console if it isn't going to be utilized thanks to an imbalance in the platform. So you won't see an overly powerful CPU paired with a fill-rate limited GPU, and you definitely won't see a lack of bandwidth to inhibit performance. What you will see is a collection of tools that Microsoft and Sony have each, independently, put together for the game developer. Each console has its strengths and its weaknesses, but as a whole, each console is individually very well balanced. So it would be wrong to say that the PlayStation 3's GPU is more powerful than the Xbox 360's GPU, because you can't isolate the two and compare them in a vacuum, how they interact with the CPU, with memory, etc... all influences the overall performance of the platform.

The Consoles and their CPUs

The CPUs at the heart of these two consoles are very different in architecture approach, despite sharing some common parts. The Xbox 360's CPU, codenamed Xenon, takes a general purpose approach to microprocessor design and implements three general purpose PowerPC cores, meaning they can execute any type of code and will do it relatively well.

The PlayStation 3's CPU, the Cell processor, pairs a general purpose PowerPC Processing Element (PPE, very similar to one core from Xenon) with 7 working Synergistic Processing Elements (SPEs) that are more specialized hardware designed to execute certain types of code.

So the comparison between Xenon and Cell really boils down to a comparison between a general purpose microprocessor, and a hybrid of general purpose and specialized hardware.

Despite what many have said, there is support for Sony's approach with Cell. We have discussed, in great detail, the architecture of the Cell processor already but there is industry support for a general purpose + specialized hardware CPU design. Take note of the following slide from Intel's Platform 2015 vision for their CPUs by the year 2015:

The use of one or two large general purpose cores combined with specialized hardware and multiple other smaller cores is in Intel's roadmap for the future, despite their harsh criticism of the Cell processor. The difference is that Cell appears to be far too early for its time. By 2015 CPUs may be manufactured on as small as a 32nm process, significantly smaller than today's 90nm process, meaning that a lot more hardware can be packed into the same amount of space. In going with a very forward-looking design, the Cell processor architects inevitably had to make sacrifices to deal with the fact that the chip they wanted to design is years ahead of its time for use in general computation.

Introducing the Xbox 360's Xenon CPU

The Xenon processor was designed from the ground up to be a 3-core CPU, so unlike Cell, there are no disabled cores on the Xenon chip itself in order to improve yield. The reason for choosing 3 cores is because it provides a good balance between thread execution power and die size. According to Microsoft's partners, the sweet spot for this generation of consoles will be between 4 and 6 execution threads, which is where the 3-core CPU came from.

The chip is built on a 90nm process, much like Cell, and will run at 3.2GHz - also like Cell. All of the cores are identical to one another, and they are very similar to the PPE used in the Cell microprocessor, with a few modifications.

The focus of Microsoft's additions to the core has been in the expansion of the VMX instruction set. In particular, Microsoft now includes a single cycle dot-product instruction as a part of the VMX-128 ISA that is implemented on each core. Microsoft has stated that there is nothing stopping IBM from incorporating this support into other chips, but as of yet we have not seen anyone from the Cell camp claim support for single cycle dot-products on the PPE.

The three cores share a meager 1MB L2 cache, which should be fine for single threaded games but as developers migrate more to multi-threaded engines, this small cache will definitely become a performance limiter. With each core being able to execute two threads simultaneously, you effectively have a worst case scenario of 6 threads splitting a 1MB L2 cache. As a comparison, the current dual core Pentium 4s have a 1MB L2 cache per core and that number is only expected to rise in the future.

The most important selling point of the Xbox 360's Xenon core is the fact that all three cores are identical, and they are all general purpose microprocessors. The developer does not have to worry about multi-threading beyond the point of getting their code to be thread safe; once it is multi-threaded, it can easily be run on any of the cores. The other important thing to keep in mind here is that porting between multi-core PC platforms and the Xbox 360 will be fairly trivial. Anywhere any inline assembly is used there will obviously have to be changes, but with relatively minor code changes and some time optimizing, code portability between the PC and the Xbox 360 shouldn't be very difficult at all. For what it is worth, porting game code between the PC and the Xbox 360 will be a lot like Mac developers porting code between Mac OS X for Intel platforms and PowerPC platforms: there's an architecture switch, but the programming model doesn't change much.

The same cannot however be said for Cell and the PlayStation 3. The easiest way to port code from the Xbox 360 to the PS3 would be to run the code exclusively on the Cell's single PPE, which obviously wouldn't offer very good performance for heavily multi-threaded titles. But with a some effort, the PlayStation 3 does have a lot of potential.

Xenon vs. Cell

The first public game demo on the PlayStation 3 was Epic Games' Unreal Engine 3 at Sony's PS3 press conference. Tim Sweeney, the founder and UE3 father of Epic, performed the demo and helped shed some light on how multi-threading can work on the PlayStation 3.

According to Tim, a lot of things aren't appropriate for SPE acceleration in UE3, mainly high-level game logic, artificial intelligence and scripting. But he adds that "Fortunately these comprise a small percentage of total CPU time on a traditional single-threaded architecture, so dedicating the CPU to those tasks is appropriate, while the SPE's and GPU do their thing."

So what does Tim Sweeney see the SPEs being used for in UE3? "With UE3, our focus on SPE acceleration is on physics, animation updates, particle systems, sound; a few other areas are possible but require more experimentation."

Tim's view on the PPE/SPE split in Cell is far more balanced than most we've encountered. There are many who see the SPEs as utterly useless for executing anything (we'll get to why in a moment), while there are others who have been talking about doing far too much on SPEs where the general purpose PPE would do much better.

For the most part, the areas that UE3 uses the Cell's SPEs for are fairly believable. For example, sound processing makes a lot of sense for the SPEs given their rather specialized architecture aimed at streaming tasks. But the one curious item is the focus on using SPEs to accelerate physics calculations, especially given how branch heavy physics calculations generally are.

Collision detection is a big part of what is commonly referred to as "game physics." As the name implies, collision detection simply refers to the game engine determining when two objects collide. Without collision detection, bullets would never hit your opponents and your character would be able to walk through walls, cars, etc... among other things.

One method of implementing collision detection in a game is through the use of a Binary Search Partitioning (BSP) tree. BSP trees are created by organizing lists of polygons into a binary tree. The structure of the tree itself doesn't matter to this discussion, but the important thing to keep in mind is that to traverse a BSP tree in order to test for a collision between some object and a polygon in the tree you have to perform a lot of comparisons. You first traverse the tree finding to find the polygon you want to test for a collision against. Then you have to perform a number of checks to see whether a collision has occurred between the object you're comparing and the polygon itself. This process involves a lot of conditional branching, code which likes to be run on a high performance OoO core with a very good branch predictor.

Unfortunately, the SPEs have no branch prediction, so BSP tree traversal will tie up an SPE for quite a bit of time while not performing very well as each branch condition has to be evaluated before execution can continue. However it is possible to structure collision detection for execution on the SPEs, but it would require a different approach to the collision detection algorithms than what would be normally implemented on a PC or Xbox 360.

We're still working on providing examples of how it is actually done, but it's tough getting access to detailed information at this stage given that a number of NDAs are still in place involving Cell development for the PS3. Regardless of how it is done, obviously the Epic team found the SPEs to be a good match for their physics code, if structured properly, meaning that the Cell processor isn't just one general purpose core with 7 others that go unused.

In fact, if properly structured and coded for SPE acceleration, physics code could very well run faster on the PlayStation 3 than on the Xbox 360 thanks to the more specialized nature of the SPE hardware. Not to mention that physics acceleration is particularly parallelizable, making it a perfect match for an array of 7 SPEs.

Microsoft has referred to the Cell's array of SPEs as a bunch of DSPs useless to game developers. The fact that the next installment of the Unreal engine will be using the Cell's SPEs for physics, animation updates, particle systems as well as audio processing means that Microsoft's definition is a bit off. While not all developers will follow in Epic's footsteps, those that wish to remain competitive and get good performance out of the PS3 will have to.

The bottom line is that Sony would not foolishly spend over 75% of their CPU die budget on SPEs to use them for nothing more than fancy DSPs. Architecting a game engine around Cell and optimizing for SPE acceleration will take more effort than developing for the Xbox 360 or PC, but it can be done. The question then becomes, will developers do it?

In Johan's Quest for More Processing Power series he looked at the developmental limitations of multi-threading, especially as they applied to games. The end result is that multi-threaded game development takes between 2 and 3 times longer than conventional single-threaded game development, to add additional time in order to restructure elements of your engine to get better performance on the PS3 isn't going to make the transition any easier on developers.

Why In-Order?

Ever since the Pentium Pro, desktop PC microprocessors have implemented Out of Order (OoO) execution architectures in order to improve performance. We've explained the idea in great detail before, but the idea is that an Out-of-Order microprocessor can reorganize its instruction stream in order to best utilize its execution resources. Despite the simplicity of its explanation, implementing support for OoO dramatically increases the complexity of a microprocessor, as well as drives up power consumption.

In a perfect world, you could group a bunch of OoO cores on a single die and offer both excellent single threaded performance, as well as great multi-threaded performance. However, the world isn't so perfect, and there are limitations to how big a processor's die can be. Intel and AMD can only fit two of their OoO cores on a 90nm die, yet the Xbox 360 and PlayStation 3 targeted 3 and 9 cores, respectively, on a 90nm die; clearly something has to give, and that something happened to be the complexity of each individual core.

Given a game console's 5 year expected lifespan, the decision was made (by both MS and Sony) to favor a multi-core platform over a faster single-core CPU in order to remain competitive towards the latter half of the consoles' lifetime.

So with the Xbox 360 Microsoft used three fairly simple IBM PowerPC cores, while Sony has the much publicized Cell processor in their PlayStation 3. Both will perform absolutely much slower than even mainstream desktop processors in single threaded game code, but the majority of games these days are far more GPU bound than CPU bound, so the performance decrease isn't a huge deal. In the long run, with a bit of optimization and running multi-threaded game engines, these collections of simple in-order cores should be able to put out some fairly good performance.

Does In-Order Matter?

As we discussed in our Cell article [anandtech.com], in-order execution makes a lot of sense for the SPEs. With in-order execution as well as a small amount of high speed local memory, memory access becomes quite predictable and code is very easily scheduled by the compiler for the SPEs. However, for the PPE in Cell, and the PowerPC cores in Xenon, the in-order approach doesn't necessarily make a whole lot of sense. You don't have the advantage of a cacheless architecture, even though you do have the ability to force certain items to remain untouched by the cache. More than anything having an in-order general purpose core just works to simplify the core, at the expense of depending quite a bit on the compiler, and the programmer, to optimize performance.

Very little of modern day games is written in assembly, most of it is written in a high level language like C or C++ and the compiler does the dirty work of optimizing the code and translating it into low level assembly. Compilers are horrendously difficult to write; getting a compiler to work is a pretty difficult job in itself, but getting one to work well, regardless of what the input code is, is nearly impossible.

However, with a properly designed ISA and a good compiler, having an in-order core to work on is not the end of the world. The performance you lose by not being able to extract the last bit of instruction level parallelism is made up by the fact that you can execute far more threads per clock thanks to the simplicity of the in-order cores allowing more to be packed on a die. Unfortunately, as we've already discussed, on day one that's not going to be much of an advantage.

The Cell processor's SPEs are even more of a challenge, as they are more specialized hardware only suitable to executing certain types of code. Keeping in mind that the SPEs are not well suited to running branch heavy code, loop unrolling will do a lot to improve performance as it can significantly reduce the number of branches that must be executed. In order to squeeze the absolute maximum amount of performance out of the SPEs, developers may be forced to hand code some routines as initial performance numbers for optimized, compiled SPE code appear to be far less than their peak throughput.

While the move to in-order architectures won't cause game developers too much pain with good compilers at their disposal, the move to multi-threaded game development and optimizing for the Cell in general will be much more challenging.

How Many Threads?

Earlier this year we saw the beginning of a transition from very fast, single core microprocessors to slower, multi-core designs on the PC desktop. The full transition won't be complete for another couple of years, but just as it has begun on the desktop PC side, it also has begun in the next-generation of consoles.

Remember that consoles must have a lifespan of around 5 years, so even if the multithreaded transition isn't going to happen with games for another 2 years, it is necessary for these consoles to be built around multi-core processors to support the ecosystem when that transition occurs.

The problem is that today, all games are single threaded, meaning that in the case of the Xbox 360, only one out of its three cores would be utilized when running present day game engines. The PlayStation 3 would fair no better, as the Cell CPU has a very similar general purpose execution core to one of the Xbox 360 cores. The reason this is a problem is because these general purpose cores that make up the Xbox 360's Xenon CPU or the single general purpose PPE in Cell are extremely weak cores, far slower than a Pentium 4 or Athlon 64, even running at much lower clock speeds.

Looking at the Xbox 360 and PlayStation 3, we wondered if game developers would begin their transition to multithreaded engines with consoles and eventually port them to PCs. While the majority of the PC installed base today still runs on single-core processors, the install base for both the Xbox 360 and PS3 will be guaranteed to be multi-core, so what better platform to introduce a multithreaded game engine than the new consoles where you can guarantee that all of your users will be able to take advantage of the multithreading.

On the other hand, looking at all of the early demos we've seen of Xbox 360 and PS3 games, not a single one appears to offer better physics or AI than the best single threaded games on the PC today. At best, we've seen examples of ragdoll physics similar to that of Half Life 2, but nothing that is particularly amazing, earth shattering or shocking. Definitely nothing that appears to be leveraging the power of a multicore processor.

In fact, all of the demos we've seen look like nothing more than examples of what you can do on the latest generation of GPUs - not showcases of multi-core CPU power. So we asked Microsoft, expecting to get a fluffy answer about how all developers would be exploiting the 6 hardware threads supported by Xenon, instead we got a much more down to earth answer.

The majority of developers are doing things no differently than they have been on the PC. A single thread is used for all game code, physics and AI and in some cases, developers have split out physics into a separate thread, but for the most part you can expect all first generation and even some second generation titles to debut as basically single threaded games. The move to two hardware execution threads may in fact only be an attempt to bring performance up to par with what can be done on mid-range or high-end PCs today, since a single thread running on Xenon isn't going to be very competitive performance wise, especially executing code that is particularly well suited to OoO desktop processors.

With Microsoft themselves telling us not to expect more than one or two threads of execution to be dedicated to game code, will the remaining two cores of the Xenon go unused for the first year or two of the Xbox 360's existence? While the remaining cores won't directly be used for game performance acceleration, they won't remain idle - enter the Xbox 360's helper threads.

The first time we discussed helper threads on AnandTech was in reference to additional threads, generated at runtime, that could use idle execution resources to go out and prefetch data that the CPU would eventually need.

The Xbox 360 will use a few different types of helper threads to not only make the most out of the CPU's performance, but to also help balance the overall platform. Keep in mind that with the 360, Microsoft has not increased the size of the media that games will be stored on. The dual layer DVD-9 spec is still in effect, meaning that game developers shipping titles for the Xbox 360 in 2006 will have the same amount of storage space as they did back in 2001. Given that current Xbox titles generally use around 4.5GB of space, it's not a big deal, but by 2010 9GB may feel a bit tight.

Thanks to idle execution power in the 3-core Xenon, developers can now perform real-time decompression of game data in order to maximize storage space. Given that a big hunk of disc space is used by audio and video, being able to use more sophisticated compression algorithms for both types of data will also help maximize that 9GB of storage. Or, if space isn't as much of a concern, developers are now able to use more sophisticated encoding algorithms to encode audio/video to use the same amount of space as they are today, but achieve much higher quality audio and video. Microsoft has already stated that in game video will essentially use the WMV HD codec. The real time decompression of audio/video will be another use for the extra power of the system.

Another interesting use will be digital audio encoding; in the original Xbox Microsoft used a relatively expensive DSP featured in the nForce south bridge to perform real-time Dolby Digital Encoding. The feature allowed Microsoft to offer a single optical out on the Xbox's HD AV pack, definitely reducing cable clutter and bringing 5.1 channel surround sound to the game console. This time around, DD encoding can be done as a separate thread on the Xenon CPU - in real time. It reduces the need for Microsoft to purchase a specialized DSP from another company, and greatly simplifies the South Bridge in the Xbox 360.

But for the most part, on day 1, you shouldn't expect Xbox 360 games to be much more than the same type of single threaded titles we've had on the PC. In fact, the biggest draw to the new consoles will be the fact that for the first time, we will have the ability to run games rendered internally at 1280 x 720 on a game console. In other words, round one of the next generation of game consoles is going to be a GPU battle.

The importance of this fact is that Microsoft has been talking about the general purpose execution power of the Xbox 360 and how it is 3 times that of the PS3's Cell processor. With only 1 - 2 threads of execution being dedicated for game code, the advantage is pretty much lost at the start of the console battle.

Sony doesn't have the same constraints that Microsoft does, and thus there is less of a need to perform real time decompression of game content. Keep in mind that the PS3 will ship with a Blu-ray drive, with Sony's minimum disc spec being a hefty 23.3GB of storage for a single layer Blu-ray disc. The PS3 will also make use of H.264 encoding for all video content, the decoding of which is perfectly suited for the Cell's SPEs. Audio encoding will also be done on the SPEs, once again as there is little need to use any extra hardware to perform a task that is perfectly suited for the SPEs.

The Xbox 360 GPU: ATI's Xenos

On a purely hardware level, ATI's Xbox 360 GPU (codenamed Xenos) is quite interesting. The part itself is made up of two physically distinct silicon ICs. One IC is the GPU itself, which houses all the shader hardware and most of the processing power. The second IC (which ATI refers to as the "daughter die") is a 10MB block of embedded DRAM (eDRAM) combined with the hardware necessary for z and stencil operations, color and alpha processing, and anti aliasing. This daughter die is connected to the GPU proper via a 32GB/sec interconnect. Data sent over this bus will be compressed, so usable bandwidth will be higher than 32GB/sec. In side the daughter die, between the processing hardware and the eDRAM itself, bandwidth is 256GB/sec.

At this point in time, much of the bandwidth generated by graphics hardware is required to handle color and z data moving to the framebuffer. ATI hopes to eliminate this as a bottleneck by moving this processing and the back framebuffer off the main memory bus. The bus to main memory is 512MB of 128-bit 700MHz GDDR3 (which results in just over 22GB/sec of bandwidth). This is less bandwidth than current desktop graphics cards have available, but by offloading work and bandwidth for color and z to the daughter die, ATI saves themselves a good deal of bandwidth. The 22GB/sec is left for textures and the rest of the system (the Xbox implements a single pool of unified memory).

The GPU essentially acts as the Northbridge for the system, and sits in the middle of everything. From the graphics hardware, there is 10.8GB/sec of bandwidth up and down to the CPU itself. The rest of the system is hooked in with 500MB/sec of bandwidth up and down. The high bandwidth to the CPU is quite useful as the GPU is able to directly read from the L2 cache. In the console world, the CPU and GPU are quite tightly linked and the Xbox 360 stands to continue that tradition.

Weighing in at 332M transistors, the Xbox 360 GPU is quite a powerful part, but its architecture differs from that of current desktop graphics hardware. For years, vertex and pixel shader hardware have been implemented separately, but ATI has sought to combine their functionality in a unified shader architecture.

What's A Unified Shader Architecture?

The GPU in the Xbox 360 uses a different architecture than we are used to seeing. To be sure, vertex and pixel shader programs will run on the part, but not on separate segments of the hardware. Vertex and pixel processing differ in purpose, but there is quite a bit of overlap in the type of hardware needed to do both. The unified shader architecture that ATI chose to use in their Xbox 360 GPU allows them to pack more functionality onto fewer transistors as less hardware needs to be duplicated for use in different parts of the chip and will run both vertex and shader programs on the same hardware.

There are 3 parallel groups of 16 shader units each. Each of the three groups can either operate on vertex or pixel data. Each shader unit is able to perform one 4 wide vector operation and 1 scalar operation per clock cycle. Current ATI hardware is able to perform two 3 wide vector and two scalar operations per cycle in the pixel pipe alone. The vertex pipeline of R420 is 6 wide and can do one vector 4 and one scalar op per cycle. If we look at straight up processing power, this gives R420 the ability to crunch 158 components (30 of which are 32bit and 128 are limited to 24bit precision). The Xbox GPU is able to crunch 240 32bit components in its shader units per clock cycle. Where this is a 51% increase in the number of ops that can be done per cycle (as well as a general increase in precision), we can't expect these 48 piplines to act like 3 sets of R420 pipelines. All things being equal, this increase (when only looking at ops/cycle) would be only as powerful as a 24 piped R420.

What will make or break the difference between something like a 24 piped R420 and the unified shaders of the Xbox GPU is how well applications will lend themselves to the adaptive nature of the hardware. Current configurations don't have nearly the same vertex processing power as they do pixel processing power. This is quite logical when we consider the fact that games have many more pixels displayed than vertices. For each geometry primitive, there are likely a good number of pixels involved. Of course, not all titles will need the same ratio of geometry to pixel power. This means that all the ops per clock could either be dedicated to geometry processing in truly polygon intense scenes. On the flip side (and more likely), any given clock cycle could see all 240 ops being used for pixel processing. If game designers realize this and code their shaders accordingly, we could see much more focused processing power dedicated to a single type of problem than on current hardware.

ATI is predicting that developers will use lots of very small triangles in Xbox 360 games. As engines like Epic's Unreal Engine 3 have shown incredible results using pixel shaders and normal maps to augment low geometric detail, we can't tell if ATI is trying to provide the chicken or the egg. In other words, will we see many small triangles on Xbox 360 because console developers are moving in that direction or because that is what will run well on ATI's hardware?

Regardless of the paths that lead to this road, it is obvious that the Xbox 360 will be a geometry power house. Not only are all 3 blocks of 16 shaders able to become vertex shaders, but ATI's GPU will be able to handle twice as many z operations if a z only pass is performed. The same is true of current ATI and NVIDIA hardware, but the fact that a geometry only pass can now make use of shader hardware to perform 48 vector and 48 scalar operations in any given clock cycle while doing twice the z operations is quite intriguing. This could allow some very geometrically complicated scenes.

Inside the Xenos GPU

As previously mentioned, the 48 shaders will be able to run either vertex or pixel shader programs in any given clock cycle. To clarify, each block of 16 shader units is able to run a shader program thread. These shader units will function on a slightly higher than DX9.0c, but in order to take advantage of the technology, ATI and Microsoft will have to customize the API.

In order to get data into the shader units, textures are read from main memory. The eDRAM of the system is unable to assist with texturing. There are 16 bilinear filtered texture samplers. These units are able to read up to 16 textures per clock cycle. The scheduler will need to take great care to organize threads so that optimal use of the texture units are made. Another consideration to take into account is anisotropic filtering. In order to perform filtering at beyond bilinear levels, the texture will need to be run through the texture unit more than once (until the filtering is finished). If no filtering is required (i.e. if a shader program is simple reading stored data), the vetex fetch units can be used (either with a vertex or a pixel shader program).

In the PC space, we are seeing shifts to more and more complex pixel shaders. Large and larger textures are being used in order to supply data, and some predict that texture processing will eclipse color and z bandwidth in the not so distant future. We will have to see if the console and desktop space continue to diverge in this area.

One of the key aspects of performance for the Xbox 360 will be in how well ATI manages threads on their GPU. With the shift to the unified shader architecture, it is even more imperative to make sure that everything is running at maximum efficiency. We don't have many details on ATI's ability to context switch between vertex and pixel shader programs on hardware, but suffice it to say that ATI cannot afford to have any difficulties in managing threads on any level. As making good use of current pixel shader technology requires swapping out threads on shaders, we expect that this will go fairly well in this department. Thread management is likely one of the most difficult things ATI had to work out to make this hardware feasible.

Those who paid close attention to the amount of eDRAM (10MB) will note that this is not enough memory to store the entire framebuffer for displays larger than standard television with 4xAA enabled. Apparently, ATI will store the front buffer in the UMA area, while the back buffer resides on the eDRAM. In order to manage large displays, the hardware will need to render the back buffer in parts. This indicates that they have implemented some sort of very large grained tiling system (with 2 to 4 tiles). Usually tile based renderes have many more tiles than this, but this is a special case.

Performance of this hardware is a very difficult aspect to assess without testing the system. The potential is there for some nice gains over the current high end desktop part, but it is very difficult to know how easily software engineers will be able to functionally use the hardware before they fully understand it and have programmed for it for a while. Certainly, the learning curve won't be as steep as something like the PlayStation 2 was (DirectX is still the API), but knowing what works and what doesn't will take some time.

ATI's Modeling Engine

The adaptability of their hardware is something ATI is touting as well. Their Modeling Engine is really a name for a usage model ATI provides using their unified shaders. As each shader unit is more general purpose than current vertex and pixel shaders, ATI has built the hardware to easily allow the execution of general floating point math.

ATI's Modeling Engine concept is made practical through their vertex cache implementation. Data for general purpose floating point computations moves into the vertex cache in high volumes for processing. The implication here is that the vertex cache has enough storage space and bandwidth to accommodate all 48 shader units without starvation for an extended period of use. If the vertex cache were to be used solely for vertex data, it could be much less forgiving and still offer the same performance (considering common vertex processing loads in current and near term games). As we stated previously, pixel processing (for now) is going to be more resource intensive than vertex processing. Making it possible to fill up the shader units with data from the vertex cache (as opposed to the output of vertex shaders), and the capability of the hardware to dump shader output to main memory is what makes ATI's Modeling Engine possible.

But just pasting a name on general purpose floating point math execution doesn't make it useful. Programmers will have to take advantage of it, and ATI has offered a few ideas on different applications for which the Modeling Engine is suited. Global illumination is an intriguing suggestion, as is tone mapping. ATI also indicates that higher order surfaces could be operated on before tessellation, giving programmers the ability to more fluidly manipulate complex objects. It has even been suggested that physics processing could be done on this part. Of course, we can expect that Xbox 360 programmers will not implement physics engines on the Modeling Engine, but it could be interesting in future parts from ATI.

PlayStation 3's GPU: The NVIDIA RSX

We've mentioned countless times that the PlayStation 3 has the more PC-like GPU out of the two consoles we're talking about here today, and after this week's announcement, you now understand why.

The PlayStation 3's RSX GPU shares the same "parent architecture" as the G70 (GeForce 7800 GTX), much in the same way that the GeForce 6600GT shares the same parent architecture as the GeForce 6800 Ultra. Sony isn't ready to unveil exactly what is different between the RSX and the G70, but based on what's been introduced already, as well as our conversations with NVIDIA, we can gather a few items.

Despite the fact that the RSX comes from the same lineage as the G70, there are a number of changes to the core. The biggest change is that RSX supports rendering to both local and system memory, similar to NVIDIA's Turbo Cache enabled GPUs. Obviously rendering to/from local memory is going to be a lot lower latency than sending a request to the Cell's memory controller, so much of the architecture of the GP

eponge

30/06/2005, 23h15

avec les photos :

http://www.nofrag.com/2005/jun/30/18062/

GutsBlack

25/08/2005, 21h21

C'est une demande plutôt bête mais comme les articles de x86-secret sont toujours super complet et précis, est ce que quelqu'un pourrais trouver quelques infos dans les docs suivantes : http://cell.scei.co.jp/index_e.html et nous faire un petit résumer de ce qu'il en pense ?

merci :)

Yasko

30/09/2005, 11h36

http://www.presence-pc.com/tests/Le-processeur-Cell-366/

Pas mal comme article.

Franxinator

30/09/2005, 14h25

ouais, très interessant.

Dean Calver programmeur d’un jeu PS3 appelé Heavenly Sword, a indiqué qu’il conseillait de considérer, d’un point de vue logiciel, le PPE comme deux processeurs distincts fonctionnant à une fréquence deux fois moins élevée. Enfin IBM lui-même a précisé que l’exécution des threads était entrelacée sans donner plus de précision.

peut être pas une "tuerie aussi mortelle" qu'annoncée finalement. enfin, attendons!

newbie06

08/06/2008, 15h09

Allez zou, je pique ceci à Foudge et remercie braoru pour la fouille archéologique.

IBM PowerXcell 8i, le remplaçant du CELL BE : http://www.onversity.net/cgi-bin/progactu/actu_aff.cgi?Eudo=bgteob&P=00001079

Un modo pourrait épinglé ce topic ? Mici :)

Neo_13

09/06/2008, 10h45

euh... Si c'est pour constater qu'en généraliste il se fait avoiner la gueule par un A64 vieux de 6ans et qu'en DSP il se fait défoncer par un G92, je sens que ça va pas faire bander longtemps. :durmaisjuste:

Et à la fin, tout le monde se fait calmer par le larabee 3. :troll:

newbie06

09/06/2008, 11h01

/me sort sa hache a troll
Ouai bien sur qu'il est tout pourri le Cell : http://www.nytimes.com/2008/06/09/technology/09petaflops.html?_r=1&adxnnl=1&oref=slogin&adxnnlx=1213002005-WD1GD2ht2z0n/Z+7NOztMA

Neo_13

09/06/2008, 11h40

Et ?

Parce que si c'est ça la preuve, il fait à peine 2x mieux que Blue Gene L qui est basé sur des PPC440 qui n'ont absolument rien dans le froc (en gros, n'importe quel ppc G4 leur met une trempe).

Et l'écart entre peak et mean ?

Et puis j'ai pas dit qu'il était tout pourri, j'ai dit qu'il n'avait pas grand intérêt car il est dépassé en performances, en conso, en scalabilité enfin il est battu en TOUT.

Après, comparer 65536 PPC440 et 12xxx Cell et en déduire que le Cell est 10x mieux... OK le cell est meilleur que le PPC440.

newbie06

09/06/2008, 12h36

Tu prends vite les tours hein ? T'as achete une xbox 360 ou quoi ? :p

Je repondais juste a ton trolling par du trolling.

Maintenant ce que je peux dire:
1. comparer les SPU a des DSP pourquoi pas, mais un DSP evolue
2. comparer les unites des GPU a des DSP evolues, alors la non ; ca se programme en C, C++, Fortran ?
3. tu as des liens comparant les perfs d'un Cell avec un g92, qui ne soient pas des micro-benchmarks ?
4. la puissance de blue gene/l ne vient pas des ppc440 mais des FPU qui y sont attachees (ref (http://www.research.ibm.com/journal/rd/492/gara.html)).
5. tu connais beaucoup de supercalculateurs qui utilisent des G4 ?
6. tu connais les FLOP/W du Cell 65nm et du futur 45nm ?

Peut-etre que le Cell est depasse, et je m'en tape. Ce qui m'interesse bien plus est de voir vers quoi tendent les architectures de calcul haute performance. Et que ce soit du Cell ou du g200, je vois des points communs.

Neo_13

09/06/2008, 13h12

Tu prends vite les tours hein ? T'as achete une xbox 360 ou quoi ? :p non une DS...

Je repondais juste a ton trolling par du trolling.Oui mais moi mieux d'abord et en plus mon pseudo est orange donc mieux deux fois en plus et toc :p

Maintenant ce que je peux dire:
1. comparer les SPU a des DSP pourquoi pas, mais un DSP evolue
2. comparer les unites des GPU a des DSP evolues, alors la non ; ca se programme en C, C++, Fortran ?
3. tu as des liens comparant les perfs d'un Cell avec un g92, qui ne soient pas des micro-benchmarks ?
4. la puissance de blue gene/l ne vient pas des ppc440 mais des FPU qui y sont attachees (ref (http://www.research.ibm.com/journal/rd/492/gara.html)).
5. tu connais beaucoup de supercalculateurs qui utilisent des G4 ?
6. tu connais les FLOP/W du Cell 65nm et du futur 45nm ?

Peut-etre que le Cell est depasse, et je m'en tape. Ce qui m'interesse bien plus est de voir vers quoi tendent les architectures de calcul haute performance. Et que ce soit du Cell ou du g200, je vois des points communs.
Pour le reste, CUDA existe, je compare des flops "peak on paper" avec des flops "peak on paper", des supercalc G4 yen a probablement eu sous marque IBM (faudrait que je relise mon article sur les supercalculateurs parce que j'ai un trou), les FPU attachés sont ondie, non ? alors ils font partie du PPC440. Et pour finir des Flop peak on paper/W idle on paper ? meme ça je ne connais pas d'ailleurs.

Ce qui est sur c'est que je nie toute facilité de programmation d'une archi sans MMU (les SPe n'en ont pas, et c'est au dev de se coltiner à la paluche d'avoir les data dans le cache local du spe et si c'est pour passer par une API, la solution GPU perd son inconvénient de recours à une API).

De plus, il semble (quitte à troller, j'insiste lourdement :p) qu'en fait ce ne soit pas un supercalculateur CELL, mais un supercalculateur Opteron qui utilise des Cell en coprocesseurs (ce qui, par ailleurs, pour moi, est sa place, et là, j'approuve).

Et si un jour j'ai le temps, je referai un inventaire du top500 pour compter le nombre de x86 toutpourris vs les autres, pour comparer à ce que j'avais trouvé la dernière fois.

Dandu

09/06/2008, 14h01

Y a eu des supercalculateurs à base de G5, quand même, le BigMac qui a été deuxième, si je ne me trompe (et qui en plus utilisait des machines "commerciales").

newbie06

09/06/2008, 14h14

Pour le reste, CUDA existe
Ca reste un langage dedie. Enfin je ne connais pas trop, je ne m'aventurerai donc pas :)

les FPU attachés sont ondie, non ? alors ils font partie du PPC440.Non, le PPC440 et la FPU sont sur le meme die mais separes (t'as pas regarde mon lien toi :p).

Et pour finir des Flop peak on paper/W idle on paper ? meme ça je ne connais pas d'ailleurs.http://www.green500.org pour les peaks. (http://www.green500.org)

Ce qui est sur c'est que je nie toute facilité de programmation d'une archi sans MMU (les SPe n'en ont pas, et c'est au dev de se coltiner à la paluche d'avoir les data dans le cache local du spe et si c'est pour passer par une API, la solution GPU perd son inconvénient de recours à une API).Ca fait partie de ce que je voulais quand je parlais d'une direction dans les super calculateurs (non cluster) : programmation specifique d'unites dediees. Apres cette programmation specifique est plus ou moins aisee, et je suis pret a parier que c'est plus facile avec un SPU qu'avec un GPU.

De plus, il semble (quitte à troller, j'insiste lourdement :p) qu'en fait ce ne soit pas un supercalculateur CELL, mais un supercalculateur Opteron qui utilise des Cell en coprocesseurs (ce qui, par ailleurs, pour moi, est sa place, et là, j'approuve).Alors le PPC440 c'est pareil, na !

Et si un jour j'ai le temps, je referai un inventaire du top500 pour compter le nombre de x86 toutpourris vs les autres, pour comparer à ce que j'avais trouvé la dernière fois.http://www.top500.org/stats/list/30/procfam

Neo_13

09/06/2008, 15h06

412 x86/500

Par rapport à novembre 2004, c'est écrasant pour les non-"x86toutpourricaymal". (grosso merdo à l'époque cté une 100aine de moins).

les DSP like évolué du slip, c'est peut etre l'avenir, mais à froid comme ça, moi, je le vois plutot coté x86 l'avenir (makgré son inélégance et sa lourdeur)

newbie06

09/06/2008, 15h15

Alexko

09/06/2008, 15h35

Neo_13

09/06/2008, 16h05

Rooooooooh cette mauvaise foi :)
Tu critiques la complexite de programmation du Cell et tu dans le meme temps tu montres tous ces clusters de x86 qui sont dans le top500 :p Ouai bon OK y'a presque que des clusters la-dedans

Et pour en revenir au rendement flop/w les 26 premiers sont des BlueGene quand meme.

Allez je retourne a mes SPEC2k :|Mauvaise foi, c'est mon 3e prénom :p

pour les flop/W, le cell se placera comme par rapport à ses frere bg ppc440 ?

C'est pas forcément incompatible, regarde Larrabee :pLarabee est-il un dsp ? moi je dirais pas ça... mais j'ai regardé le bestiau de très très loin pour l'instant...

(c'est dur de troller, bosser et se documenter... alors j'ai arreté la doc)

Foudge

09/06/2008, 16h19

http://www.top500.org/stats/list/30/procfamBref, 80% d'x86.

Le problème c'est pour les super-calculateur "hybride". Par ex celui de Genci fera 300 TFlops et sera composé de 1068 Nehalem octocore et de 48 modules Tesla (soit 96 GT200 de nVidia). Niveau puissance, 100TFlops proviennent des CPU, 200 des GPU.
Et ce sera considéré comme un supercalculateur x86 ?

newbie06

09/06/2008, 16h47

Et ce sera considéré comme un supercalculateur x86 ?
Hormis Neo_13 personne n'oserait faire ca hein ? :p

pour les flop/W, le cell se placera comme par rapport à ses frere bg ppc440 ?
Je ne crois pas qu'on sache pour le moment. Mais il me semble que le cell 65nm a un assez bon rendement. Bah de toute facon ce green top il compte l'ensemble des machines, pas juste les proc :)

Larabee est-il un dsp ? moi je dirais pas ça... mais j'ai regardé le bestiau de très très loin pour l'instant...
Personne ne l'a vu de pres, je pense, sauf ceux qui bossent dessus. Mais je parie que les unites vectorielles seront tres DSP like, enfin SPU like, enfin comme tu veux ^_^

(c'est dur de troller, bosser et se documenter... alors j'ai arreté la doc)
Je continue les trois, donc je les fais mal.

fefe

09/06/2008, 20h56

Allez je retourne a mes SPEC2k :|

Tu sais y a spec2k6 depuis qq temps maintenant :p

Personne ne l'a vu de pres, je pense, sauf ceux qui bossent dessus. Mais je parie que les unites vectorielles seront tres DSP like, enfin SPU like, enfin comme tu veux ^_^

Et moi je parie que non :p, que ca sera juste des unites SSE larges comme ce qui a ete annonce pour AVX.

newbie06

09/06/2008, 22h08

Tu sais y a spec2k6 depuis qq temps maintenant :p
On devrait l'avoir bientôt. Mais ça va être coton à simuler ^_^
Je suis en train de mettre en place une infrastructure avec SimPoint et qemu, je m'éclate, chuis un tordu :p

Et moi je parie que non :p, que ca sera juste des unites SSE larges comme ce qui a ete annonce pour AVX.Vi je me doute, je voulais juste dire ça pour que Neo_13 comprenne, puisqu'il parle de DSP pour les SPU :ph34r:

newbie06

11/06/2008, 08h14

Il semblerait que Roadrunner fasse "376 million calculations per Watt".
Ref: http://linuxdevices.com/news/NS6440737610.html

Cela le placerait en tete du Green500.