Alors pour initier ce topic sur le challenger de l'Atom, commençons-donc par une interview du président de Centaur, sur un bon vieux rock comme les taiwanais en ont le secret :
Rubrique à brac :La gamme :
- Date de sortie : Q3'08
- Bus : FSB 200 MHz QDR
- Process : 65 nm Fujitsu
- Die : 7.650mm x 8.275mm (63.3 mm²)
- Package : NanoBGA2 (21mm x 21mm) - idem C7
- ISA : x86, 64 bits, SSEx
- caches : 64 KB L1I + 64 KB L1D - 16-way, 1 MB L2 16-way exclusif
- Espace dédié de 64 lignes de cache pour le prefetch (?)
- Prédiction de branches : eight different predictors in two different pipeline stages. The first fetch pipeline stage contains three predictors of conditional branch behavior (each more accurate for a particular type of branch), a predictor of which of these predictors to use, and a separate return predictor. The translate stage contains a return predictor, a conditional branch “overflow” predictor, and a default predictor.
- Decode, issue, retire : decode three full x86 instructions per clock (instructions of any length in a single clock, assuming that all instructions start within the same 16-byte address range. no restriction on the type of x86 instructions or their format—all three x86 instructions can be “complex”), generate three fused micro-ops per clock, issue—speculatively and out-of-order—seven execution micro-ops per clock to seven execution ports (two integer, a load, a store address, a store data, a “media”, and a multiply micro-op. The media micro-op can be a floating-point add, divide or square root, or a SIMD integer instruction), and retire three fused micro-ops per clock.
- Macro & micro fusion : specific combinations of two x86 instructions are combined (fused) into one executable micro-op (macro-fusion). Multiple micro-ops controlling different execution ports can be fused into a single fused micro-op (micro fusion).
- Compute : four floating-point adds and four floating-point multiplies every clock processor — add : two clocks for any format (SP, DP, DE, packed or scalar). ; mul : three clocks for SP multiply, and four for DP and DE. data path for SIMD integer instructions is 128-bits wide,
and almost all SSEx instructions—including all shuffles— execute in only one clock.- Power : adaptively transitioning between performance and voltage states (“P” states) while the processor continues to run. automatic overclocking if the die temperature is low. automatically maintain the die temperature at a user-specified temperature.
- Securité : hardware for AES encryption, SHA-1 and SHA-256 secure hashing, random number generation, and a very specialized “secure execution mode”.
Model Speed Idle Power TDP
L2100 1.8GHz 500mW 25W
L2200 1.6GHz 100mW 17W
U2300 1.3GHz 100mW 8W
U2500 1.2GHz 100mW 6.8W
U2400 1.0GHz 100mW 5W
Les perfs :
Attention, on entre dans le spéculatif (avec fortes probabilités d'erreur sur le prefetch...)
En se basant sur les chiffres de Via annonçant le Nano comme 2 fois (2.4 précisément) plus puissant que le C7 à fréquence identique, et ceux des tests du Doc entre le C7 et l'Atom, on obtient un Nano au moins 30% plus performant qu'un Atom à fréquence identique (entre 1.1 et 1.4 sans tenir compte des 100 MHz à l'avantage de l'Atom). On pourrait donc avoir Nano@1.2GHz/8W équivalent à un Atom@1.6GHz/2W (valeurs de TDP, pas conso réelle...)