• Nem Talált Eredményt

Multicore processors

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Multicore processors"

Copied!
328
0
0

Teljes szövegt

(1)

Multicore processors

Dezső, Sima

(2)

Multicore processors

Dezső, Sima

Szerzői jog © 2013 Typotex Kiadó

Kivonat

Az informatika egyik jellemző trendje a processzorok illetve processzor architektúrák nagyiramú fejlődése.

2005-2006-tól kezdődően teljes mértékben tért nyertek a többmagos processzorok, melyek napjaink notebook- jainak, asztali gépeinek, szervereinek jellemző processzor típusává váltak. E processzor kategóriában vezető szerepet játszik az Intel cég mintegy 80%-os világpiaci részesedéssel, míg e terület második helyen álló szereplője az AMD. A fentiekre tekintettel a kidolgozott tananyag az Intel és AMD alaparchitektúráira fókuszál, bemutatva az egymást követő processzorcsaládok utasításkészletének, mikroarchitektúrájának, disszipációkezelési technikáinak illetve rendszerarchitektúrájának a fejlődését, valamint a megjelent processzorok kínálatát. A tananyag alapvető szempontja a többmagos processzorok területén bekövetkezett fejlődés kidomborítása, megismertetése a hallgatókkal.

Gacsal József, Intel Hungary, Üzletfejlesztési igazgató

(3)

Tartalom

Part I. Introduction ... 1

1. Introduction ... 3

1. Foreword ... 3

2. The mobile boom and its consequences to computer architectures ... 3

3. Consequences of the low power requirement of mobile devices for Intel and AMD .. 5

4. Foreseeable market situation ... 5

5. Intel’s response to the mobile challenge ... 5

6. Evolution of Intel’s basic architectures [2] ... 5

7. AMD’s response to the mobile challenge ... 6

8. Evolution of AMD’s basic architectures ... 6

9. Overview of Intel’s and AMD’s actual processor lines ... 6

10. Scope of these slides ... 7

11. Reasons for this decision ... 7

2. References ... 8

Part II. Intel’s Core 2-based processor lines ... 9

1. Introduction ... 15

1. The evolution of Intel’s basic microarchitectures ... 15

2. Intel’s Tick-Tock model ... 15

3. Basic architectures and their related shrinks ... 16

2. The Core 2 line ... 19

1. 2.1 Introduction ... 19

2. 2.2 Major innovations of the Core 2 line ... 20

2.1. 2.2.1 Wide execution ... 20

2.1.1. 4-wide core ... 20

2.1.2. Enhanced execution resources ... 22

2.1.3. Performance leadership changes between Intel and AMD ... 23

2.1.4. Example 1: DP web-server performance comparison (2003) ... 24

2.1.5. Example 2: Summary assessment of extensive benchmark tests contrasting dual Opterons vs dual Xeons (2003) [7] ... 24

2.1.6. Example: DP web-server performance comparison (2006) ... 25

2.2. 2.2.2 Smart L2 cache ... 25

2.2.1. Shared L2 cache ... 25

2.2.2. Benefits of shared caches ... 26

2.2.3. Drawbacks of shared caches ... 26

2.3. 2.2.3 Smart memory accesses ... 26

2.3.1. Hardware prefetchers [9] ... 26

2.3.2. Intensive use of hardware prefetchers [11] ... 27

2.3.3. Hardware prefetchers within the Core 2 microarchitecture ... 27

2.4. 2.2.4 Enhanced digital media support ... 27

2.4.1. Widening the width of FP/SSE Execution units from 64-bit to 128-bit 27 2.4.2. Overview of the x86 ISA extensions in Intel’ processor lines ... 28

2.4.3. Achieved performance boost in Core 2 for gaming apps ... 30

2.5. 2.2.5 Intelligent Power management ... 31

2.5.1. Ultra fine grained power control ... 31

2.5.2. Platform Thermal Control ... 31

2.5.3. Possible solution for the Platform Thermal Control Manager [88] .. 32

3. 2.3 Overview of Core 2 based processor lines ... 32

3. The Penryn line ... 34

1. 3.1 Introduction ... 34

1.1. Penryn ... 34

2. 3.2 Key enhancements of Penryn line ... 35

2.1. 3.2.2 More advanced power management ... 36

2.1.1. Deep Power Down technology (DPD) ... 36

2.1.2. Enhanced Dynamic Acceleration Technology (EDAT) (for mobiles) 38 2.1.3. Overall performance achievements with Penryn (1) ... 39

3. 3.3 Overview of Penryn based processor lines ... 40

(4)

4. The Nehalem line ... 42

1. 4.1 Introduction to the 1. generation Nehalem line (Bloomfield) ... 42

1.1. Die shot of the 1. generation Nehalem desktop processor (Bloomfield) [45] 43 2. 4.2 Major innovations of the 1. generation Nehalem line [54] ... 44

2.1. 4.2.1 Simultaneous Multithreading (SMT) ... 45

2.1.1. Performance gains of SMT ... 46

2.2. 4.2.2 New cache architecture ... 47

2.2.1. Distinguished features of Nehalem’s cache architecture ... 47

2.3. 4.2.3 Integrated memory controller ... 48

2.3.1. Main features ... 49

2.3.2. Benefit of integrated memory controllers ... 49

2.3.3. Drawback of integrated memory controllers ... 49

2.3.4. Non Uniform Memory Access (NUMA) ... 49

2.3.5. Memory latency comparison: Nehalem vs Penryn ... 50

2.4. 4.2.4 QuickPath Interconnect bus (QPI) ... 51

2.4.1. Signals of the QuickPath Interconnect bus (QPI bus) ... 52

2.4.2. QuickPath Interconnect bus (QPI) ... 52

2.4.3. QPI based DP and MP server system architectures ... 52

2.4.4. Comparison of the transfer rates of the QPI, FSB and HT buses .... 53

2.4.5. The notion of “Uncore” ... 53

2.5. 4.2.5 Enhanced power management ... 54

2.5.1. Nehalem’s Turbo Mode ... 55

2.5.2. ACPI states [26] ... 55

2.6. 4.2.6 New socket ... 58

3. 4.3 Major innovations of the 2. generation Nehalem line (Lynnfield) (1) [46] ... 58

3.1. Major innovations of the 2. generation Nehalem line (Lynnfield) (2) [46] ... 58

3.2. Evolution of providing PCIe lanes for graphics ... 59

3.3. Evolution of the topology and type of available PCIe lanes for graphics cards 59 3.4. Major innovations of the 2. generation Nehalem line (Lynnfield) (3) [46] ... 60

3.5. Die photos of the 1. and 2. gen. Nehalem desktop chips ... 60

5. The Nehalem-EX line ... 62

1. 5.1 Introduction ... 62

1.1. Overview of the Nehalem-EX based processor lines (based on [44]) ... 62

2. 5.2 Major innovations of the Nehalem-EX processors ... 62

2.1. 5.2.1 Overview ... 62

2.2. 5.2.2 Native 8 cores with 24 MB L3 cache (LLC) [55] ... 62

2.2.1. Die micrograph of the 8 core Nehalem-EX (Xeon 7500/Beckton) MP server [71], [72] ... 63

2.3. 5.2.3 On-die ring interconnect bus [56] ... 63

2.4. 5.2.4 Serial memory channels [55] ... 64

2.5. 5.2.5 Scalable platform configurations [55] ... 65

3. 5.3 Performance features of the 8-core Nehalem-EX based Xeon 7500 vs the Penryn based 6-core Xeon 7400 [67] ... 66

6. The Westmere line ... 67

1. 6.1 Introduction ... 67

1.1. Westmere 2-core and 6-core die plots [57] ... 67

2. 6.2 Key enhancements of the Westmere lines vs. the Nehalem lines [44] ... 68

2.1. Overview of the Westmere lines ... 68

3. 6.3 Dual-core Westmere-based mobile/desktop lines ... 68

3.1. 6.3.1 Overview ... 68

3.2. 6.3.2 Innovations and enhancements of the dual-core mobile/desktop lines . 69 3.2.1. 6.3.2.1 Overview ... 69

3.2.2. 6.3.2.2 In-package integrated CPU/GPU for the 2 core mobile and desktop segments ... 69

3.2.3. 6.3.2.3 Enhanced Turbo Boost technology in the mobile Arrandale line [57] ... 73

4. 6.4 The six core Westmere-based desktop line ... 73

4.1. Platform and main features of the six core Westmere-based desktop line [93] 74 5. 6.5 Six core Westmere-EP server lines ... 74

5.1. Native 6 cores with 12 MB L3 cache (LLC) for UP/DP servers [58] ... 74

(5)

Multicore processors

5.2. Overview of the models of the Westmere-EP based Xeon 5600 family [94] 74

5.3. Example Westmere-EP DP server platform [57] ... 75

7. The Westmere-EX line ... 76

1. 7.1 Introduction ... 76

2. 7.2 Key enhancement of the Westmere-EX line vs. the Nehalem-EX server line [95] 76 3. 7.3 Selected details of the Westmere-EX processors ... 77

3.1. 7.3.1 Native 10 cores with 30 MB L3 cache (LLC) [60] ... 77

3.2. 7.3.2 Basic building blocks of the Westmere-EX processor (10 cores/30 MB L3 cache) (LLC) [60] ... 77

3.3. 7.3.3 Interconnection of the basic building blocks of the Westmere-EX processors [60] ... 78

8. The Sandy Bridge line ... 79

1. 8.1 Introduction ... 79

1.1. Overview of the Sandy Bridge family ... 79

1.2. Overview of the Sandy Bridge based processor lines ... 79

1.3. Main functional units of Sandy Bridge [96] ... 80

2. 8.2 Major innovations of the Sandy Bridge line vs. the 1. generation Nehalem line [61] 81 2.1. 8.2.1 Overview ... 81

2.2. 8.2.2 Extension of the ISA (of the cores) by the AVX instruction set ... 82

2.3. 8.2.2 Extension of the ISA (of the cores) by the AVX instruction set (Based on [18]) ... 82

2.3.1. The AVX extension includes [97]: ... 83

2.3.2. Implementation of AVX ... 84

2.3.3. Subsequent evolution of AVX [97] ... 84

2.4. 8.2.3 New microarchitecture of the cores ... 85

2.5. 8.2.4 On die ring interconnect bus [66] ... 85

2.5.1. Main features of the on-die interconnect bus [64] ... 86

2.6. 8.2.5 On die graphics unit [99] ... 86

2.6.1. Support of both media and graphics processing by the graphics unit [99] 87 2.6.2. Main features of the on die graphics unit [99] ... 87

2.6.3. Specification data of the HD 2000 and HD 3000 graphics [100] .... 88

2.6.4. Performance comparison of the Sandy Bridge’s graphics: gaming [101] 88 2.7. 8.2.6 Enhanced Turbo Boost technology [64] ... 89

2.7.1. Intelligent power sharing between the cores and the integrated graphics [64] ... 90

3. 8.3 Example for a Sandy Bridge based desktop platform with the H67 chipset [102] 91 4. 8.4 The E3-1200 UP server line [103] ... 92

9. The Sandy Bridge-E line ... 93

1. 9.1 Introduction ... 93

1.1. Overview of the Sandy Bridge-E based processor lines ... 93

1.2. Comparison of die parameters of recent DT processors [77] ... 93

2. 9.2 Differences to the original Sandy Bridge line ... 93

2.1. 9.2.1 Overview ... 93

2.1.1. 9.2.2 6 cores, no integrated graphics ... 94

2.1.2. 9.2.3 4 parallel memory channels instead of 2 available in the Sandy Bridge lines ... 95

2.1.3. 9.2.4 40 PCIe 2. gen. lanes to connect multiple graphics cards to the processor ... 96

2.2. 9.2.2 LGA-2011 socket instead of the LGA-1155 used in the original Sandy Bridge line ... 98

2.2.1. Main features of the Sandy Bridge-E line vs the Sandy Bridge line [77] 98 2.2.2. Example for a Sandy Bridge-E/X79 based 4-way SLI multi graphics card configuration ... 99

10. The Sandy Bridge-EN/EP line ... 100

1. 10.1 Introduction ... 100

1.1. Overview of the Sandy Bridge-EN/EP lines ... 100

(6)

1.2. Improvements of the microarchitecture of the Sandy Bridge-EN/EP processors

[107] ... 100

1.3. Die shot of the Xeon E5-2600 [107] ... 101

1.4. The interconnection ring connecting main units of the processor [107] ... 101

2. 10.2 Main enhancements of the Sandy Bridge-EN line over the previous Westmere-EP Xeon 5600 line [108] ... 102

3. 10.3 Main enhancements of the Sandy Bridge-EP line over the Sandy Bridge-EN line [108] ... 102

3.1. Feature comparison Westmere-EP 5600, Sandy Bridge-EN (E5-2400) and Sandy Bridge-EP (E5-2600) [108] ... 102

3.2. Comparison of the dual socket (DP) Sandy Bridge-EN and Sandy Bridge-EP platforms [109] ... 103

3.3. The dual socket (DP) Xeon E5-2600 (Sandy Bridge-EP) Romley platform [110] 104 3.4. The quad socket (MP) Xeon E5-2600 (Sandy Bridge-EP) Romley platform [110] 105 4. 10.4 Main features of selected E5-EP models [111] ... 105

5. 10.5 More details on the Romley server platform ... 106

5.1. The Patsburg (C600) chipset ... 106

6. 10.6 Performance comparison Sandy Bridge-EP vs. Westmere-EP X5680 [112] ... 107

6.1. Summary assessment of the performance comparison ... 108

6.2. Historical increase of the integer performance of 2 Socket (2S) configurations [113] ... 108

7. 10.7 Intel’s Xeon E5 family server roadmap [114] ... 109

11. The Ivy Bridge line ... 110

1. 11.1 Introduction ... 110

1.1. Overview of the Ivy Bridge family-1 ... 110

1.2. Overview of the Ivy Bridge family-2 ... 110

1.3. Contrasting the Sandy Bridge and Ivy Bridge dies [81] ... 112

1.4. Main implementation parameters of recent processors [81] ... 112

1.5. Overview of the Ivy Bridge based processor lines ... 112

2. 11.2 Major innovations of Ivy Bridge [80] ... 113

2.1. 11.2.1 Overview ... 113

2.2. 11.2.2 The 22 nm tri-gate process technology within Intel’s technology roadmap [82] ... 114

2.2.1. The traditional planar transistor [82] ... 114

2.2.2. The 22 nm Tri-Gate transistor-1 [82] ... 115

2.2.3. The 22 nm Tri-Gate transistor-2 [82] ... 116

2.2.4. Transistor characteristics [82] ... 117

2.2.5. Transistor gate delay [82] ... 118

2.2.6. Intel’s 22 nm manufacturing fabs [82] ... 119

2.2.7. Ivy Bridge chips on a 300 mm wafer [82] ... 120

2.3. 11.2.3 Supervisory Mode Execute Protection [83] ... 121

2.4. 11.2.4 Next generation processor graphics and media [81] ... 121

2.4.1. Overview of video interfaces of computing devices to external displays 122 3. 11.3 Main features of Ivy Bridge-based first introduced processors ... 122

3.1. 11.3.1 Main features of the first introduced Ivy Bridge-based desktop models [116] ... 122

3.2. 11.3.2 Main features of the first introduced Ivy Bridge-based mobile models [116] 123 4. 11.4 Ivy Bridge-based desktop platform [81] ... 123

5. 11.5 Performance assessment of the desktop models ... 124

5.1. 11.5.1 CPU performance of the highest clocked Ivy Bridge model Core i7-3770K [81] (Higher is better) ... 124

5.2. 11.5.2 Relative GPU performance (with games DX9/DX10/DX11) of the highest performance Ivy Bridge DT model Core i7-3770K (Resolutions 1440x900, 1680x1050) [81] (Higher is better) ... 125

5.2.1. Increasing performance of Intel’s integrated graphics [117] ... 125

(7)

Multicore processors

6. 11.6 Main features of first introduced Ivy Bridge-based Xeon E3-12xx v2 models [118]

126

7. 11.7 The Ivy Bridge-based Xeon E3-1200 v2 platform (called the Bromolow refresh server

platform) [119] ... 126

12. The Ivy Bridge-E line ... 128

1. 12.1 Introduction ... 128

1.1. Overview of the Ivy Bridge-E based processor lines ... 128

2. 12.2 Differences to the previous Sandy Bridge-E line [132] ... 128

2.1. Overview of providing PCIe lanes on Intel desktop processors ... 128

2.2. Die plot of an Ivy Bridge-E processor [133] ... 129

2.3. Main features of Ivy Bridge-E models [131] ... 129

3. 12.3 Example for an Ivy Bridge-E based desktop platform with the X79 chipset [134] 130 4. 12.4 Performance increase achieved by the Ivy Bridge-E line vs. the previous Sandy Bridge-E line [135] ... 130

13. The Ivy Bridge-EN/EP lines ... 132

1. 13.1 Introduction ... 132

1.1. Overview of the Ivy Bridge-EN/EP lines ... 132

1.2. Die layouts [137] ... 132

1.3. Die shot of the ten-core Ivy Bridge-EP processor [138] ... 133

2. 13.2 Main enhancements of the Ivy Bridge-EP-based Xeon E5-2600 v2 line vs. the Sandy Bridge-EP-based Xeon E5-2600 line [138] ... 134

2.1. Comparison of main features of the Ivy Bridge-EP-based Xeon E5-2600 v2 line vs. the Sandy Bridge-EP-based Xeon E5-2600 line [138] ... 134

3. 13.3 Main features of specific models of the Xeon E5-2600 v2 series [139] ... 135

4. 13.4 Main features of specific models of the Xeon E5-1600 v2 series [140] ... 135

5. 13.5 The Romley server platform [138] ... 136

5.1. Intel Xeon E5 family server roadmap [136] ... 136

14. The Ivy Bridge-EX line ... 138

1. 14.1 Introduction ... 138

1.1. Ivy Bridge-EX ... 138

2. 14.2 Main features of the Ivy Bridge-EX line [142] ... 138

15. The Haswell line ... 139

1. 15.1 introduction ... 139

1.1. Overview of Haswell-based processor lines (Based on [120]) ... 139

1.2. Die plot of a Haswell processor [121] ... 139

1.3. Sub-families of Haswell [144] ... 140

2. 15.2 Key enhancements of the Haswell cores [80] ... 141

2.1. Buffer sizes of subsequent generations of Core processors [80] ... 141

2.2. Cache sizes, latencies and bandwidth values of subsequent Core generations [122] 142 2.3. Issue rate and execution unit enhancements of Haswell [80] ... 142

3. 15.3 ISA enhancements of the Haswell cores [80] ... 143

3.1. Evolution of the AVX ISA extension [97] ... 143

3.2. Enhancements of AVX2 [97] ... 144

3.3. FMA and peak FLOPs of Haswell [97] ... 144

4. 15.4 Main innovations of the Haswell processor ... 145

4.1. 15.4.1 Overview ... 145

4.2. 15.4.2 Enhanced graphics ... 145

4.2.1. Main enhancements of the Iris Pro and Iris graphics units [123] ... 145

4.2.2. Performance boost provided by the Iris Pro/Iris graphics vs. the previous generation [123] ... 146

4.2.3. Graphics performance increase of subsequent Core generations [117] 147 4.3. 15.4.3 On-package eDRAM cache [117] ... 147

4.3.1. Principle of operation [117] ... 148

4.3.2. Implemented on-package eDRAM [124] ... 148

4.3.3. Memory latency vs. access range in a memory system with eDRAM cache (L4) [117] ... 149

5. 15.5 Main features of the Haswell line of mobile and desktop processors ... 150

5.1. 15.5.1 Example 1: Main features of Haswell-based mobile Core i7 M-Series processors [125] ... 150

(8)

5.2. 15.5.2 Example 2: Main features of Haswell-based Core i7 desktop processors

[126] ... 150

6. 15.6 Haswell-based desktop platform [145] ... 151

7. 15.7 Integer and FP performance of subsequent generations of Core processors [127] 152 8. 15.8 Graphics performance of subsequent generations of Core processors [127] .. 152

9. 15.9 Main features of Haswell-based Xeon E3-12xx v3 line of server processors [128] 153 9.1. Main features of subsequent generations of E3-1200 Xeon processors [129] 153 10. 15.10 Haswell-based Xeon E3-1200 v3 server platform [130] ... 154

16. The Haswell-E line ... 155

1. 16.1 Introduction ... 155

2. 16.2 Differences to the previous Ivy Bridge-E line [143] ... 155

2.1. 16.2.1 Overview ... 155

2.2. 16.2.2 The Haswell-E processor [143] ... 155

2.3. 16.2.3 The Wellsburg-X PCH [143] ... 156

2.4. 16.2.4 DDR4 memory [143] ... 156

2.5. 16.2.4 DDR4 memory [143] ... 157

17. 17. References ... 158

Part III. AMD’s high performance oriented Family 15h (Bulldozer-based) processor lines ... 166

1. Overview of AMD’s high performance oriented Family 15h (Bulldozer-based) processor lines 173 1. Overview of AMD’s high performance oriented Family 15h processor lines (based on [1]) 173 2. Performance increase of AMD’s DP servers up to the Bulldozer-based Interlagos [18] 174 3. AMD’s projection to increase performance in post Bulldozer architectures [19] ... 174

4. Recent roadmaps of AMD’s basic lines [2] ... 174

5. Introduction to the Family 15h lines of processors, designated also as the Bulldozer lines 175 6. The compute module of the Family 15h processors ... 176

7. Shared and dedicated components of the Bulldozer cores ... 178

8. Design philosophy of using compute modules in Bulldozer-based designs ... 178

8.1. Main design aspects-1 [3] ... 178

9. Design philosophy of using compute modules ... 179

9.1. Main design aspects-2 [3] ... 179

10. Example: Clock speed gain achieved by the 1. generation Bulldozer design vs. the previous K10.5 design-1 ... 179

11. a) Servers ... 179

12. Main operational parameters of AMD’s K10.5 Istambul-based DP servers (Lisbon) [13] 180 13. Main operational parameters of AMD’s Family 15h-based DP servers (Valencia) [13] 180 14. b) Desktops ... 181

15. Main features of AMD’s K10.5-based Phenom™ II X6 desktop processors [14] 181

16. Main features of AMD’s 1. generation Bulldozer-based FX desktop processors [14] 181 17. Example: Clock speed gain achieved by the 1. generation Bulldozer design vs. the previous K10.5 design - Summary ... 182

18. The width of the Bulldozer cores ... 182

2. First generation Family 15h Bulldozer-based processor lines ... 183

1. 2.1 Overview of Family 15h Bulldozer-based processor lines [3] ... 183

1.1. AMD’s Bulldozer-based server and desktop lines – Overview-1 (based on [1]) 183 1.2. Brand names of AMD’s Bulldozer-based server and desktop lines ... 184

1.3. Positioning AMD’s Bulldozer-based server lines ... 184

1.4. Positioning AMD’s Bulldozer-based desktop lines ... 185

2. 2.2 The Bulldozer Compute Module ... 185

2.1. 2.2.1 Overview of the Bulldozer Compute Module ... 185

2.1.1. The Bulldozer Compute module ... 185

2.1.2. Principle of operation of a Bulldozer module [4] ... 186

2.2. 2.2.2 ISA extensions introduced in the Bulldozer design ... 186

2.2.1. New Bulldozer instructions and their possible use [15] ... 186

2.2.2. Introduction of ISA x86 extensions by Intel vs. AMD ... 187

(9)

Multicore processors

2.2.3. Comparison of FP-capabilities of Buldozer, Magny Course and Sandy

Bridge [16] ... 188

2.2.4. Compiler support of Bulldozers new instructions [15] ... 189

2.3. 2.2.3 The microarchitecture of the Bulldozer Compute Module ... 189

2.3.1. AMD’s Bulldozer module contrasted with two cores of Magny Course [4] 190 2.3.2. The microarchitecture of a Bulldozer core [10] ... 190

2.3.3. Block diagram of Intel’s Core 2 microarchitecture [11] ... 191

2.3.4. Block diagram of AMD’s K8 microarchitecture [11] ... 191

2.3.5. The microarchitecture of a Bulldozer core [10] ... 192

2.3.6. The microarchitecture of of Intel’s Sandy Bridge cores [17] ... 193

2.3.7. The microarchitecture of Intel’s Westmere cores [10] ... 194

2.4. 2.2.4 Assessing the performance potential of the Bulldozer module-1 [3] . 195 2.4.1. Contrasting the execution resources of the Bulldozer core with previous designs ... 196

2.5. 2.2.4 Assessing the performance potential of the Bulldozer module-2 [3] . 196 2.5.1. Contrasting the FP execution resources of the Bulldozer core with previous designs ... 197

2.5.2. Contrasting the FP execution resources of the Bulldozer core with previous designs ... 197

2.5.3. Comparing Bulldozer’s per module and Sandy Bridge’s per core available 256-bit execution resources-1 [17] ... 199

2.5.4. Comparing Bulldozer’s per module and Sandy Bridge’s per core available 256-bit execution resources-1 [17] ... 200

2.6. 2.2.4 Assessing the performance potential of the Bulldozer module-3 [3] . 200 2.6.1. Cache/main memory latencies of K10/K10.5, Bulldozer and Sandy Bridge processors [3] ... 200

2.6.2. Cache sizes of K10/K10.5, Bulldozer and Sandy Bridge processors 201 2.6.3. AMD’s projection to increase performance in post Bulldozer architectures [19] ... 201

3. 2.3 The Orochi die ... 202

3.1. The floorplan of the Orochi die ... 202

3.2. The North Bridge of Orochi [21] ... 203

3.3. Block diagram of the Orochi die ... 204

4. 2.4 New power management features of the Bulldozer design ... 204

4.1. AMD’s power management techniques K8 – 1. gen. Family 15h (Bulldozer) (based on [4]) ... 204

4.2. New power management features of the Bulldozer design ... 205

4.3. TDP Power Cap [23] ... 205

4.4. Module C6 state [24], [6] ... 205

4.5. Module level VSS power gating ... 206

4.6. Benefit of module level power gating (C6) vs. C1E state [7] ... 207

4.7. Contrasting the Smart Fetch technique with entering the Module C6 state [7] 208 4.8. LV-DDR3 support ... 208

5. 2.5 Bulldozer-based server lines ... 209

5.1. 2.5.1 Overview of the Bulldozer-based server lines ... 209

5.1.1. Overview of the Bulldozer-based server lines-1 (Based on [1]) .... 209

5.1.2. Overview of the Bulldozer-based server lines-2 (Based on [1]) .... 210

5.2. 2.5.2 The Bulldozer-based Interlagos MP server line ... 211

5.2.1. Positioning the Bulldozer-based Interlagos MP server line ... 211

5.2.2. Block diagram of Interlagos [6] ... 211

5.2.3. Example: Interlagos-based MP system [6] ... 212

5.2.4. Performance increase of AMD’s MP servers up to the Bulldozer-based Interlagos [18] ... 213

5.2.5. Performance/Watt evolution of AMD’s server lines [2] ... 213

5.2.6. Main features of Bulldozer-based Interlagos MP server lines [13] 214

5.2.7. Comparing main features of Bulldozer-based lines with the previous generation [4] ... 214 5.2.8. Performance assessment of Family 15h Bulldozer-based MP servers [13]

214

(10)

5.2.9. Throughput results of the Open Source server workload runs [26] 215 5.2.10. Response time results of the Open Source server workload runs [26] 215 5.2.11. Power consumption results of the Open Source server workload runs [26]

216

5.2.12. Assessing the benchmark results gained for the Bulldozer-based Interlagos

6276 server ... 216

5.3. 2.5.3 The Turbo core technology of Bulldozer-based MP servers ... 216

5.3.1. Principle of operation [6] ... 217

5.3.2. Full and half load turbo frequencies of Family 15h Bulldozer-based Interlagos MP servers [13] ... 217

5.4. 2.5.4 Bulldozer-based DP (Valencia) and UP (Zurich) server lines ... 218

5.4.1. AMD’s 2012 – 2013 server roadmap [2] ... 218

5.4.2. The Family 15h Bulldozer-based DP system (Valencia) [6] ... 218

5.4.3. Example Family 15h Bulldozer-based DP system (Valencia) [6] .. 219

5.4.4. Main parameters of the Family 15h Bulldozer-based Valencia DP server line [13] ... 219

5.4.5. Main parameters of the Family 15h Bulldozer-based Zurich UP server line [13] ... 220

5.4.6. AMD’s 2012 – 2013 server roadmap [2] ... 220

5.4.7. Recent roadmaps of AMD’s basic lines [27] ... 221

6. 2.6 The Bulldozer-based Zambezi DT line ... 221

6.1. 2.6.1 Overview of the Bulldozer-based Zambezi high performance desktop line [1] ... 221

6.1.1. Brand name of the Bulldozer-based high performance Zambezi desktop line ... 222

6.1.2. Positioning the Bulldozer-based Zambezi high performance desktop line 222 6.1.3. The Family 15h Bulldozer-based high performance Zambezi desktop line [6] ... 223

6.1.4. Die plot of Zambezi [28] ... 223

6.1.5. Key parameters of the Family 15h Bulldozer-based Zambezi desktop line [29] ... 224

6.1.6. System example of a Zambezi desktop system (Scorpius platform) [30] 224 6.2. 2.6.2 The Turbo core technology of the Bulldozer-based Zambezi desktop line 225 6.2.1. Contrasting AMD’s 1. and 2. gen. Turbo core implementations [36] 225 6.2.2. AMD’s 2. generation Turbo core technology ... 226

6.2.3. Principle of operation [6] ... 226

6.2.4. Nominal, 8-core Turbo, and 4-core max. Turbo frequencies of the Zambezi DT [29] ... 227

6.2.5. Example for the operation of AMD’s 2. generation Turbo core technology [37] ... 227

6.2.6. Example: Running a single threaded workload on the 8150 Zambezi DT with Turbo core enabled [36] ... 228

6.2.7. Run time reduction achieved by enabling Turbo core for a single threaded workload running on an FX-8150 (Zambezi) [38] ... 228

6.2.8. Run time reduction achieved by enabling Turbo core for a multi-threaded workload running on an FX-8150 (Zambezi) [38] ... 229

6.2.9. Contrasting the operation of AMD’s 2. gen. Turbo core with that of Intel’s Turbo Boost technology, as implemented in Sandy Bridge-based desktops (i5- 2500K) [36] ... 230

6.2.10. Principle of operation of Intel’s Deep Power Down technology [39] 231 6.2.11. a) Precursor of Intel’s Turbo Boost: EDAT-2 ... 231

6.2.12. b) Intel’s 1. gen. Turbo Boost ... 231

6.2.13. c) Intel’s enhanced 1. gen. Turbo Boost ... 232

6.2.14. Available Turbo Boost bins (133 MHz) for the 1. and 2. gen. Nehalem processors [38] ... 232

6.2.15. d) Intel’s 2. gen. (Next gen.) Turbo Boost (Dynamic Turbo Boost) 232 6.2.16. Contrasting the introduction of Intel’s and AMD’s Turbo and Power gating technologies ... 233

(11)

Multicore processors

6.2.17. Evolution of Intel’s Turbo technology [34] ... 234

6.3. 2.6.3 Performance assessment of the Bulldozer-based Zambezi desktop line 234 6.3.1. Summary benchmark results including all tests excl. games [32] .. 234

6.3.2. Summary performance assessment of Zambezi-1 ... 235

6.3.3. Summary benchmark results including all tests excl. games [32] .. 235

6.3.4. Summary performance assessment of Zambezi-2 ... 236

6.3.5. Summary benchmark results including all tests excl. games [32] .. 236

6.3.6. Example: Impact of Windows 7’s scheduling policy to the activation of Max. Turbo mode [9] ... 237

6.3.7. Summary assessment of the benchmark results of the Zambezi FX 8150 line [32] ... 239

6.3.8. Summary assessment of all Bulldozer based designs ... 239

6.3.9. Remark – AMD’s reorganization after the Bulldozer disaster ... 239

3. Second generation Family 15h Piledriver-based processor lines ... 240

1. 3.1 Overview of the Pilediriver-based processor lines (based on [1]) ... 240

1.1. Brand names of Piledriver-based processor lines ... 240

2. Piledriver-based processor lines ... 241

3. 3.2 The Piledriver Compute Module ... 241

3.1. 3.2.1 Overview of the Piledriver Compute Module ... 241

3.2. 3.2.1 Piledriver’s performance enhancements vs. Bulldozer [54] ... 242

3.2.1. Piledriver’s performance enhancements vs. the (Fam. 12h) Husky and Bulldozer cores [55] ... 242

3.3. 3.2.3 Piledriver’s power management enhancement vs. Bulldozer – The RCM technology [63] ... 243

3.3.1. 3.2.2.1 A brief introduction into clock distribution networks [57] . 243 3.3.2. 3.2.3.2 Principle of the Resonant Clock Mesh (RCM) technology 247

3.3.3. 3.2.3.3 The evolution of implementing RCM ... 254

3.3.4. Main features of AMD’s Bulldozer- and Piledriver based Opteron server lines [65] ... 255

3.3.5. Plans to implement Cyclos’s RCM in ARM Cortex-A15 [66] ... 256

4. 3.3 Piledriver-based GPU-less processor lines ... 256

4.1. 3.3.1 Overview of the Piledriver-based GPU-less processor lines-1 ... 256

4.1.1. Comparing the Bulldozer-based and Piledriver-based 4-module (8 cores) dies [6], [54] ... 257

4.1.2. Main functional blocks of a Piledriver-based GPU-less processor die [54] 258 4.2. 3.3.2 The Abu Dhabi Opteron 6300 server line ... 258

4.2.1. Main functional blocks of the dual-chip Opteron 6300 (Abu Dhabi) 4P server processor [67] ... 259

4.2.2. Die plot of the dual-chip Opteron 6300 (Abu Dhabi) server processor [68] 260 4.2.3. Model numbers and main features of the Opteron 6300 (Abu Dhabi) 4P line [69] ... 260

4.2.4. Comparison of the Bulldozer-based Opteron 6200 and the Piledriver-based Opteron 6300 server lines [67] ... 261

4.3. 3.3.3 The Vishera high performance FX desktop line ... 261

4.3.1. Main functional blocks of the high performance Vishera FX desktop line [54] ... 262

4.3.2. Die plot of the high performance Vishera FX desktop line [54] ... 262

4.3.3. Model numbers and main features of the high performance Vishera FX desktop line [60] ... 263

4.3.4. Comparing main features of AMD’s Vishera and Zambezi FX desktop lines [49] ... 263

4.3.5. Main features of the 9-Series chipset supporting the high performance Vishera DT [70] ... 264

4.3.6. AMD’s high-performance processor roadmap from 10/2011 [44] . 264 5. 3.4 Piledriver-based Trinity APU lines ... 265

5.1. 3.4.1 Overview of the Piledriver-based Trinity APU lines ... 265

5.1.1. Piledriver-based Trinity APU lines ... 265

5.2. 3.4.2 The Trinity APU die ... 265

(12)

5.2.1. AMD’s Trinity APU die [71] ... 266

5.2.2. Comparing die plots of AMD’s Llano and Trinity dies [72] ... 266

5.2.3. Improvements of the Piledriver APU family over the Llano APU family 267 5.2.4. a) Enhancements of the microarchitecture of the Trinity APU [73] 267 5.2.5. b) Improvement of the power management ... 267

5.2.6. The Turbo Core technology of the Llano APU [74], [75] ... 268

5.2.7. Illustration of the operation of the Turbo Core Technology 3.0 of the Trinity APU [77] ... 270

5.3. 3.4.3 The Trinity mainstream desktop APU line ... 271

5.3.1. Positioning the Trinity mainstream desktop APU line [51] ... 272

5.3.2. Main components of the Trinity mainstream desktop APU [78] .... 272

5.3.3. Model numbers and main features of the mainstream Trinity desktop APU line [78] (Virgo platform) ... 273

5.3.4. The new FM2 socket of the Trinity mainstream desktop APU line [78] 273 5.3.5. System architecture of the mainstream Trinity desktop APU with the A85X FCH [79] ... 274

5.3.6. Performance increase achieved over the previous A-Series Llano APU line [78] ... 274

5.4. 3.4.4 The Trinity mobile APU line ... 275

5.4.1. Positioning the Trinity mobile APU line-1 [51] ... 275

5.4.2. Positioning the Trinity mobile APU line-2 [52] ... 276

5.4.3. Model numbers and main features of the Trinity mobile APU line [80] (Comal platform) ... 276

5.4.4. The Comal mobile platform including the (Piledriver-based) Trinity APU and the A70M/A60M FCH [52] ... 277

6. 3.5 Piledriver-based Richland APU lines ... 277

6.1. 3.5.1 Overview of the Piledriver-based Trinity APU lines ... 277

6.1.1. Positioning the Trinity mainstream desktop and mobile APU lines [52] 278 6.1.2. Die shot of the Richland APU [81] ... 278

6.1.3. Key features of the Richland mobile APU line as exposed by AMD [82] 279 6.1.4. Major improvements of the Richland mobile APU line discussed [83], [84] 279 6.1.5. Principle of operation of the Temperature Smart Turbo Core (TSTC) technique-1 ... 280

6.1.6. Principle of operation of the Temperature Smart Turbo Core (TSTC) technique-2 [85] ... 280

6.1.7. Comparing clock frequencies of the Richland and the Trinity APU lines [86] ... 281

6.1.8. Principle of operation of the Temperature Smart Turbo Core (TSTC) technique-3 [85] ... 281

6.1.9. Introducing additional frequency/voltage operating points ... 281

6.1.10. An innovative suite of apps. available typically on the Richland A8 and A10 models [87] ... 282

6.1.11. AMD Face Login [88] ... 282

6.1.12. AMD Gesture Control [88] ... 283

6.1.13. AMD Screen Mirror [88] ... 283

6.1.14. AMD optimized games [88] ... 283

6.2. 3.5.2 The Richland mainstream desktop APU line ... 283

6.2.1. Overview of the Richland mainstream desktop APU line ... 283

6.2.2. Positioning the Richland mainstream desktop and mobile APU lines [52] 284 6.2.3. Model numbers and expected key features of the Richland desktop APU line [89] (Elite Performance platform) ... 284

6.3. 3.5.3 The Richland mobile APU line ... 285

6.3.1. Positioning the Richland mobile APU line [52] ... 285

(13)

Multicore processors

6.3.2. Model numbers and expected main features of the Richland mobile APU

line [84] (Elite performance APU platform) ... 286

6.3.3. AMD’s graphics performance figures of the Richland mobile APU line vs. Intel’s Ivy Bridge-based mobile processors [83] ... 286

4. Third generation Family 15h Steamroller-based processor lines ... 288

1. 4.1 Overview of Family 15h Steamroller-based processor lines (based on [1]) ... 288

1.1. Brand names of Family 15h Steamroller-based processor lines ... 288

1.2. Overview of AMD’s Family 15h Steamroller-based processor lines ... 289

2. 4.2 The Steamroller Compute Module ... 289

2.1. Planned introduction of the Steamroller compute module ... 289

2.2. Preview of the Steamroller compute module (CM) ... 290

2.3. Block diagram of the Steamroller compute module [45] ... 290

2.4. Improvements of the front-end part of the Steamroller compute module [45] 290 2.5. Improving integer scheduling, integer execution and reducing average load latency in the Steamroller compute module [45] ... 291

2.6. Improving the power efficiency (performance/Watt figure) of the Steamroller compute module [45] ... 291

2.7. Comparing the block diagrams of three generations of the Family 15h Bulldozer design-1 ... 292

2.8. Improvements made in the microarchitecture of the Steamroller compute module 292 3. 4.3 Steamroller-based Opteron server lines ... 293

3.1. Overview of AMD’s Family 15h Steamroller-based processor lines ... 294

3.2. 4.3.1 Overview of Steamroller-based server lines (based on [1]) ... 294

3.2.1. Bringing forward the introduction of the Steamroller based server line 294 3.2.2. AMD’s server roadmap from 2/2012 [27] ... 294

3.2.3. AMD’s indication of introducing the Streamroller based server line already in 2013 [50] ... 295

4. 4.4 Overview of Steamroller-based Kaveri desktop and mobile APU lines (based on [1]) 295 4.1. AMD’s Family 15h Steamroller-based mobile APU lines (based on [1]) .. 296

4.2. Positioning the Steamrolller-based Kaveri APU line as mainstream desktop line [51] ... 297

4.3. Positioning the Steamroller-based Kaveri APU as performance/mainstream mobile line [51] ... 297

4.4. Revised positioning the Steamroller-based Kaveri APU line [52] ... 298

4.5. Overview of AMD’s Family 15h Steamroller-based APU lines ... 298

4.6. Main components of Kaveri APUs ... 299

4.7. Architectural integration of the CPU and the GPU in Kaveri APU lines .... 299

4.8. Evolution of HSA in subsequent mobile APU lines [48] ... 299

4.9. GPU co-processing without pointers and data sharing – Without HSA [91] 299 4.10. GPU co-processing with pointers and data sharing – With HSA [91] ... 300

4.11. Data transfers in the memory hierarchy of the Llano APU [53] ... 301

5. References ... 302

(14)

Az ábrák listája

1.1. ... 3

1.2. ... 3

1.3. ... 4

1.4. ... 4

1.5. ... 5

1.6. ... 5

1.7. ... 5

1.8. ... 6

1.9. ... 6

1.10. ... 7

1.1. Intel’s Tick-Tock development model (Based on [1]) ... 15

1.2. Overview of Intel’s Tick-Tock model (Based on [3]) ... 15

1.3. ... 16

1.4. Intel’s plan to develop their manufacturing technology and processor lines revealed at a shareholder’s meeting back in 4/2006 [74] ... 16

1.5. Intel’s plan to develop their manufacturing technology and processor lines revealed at the IDF Spring 2007 in 4/2007 [75] ... 17

1.6. Intel’s design principles for developing microprocessors, revealed at their shareholder’s meeting in 4/2006 [74] ... 18

2.1. ... 19

2.2. Key features of the Core 2 microarchitecture [16] ... 19

2.3. Block diagram of Intel’s Core 2 microarchitecture [4] ... 20

2.4. Block diagram of Intel’s Pentium 4 microarchitecture [5] ... 20

2.5. Block diagram of AMD’s K8 microarchitecture[4] ... 21

2.6. Issue ports and execution units of the Core 2 [4] ... 22

2.7. Issue ports and execution unit of the Pentium 4 [9] ... 22

2.8. Block diagram of AMD’s K8 microarchitecture [4] ... 23

2.9. DP web server performance comparison: AMD Opteron 248 vs. Intel Xeon 2.8 [6] ... 24

2.10. ... 24

2.11. DP web server performance comparison: AMD Opteron 275/280 vs. Intel Xeon 5160 [8] .... 25

2.12. Core’s shared L2 cache vs. Pentium 4’s private L2 caches ... 25

2.13. ... 26

2.14. Hardware prefetchers within the Core 2 microarchitecture [11] ... 27

2.15. Widening the FP/SSE Execution Units from 64-bit to 128-bit [12] ... 28

2.16. Intel’s x86 ISA extensions - the SIMD register space (based on [18]) BMA ... 28

2.17. SIMD execution resources in Intel’s basic processors (based on [18]) ... 28

2.18. Overview of Intel’s x86 ISA extensions (based on [18]) ... 29

2.19. Intel’s x86 ISA extensions - the operations introduced (based on [17]) ... 29

2.20. ... 30

2.21. Achieved performance boost in Core2 for gaming vs AMD’s Athlon 64 FX60 [13] ... 30

2.22. The operation of the Ultra fine grained power control – an example [11] ... 31

2.23. Principle of the Platform Thermal Control [11] , [20] ... 31

2.24. The aSC7621 hardware monitor with fan control and PECI from Andigilog ... 32

2.25. ... 32

3.1. Dynamic and static power dissipation trends in chips [21] ... 34

3.2. Structure of a high-k + metal transistor [23] ... 34

3.3. Benefits of high-k + metal gate transistors [23], [24] ... 35

3.4. The 45 nm Penryn is a shrink of the 65 nm Core 2 with a few enhancements [25] ... 35

3.5. Key enhancements introduced into Penryn’s microarchitecture vs. the Core (based on [25]) .. 35

3.6. Intel’s x86 ISA extensions - the operations introduced (based on [17]) ... 36

3.7. Intel’s Deep Power Down technology [26] ... 36

3.8. Operation of Intel’s Deep Power Down technology [27] ... 37

3.9. Power reduction achieved by the Deep Power Down Technology [27] ... 37

3.10. Principle of the Enhanced Dynamic Acceleration Technology [27] ... 38

3.11. Performance improvements of Penryn vs. Core at the same clock frequency [26] ... 39

3.12. ... 40

(15)

Multicore processors

4.1. ... 42

4.2. Design objective of Nehalem [1] ... 42

4.3. ... 42

4.4. ... 43

4.5. Die photo of the Bloomfield/Gainestown chip ... 45

4.6. Simultaneous Multithreading (SMT) of Nehalem [1] ... 45

4.7. Performance gains achieved by Nehalem’ SMT [1] ... 46

4.8. The 3-level cache architecture of Nehalem (based on [1]) ... 47

4.9. ... 47

4.10. ... 48

4.11. Integrated memory controller of Nehalem [33] ... 48

4.12. ... 49

4.13. Non Uniform Memory Access (NUMA) in multi-socket servers [1] ... 49

4.14. Memory latency comparison: Nehalem vs. Penryn [1] ... 50

4.15. The low cost (<600 $) Timna PC [40] ... 51

4.16. Point of attaching memory ... 51

4.17. Signals of the QuickPath Interconnect bus (QPI-bus) [22] ... 52

4.18. QPI based DP and MP server system architectures [31], [33] ... 52

4.19. ... 53

4.20. Interpretation of the notion “Uncore” [1] ... 53

4.21. Use of integrated power gates [32] ... 54

4.22. Overview of the Power Control unit [32] ... 54

4.23. ... 55

4.24. Turbo mode uses the available power headroom in processor package power limits [52] ... 57

4.25. New LGA sockets ... 58

4.26. ... 58

4.27. ... 59

4.28. Main options of providing PCIe lanes on the processor for graphics cards in DT systems ... 59

4.29. ... 59

4.30. ... 60

4.31. ... 60

5.1. ... 62

5.2. ... 62

5.3. ... 63

5.4. ... 64

5.5. ... 64

5.6. ... 64

5.7. ... 65

5.8. ... 66

6.1. ... 67

6.2. ... 67

6.3. ... 68

6.4. ... 68

6.5. ... 69

6.6. Westmere-based dual-core mobile and desktop platform ... 70

6.7. ... 71

6.8. The Clarksdale processor with in-package integrated graphics along with the H57 chipset [91] 71 6.9. ... 72

6.10. ... 72

6.11. ... 73

6.12. ... 74

6.13. ... 74

6.14. ... 75

6.15. Westmere-EP 6-core DP server platform ... 75

7.1. ... 76

7.2. ... 76

7.3. ... 77

7.4. ... 78

8.1. Intel’s Tick-Tock development model (Based on [1]) ... 79

8.2. ... 79

(16)

8.3. ... 79

8.4. ... 80

8.5. ... 81

8.6. ... 82

8.7. ... 83

8.8. ... 83

8.9. ... 84

8.10. Microarchitecture of the cores of Sandy Bridge [64] ... 85

8.11. ... 85

8.12. ... 86

8.13. Evolution of graphics implementation from Westmere to Sandy Bridge [99] ... 86

8.14. ... 87

8.15. ... 87

8.16. ... 88

8.17. ... 88

8.18. ... 89

8.19. ... 89

8.20. ... 90

8.21. ... 91

8.22. ... 91

8.23. ... 92

9.1. ... 93

9.2. ... 93

9.3. ... 94

9.4. ... 94

9.5. ... 95

9.6. ... 95

9.7. The Sandy Bridge-E platform with the X79 chipset [78] ... 96

9.8. ... 97

9.9. ... 98

9.10. ... 98

9.11. ... 98

9.12. ... 99

10.1. ... 100

10.2. ... 100

10.3. ... 101

10.4. ... 101

10.5. ... 102

10.6. ... 103

10.7. The original Sandy Bridge processor [109] ... 104

10.8. ... 104

10.9. ... 105

10.10. ... 105

10.11. The Romley server platform [107] ... 106

10.12. Intel's Patsburg chipset diagram [107] ... 106

10.13. ... 107

10.14. ... 109

10.15. ... 109

11.1. ... 110

11.2. ... 110

11.3. ... 110

11.4. ... 112

11.5. ... 112

11.6. ... 112

11.7. ... 113

11.8. ... 114

11.9. ... 114

11.10. ... 115

11.11. ... 116

11.12. ... 117

(17)

Multicore processors

11.13. ... 118

11.14. ... 119

11.15. ... 120

11.16. ... 121

11.17. ... 121

11.18. ... 122

11.19. ... 123

11.20. ... 123

11.21. ... 123

11.22. ... 124

11.23. ... 125

11.24. ... 125

11.25. ... 126

11.26. ... 126

12.1. ... 128

12.2. ... 128

12.3. ... 129

12.4. ... 130

12.5. ... 130

12.6. ... 131

13.1. ... 132

13.2. ... 132

13.3. ... 133

13.4. ... 134

13.5. ... 135

13.6. ... 135

13.7. ... 136

13.8. ... 136

14.1. ... 138

15.1. Intel’s Tick-Tock development model (Based on [1]) ... 139

15.2. ... 139

15.3. ... 139

15.4. ... 140

15.5. ... 141

15.6. ... 141

15.7. ... 142

15.8. ... 142

15.9. ... 143

15.10. ... 143

15.11. ... 144

15.12. ... 144

15.13. ... 145

15.14. ... 145

15.15. ... 146

15.16. ... 147

15.17. ... 148

15.18. ... 148

15.19. ... 149

15.20. ... 150

15.21. ... 150

15.22. ... 151

15.23. ... 152

15.24. ... 152

15.25. ... 153

15.26. ... 154

15.27. ... 154

16.1. ... 155

16.2. ... 155

16.3. ... 156

16.4. ... 156

(18)

16.5. ... 157

1.1. ... 173

1.2. ... 173

1.3. ... 174

1.4. ... 174

1.5. ... 175

1.6. ... 175

1.7. ... 176

1.8. ... 177

1.9. ... 178

1.10. ... 179

1.11. ... 180

1.12. ... 180

1.13. ... 181

1.14. ... 182

1.15. ... 182

2.1. ... 183

2.2. ... 183

2.3. ... 184

2.4. ... 184

2.5. ... 185

2.6. ... 186

2.7. ... 186

2.8. ... 186

2.9. ... 187

2.10. Overview of Intel’s x86 ISA extensions (based on [44]) ... 188

2.11. ... 188

2.12. ... 189

2.13. ... 190

2.14. ... 190

2.15. ... 191

2.16. ... 191

2.17. ... 192

2.18. ... 193

2.19. ... 194

2.20. ... 196

2.21. ... 197

2.22. ... 197

2.23. ... 198

2.24. ... 198

2.25. ... 199

2.26. ... 200

2.27. ... 201

2.28. ... 201

2.29. ... 202

2.30. ... 202

2.31. ... 203

2.32. ... 204

2.33. ... 204

2.34. ... 206

2.35. ... 207

2.36. ... 207

2.37. ... 208

2.38. ... 209

2.39. ... 209

2.40. ... 210

2.41. ... 211

2.42. ... 211

2.43. ... 212

2.44. ... 213

(19)

Multicore processors

2.45. ... 213

2.46. ... 214

2.47. ... 214

2.48. ... 215

2.49. ... 215

2.50. ... 216

2.51. ... 217

2.52. ... 217

2.53. ... 218

2.54. ... 218

2.55. ... 219

2.56. ... 219

2.57. ... 220

2.58. ... 220

2.59. ... 221

2.60. ... 222

2.61. ... 222

2.62. ... 222

2.63. ... 223

2.64. ... 223

2.65. ... 224

2.66. ... 224

2.67. ... 225

2.68. ... 226

2.69. ... 227

2.70. ... 227

2.71. ... 228

2.72. ... 228

2.73. ... 229

2.74. ... 230

2.75. ... 231

2.76. ... 232

2.77. ... 232

2.78. ... 233

2.79. ... 234

2.80. ... 234

2.81. ... 235

2.82. ... 236

2.83. ... 237

2.84. ... 238

2.85. ... 239

3.1. ... 240

3.2. ... 240

3.3. ... 241

3.4. ... 242

3.5. ... 242

3.6. ... 242

3.7. ... 243

3.8. ... 244

3.9. ... 245

3.10. ... 245

3.11. Distribution of power consumption in a Bulldozer processor [60] ... 245

3.12. ... 246

3.13. Use of clock gating to switch off temporarily not used units in a grid-based clock distribution network [57] ... 247

3.14. ... 247

3.15. ... 248

3.16. ... 248

3.17. ... 249

3.18. ... 249

(20)

3.19. ... 250

3.20. ... 251

3.21. ... 251

3.22. ... 253

3.23. ... 253

3.24. ... 253

3.25. ... 254

3.26. ... 255

3.27. ... 256

3.28. ... 257

3.29. ... 258

3.30. Sub-families of the Opteron 6300 (Abu Dhabi) server line [51] ... 259

3.31. ... 259

3.32. ... 259

3.33. ... 260

3.34. ... 260

3.35. ... 261

3.36. ... 261

3.37. ... 262

3.38. ... 262

3.39. ... 263

3.40. ... 263

3.41. ... 264

3.42. ... 264

3.43. ... 265

3.44. ... 266

3.45. ... 266

3.46. Simplified layout of the digital power monitoring system of the Llano APU [75] ... 267

3.47. Simplified layout of the digital power monitoring system of the Trinity APU [76] ... 268

3.48. Example for the operation of the AMD Turbo Core Technology 3.0 [55] ... 269

3.49. ... 269

3.50. ... 270

3.51. ... 271

3.52. ... 272

3.53. ... 272

3.54. ... 273

3.55. ... 273

3.56. ... 274

3.57. ... 274

3.58. ... 275

3.59. ... 275

3.60. ... 276

3.61. ... 276

3.62. ... 277

3.63. ... 277

3.64. ... 278

3.65. ... 278

3.66. ... 279

3.67. ... 280

3.68. Additional frequency/voltage points (P points) introduced in the Richland APU [85] ... 281

3.69. ... 281

3.70. ... 282

3.71. ... 283

3.72. ... 284

3.73. ... 284

3.74. ... 285

3.75. ... 285

3.76. ... 286

3.77. ... 286

3.78. ... 287

(21)

Multicore processors

4.1. ... 288

4.2. ... 288

4.3. ... 289

4.4. ... 289

4.5. ... 290

4.6. ... 290

4.7. ... 291

4.8. ... 291

4.9. ... 292

4.10. ... 294

4.11. ... 294

4.12. ... 294

4.13. ... 295

4.14. ... 296

4.15. ... 296

4.16. ... 297

4.17. ... 298

4.18. ... 298

4.19. ... 298

4.20. ... 299

4.21. ... 299

4.22. ... 300

4.23. ... 301

(22)
(23)

Part I. rész - Introduction

(24)

Tartalom

1. Introduction ... 3 1. Foreword ... 3 2. The mobile boom and its consequences to computer architectures ... 3 3. Consequences of the low power requirement of mobile devices for Intel and AMD ... 5 4. Foreseeable market situation ... 5 5. Intel’s response to the mobile challenge ... 5 6. Evolution of Intel’s basic architectures [2] ... 5 7. AMD’s response to the mobile challenge ... 6 8. Evolution of AMD’s basic architectures ... 6 9. Overview of Intel’s and AMD’s actual processor lines ... 6 10. Scope of these slides ... 7 11. Reasons for this decision ... 7 2. References ... 8

(25)

1. fejezet - Introduction

1. Foreword

The course „Multicore processors” intends to present the basic multicore architectures of the leading processor manufacturers Intel and AMD used widely in engineering.

It focuses mainly on the microarchitecture of dominant multicore processor families emphasizing incentives and implications of major steps of the evolution.

We note that a section on Intel’s many core Xeon Phi family can be found in the Outlook chapter of the course on „GPGPUs and their programming”.

2. The mobile boom and its consequences to computer architectures

In the second half of the 2000’s mobile devices (smartphones, tablets) emerged very rapidly.

Nevertheless, for mobile devices low power operation is an ultimate paradigm.

This differs sharply from the design paradigm of conventional devices, such as desktops or servers, as depicted below.

1.1. ábra -

As far as low power CPU microarchitectures (CPU cores) concerns, low power operation raises two basic requirements:

a) Low power CPUs need to have “narrow” microarchitectures b) Low power CPUs need to have relative low basic clock frequencies as briefly discussed next.

a) Low power CPUs need to have “narrow” microarchitectures (e.g. 2-wide microarchitectures).

Example: Microarchitectures of ARM CPUs underlying tablets and smartphones [1]

1.2. ábra -

(26)

By contrast, typical microarchitectures of traditional processors have wide microarchitectures, as the next example shows.

Example: Basic layout of the microarchitecture of Intel’s Core 2 – Haswell processors underlying laptops, PCs and servers.

1.3. ábra -

b) Low power CPUs need to have relative low basic clock frequencies

1.4. ábra -

Here take into account that D = const x fc x V2, in addition higher fc requires higher V.

(27)

Introduction

3. Consequences of the low power requirement of mobile devices for Intel and AMD

1.5. ábra -

4. Foreseeable market situation

1.6. ábra -

5. Intel’s response to the mobile challenge

Introduction of the Atom line of processors in 4/2008

6. Evolution of Intel’s basic architectures [2]

1.7. ábra -

(28)

7. AMD’s response to the mobile challenge

Introduction of the Bobcat line of processors in 1/2011.

8. Evolution of AMD’s basic architectures

1.8. ábra -

9. Overview of Intel’s and AMD’s actual processor lines

1.9. ábra -

(29)

Introduction

10. Scope of these slides

From all of the above processor lines the slides presented focus on two high performance/power oriented families, as indicated below.

1.10. ábra -

11. Reasons for this decision

a) Engineering disciplines make use recently typically of laptops, desktops and servers.

These computers are usually built up on Core 2 or Bulldozer processors.

b) Intel’s Itanium processors target mission critical servers, they are not widely used and they approach their end-of-life as in future they will miss Microsoft’s OS support.

Core 2-based processor lines are presented in Part II whereas Bulldozer-based lines in Part III.

(30)

2. fejezet - References

[1.1] Goto H., ARM Cortex – A Family Architecture, 2010,

http://pc.watch.impress.co.jp/video/pcw/docs/423/409/p1.pdf

[1.2] Smith S.L., Intel Strategy & Technology Update, Barclays Capital Global Technology Conf., Dec. 2011, http://files.shareholder.com/downloads/INTC/1576180143x0x526852/c9868a3a-494e-4506-bcc6- a631aca1fd75/Steve%20Smith%20Barclays%20Dec%202011.pdf

(31)

Part II. rész - Intel’s Core 2-based

processor lines

(32)

Tartalom

1. Introduction ... 15 1. The evolution of Intel’s basic microarchitectures ... 15 2. Intel’s Tick-Tock model ... 15 3. Basic architectures and their related shrinks ... 16 2. The Core 2 line ... 19 1. 2.1 Introduction ... 19 2. 2.2 Major innovations of the Core 2 line ... 20 2.1. 2.2.1 Wide execution ... 20 2.1.1. 4-wide core ... 20 2.1.2. Enhanced execution resources ... 22 2.1.3. Performance leadership changes between Intel and AMD ... 23 2.1.4. Example 1: DP web-server performance comparison (2003) ... 24 2.1.5. Example 2: Summary assessment of extensive benchmark tests contrasting dual Opterons vs dual Xeons (2003) [7] ... 24 2.1.6. Example: DP web-server performance comparison (2006) ... 25 2.2. 2.2.2 Smart L2 cache ... 25 2.2.1. Shared L2 cache ... 25 2.2.2. Benefits of shared caches ... 26 2.2.3. Drawbacks of shared caches ... 26 2.3. 2.2.3 Smart memory accesses ... 26 2.3.1. Hardware prefetchers [9] ... 26 2.3.2. Intensive use of hardware prefetchers [11] ... 27 2.3.3. Hardware prefetchers within the Core 2 microarchitecture ... 27 2.4. 2.2.4 Enhanced digital media support ... 27 2.4.1. Widening the width of FP/SSE Execution units from 64-bit to 128-bit ... 27 2.4.2. Overview of the x86 ISA extensions in Intel’ processor lines ... 28 2.4.3. Achieved performance boost in Core 2 for gaming apps ... 30 2.5. 2.2.5 Intelligent Power management ... 31 2.5.1. Ultra fine grained power control ... 31 2.5.2. Platform Thermal Control ... 31 2.5.3. Possible solution for the Platform Thermal Control Manager [88] ... 32 3. 2.3 Overview of Core 2 based processor lines ... 32 3. The Penryn line ... 34 1. 3.1 Introduction ... 34 1.1. Penryn ... 34 2. 3.2 Key enhancements of Penryn line ... 35 2.1. 3.2.2 More advanced power management ... 36 2.1.1. Deep Power Down technology (DPD) ... 36 2.1.2. Enhanced Dynamic Acceleration Technology (EDAT) (for mobiles) ... 38 2.1.3. Overall performance achievements with Penryn (1) ... 39 3. 3.3 Overview of Penryn based processor lines ... 40 4. The Nehalem line ... 42 1. 4.1 Introduction to the 1. generation Nehalem line (Bloomfield) ... 42 1.1. Die shot of the 1. generation Nehalem desktop processor (Bloomfield) [45] ... 43 2. 4.2 Major innovations of the 1. generation Nehalem line [54] ... 44 2.1. 4.2.1 Simultaneous Multithreading (SMT) ... 45 2.1.1. Performance gains of SMT ... 46 2.2. 4.2.2 New cache architecture ... 47 2.2.1. Distinguished features of Nehalem’s cache architecture ... 47 2.3. 4.2.3 Integrated memory controller ... 48 2.3.1. Main features ... 49 2.3.2. Benefit of integrated memory controllers ... 49 2.3.3. Drawback of integrated memory controllers ... 49 2.3.4. Non Uniform Memory Access (NUMA) ... 49 2.3.5. Memory latency comparison: Nehalem vs Penryn ... 50 2.4. 4.2.4 QuickPath Interconnect bus (QPI) ... 51

(33)

Intel’s Core 2-based processor lines

2.4.1. Signals of the QuickPath Interconnect bus (QPI bus) ... 52 2.4.2. QuickPath Interconnect bus (QPI) ... 52 2.4.3. QPI based DP and MP server system architectures ... 52 2.4.4. Comparison of the transfer rates of the QPI, FSB and HT buses ... 53 2.4.5. The notion of “Uncore” ... 53 2.5. 4.2.5 Enhanced power management ... 54 2.5.1. Nehalem’s Turbo Mode ... 55 2.5.2. ACPI states [26] ... 55 2.6. 4.2.6 New socket ... 58 3. 4.3 Major innovations of the 2. generation Nehalem line (Lynnfield) (1) [46] ... 58 3.1. Major innovations of the 2. generation Nehalem line (Lynnfield) (2) [46] ... 58 3.2. Evolution of providing PCIe lanes for graphics ... 59 3.3. Evolution of the topology and type of available PCIe lanes for graphics cards ... 59 3.4. Major innovations of the 2. generation Nehalem line (Lynnfield) (3) [46] ... 60 3.5. Die photos of the 1. and 2. gen. Nehalem desktop chips ... 60 5. The Nehalem-EX line ... 62 1. 5.1 Introduction ... 62 1.1. Overview of the Nehalem-EX based processor lines (based on [44]) ... 62 2. 5.2 Major innovations of the Nehalem-EX processors ... 62 2.1. 5.2.1 Overview ... 62 2.2. 5.2.2 Native 8 cores with 24 MB L3 cache (LLC) [55] ... 62

2.2.1. Die micrograph of the 8 core Nehalem-EX (Xeon 7500/Beckton) MP server [71], [72] ... 63 2.3. 5.2.3 On-die ring interconnect bus [56] ... 63 2.4. 5.2.4 Serial memory channels [55] ... 64 2.5. 5.2.5 Scalable platform configurations [55] ... 65 3. 5.3 Performance features of the 8-core Nehalem-EX based Xeon 7500 vs the Penryn based 6-core Xeon 7400 [67] ... 66 6. The Westmere line ... 67 1. 6.1 Introduction ... 67 1.1. Westmere 2-core and 6-core die plots [57] ... 67 2. 6.2 Key enhancements of the Westmere lines vs. the Nehalem lines [44] ... 68 2.1. Overview of the Westmere lines ... 68 3. 6.3 Dual-core Westmere-based mobile/desktop lines ... 68 3.1. 6.3.1 Overview ... 68 3.2. 6.3.2 Innovations and enhancements of the dual-core mobile/desktop lines ... 69 3.2.1. 6.3.2.1 Overview ... 69 3.2.2. 6.3.2.2 In-package integrated CPU/GPU for the 2 core mobile and desktop segments ... 69 3.2.3. 6.3.2.3 Enhanced Turbo Boost technology in the mobile Arrandale line [57] 73 4. 6.4 The six core Westmere-based desktop line ... 73

4.1. Platform and main features of the six core Westmere-based desktop line [93] ... 74 5. 6.5 Six core Westmere-EP server lines ... 74 5.1. Native 6 cores with 12 MB L3 cache (LLC) for UP/DP servers [58] ... 74 5.2. Overview of the models of the Westmere-EP based Xeon 5600 family [94] ... 74 5.3. Example Westmere-EP DP server platform [57] ... 75 7. The Westmere-EX line ... 76 1. 7.1 Introduction ... 76 2. 7.2 Key enhancement of the Westmere-EX line vs. the Nehalem-EX server line [95] ... 76 3. 7.3 Selected details of the Westmere-EX processors ... 77 3.1. 7.3.1 Native 10 cores with 30 MB L3 cache (LLC) [60] ... 77 3.2. 7.3.2 Basic building blocks of the Westmere-EX processor (10 cores/30 MB L3 cache) (LLC) [60] ... 77 3.3. 7.3.3 Interconnection of the basic building blocks of the Westmere-EX processors [60]

78

8. The Sandy Bridge line ... 79 1. 8.1 Introduction ... 79 1.1. Overview of the Sandy Bridge family ... 79 1.2. Overview of the Sandy Bridge based processor lines ... 79 1.3. Main functional units of Sandy Bridge [96] ... 80

(34)

2. 8.2 Major innovations of the Sandy Bridge line vs. the 1. generation Nehalem line [61] ... 81 2.1. 8.2.1 Overview ... 81 2.2. 8.2.2 Extension of the ISA (of the cores) by the AVX instruction set ... 82 2.3. 8.2.2 Extension of the ISA (of the cores) by the AVX instruction set (Based on [18]) 82

2.3.1. The AVX extension includes [97]: ... 83 2.3.2. Implementation of AVX ... 84 2.3.3. Subsequent evolution of AVX [97] ... 84 2.4. 8.2.3 New microarchitecture of the cores ... 85 2.5. 8.2.4 On die ring interconnect bus [66] ... 85 2.5.1. Main features of the on-die interconnect bus [64] ... 86 2.6. 8.2.5 On die graphics unit [99] ... 86 2.6.1. Support of both media and graphics processing by the graphics unit [99] . 87 2.6.2. Main features of the on die graphics unit [99] ... 87 2.6.3. Specification data of the HD 2000 and HD 3000 graphics [100] ... 88 2.6.4. Performance comparison of the Sandy Bridge’s graphics: gaming [101] .. 88 2.7. 8.2.6 Enhanced Turbo Boost technology [64] ... 89

2.7.1. Intelligent power sharing between the cores and the integrated graphics [64] 90 3. 8.3 Example for a Sandy Bridge based desktop platform with the H67 chipset [102] ... 91 4. 8.4 The E3-1200 UP server line [103] ... 92 9. The Sandy Bridge-E line ... 93 1. 9.1 Introduction ... 93 1.1. Overview of the Sandy Bridge-E based processor lines ... 93 1.2. Comparison of die parameters of recent DT processors [77] ... 93 2. 9.2 Differences to the original Sandy Bridge line ... 93 2.1. 9.2.1 Overview ... 93 2.1.1. 9.2.2 6 cores, no integrated graphics ... 94 2.1.2. 9.2.3 4 parallel memory channels instead of 2 available in the Sandy Bridge lines ... 95 2.1.3. 9.2.4 40 PCIe 2. gen. lanes to connect multiple graphics cards to the processor 96

2.2. 9.2.2 LGA-2011 socket instead of the LGA-1155 used in the original Sandy Bridge line 98

2.2.1. Main features of the Sandy Bridge-E line vs the Sandy Bridge line [77] .. 98 2.2.2. Example for a Sandy Bridge-E/X79 based 4-way SLI multi graphics card configuration ... 99 10. The Sandy Bridge-EN/EP line ... 100 1. 10.1 Introduction ... 100 1.1. Overview of the Sandy Bridge-EN/EP lines ... 100 1.2. Improvements of the microarchitecture of the Sandy Bridge-EN/EP processors [107]

100

1.3. Die shot of the Xeon E5-2600 [107] ... 101 1.4. The interconnection ring connecting main units of the processor [107] ... 101 2. 10.2 Main enhancements of the Sandy Bridge-EN line over the previous Westmere-EP Xeon 5600 line [108] ... 102 3. 10.3 Main enhancements of the Sandy Bridge-EP line over the Sandy Bridge-EN line [108] 102

3.1. Feature comparison Westmere-EP 5600, Sandy Bridge-EN (E5-2400) and Sandy Bridge- EP (E5-2600) [108] ... 102 3.2. Comparison of the dual socket (DP) Sandy Bridge-EN and Sandy Bridge-EP platforms [109] ... 103 3.3. The dual socket (DP) Xeon E5-2600 (Sandy Bridge-EP) Romley platform [110] 104 3.4. The quad socket (MP) Xeon E5-2600 (Sandy Bridge-EP) Romley platform [110] 105 4. 10.4 Main features of selected E5-EP models [111] ... 105 5. 10.5 More details on the Romley server platform ... 106 5.1. The Patsburg (C600) chipset ... 106 6. 10.6 Performance comparison Sandy Bridge-EP vs. Westmere-EP X5680 [112] ... 107 6.1. Summary assessment of the performance comparison ... 108 6.2. Historical increase of the integer performance of 2 Socket (2S) configurations [113]

108

7. 10.7 Intel’s Xeon E5 family server roadmap [114] ... 109 11. The Ivy Bridge line ... 110

(35)

Intel’s Core 2-based processor lines

1. 11.1 Introduction ... 110 1.1. Overview of the Ivy Bridge family-1 ... 110 1.2. Overview of the Ivy Bridge family-2 ... 110 1.3. Contrasting the Sandy Bridge and Ivy Bridge dies [81] ... 112 1.4. Main implementation parameters of recent processors [81] ... 112 1.5. Overview of the Ivy Bridge based processor lines ... 112 2. 11.2 Major innovations of Ivy Bridge [80] ... 113 2.1. 11.2.1 Overview ... 113 2.2. 11.2.2 The 22 nm tri-gate process technology within Intel’s technology roadmap [82]

114

2.2.1. The traditional planar transistor [82] ... 114 2.2.2. The 22 nm Tri-Gate transistor-1 [82] ... 115 2.2.3. The 22 nm Tri-Gate transistor-2 [82] ... 116 2.2.4. Transistor characteristics [82] ... 117 2.2.5. Transistor gate delay [82] ... 118 2.2.6. Intel’s 22 nm manufacturing fabs [82] ... 119 2.2.7. Ivy Bridge chips on a 300 mm wafer [82] ... 120 2.3. 11.2.3 Supervisory Mode Execute Protection [83] ... 121 2.4. 11.2.4 Next generation processor graphics and media [81] ... 121 2.4.1. Overview of video interfaces of computing devices to external displays 122 3. 11.3 Main features of Ivy Bridge-based first introduced processors ... 122

3.1. 11.3.1 Main features of the first introduced Ivy Bridge-based desktop models [116] 122 3.2. 11.3.2 Main features of the first introduced Ivy Bridge-based mobile models [116] 123 4. 11.4 Ivy Bridge-based desktop platform [81] ... 123 5. 11.5 Performance assessment of the desktop models ... 124 5.1. 11.5.1 CPU performance of the highest clocked Ivy Bridge model Core i7-3770K [81]

(Higher is better) ... 124 5.2. 11.5.2 Relative GPU performance (with games DX9/DX10/DX11) of the highest performance Ivy Bridge DT model Core i7-3770K (Resolutions 1440x900, 1680x1050) [81]

(Higher is better) ... 125 5.2.1. Increasing performance of Intel’s integrated graphics [117] ... 125 6. 11.6 Main features of first introduced Ivy Bridge-based Xeon E3-12xx v2 models [118] 126 7. 11.7 The Ivy Bridge-based Xeon E3-1200 v2 platform (called the Bromolow refresh server platform) [119] ... 126 12. The Ivy Bridge-E line ... 128 1. 12.1 Introduction ... 128 1.1. Overview of the Ivy Bridge-E based processor lines ... 128 2. 12.2 Differences to the previous Sandy Bridge-E line [132] ... 128 2.1. Overview of providing PCIe lanes on Intel desktop processors ... 128 2.2. Die plot of an Ivy Bridge-E processor [133] ... 129 2.3. Main features of Ivy Bridge-E models [131] ... 129 3. 12.3 Example for an Ivy Bridge-E based desktop platform with the X79 chipset [134] ... 130 4. 12.4 Performance increase achieved by the Ivy Bridge-E line vs. the previous Sandy Bridge-E line [135] ... 130 13. The Ivy Bridge-EN/EP lines ... 132 1. 13.1 Introduction ... 132 1.1. Overview of the Ivy Bridge-EN/EP lines ... 132 1.2. Die layouts [137] ... 132 1.3. Die shot of the ten-core Ivy Bridge-EP processor [138] ... 133 2. 13.2 Main enhancements of the Ivy Bridge-EP-based Xeon E5-2600 v2 line vs. the Sandy Bridge- EP-based Xeon E5-2600 line [138] ... 134

2.1. Comparison of main features of the Ivy Bridge-EP-based Xeon E5-2600 v2 line vs. the Sandy Bridge-EP-based Xeon E5-2600 line [138] ... 134 3. 13.3 Main features of specific models of the Xeon E5-2600 v2 series [139] ... 135 4. 13.4 Main features of specific models of the Xeon E5-1600 v2 series [140] ... 135 5. 13.5 The Romley server platform [138] ... 136 5.1. Intel Xeon E5 family server roadmap [136] ... 136 14. The Ivy Bridge-EX line ... 138 1. 14.1 Introduction ... 138 1.1. Ivy Bridge-EX ... 138

Ábra

1.2. ábra - Overview of Intel’s Tick-Tock model (Based on [3])
1.5. ábra - Intel’s plan to develop their manufacturing technology and processor lines  revealed at the IDF Spring 2007 in 4/2007 [75]
1.6. ábra - Intel’s design principles for developing microprocessors, revealed at their  shareholder’s meeting in 4/2006 [74]
2.3. ábra - Block diagram of Intel’s Core 2 microarchitecture [4]
+7

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

In this study the application of the general real-time loading-state simulation method developed at the Department of Raihyay Vehicles of the Technical Cniversity

For example, the doubly nested loop structure of the insertion sort algorithm from Chapter 2 immediately yields an O(n 2 ) upper bound on the worst-case running time: the cost of

ƒ the operation of multi-hop wireless networks requires the nodes to forward data packets on behalf of other nodes. ƒ however, such cooperative behavior has no direct benefit for

bounds for polynomial time solvable problems, and for running time of

For American made turbo-alternators (where one meets very small synchronous reactances, for instance of 100%), it is not possible to introguce asynchronous running

Studying the oscillograms relative to the asynchronous operation of turbo-generators, the question arose, if it is possible to elaborate a relatively simple method of

This list contains the symhols of the operations, the starting and the terminat- ing event of an operation (in the sequence of the starting events), the time needed for

Instead of the more general case of the conductance and also the susceptance being a general quadratic expression of the slip as the relations are very complicated the