Hardware acceleration of 3D TLM Method with FPGA

(1)

Hardware acceleration of 3D TLM Method with FPGA

TÁMOP-4.2.2/B-10/1/2012-0014

Phd seminar, Budapest, 9th November, 2012 László Füredi

(Supervisor: Péter Szolgay)

(2)

Overview

 Motivations, FPGAs

 Calculation of 3D TLM method

 Discretization of equations

 Implementation on FPGA

 Performance comparison

 Conclusions

 Future plans

(3)

Motivations



High-performance computing, multi-processor environment, insufficient memory bandwidth



Frequency dependent parallel transmission lines



High latency



Problem: the frequency dependent transmission line design is not solved



Need computation intensive calculation to solve

transmission line equations

(4)

Parallel transmission lines

(5)

Parallel transmission lines (many processor)

(6)

FPGA (Field Programmable Gate Arrays)

 Configurable Logic Block (CLB)



Look-up table (LUT)



Register



Logic circuit



Adder



Multiplier



Memory



Microprocessor

 Input/Output Block (IOB)

 Programmable interconnect

(7)

DSP block



DSP48E slice



25 x 18, two’s complement, multiplication



Optional adder, subtracter, and accumulator



Optional bitwise logical functionality, pipelining



Dedicated cascade connections



Integrated adder for complex-multiply or multiply-add

operation

(8)

Electromagnetic field calculation (The Maxwell’s equations)

0

x z y

x e x x

e h h

e

t y z

  ^  ^  ^  

  

0

y x z

y e y y

e h h

e

t z x

  ^  ^  ^  

  

0

z y x

z e z z

e h h

t x y e

  ^  ^  ^  

  

0

x y z

x m x x

h e e

h

t z y

  ^  ^  ^  

  

0

y z x

y m y y

h e e

h

t x z

  ^  ^  ^  

  

0

z x y

z m z z

h e e

h

t y z

  ^  ^  ^  

  

(9)

TLM Method



3D space-volume divided into nodes



Each node is a 12-port transmission-line junction



Scattering at the nodes models coupling between E and H fields



Transient E and H fields are calculated from combinations of voltages and currents on the transmission lines



Spectrum found by FFT

3 4 7 8

1 ( ) /

2

i i i i

E

y

 V  V  V  V DY

(10)

Calculation of 3D TLM method

 fifteen linear equations

 eight coefficients

 seven input voltages

 one current source

 SCN (symmetrical condensed node)

 HSCN (hibrid symmetrical condensed node)

 GSCN (general symmetrical condensed node)

 HSCN is ideal for parallel computing

 Numbering organized in sequence of pairs

, 1 ,

1

= |

i n r n

A d j a c e n t N o d e

V

_ _ ^

V

_

(11)

Calculation of 3D TLM method II.

(12)

The scattering matrix (GSCN)

Y_l x x x x y y y y z z z z

Y_t z z y y z z x x y y x x

Y_s y y z z x x z z x x y y x y z x y z

R_t G_s 1y 2y 3z 4z 5x 6x 7z 8z 9x 10x 11y 12y 13 14 15 16 17 18

y x 9x dzx -dzx bzx bzx azx czx g k

y x 10x -dzx dzx bzx bzx czx azx g k

x 13x b b b b h k

z x 5x dyx -dyx ayx cyx byx byx g k

z x 6x -dyx dyx cyx ayx byx byx g k

z y 1y axz cxy dxy -dxy bxy bxy g k

z y 2y cxy axy -dxy dxy bxy bxy g k

y 14y b b b b h k

x y 11y bzy bzy dzy -dzy azy czy g k

x y 12y bzy bzy -dzy dzy czy azy g k

y z 3z axz cxz bxz bxz dxz -dxz g k

y z 4z cxz axz bxz bxz -dxz dxz g k

z 15z b b b b h k

x z 7z byz byz ayz cyz dyz -dyz g k

x z 8z byz byz cyz ayz -dyz dyz g k

Capacitors Sources

(13)

Discretization of equations

 

0 0

2

2 2

s l t

p p

s l t

Y G Y Y

a Y G Y Y

  

        

 

0

2 2

t p q

s l t

b Y

Y G Y Y

    

 

0 0

2

p q

2

s l t

g Y

Y G Y Y

   

 

0 0

2 1

2

s l t

p q ij

s l t

Y G Y Y

h g

Y G Y Y

  

  

  

 

0

1

p q

2

s l t

k  Y G Y Y

  

1 6 0

1 7 0

1 8 0

x y z

V j Z z y

V j Z x z

V j Z x y

  

 

   

0 0

2

2 4

2 2

s l t t t

p p

t t

s l t

Y G Y Y R Y

c Y G Y Y R Y

    

    



  

 

   

 

2

p q

4

t t

d  R Y



 

p

S e p

m m p

q r

G S

p q r

R p



  



   



 

1

2

q

p q p q

i O O

p q p q

Y C S

Z L

Y C S

t

t q C L m

s

 

 

 

      

R

_t

- magnetic losses, G

_s

– electric losses, C

_pq

– Capacitance of the lines, L

_pq

– Inductance of the

lines. C

_Op

- Capacitance of the open lines stubs

(14)

Implementation on FPGA

(15)

Implementation of the c _pp equation

 

   

0 0

2 2 4

2 2

s l t t t

p p

t t

s l t

Y G Y Y R Y

c Y G Y Y R Y

    

    



  

 

   

 

(16)

Parts of the equations are reuseable

 

0 0

2

2 2

s l t

p p

s l t

Y G Y Y

a Y G Y Y

  

        

 

0

2 2

t p q

s l t

b Y

Y G Y Y

    

 

0 0

2

p q

2

s l t

g Y

Y G Y Y

   

 

0 0

2 1

2

s l t

p q ij

s l t

Y G Y Y

h g

Y G Y Y

  

  

  

 

0

1

p q

2

s l t

k  Y G Y Y

  

1 6 0

1 7 0

1 8 0

x y z

V j Z z y

V j Z x z

V j Z x y

  

 

   

0 0

2

2 4

2 2

s l t t t

p p

t t

s l t

Y G Y Y R Y

c Y G Y Y R Y

    

    



  

 

   

 

2

p q

4

t t

d  R Y



 

p

S e p

m m p

q r

G S

p q r

R p



  



   



 

1

2

q

p q p q

i O O

p q p q

Y C S

Z L

Y C S

t

t q C L m

s

 

 

 

      

R

_t

- magnetic losses, G

_s

– electric losses, C

_pq

– Capacitance of the lines, L

_pq

– Inductance of the

lines. C

_Op

- Capacitance of the open lines stubs

(17)

Calculations of the equations I.

(18)

Calculations of the equations II.

(19)

The scattering matrix (HSCN)

Ymα y z x z y x y z z x x y y x x Ymβ z y z x x y x x y y z z z z y Ysγ x x y y z z z y x z y x x y z 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 a b d b ‐d c g

2 b a d c ‐d b g

3 d a b b c ‐d g

4 b a d ‐d c b g

5 d a b c ‐d b g

6 d b a b ‐d c g

7 ‐d c b a d b g

8 b c ‐d d a b g

9 b c ‐d a d b g

10 ‐d b c b d a g

11 ‐d c b b a d g

12 c b ‐d b d a g

13 b b b b h

14 b b b b h

15 b b b b h

(20)

Required resources

Multiplier Adder Divisor Subtracter Change sign Division with 2

LUT (D) 279 813 279 813 15 24

FF (D) 413 719 413 719 0 64

DSP (D) 11 3 11 3 0 0

LUT(S) 103 281 103 281 15 24

FF(S) 111 302 111 302 0 64

DSP(S) 3 2 3 2 0 0

Input 3 0 0 0 0 0

S matrix 12 75 21 9 21 48

Matrix multip. 48 81 0 0 0 0

Output 21 24 0 9 3 6

Sum 84 180 21 18 24 54

Calculated with Virtex-5SX240T and Virtex-6SX475T

(21)

Required resources II.

Sum (D) Multiplier Adder Divisor Subtracter Change sign Division with 2 SUM Virtex‐5 Virtex‐6 Kintex‐7 Virtex‐7 LUT 30912 195660 7728 19566 360 1296 255522 149760 297600 254200 712000

FF 35532 170280 8883 17028 0 3456 235179 149760 595200 508400 1424000

DSP 840 540 210 54 0 0 1644 1056 2016 1540 3360

BRAM 0 0 0 0 0 0 49 11664000 38304000 28620000 67680000

0 238041 781714 584082 1381224 Sum (S) Multiplier Adder Divisor Subtracter Change sign Division with 2 SUM Virtex‐5 Virtex‐6 Kintex‐7 Virtex‐7

LUT 8316 44100 2079 4410 360 1296 60561 149760 297600 254200 712000

FF 9576 60660 2394 6066 0 3456 82152 149760 595200 508400 1424000

DSP 252 360 63 36 0 0 711 1056 2016 1540 3360

BRAM 0 0 0 0 0 0 25 11664000 38304000 28620000 67680000

0 466560 1532160 1144800 2707200 Number of cached cells

Number of cached cells

(22)

Required resources (single precision)

(23)

Required resources (double precision)

(24)

Performance comparison

Implementations

Intel Q9300 Intel Xeon X5550 Nvidia GTX480 XC6VSX475T Implementation type Software (Intel IPP) Software (Intel IPP) Software (Cuda) FPGA

Technology (nm) 45 45 65 40

Clock Frequency (MHz) 2500 2666 (3060) 1400 300

Number of Processing

Elements 4 Cores 4 Cores 480 Cuda Cores 1 PE

Power Consuption(W) 95 95 450 ~ 50

Million cell iteration/s 10,5 18,5 180 472

Speedup 1 1,76 17,11 44,95

(25)

Conclusion



The solution was optimized according to the special requirements of the FPGA architectures for „one cycle”

Hardware acceleration of 3D TLM Method with FPGA