• Nem Talált Eredményt

Hardware acceleration of 3D TLM Method with FPGA

N/A
N/A
Protected

Academic year: 2022

Ossza meg "Hardware acceleration of 3D TLM Method with FPGA"

Copied!
26
0
0

Teljes szövegt

(1)

Hardware acceleration of 3D TLM Method with FPGA

TÁMOP-4.2.2/B-10/1/2012-0014

Phd seminar, Budapest, 9th November, 2012 László Füredi

(Supervisor: Péter Szolgay)

(2)

Overview

 Motivations, FPGAs

 Calculation of 3D TLM method

 Discretization of equations

 Implementation on FPGA

 Performance comparison

 Conclusions

 Future plans

(3)

Motivations

High-performance computing, multi-processor environment, insufficient memory bandwidth

Frequency dependent parallel transmission lines

High latency

Problem: the frequency dependent transmission line design is not solved

Need computation intensive calculation to solve

transmission line equations

(4)

Parallel transmission lines

(5)

Parallel transmission lines (many processor)

(6)

FPGA (Field Programmable Gate Arrays)

 Configurable Logic Block (CLB)

Look-up table (LUT)

Register

Logic circuit

Adder

Multiplier

Memory

Microprocessor

 Input/Output Block (IOB)

 Programmable interconnect

(7)

DSP block

DSP48E slice

25 x 18, two’s complement, multiplication

Optional adder, subtracter, and accumulator

Optional bitwise logical functionality, pipelining

Dedicated cascade connections

Integrated adder for complex-multiply or multiply-add

operation

(8)

Electromagnetic field calculation (The Maxwell’s equations)

0

x z y

x e x x

e h h

e

t y z

   

  

0

y x z

y e y y

e h h

e

t z x

   

  

0

z y x

z e z z

e h h

t x y e

   

  

0

x y z

x m x x

h e e

h

t z y

   

  

0

y z x

y m y y

h e e

h

t x z

   

  

0

z x y

z m z z

h e e

h

t y z

   

  

(9)

TLM Method

3D space-volume divided into nodes

Each node is a 12-port transmission-line junction

Scattering at the nodes models coupling between E and H fields

Transient E and H fields are calculated from combinations of voltages and currents on the transmission lines

Spectrum found by FFT

3 4 7 8

1 ( ) /

2

i i i i

E

y

VVVV DY

(10)

Calculation of 3D TLM method

 fifteen linear equations

 eight coefficients

 seven input voltages

 one current source

 SCN (symmetrical condensed node)

 HSCN (hibrid symmetrical condensed node)

 GSCN (general symmetrical condensed node)

 HSCN is ideal for parallel computing

 Numbering organized in sequence of pairs

, 1 ,

1

= |

i n r n

A d j a c e n t N o d e

V

V

(11)

Calculation of 3D TLM method II.

(12)

The scattering matrix (GSCN)

Yl x x x x y y y y z z z z

Yt z z y y z z x x y y x x

Ys y y z z x x z z x x y y x y z x y z

Rt Gs 1y 2y 3z 4z 5x 6x 7z 8z 9x 10x 11y 12y 13 14 15 16 17 18

y x 9x dzx -dzx bzx bzx azx czx g k

y x 10x -dzx dzx bzx bzx czx azx g k

x 13x b b b b h k

z x 5x dyx -dyx ayx cyx byx byx g k

z x 6x -dyx dyx cyx ayx byx byx g k

z y 1y axz cxy dxy -dxy bxy bxy g k

z y 2y cxy axy -dxy dxy bxy bxy g k

y 14y b b b b h k

x y 11y bzy bzy dzy -dzy azy czy g k

x y 12y bzy bzy -dzy dzy czy azy g k

y z 3z axz cxz bxz bxz dxz -dxz g k

y z 4z cxz axz bxz bxz -dxz dxz g k

z 15z b b b b h k

x z 7z byz byz ayz cyz dyz -dyz g k

x z 8z byz byz cyz ayz -dyz dyz g k

Capacitors Sources

(13)

Discretization of equations

 

 

0 0

2

2 2

s l t

p p

s l t

Y G Y Y

a Y G Y Y

  

        

 

0

2 2

t p q

s l t

b Y

Y G Y Y

    

 

0 0

2

p q

2

s l t

g Y

Y G Y Y

   

 

 

0 0

2 1

2

s l t

p q ij

s l t

Y G Y Y

h g

Y G Y Y

  

  

  

 

0

1

p q

2

s l t

kY G Y Y

  

1 6 0

1 7 0

1 8 0

x y z

V j Z z y

V j Z x z

V j Z x y

  

  

  

 

   

0 0

2

2 4

2 2

s l t t t

p p

t t

s l t

Y G Y Y R Y

c Y G Y Y R Y

    

    

  

 

   

 

2

p q

4

t t

dR Y

 

 

p

p

S e p

m m p

q r

G S

p q r

R p

  

   

 

 

1

2

q

p q p q

p q p q

i O O

p q p q

Y C S

Z L

Y C S

t

t q C L m

s

 

 

 

      

R

t

- magnetic losses, G

s

– electric losses, C

pq

– Capacitance of the lines, L

pq

– Inductance of the

lines. C

Op

- Capacitance of the open lines stubs

(14)

Implementation on FPGA

(15)

Implementation of the c pp equation

 

   

0 0

2

2 4

2 2

s l t t t

p p

t t

s l t

Y G Y Y R Y

c Y G Y Y R Y

    

    

  

 

   

 

(16)

Parts of the equations are reuseable

 

 

0 0

2

2 2

s l t

p p

s l t

Y G Y Y

a Y G Y Y

  

        

 

0

2 2

t p q

s l t

b Y

Y G Y Y

    

 

0 0

2

p q

2

s l t

g Y

Y G Y Y

   

 

 

0 0

2 1

2

s l t

p q ij

s l t

Y G Y Y

h g

Y G Y Y

  

  

  

 

0

1

p q

2

s l t

kY G Y Y

  

1 6 0

1 7 0

1 8 0

x y z

V j Z z y

V j Z x z

V j Z x y

  

  

  

 

   

0 0

2

2 4

2 2

s l t t t

p p

t t

s l t

Y G Y Y R Y

c Y G Y Y R Y

    

    

  

 

   

 

2

p q

4

t t

dR Y

 

 

p

p

S e p

m m p

q r

G S

p q r

R p

  

   

 

 

1

2

q

p q p q

p q p q

i O O

p q p q

Y C S

Z L

Y C S

t

t q C L m

s

 

 

 

      

R

t

- magnetic losses, G

s

– electric losses, C

pq

– Capacitance of the lines, L

pq

– Inductance of the

lines. C

Op

- Capacitance of the open lines stubs

(17)

Calculations of the equations I.

(18)

Calculations of the equations II.

(19)

The scattering matrix (HSCN)

Ymα y z x z y x y z z x x y y x x Ymβ z y z x x y x x y y z z z z y Ysγ x x y y z z z y x z y x x y z 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

1 a b d b ‐d c g

2 b a d c ‐d b g

3 d a b b c ‐d g

4 b a d ‐d c b g

5 d a b c ‐d b g

6 d b a b ‐d c g

7 ‐d c b a d b g

8 b c ‐d d a b g

9 b c ‐d a d b g

10 ‐d b c b d a g

11 ‐d c b b a d g

12 c b ‐d b d a g

13 b b b b h

14 b b b b h

15 b b b b h

(20)

Required resources

Multiplier Adder Divisor Subtracter Change sign Division with 2

LUT (D) 279 813 279 813 15 24

FF (D) 413 719 413 719 0 64

DSP (D) 11 3 11 3 0 0

LUT(S) 103 281 103 281 15 24

FF(S) 111 302 111 302 0 64

DSP(S) 3 2 3 2 0 0

Input 3 0 0 0 0 0

S matrix 12 75 21 9 21 48

Matrix multip. 48 81 0 0 0 0

Output 21 24 0 9 3 6

Sum 84 180 21 18 24 54

Calculated with Virtex-5SX240T and Virtex-6SX475T

(21)

Required resources II.

Sum (D) Multiplier Adder Divisor Subtracter Change sign Division with 2 SUM Virtex‐5 Virtex‐6 Kintex‐7 Virtex‐7 LUT 30912 195660 7728 19566 360 1296 255522 149760 297600 254200 712000

FF 35532 170280 8883 17028 0 3456 235179 149760 595200 508400 1424000

DSP 840 540 210 54 0 0 1644 1056 2016 1540 3360

BRAM 0 0 0 0 0 0 49 11664000 38304000 28620000 67680000

0 238041 781714 584082 1381224 Sum (S) Multiplier Adder Divisor Subtracter Change sign Division with 2 SUM Virtex‐5 Virtex‐6 Kintex‐7 Virtex‐7

LUT 8316 44100 2079 4410 360 1296 60561 149760 297600 254200 712000

FF 9576 60660 2394 6066 0 3456 82152 149760 595200 508400 1424000

DSP 252 360 63 36 0 0 711 1056 2016 1540 3360

BRAM 0 0 0 0 0 0 25 11664000 38304000 28620000 67680000

0 466560 1532160 1144800 2707200 Number of cached cells

Number of cached cells

(22)

Required resources (single precision)

(23)

Required resources (double precision)

(24)

Performance comparison

Implementations

Intel Q9300 Intel Xeon X5550 Nvidia GTX480 XC6VSX475T Implementation type Software (Intel IPP) Software (Intel IPP) Software (Cuda) FPGA

Technology (nm) 45 45 65 40

Clock Frequency (MHz) 2500 2666 (3060) 1400 300

Number of Processing

Elements 4 Cores 4 Cores 480 Cuda Cores 1 PE

Power Consuption(W) 95 95 450 ~ 50

Million cell iteration/s 10,5 18,5 180 472

Speedup 1 1,76 17,11 44,95

(25)

Conclusion

The solution was optimized according to the special requirements of the FPGA architectures for „one cycle”

computing.

Optimized for minimized memory transfer

45-times speedup can be achieved with respect to a

high performance Core Q9300 processor used with 4

cores

(26)

Future plans

 Implementation in multi-processor case with 2 Virtex-6SX475T

 Virtex-7 implementation (frequency, memory bandwidth)

 Implementation of frequency dependent model

 Use the FPGA implementation in real software

environment

Hivatkozások

KAPCSOLÓDÓ DOKUMENTUMOK

In Section 4, the computation method of maximum controlled invariant sets of polynomial lateral dynamical vehicle model is presented.. The practical computation of the SOS problem

In the wirebound telecommunication the transmission lines have a frequency-dependent characteristic impedance and if the reflection coefficient is prescribed both for

The solid line corresponds to the mixed µ synthesis, the dashed line to the complex µ synthesis, the dotted line to the LQG design, and the dashed-dotted line to the passive system.

This is important because the indh-idual filters are connected parallel to the bridge arms, and each filter shunts the transmission line to an extent depending

Circuit equations will be written in the following in such a way that yoltage equations should be satisfied automatically and thus only c pieces of independent

Microstrip antennas are special types of microstrip transmission line cir- cuits, which are the basis of microwave integrated circuits (HOFFMANN, R.. An

This part of the program is similar to the part used for quarter-wave transformers because in this case transmission line sections with different impedances

To obtain proper formulae for the computation of the transmISSIOn probabilities of complex systems of various geometries it is necessary to define clearly the