Universidade do Minho Escola de Engenharia. Fábio José Gonçalves Correia Assessing the Hardness of SVP Algorithms on Multi-core CPUs - PDF

Description
Universidade do Minho Escola de Engenharia Fábio José Gonçalves Correia Assessing the Hardness of SVP Algorithms on Multi-core CPUs Outubro de 2014 Universidade do Minho Escola de Engenharia Departamento

Please download to get full document.

View again

of 65
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Study Guides, Notes, & Quizzes

Publish on:

Views: 22 | Pages: 65

Extension: PDF | Download: 0

Share
Transcript
Universidade do Minho Escola de Engenharia Fábio José Gonçalves Correia Assessing the Hardness of SVP Algorithms on Multi-core CPUs Outubro de 2014 Universidade do Minho Escola de Engenharia Departamento de Informática Fábio José Gonçalves Correia Assessing the Hardness of SVP Algorithms on Multi-core CPUs Dissertação de Mestrado Mestrado em Engenharia Informática Trabalho realizado sob orientação de Professor Alberto José Proença Artur Miguel Matos Mariano Outubro de 2014 i (i 1) MAX_BREADTH MAX_DEPTH = n 2 (4T hreads) m R m L 1 n R m L( ) = { R m : = n n i i, Z n } T a a a R n 2 = n i=1 2 i i th λ 1 (L( )) Z n \0 R m Z n i=1 Local Memory Hierarchy Local Memory Hierarchy Core Core Core Core Core Core Core Core CPU CPU Interface Accelerator Accelerator Local Memory Hierarchy Local Memory Hierarchy CHAPTER 2. TARGET PLATFORMS Figure 2.2: Diagram of the Kepler architecture (from NVidia documentation). the way how data is organized in memory, thread scheduling,... There is also the risk of portability that when changing the system the performance might drop. To automate all this process of memory management, thread scheduling and portability, heterogeneous development frameworks have been created, e.g., StarPU (Augonnet et al., 2011) or GAMA(Barbosa, Sept. 2012). However, the main aim of heterogeneous development frameworks is to allow the simultaneous execution of a program on multi-core CPUs and GPUs. StarPU, currently one of the most used heterogeneous frameworks, is a software tool that supports C extensions and aims to allow programmers to exploit the available resources, multi-core CPUs and GPUs, relieving them from the need to adapt their code to the target platforms. This framework is responsible to schedule tasks at runtime on CPU and/or GPU implementations and automatically manage data transfers on available CPUs and GPUs. This way, the programmer does not have to worry about scheduling issues and technical details associated with these data transfers. Two of the most important data structures in StarPU are codelets and tasks. A codelet is a computational kernel that can be executed on distinct computational units, such as a multi-core CPU, a CUDA device or an OpenCL device. A task applies a codelet on a data set on the architecture where the codelet is implemented and controls how it is accessed (read and/or write). A task is an asynchronous operation and, therefore, submitting a task is also a non-blocking operation. A task can also define its priority as hint to the scheduler or a callback function, which is executed once the task is completed. Tasks can also have data dependencies between them, which forces a sequential execution. A task might be identified by a unique number, called tag. Dependencies between tasks can be expressed by 8 Z n \0 b i b i 1 /2, i {2..n} µ i,j 1/2 i j b 2,..., b n b 1 b 1 = λ 1 (L( )) n k (b 1,..., b k 1 ) k n + 1 (b 1,..., b n ) b k b k b k 1 1,..., n k k k = k µ k,i i µ k,j = µ k,j µ k,i µ i,j b k b k b k 1,..., n δ (1/4, 1) δ, 1/2 µ i,j i k = 2 k n k k 2 δ k 1 k + k 1 i=k 1 µ2 k,i i k = k 1 µ k,i = µ k,i k k k = k + 1 1,..., n 2 2 β β = 2 β = n n β 1,..., n µ i β 2,..., n δ (1/4, 1) β z = 0 j = 0 1,..., n ), δ z n 1 j = (j (n 1)) + 1 k = (j + β 1, n) h = (k + 1, n) = (µ [j,k],cj,k (1, 0,..., 0) z = 0 1,..., k i=j i i, j,..., h ), δ z = z + 1 1,..., h ), δ 1,..., n β j k = (j + β 1, n) j k [j,k] new j 1 j ( 1,..., j 1, new, j,..., h ) h = (k +1, n) [1,h] j = n 1 k = n j 1 n 1 1 j n 1 n n = 1 _ C O y i 1 / H i,i e i-1 u i i (i 1) C C (a) a 0 a a i i (i 1) 1/ i,i n i i i i = ( i,1, i,2,..., i,n ) R n i (i 1) i (i 1) i (i 1) i i i C C n = n (i 1) C n, Z n C = i = n + 1 j = n, j = 1,..., n n+1 = 0 n,j = 0, j = 1,..., n i 1 //move down i = i 1 j 1,i = j,i j j,i, j = i, i 1,..., i + 1 i = ( i,i ) i = ( i,i i )/ i,i i = ( i ) i = i i i 0 //update best vector = C = 1 i = i + 1 i C m = i i, last nonzero,,,,,, i = n //move up i = i + 1 i, last nonzero,,,,,, i C j = i, j = m, m + 1,..., i 1 j = m 1, m 2,..., 1 j i j = i i, last nonzero,,,,,, i = i + i i = i ( i ) i = ( i,i i )/ i,i i = i i i = k i,i j,i j = k 1, k 2,..., i a, b a a b b µ 1 2,..., n 2 n C 1 2 = (1, 0,..., 0) last nonzero L( ) GH(L( )) = Γ(n/2+1)1/n π (L( )) 1/n Γ(x) n, µ, b 1 2,..., b n 2 Z n C = b 1 2 i = 1 j = 0, j = 1,..., n = 1 = 1 j = j = 0, j = 2,..., n j = 0, j = 1,..., n j = 0, j = 1,..., n (0) (n+1) n j = j, j = 1,..., n last nonzero = 1 i = i+1 + ( i i ) 2 b i 2 i C i 1 //move down i = i 1 i 1 = ( i 1, i ) j,i = j+1,i + j µ j,i, j = i, i 1,..., i + 1 i = i+1,i i = ( i ) i = 1 //update best vector C = i = i = n //move up i = i + 1 i 1 = i i last nonzero last nonzero = i i = i + 1 i i i = i i i = i + i i = i + 1 Γ(x) = (x 1)! (L( )) L( ) 8 p(x) = i x i, = ( , , , , , , , , ) x n p(x 110/n) 1.05 GS(L( )) i=0 Randomization of the Basis BKZ Reduction Extreme Pruned Enumeration Call 10 99 /2 2(2 n 1) 2(2 n 1) = {0, 1/2} n \0 2 n 1 S p S p = T 1 T p, T p p E p E p = S p p = T 1 pt p. ( i,j, i,j n ), 1 i n, 1 j n/2 1/ 11 Computed Nodes Avoided Nodes last nonzero i i 0 i = 0 i+1 0 j = 0, j = i,..., n i i i i last nonzero last nonzero n, Z n C = i = n + 1 j = n, j = 1,..., n n+1 = 0 n,j = 0, j = 1,..., n last nonzero = 1 i 1 //move down i = i 1 j 1,i = j,i j j,i, j = i, i 1,..., i + 1 i = ( i,i ) i = ( i,i i )/ i,i i = ( i ) i = i i //update best vector i 0 = C = 1 i = i + 1 i C m = i i, last nonzero,,,,,, i = n //move up i = i + 1 i, last nonzero,,,,,, i C j = i, j = m, m + 1,..., i 1 j = m 1, m 2,..., 1 j i j = i i, last nonzero,,,,,, i last nonzero last nonzero = i i = i + 1 i = i + i i = i ( i ) i = ( i,i i )/ i,i i = i i Execution Time (s) BKZ + SE++ BKZ SE++ Execution Time (s) BKZ + ENUM BKZ ENUM BKZ Block Size BKZ Block Size 2 n 1 j n β +1 k = n j = n β +1 k = n j n β + 1 j = n β + 1 HF = 1 vol(l( )) = 1 det(l( )) 1/n vol(l( )) MAX_DEPTH MAX_BREADTH Sequential Task 1 Task 2 Task 3 MAX_BREADTH Task 4 Task 5 Task 6 MAX_BREADTH MAX_DEPTH = n 2 (4T hreads) MAX_BREADTH MAX_DEPTH = n 2 (4T hreads) MAX_BREADTH MAX_DEPTH MAX_DEPTH MAX_BREADTH i MAX_BREADTH i (i 1) i i MAX_BREADTH i MAX_BREADTH MAX_DEPTH = n 2 ( T hreads) MAX_BREADTH i lvl i last non zero last nonzero i i 0 last nonzero C C C 1/ 1,1 = (1, 0,..., 0) 1 C WorkingSProcess RandomizationSofStheSBasis BKZSReduction ExtremeSPrunedSEnumerationSCall Request_Work Send_Work Send_Best_Vector MasterS Process Execution Time (s) Dim Dim Dim Number of Tasks Threads 8 Threads 4 Threads MAX_BREADTH MAX_BREADTH MAX_BREADTH MAX_BREADTH MAX_BREADTH MAX_BREADTH MAX_DEPTH 1 T hreads MAX_DEPTH n 2 ( T hreads) β = 20 90 90 90 n β +1 M M n µ b 1 2,..., b n 2 C last nonzero SMXs max blocks max threads SMXs max blocks max threads M MT MT MAX_BREADTH MAX_DEPTH
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks