70
Thrust Content from GTC 2012: Nathan Bell MISMO código Corre en CPU Multicore!! Delegamos la implementación de bajo nivel o “mapeo al hardware” a la librería Thrust backends

MISMO código Corre en CPU Multicore!!fisica.cab.cnea.gov.ar/gpgpu/images/2017/clase_cusp.pdf · No necesitan el cuda toolkit ni una GPU para desarrollar, ... Formatos para matrices

Embed Size (px)

Citation preview

Thrust Content from GTC 2012: Nathan Bell

MISMO códigoCorre en CPU Multicore!!

Delegamos la implementaciónde bajo nivel o “mapeo al hardware” a la librería

Thrust backends

Thrust Portability: cp miprog.cu miprog.cpp //extension para g++

● GPU (paralelo)nvcc -o miprogcpusingle miprog.cu

● CPU (paralelo, mas de un núcleo)g++ -O2 -o miprogcpumulti miprog.cpp -fopenmp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -lgomp -I<path-to-thrust-headers>

● CPU (serial, un núcleo)g++ -O2 -o miproggpu miprog.cpp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP -I<path-to-thrust-headers>

No necesitan el cuda toolkit ni una GPU para desarrollar, compilar y correr sus codigos (claro, si estos usan solo thrust). Solo copiar los headers de thrust.Pueden comparar aceleración del cálculo paralelo en distintas CPU multicore, y cambiar la variable OMP_NUM_THREADS.Cuando tengan acceso a una placa, pueden comparar la aceleración simplemente recompilando con nvcc.

OpenACC portability

How to Compile OpenACC Applications for Multicore CPUs

Passing the flag -ta=multicore on the PGI compiler (pgcc, pgc++ or pgfortran) command line tells the compiler to generate parallel multicore code for OpenACC compute regions, instead of the default of generating parallel GPU kernels. The parallel multicore code will execute in much the same fashion as if you had used OpenMP omp parallel directives instead of OpenACC compute regions.

Prueben:

● cd /pathto/clase_08/jacobi/paso1

pgcc -I../common -acc -ta=multicore -Minfo=accel -o laplace2d_acc_cpu laplace2d.cmain: 100, Loop is parallelizable Generating Multicore code 100, #pragma acc loop gang 102, Loop is parallelizable 112, Loop is parallelizable Generating Multicore code 112, #pragma acc loop gang 114, Loop is parallelizable

Paralelice para “cualquier” device con OpenACC

CUSP

Clase 12

Que es CUSP?

Que es CUSP?

● Cusp is a library for sparse linear algebra and graph computations based on Thrust. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems.

http://cusplibrary.github.io/http://cusplibrary.github.io/md_quickstart.html

Que son las matrices ralas (sparse matrix)?

Nathan Bell, NVIDIA

Donde aparecen las matrices ralas?

Nathan Bell, NVIDIA

Matriz de conectividad o adyacencia

The adjacency matrix, sometimes also called the connection matrix, of a simple labeled graph is a matrix with rows and columns labeled by graph vertices, with a 1 or 0 in position according to whether and are adjacent or not. For a simple graph with no self-loops, the adjacency matrix must have 0s on the diagonal. For an undirected graph, the adjacency matrix is symmetric.

http://mathworld.wolfram.com/AdjacencyMatrix.html

La matriz es basicamente una estructura de datos muy útil para representar y manipular grafos en programas de computadora.

Ejercicio● Mitre, Moreno, Elflein, Gallardo.● Quaglia, Villegas, Rolando, Palacios.

A

B● Salgo de (A) Elflein y Palacios ● Llego a (B) Mitre y Quaglia● ¿camino mas corto?● ¿cuantos caminos de n cuadras hay?● ¿Cuantos caminos de hasta n cuadras? ● ¡A pie no es igual que en auto!● Etc.

2 3

7

4

56

8

910 11

1314

12

15 16

1

Matriz de conectividad en auto (2016!)

2 3

7

4

56

8

910 11

1314

12

15 16

1

Matriz de conectividad en auto

No simétrica

2 3

7

4

56

8

910 11

1314

12

15 16

1

¿Matriz de conectividad a pie?

Simétrica

2 3

7

4

56

8

910 11

1314

12

15 16

1

Test de vigilia ...

Simétrica

Conectividad a pie...

?

23

7

4

56

8

910

11

1314

12

1516

1

?

2 3

7

4

56

8

910 11

1314

12

15 16

1

¿Matriz de conectividad a pie?

simétrica

Propiedades de la matriz de adyacencia● The kth power of a graph G is a graph

with the same set of vertices as G and an edge between two vertices iff there is a path of length at most k between them.

● Since a path of length two between vertices u and v exists for every vertex w such that {u,w} and {w,v} are edges in G, the square of the adjacency matrix of G counts the number of such paths.

● Similarly, the (u,v)th element of the kth power of the adjacency matrix of G gives the number of paths of length k between vertices u and v.

● The graph kth power is then defined as the graph whose adjacency matrix given by the sum of the first k powers of the adjacency matrix,

adj(G^k)=sum_(i=1)^k[adj(G)]^i,

which counts all paths of length up to k.http://mathworld.wolfram.com/GraphPower.html

2 3

7

4

56

8

910 11

1314

12

15 16

1

Caminos mas cortos, entre uno o mas puntos, o número de caminos de k pasos entre dos puntos, etc, etc.

Simétrica y rala (sparse)

Ecuación de LaplaceEcuación de difusiónEcuación del calorMembrana elásticaElectrostática...Etc.

Método de Jacobi

Método de Jacobi

En algebra lineal numéricaToda matriz es un vector

Método de Jacobi

¿Cuantos elementos distintos de cero tiene A?

Convolución o Correlación

¿Cuantos elementos distintos de cero tiene H si el filtro es local?

El filtro puede representar un operador diferencial discretizado

En general los operadores diferenciales locales dan una H rala.

Por ejemplo en 1D

Sistemas lineales ralos

Nathan Bell, NVIDIA

Que es CUSP?

● Cusp is a library for sparse linear algebra and graph computations based on Thrust. Cusp provides a flexible, high-level interface for manipulating sparse matrices and solving sparse linear systems.

Modulos

Modulos

Basic 1D containers (~thrust vector)

Modulos

Basic 2D containers

Modulos

Basic 1D and 2D containers that wrap existing data or iterators

ModulosSparse matrix containers represented in COO, CSR, DIA, ELL, HYB, and Permutation.

Formatos para matrices ralas

Nathan Bell, NVIDIA

Formatos para matrices ralas

Nathan Bell, NVIDIA

Formatos para matrices ralas

Nathan Bell, NVIDIA

Formatos para matrices ralas

Nathan Bell, NVIDIA

Formatos para matrices ralas

Nathan Bell, NVIDIA

Formatos para matrices ralas

Nathan Bell, NVIDIA

Que formato elegir?

Nathan Bell, NVIDIA

Que formato elegir?

Nathan Bell, NVIDIA

Que formato elegir?Random

Nathan Bell, NVIDIA

Que formato elegir?

Power law graph

Nathan Bell, NVIDIA

Resolucion de sistemas lineales

Lo mas importante es hacer esto eficientemente!

Nathan Bell, NVIDIA

Producto Matrix Rala Vector

http://www.nvidia.com/docs/IO/77944/sc09-spmv-throughput.pdf

ModulosSparse matrix containers represented in COO, CSR, DIA, ELL, HYB, and Permutation.

Ejemplo:nvcc densetocoo.cuqsub submit.sh

#include <cusp/io/matrix_market.h>#include <cusp/array2d.h>#include <cusp/coo_matrix.h>#include <cusp/print.h>

int main(void){ // un ejemplo simple en formato 2d standard (denso) en host cusp::array2d<float, cusp::host_memory> A(3,4); A(0,0) = 10; A(0,1) = 0; A(0,2) = 20; A(0,3) = 0; A(1,0) = 0; A(1,1) = 30; A(1,2) = 0; A(1,3) = 40; A(2,0) = 50; A(2,1) = 60; A(2,2) = 70; A(2,3) = 80;

// la imprimo en pantalla para verificar... cusp::print(A);

// guardo en disco en MatrixMarket format cusp::io::write_matrix_market_file(A, "A.mtx");

// cargo A del disco en formato coo_matrix en device // implica copia host->device cusp::coo_matrix<int, float, cusp::device_memory> B; cusp::io::read_matrix_market_file(B, "A.mtx");

// print B en formato coo, implica copia device->host cusp::print(B);

return 0;}

ModulosSparse matrix containers represented in COO, CSR, DIA, ELL, HYB, and Permutation.

cusp::convert (const SourceType &src, DestinationType &dst)Convert between matrix formats. More...

cusp::copy (const SourceType &src, DestinationType &dst)Copy one array or matrix to another. More... cusp::elementwise (const MatrixType1 &A, const MatrixType2 &B, MatrixType3 &C, BinaryFunction op)Perform transform operation on two matrices. More... void cusp::add (const MatrixType1 &A, const MatrixType2 &B, MatrixType3 &C)Compute the sum of two matrices. More... cusp::subtract (const MatrixType1 &A, const MatrixType2 &B, MatrixType3 &C) Compute the difference of two matrices. More...

void cusp::multiply (const LinearOperator &A, const MatrixOrVector1 &B, MatrixOrVector2 &C) Implements matrix-matrix and matrix-vector multiplication. More... void cusp::multiply (const LinearOperator &A, const MatrixOrVector1 &B, MatrixOrVector2 &C, UnaryFunction initialize, BinaryFunction1 combine, BinaryFunction2 reduce)Implements matrix-vector multiplication with custom combine and reduce functionality. More... void cusp::generalized_spmv (const LinearOperator &A, const Vector1 &x, const Vector2 &y, Vector3 &z, BinaryFunction1 combine, BinaryFunction2 reduce) Implements generalized matrix-vector multiplication. More... void cusp::transpose (const MatrixType1 &A, MatrixType2 &At) Transpose a matrix. More...

ETC, ETC

Algorithms for processing graphs represented in CSR and COO formats.

void cusp::graph::breadth_first_search (const MatrixType &G, const typename MatrixType::index_type src, ArrayType &labels, bool mark_levels=true)Performs a Breadth-first traversal of a graph starting from a given source vertex. More... cusp::graph::connected_components (const MatrixType &G, ArrayType &components) Computes the connected components of a graph. More... void cusp::graph::hilbert_curve (const Array2dType &coord, const size_t num_parts, ArrayType &parts)Partition a graph using Hilbert curve. More... cusp::graph::maximal_independent_set (const MatrixType &G, ArrayType &stencil, size_t k=1)Compute maximal independent set of a graph. More... cusp::graph::vertex_coloring (const MatrixType &G, ArrayType &colors) Performs a vertex coloring a graph. More...

ETC, ETC

Iterative Krylov methods for hermitian and non-hermitian linear systems.

Iterative Krylov methods for hermitian and non-hermitian linear systems.

Configurable convergence monitors for iterative solvers.

The monitor terminates iteration when the residual norm satisfies the condition ||b - A x|| <= absolute_tolerance + relative_tolerance * ||b|| or when the iteration limit is reached.

Relaxation methods

Jacobi

http://cusplibrary.github.io/md_quickstart.html

● Sparse Matrices● Format Conversions● Iterative Solvers● Preconditioners● User-Defined Linear Operators:

– Sometimes it is useful to solve a linear system A * x = b without converting the matrix A into one of Cusp's formats. For this reason Cusp supports user-defined linear operators that take in a vector x and compute the result y = A * x. These black-box operators can be used to interface matrix-free methods with Cusp's iterative solvers.

cufft

random123

cusp

Ejemplos de clase

Códigos: densetocoo.cu, sagan.cu, movie.gnu● cp -r /share/apps/codigos/alumnos_icnpg2017/clase_12 . ● cd clases_12

densetocoo.cu● nvcc densetocoo.cu ● qsub submit.sh (./a.out)● Mirar código, y output

Portabilidad de CUSP

● CUSP es portable como Thrust– CUDA backend

● nvcc -o sagan sagan.cu

– OPENMP backend (no es necesario nvcc)

● g++ -O2 -o sagan sagan.cpp -fopenmp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_OMP -lgomp -I /usr/local/cuda-7.5/include

– CPU 1 núcleo backend (no es necesario nvcc)

● g++ -O2 -o sagan sagan.cpp -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CPP -I /usr/local/cuda-7.5/include

● CUSP es una librería solo de headers: bajénse-unzipeen el ultimo zip y listo.● Una mala: la compilación es lenta... (como con thrust)

https://www.youtube.com/watch?v=VKhOhHWP6Lc

Ejemplos de clase

sagan.cu● nvcc sagan.cu● qsub submit.sh (./a.out)● Generar animación, experimentar

Membrana elástica 2d

Pesito que vamos a ir moviendo...

Ejemplos de clase

Códigos: densetocoo.cu, sagan.cu, movie.gnu● cp -r /share/apps/codigos/alumnos_icnpg2017/clase_12 . ● cd clases_12

sagan.cu● nvcc ecdif.cu ● qsub submit.sh (./a.out)● Generar animación (movie.gnu), experimentar

Ejemplos de clase

Guía

● Matriz de adyacencias en formato CSR.● Número de caminos de exactamente 5 pasos que unen un vértice con cualquier otro.● Número de componentes del grafo, y sus vértices miembros.● Colorear, como si fuera un mapa, los vértices usando la mínima cantidad de colores.

Calcular usando CUSP

Un poco de C++

Function Templates

Functors

Generic Algorithms