Designing
Parallel Algorithms
(from
Foster - Chapter 2)
Partitioning
--
focus on the largest data structure or most frequently accessed data structure
--
for example, our original decomposition of the Parallel Jacobi
Method assigned a process to each element of the grid, in the parallel
Simpson's rule for numerical
integration, and in parallel matrix
multiply we assigned one thread per element of resultant matrix
--
weather modeling might assign each particle to a thread (probably not)
--
for example, when we developed the full duplex
sliding window, we isolated the control of packets and sequence numbering
(the shared data) in the transmitter/receiver thread
--
example: layering of operating systems (file system on top of multithreading
kernel)
--
functional decomposition of climate
simulation model
--
an order of magnitude more tasks than processors (otherwise little flexibility
in later phases)
--
avoid redundant computation and storage requirements (otherwise maybe not be
scalable to larger problems)
--
tasks should be about same size (easy to allocate to processors)
--
the number of tasks must scale with problem size (problem size increases # of
threads)
--
are there other alternative partitions (for example, the grid problem lends
itself to row partitions and subpartitions)
Communication
--
for example, in row-by-row Jacobi, we send "signals" between row threads;
in the partitioned version, we send border elements between subpartitions
--
in the data communications protocols, we send packets between the
transmitter/receiver threads
--
in the shortest path algorithm, we sent trial distances between threads
--
in the adaptive quadrature problem, we sent interval objects (height, width,
area)
--
local/global - in the grid problem, we talked only to our neighbors; in adaptive
quadrature, we placed intervals in a global pool; we also implemented a multiway rendezvous to implement a global barrier (n-body
problem, divide and conquer, replicated workers)
--
structured communication (as in the grid problems)
--
unstructured communication as in the shortest path algorithm
--
static communication architecture as in the topology algorithm
--
dynamic communication architecture as in the Java webpage server using
TCP/IP
--
synchronous communications as in the alternating bit protocol and the
Jacobi Relaxation algorithm
--
asynchronous as in the resource allocation thread and the file
client/server interaction
--
balance of computation within each task (scalability)
** grid
point, row, or partition
-- keep neighborhood of
communication small
** only
communicate with neighbors in grid
-- concurrency
of communications
** grid
neighbors send and then receive (all parallel)
-- concurrency
of computation
** all grid
computations are parallel within step
Agglomeration
--
grid points to rows to partitions
--
matrix elements to rows to partitions
--
in grid problem, we replicated borders of partitions
--
communication costs include both setup and
transmission
--
task creation costs
--
surface-to-volume effects (grid partitions vs rows
for various stencils)
-- have you increased locality and reduced communication
costs?
--
replicating data can sometimes interfere with
scalability
--
are tasks about the same granularity?
--
does number of tasks scale with problem size?
--
did agglomeration reduce concurrency?
--
do you need to parallelize sequential code to get more concurrency?
Mapping
-
- example: vector matrix multiplication
-
- intended to agglomerate one partition for each
processor
-
- Local Algorithms (process migration)
-
- Probabilistic Methods - allocate tasks randomly
-
- Cyclic Mappings - each processor allocated every Pth
task
-
- centralized or distributed task pool
**Manager/Worker - replicated
workers under control of a manager or monitor
**Decentralized
Schemes - task pool is a distributed data structure (shortest path algorithm or
adaptive quadrature in distributed pool)
- - Termination Detection is
necessary to determine when work pools are completed
-
- make sure the manager is not a bottleneck
-
- have you considered the implementation costs in dynamic load-balancing
-
- have you considered the effectiveness of dynamic load-balancing