Medium 9781601322586

Parallel and Distributed Processing Techniques and Applications: The 2013 WorldComp International Conference Proceedings

Views: 4020
Ratings: (0)

New research by international contributors on Parallel and Distributed Processing Techniques and Applications

List price: $89.95

Your Price: $71.96

You Save: 20%

Remix
Remove
 

105 Slices

Format Buy Remix

Session - Resource Allocation, Scheduling, Energy-Aware Computing + Load-Balancing + Fault-Tolerant Systems

PDF

 

A Machine-by-Machine Analysis of a Bi-Objective Resource Allocation Problem

PDF

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

3

A Machine-by-Machine Analysis of a Bi-Objective Resource

Allocation Problem

Ryan Friese1 , Tyler Brinks1,2 , Curt Oliver1

Anthony A. Maciejewski1 , Howard Jay Siegel1,2 , and Sudeep Pasricha1,2

1 Department of Electrical and Computer Engineering

2 Department of Computer Science

Colorado State University

Fort Collins, CO, 80523

Abstract— As high performance computing systems continually become faster, the operating cost to run these systems has increased. A significant portion of the operating costs can be attributed to the amount of energy required for these systems to operate. To reduce these costs it is important for system administrators to operate these systems in an energyefficient manner. To help facilitate a transition to energyefficient computing, the trade-offs between system performance and system energy consumption must be analyzed and understood. We analyze these trade-offs through bi-objective resource allocation techniques, and in this paper we explore an analysis approach to help system administrators investigate these trade-offs. Additionally, we show how system administrators can perform “what-if” analyses to evaluate the effects of adding or removing machines from their high performance computing systems. We perform our study using three environments based on data collected from real machines and real applications. We show that by utilizing different resource allocations we are able to significantly change the performance and energy consumption of a given system, providing a system administrator with the means to examine these trade-offs to help make intelligent decisions regarding the scheduling and composition of their systems.

 

A Load Balancing Schema for Agent-based SPMD Applications

PDF

10

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

A Load Balancing Schema for Agent-based SPMD Applications

Claudio Márquez, Eduardo César, and Joan Sorribes

Computer Architecture and Operating Systems Department (CAOS),

Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain

Email: claudio.marquez@caos.uab.es, {eduardo.cesar,joan.sorribes}@uab.cat

Abstract— Agent based applications are used for large simulations of complex systems. When large number of agents and complex interaction rules are required, an HPC infrastructure can be helpful for executing such simulations in a reasonable time. However, complex interaction rules usually cause workload imbalances that negatively affect the simulation time. In this paper, we propose a load balancing schema that tries to find a reduced combination of exchanges to balance the computing time of the processes. The method adjusts the computational load within a certain range of tolerance, computing the global reconfiguration of the workload using computing time, and the number of agents.

 

Towards an Operating System Based Framework for Energy-Efficient Scheduling of Parallel Workloads

PDF

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

17

Towards an Operating System Based Framework for

Energy-Efficient Scheduling of Parallel Workloads

Shwetha Shankar1 , Dan Tamir1 , and Apan Qasem1

1 Department of Computer Science, Texas State University, San Marcos, TX, USA

Abstract— Power is a dominant obstacle for significant cost performance improvements of VLSI technology. Excessive and unbalanced power consumption affects device reliability, requires expensive packaging, and causes irreversible damage to semiconductor devices. Hence, power monitoring and thermal hot-spot elimination is a major concern of the semiconductor industry. This research addresses multicore power monitoring, management, and control via poweraware task scheduling and load balancing; supporting hotspot elimination as well as overall power balancing and reduction. The paper concentrates on the scheduling aspects of efficient power management schemes in the context of multicore systems. We utilize an in-house scheduling simulator and incorporate power and execution metrics in several classic scheduling algorithms. Results show a potential for scheduling with high level of power efficiency and tolerable degradation in execution time.

 

A Fault-Tolerant Approach to Distributed Applications

PDF

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

23

A Fault-Tolerant Approach to Distributed Applications

T. Nguyen1, J-A. Desideri2, and L. Trifan1

INRIA, Grenoble Rhône-Alpes, Montbonnot, Saint-Ismier, France

2

INRIA, Sophia-Antipolis Méditerranée, Sophia-Antipolis, France

1

Abstract - Distributed computing infrastructures support system and network fault-tolerance, e.g., grids and clouds.

They transparently repair and prevent communication and system software errors. They also allow duplication and migration of jobs and data to prevent hardware failures.

However, only limited work has been done so far on application resilience, i.e., the ability to resume normal execution after errors and abnormal executions in distributed environments. This paper addresses issues in application resilience, i.e., fault-tolerance to algorithmic errors and to resource allocation failures. It addresses solutions for error detection and management. It also overviews a platform used to deploy, execute, monitor, restart and resume distributed applications on grids and cloud infrastructures in case of unexpected behavior.

 

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed Systems with Crash Failure

PDF

30

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

Two-Phase Atomic Commitment Protocol in Asynchronous Distributed

Systems with Crash Failure

Yong-Hwan Cho, Sung-Hoon Park and Seon-Hyong Lee

School of Electrical and Computer Engineering, Chungbuk National Unvi. Cheongju

ChungBuk 361-763

E-mail: [yhcho,spark]@chungbuk.ac.kr

Abstract

This paper defines the Non-Blocking Atomic

Commitment problem in a message-passing asynchronous system and determines a failure detector to solve the problem. This failure detector, which we call the modal failure detector star, and which we denote by M*, is strictly weaker than the perfect failure detector P but strictly stronger than the eventually perfect failure detector ୚P. The paper shows that at any environment, the problem is solvable with M*.

1. Introduction

1.1 Background

We address the fault-tolerant Non-Blocking Atomic

Commitment problem, simply NB-AC, in an asynchronous distributed system where the communication between a pair of processes is by a message-passing primitive, channels are reliable and processes can fail by crashing. In distributed systems, to ensure transaction failure atomicity in a distributed system, an agreement problem must be solved among a set of participating processes. This problem, called the

 

Load Balancing in Heterogeneous Distributed Computing Systems using Approximation Algorithm

PDF

36

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

Load Balancing in Heterogeneous Distributed

Computing Systems using Approximation Algorithm

Bibhudatta Sahoo

Sanjay Kumar Jena

Sudipta Mahapatra

Department of Computer Science and

Engineering

NIT Rourkela, Odisha, India

Email: bdsahu@nitrkl.ac.in

Department of Computer Science and

Engineering

NIT Rourkela, Odisha, India

Email: skjena@nitrkl.ac.in

Department of Electronics and

Electrical Communication

IIT Kharagpur, India

Email: sudipta@ece.iitkgp.ernet.in

Abstract—Approximation algorithms have been used to design polynomial time algorithms for intractable problems that provide solutions within the bounded proximity of the optimal solution.

Load balancing problem on Heterogeneous Distributed Computing System (HDCS) deals with allocation of tasks to computing nodes, so that computing nodes are evenly loaded. Load-balancing algorithms are attempts to compute the assignment with smallest possible makespan(i.e. the completion time at the maximum loaded computing node). Load balancing problem is a NP hard problem. This paper presents an analysis of approximation algorithms based on task and machine heterogeneity through

 

Session - Parallel and Distributed Algorithms and Applications

PDF

 

Parallel Algorithms for Hybrid Multi-core CPU-GPU Implementations of Component Labelling in Critical Phase Models

PDF

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

45

Parallel Algorithms for Hybrid Multi-core CPU-GPU Implementations of

Component Labelling in Critical Phase Models

K.A. Hawick and D.P. Playne

Computer Science, Massey University, North Shore 102-904, Auckland, New Zealand k.a.hawick@massey.ac.nz, d.p.playne@massey.ac.nz

Tel: +64 9 414 0800 Fax: +64 9 441 8181

April 2013

Abstract

Optimising the use of all the cores of a hybrid multi-core

CPU and its accelerating GPUs is becoming increasingly important as such combined systems become widely available.

We show how a complex interplay of cross-calling kernels and host components can be used to support good throughput performance on hybrid simulation tasks that have inherently serial analysis calculations that must be run alongside more easily parallelisable simulation time-stepping calculations. We present results for a cluster component-labelling analysis performed during simulation of a Potts lattice simulation model.

We discuss how these hybrid techniques can be more broadly applied to this class of numerical simulation experiments in computational science.

 

Parallel Asynchronous Modelization and Execution of Cholesky Algorithm using Petri Nets

PDF

52

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

Parallel Asynchronous Modelization and Execution of Cholesky Algorithm using Petri Nets

Gustavo Wolfmann

Armando De Giusti

Laboratorio de Computaci´on

Fac. Cs. Exactas F´ısicas y Naturales

Universidad Nacional de C´ordoba

Av. V´elez S´arsfield 1611 - C´ordoba - Argentina gwolfmann@efn.uncor.edu

III LIDI

Fac. Inform´atica

Universidad Nacional de La Plata

50 y 120 - La Plata - Argentina degiusti@lidi.info.unlp.edu.ar

Abstract—PDPTA 2013 - Parallelization of algorithms with hard data dependency has a need of task synchronization.

Synchronous parallel versions are simple to model and program, but inefficient in terms of scalability and processors use rate. The same problem for Asynchronous versions with elemental static task scheduling. Efficient Asynchronous algorithms implement out-of-order execution and are complex to model and execute.

In this paper we introduce Petri Nets as a tool for simplifying the modeling and execution of parallel asynchronous versions of this kind of algorithms, while using an efficient dynamic task scheduling implementation. The Cholesky factorization algorithm was used as testbed. Simulations were carried out as a proof of concept, based on real execution times on GPGPU’s, and have shown excellent performances.

 

Parallel Implementation of GRAph Aligner (GRAAL) Algorithm for Network Alignment

PDF

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

59

Parallel Implementation of GRAph Aligner (GRAAL)

Algorithm for Network Alignment

Si Li1, Shengai Jin1, Jonathan Z. Sun1, Chaoyang Zhang1

1

School of Computing, University of Southern Mississippi, Hattiesburg, MS, 39406, USA

Abstract - Network alignment is one of the most commonly used biological network comparison methods since determining protein functions shifted the focus from targeting specific proteins based solely on sequence homology to analyses of the whole proteome based on protein-protein interactions (PPI). Aligning

PPI networks of different species is of great importance when detecting evolutionary conserved pathways, or protein complexes. However, when it comes to large biological network data, the improved serial algorithms still take a long time. In this paper, a parallel algorithm for network alignment, which is based on the serial implementation of the GRAph Aligner Algorithm

(GRAAL), is designed to improve the efficiency of network alignment. This algorithm is implemented in parallel with C++ and the Message Passing Interface

 

A Massively Parallel Line Simplification Algorithm Implemented Using Chapel

PDF

64

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

A Massively Parallel Line Simplification Algorithm

Implemented Using Chapel

Michael Scherger

Department of Computer Science

Texas Christian University

Email: m.scherger@tcu.edu

Huy Tran

Department of Computing Sciences

Texas A&M University Corpus Christi

Email: htran@islander.tamucc.edu

Abstract - Line simplification is a process of reducing the number of line segments to represent a polyline. This reduction in the number of line segments and vertices can improve the performance of spatial analysis applications. The classic DouglasPeucker algorithm developed in 1973 has a complexity of O(mn), where n denotes the number of vertices and m the number of line segments. An enhanced version of this algorithm was developed in in 1992 and has a complexity of O(n log n). In this paper, we present a parallel line simplification algorithm and discuss the implementation results using only one instruction stream of the parallel

Multiple-instruction-stream Associative Computing model (MASC). The parallel algorithm is implemented in Chapel, a parallel programming language developed by Cray Inc., has parallel complexity of O(n) on n processors.

 

A Parallel Ford-Fulkerson Algorithm For Maximum Flow Problem

PDF

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

71

A Parallel Ford-Fulkerson Algorithm For Maximum Flow Problem

Zhipeng Jiang1 , Xiaodong Hu1 , and Suixiang Gao2

1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China

2 School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China

Abstract— The maximum flow problem is one of the most fundamental problems in network flow theory and has been investigated extensively. The Ford-Fulkerson algorithm is a simple algorithm to solve the maximum flow problem based on the idea of augmenting path. But its time complexity is high and it’s a pseudo-polynomial time algorithm. In this paper, a parallel Ford-Fulkerson algorithm is given. The idea of this algorithm is not intuitive. All the arcs in the computed

flow network are processed simultaneously in the parallel steps in every iteration. We execute the algorithm in CUDA and the simulation result shows that this parallel algorithm has a good performance.

 

Model Checking Prioritized Token-Based Mutual Exclusion Algorithms

PDF

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

75

Model Checking Prioritized Token-Based Mutual Exclusion Algorithms

Mitchell L. Neilsen

Department of Computing and Information Sciences

Kansas State University

Manhattan, KS, USA

{neilsen@ksu.edu}

Abstract - Many existing token-based, distributed mutual exclusion algorithms can be generalized by a single algorithm. The number of messages required to provide mutual exclusion is dependent upon the logical topology imposed on the nodes and the policy used to forward requests.

This paper extends the generalized algorithm to support prioritized requests and presents models that can be used to analyze the performance and verify the correctness of the prioritized algorithm. Both safety and liveness properties are verified. Model checking can also be used to analyze performance. Using the best topology, the generalized algorithm attains the same worst-case performance as a centralized algorithm; that is, three messages per critical section. In the average case, the generalized algorithm performs better than a centralized one when the star topology is used.

 

A Parallel Implementation of the Modus Ponens Inference Rule in a DNA Strand-Displacement System

PDF

82

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

A Parallel Implementation of the Modus Ponens

Inference Rule in a DNA Strand-Displacement

System

Jack K. Horner

P.O. Box 266

Los Alamos NM 87544 USA

PDPTA 2013

Abstract

Computation implemented in DNA reactions promises to advance high-performance computing (HPC) for at least three reasons. It (1) is inherently Amdahl-scalable by reactor-volume, (2) has a power/operationsper-second(OPS) ratio that is potentially orders of magnitude smaller than that of silicon circuits, and (3) can provide a natural access-interface to DNA-based high-density information storage. In order to serve as general-purpose computing regime, DNA computing will have to support Boolean operations. Here, I describe an implementation of the modus ponens inference rule (commonly used in Boolean logic) in a

DNA strand-displacement (DSD) system.

Keywords: DNA computing, DNA strand displacement, modus ponens

1.0 Introduction

Computing implemented in DNA reactions promises to advance high-performance computing (HPC) for at least three reasons.

 

Hadoop-Collaborative Caching in Real Time HDFS

PDF

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

89

Hadoop-Collaborative Caching in Real Time HDFS

Meenakshi Shrivastava 1 , Dr. Hans-Peter Bischof 2

Computer Science, Rochester Institute of Technology, Rochester, NY, USA

Abstract— Massive amounts of data is being produced in everyday activities hence it is necessary to store and analyze such data. Hadoop is a popular distributed system used to store this data and MapReduce is used for performing analysis on it. Detail study and experiments led to conclusion that MapReduce job’s execution times can be lowered. A cost-effective mechanism known as collaborative caching has been proposed for efficient use of resources and system.

This mechanism helps in improving the performance, reducing access latency and increasing the throughput. A new architecture called Hadoop-Collaborative Caching is proposed in order to lower the execution times. It incorporates collaborative caching, reference caching and Modified-ARC algorithm. Each of the DataNodes have their own dedicated

 

Stochastic Assessment of Voltage Sags by Applying a Parallelized Method

PDF

96

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

Stochastic Assessment of Voltage Sags by Applying a

Parallelized Method

C. Ceja-Espinosa, A. Ramos-Paz, E. Espinosa-Juárez

Faculty of Electrical Engineering, Universidad Michoacana de San Nicolás de Hidalgo

Morelia, México

Abstract - In this paper, the parallelization of the Fault

Position Method for the stochastic assessment of voltage sags is presented. The parallelization of this stochastic method is made by using multi-thread programming in an algorithm written in C programming language, including functions defined by the POSIX threads header and library in order to perform the parallel calculations desired. All tests were performed using the GNU/Linux operating system. The proposed parallelized method is applied to two electrical systems: a small 5-bus test system and the IEEE 57-bus test system, in order to demonstrate its proper operation. A comparative analysis of the parallelized Fault Position

Method with respect to the traditional method is presented, and the reduction in computational time is shown.

 

Large scale 3D shape retrieval by exploiting multi-core and GPU

PDF

102

Int'l Conf. Par. and Dist. Proc. Tech. and Appl. | PDPTA'13 |

Large scale 3D shape retrieval by exploiting multi-core and GPU

M. BENJELLOUN1, E.W. DADI2, and E.M. DAOUDI2

University of Mons, Faculty of Engineering. Dept of Computer Science

Mons, BELGIUM

2

University of Mohammed First, Faculty of Sciences, LaRi Laboratory,

Oujda, Morocco

1

Abstract - this paper addresses the problem of 3D shape retrieval in large databases of 3D objects (large retrieval).

While this problem is emerging and interesting as the size of

3D object databases grows rapidly, the main two issues the community has to focus on are: computational efficiency of 3D object retrieval and the quality of retrieved results. In this work we deal with the first consideration, namely the computational efficiency of 3D object retrieval by exploiting new implementations based on parallel computing by exploiting multi-core and GPU architectures. Experimental results, show that the large scale retrieval can be achieved using the multi-core environment.

 

Load more


Details

Print Book
E-Books
Slices

Format name
PDF
Encrypted
No
Sku
B000000030941
Isbn
9781601322586
File size
33.5 MB
Printing
Allowed
Copying
Allowed
Read aloud
Allowed
Format name
PDF
Encrypted
No
Printing
Allowed
Copying
Allowed
Read aloud
Allowed
Sku
In metadata
Isbn
In metadata
File size
In metadata