2nd International Workshop on Middleware for Grid Computing

Co-located with Middleware 2004, Toronto, Ontario - Canada
Monday, October 18th 2004

Time Title ID Slides Text
8:30AM  Management and Scheduling for Data Applications - Chair: Bruno Schulze
 
L Chen, G Agrawal
MGC01    
 
Z Cai, G Eisenhauer, Q He, V Kumar, K Schwan, M Wolf
MGC02    
 
S Venugopal, R Buyya, L Winton
MGC13    
10:00AM  Coffee-Break
10:30AM  Management and Scheduling - Chair: Emmanuel Cecchet
 
V Talwar, B Agarwalla, S Basu, R Kumar, K Nahrstedt
MGC04    
 
A Ciuffoletti, A Andreozzi, C Vistoli, A Ghiselli
MGC08    
 
T Araki
MGC14    
 
B Vianna, A Fonseca, N Moura, L Menezes, H Mendes, J da Silva, C Boeres, V Rebello
MGC07    
12:30PM  Lunch
1:30PM  Security and Data Discovey - Chair: Fabio Kon
 
E Dodonov, J Quaini Sousa, H Guardia
MGC03    
 
A Detsch, L Gaspary, M Barcellos, G Cavalheiro
MGC09    
 
Y Zhao, M Wilde, I Foster
MGC10    
3:00PM  Coffee-Break and Posters
3:30PM  Data Grid Middleware and Services - Chair: Radha Nandkumar
 
T Kosar, G Kola, M Livny
MGC11    
 
G Aloisio, M Cafaro, S Fiore, M Mirto
MGC12    
 
V Fontes, M Dutra, F Fabio Porto, B Schulze, A Barbosa
MGC05    
5:00PM  Break 
5:15PM  Tools, Programming and Environments - Chair: Cristiana Amza
 
R Camargo, A Goldchleger, F Kon, A Goldman
MGC06    
 
F Cicerre, E Madeira, L E Buzato
MGC15    
6:15PM  Summarisation Session (to be defined)
6:30PM  End of Workshop
   Posters
 
G Coulson, W Cai, P Grace, G Blair, L Mathy, W K Yeung
   
 
A Filho, L Morais, R Real, L da Silva, A Yamin, I Augustin, P Vargas, Claudio Geyer
   
 
A Ziviani
   
 
V Yarmolenko, P Cockshott, E Borland, P Graham, L Mackenzie
   
 
L Nassif, J M Nogueira, M Ahmed, R Impey, A Karmouch
   
 
E Cecchet, V Quema, B Boutaleb
   
 
D Biswas, N Pal
   
 
E Araújo, W Cirne, G Mendes
   
 
M Assunção, F Koch, C B Westphall
 

MGC01 - Resource Allocation in a Middleware for Streaming Data
Liang Chen, Gagan Agrawal
  Abstract: Increasingly, a number of applications rely on, or can potentially benefit from, analysis and monitoring of data streams. To support processing of streaming data in a grid environment, we have been developing a middleware system called GATES (Grid-based AdapTive Execution on Streams). Our target applications are those involving high volume data streams and requiring distributed processing of data arising from a distributed set of sources. This paper addresses the problem of resource allocation in the GATES system. Though resource discovery and resource allocation have been active topics in grid community, the pipelined processing and real-time constraint required by distributed streaming applications pose new challenges. We present a resource allocation algorithm that is based on minimal spanning trees. We evaluate the algorithm experimentally and demonstrate that it results in configurations that are very close to optimal, and significantly better than most other possible configurations.
MGC02 - IQServices: NetworkAware Middleware for Interactive LargeData Applications
Zhongtang Cai, Greg Eisenhauer, Qi He, Vibhore Kumar, Karsten Schwan, Matthew Wolf
  Abstract: IQ-Services are application-specific, resource-aware code modules executed by data transport middleware. They constitute a 'thin' layer between application components and the underlying computational and communication resources that implements the data manipulations necessary to permit wide-area collaborations to proceed smoothly, despite dynamic resource variations. IQ-Services interact with the application and resource layers via dynamic performance attributes, and end-to-end implementations of such attributes also permit clients to interact with data providers. The joint middleware/resource and provider/consumer interactions implemented with performance attributes may be used to realize effective methods for managing the data flows in the large-data, distributed Grid applications targeted by our research. Experimental results in this paper demonstrate substantial performance improvements attained by coordinating network-level with service-level adaptations of the data being transported and by permitting end users to dynamically deploy and use application-specific services for manipulating data in ways suitable for their current needs.
MGC03 - GridBox: Securing Hosts from Malicious and Greedy Applications
Evgueni Dodonov, Joelle Quaini Sousa, Hélio Crestana Guardia
  Abstract: Security is an important concern in providing the infrastructure for the implementation of general purpose computational grids. However, most grid implementations focus their security concerns in correctly authenticating users and hosts and in the communications among them. In most cases, application security is left to the underlying operating system. This can be a problem when a "malicious" application is executed. In this work, we introduce the GridBox architecture, that aims to provide additional security for GRID applications, using Access Control Lists and sandbox functionality for GRID tasks.
MGC04 - Architecture for Resource Allocation Services supporting Interactive Remote Desktop Sessions in Utility Grids
Vanish Talwar, Bikash Agarwalla, Georgia Tech, Sujoy Basu, Raj Kumar, Klara Nahrstedt
  Abstract: Emerging large scale utility computing systems like Grids promise computing and storage to be provided to end users as a utility. System management services deployed in the middleware are a key to enabling this vision. Utility Grids provide a challenge in terms of scale, dynamism, and heterogeneity of resources and workloads. In this paper, we present a model based architecture for resource allocation services for Utility Grids. The proposed service is built in the context of interactive remote desktop session workloads and takes application performance QoS models into consideration. The key design guidelines are hierarchical request structure, application performance models, remote desktop session performance models, site admission control, multi-variable resource assignment system, and runtime session admission control. We have also built a simulation toolkit that can handle mixed batch and remote desktop session requests, and have implemented our proposed resource allocation service into the toolkit. We present some results from experiments done using the toolkit. Our proposed architecture for resource allocation services addresses the needs of emerging utility computing systems and captures the key concepts and guidelines for building such services in these environments.
MGC05 - CoDIMS-G: a Data and Program Integration Service for the Grid
V Fontes, B Schulze, M Dutra, F Porto
  Abstract: Grid services provide an important abstract layer on top of heterogeneous components (hardware and software) that take part into a Grid environment. In this scenario, applications, like scientific visualization, require access to data of nonconventional data types, like fluid path geometry, and the evaluation of special user programs on these data. In order to support such applications we are developing CoDIMS-G, which is a data and program integration service for the Grid. CoDIMSG provides users transparent access to data and programs distributed on the Grid, as well as dynamic resource allocation and management. We conceived a new node scheduling algorithm and designed an adaptive distributed query engine for the grid environment.
MGC06 - Checkpointing based Rollback Recovery for Parallel Applications on the InteGrade Grid Middleware
Raphael Y. de Camargo, Andrei Goldchleger, Fabio Kon, and Alfredo Goldman
  Abstract: InteGrade is a grid middleware infrastructure that enables the use of idle computing power from user workstations. One of its goals is to support the execution of long-running parallel applications that present a considerable amount of communication among application nodes. However, in an environment composed of shared user workstations spread across many different LANs, machines may fail, become unaccessible, or may switch from idle to busy very rapidly, compromising the execution of the parallel application in some of its nodes. Thus, to provide some mechanism for fault-tolerance becomes a major requirement for such a system. In this paper, we describe the support for checkpoint based rollback recovery of parallel BSP applications running over the InteGrade middleware. This mechanism consists of periodically saving application state to permit to restart its execution from an intermediate execution point in case of failure. A precompiler automatically instruments the source-code of a C/C++ application, adding code for saving and recovering application state. A failure detector monitors the application execution. In case of failure, the application is restarted from the last saved global checkpoint.
MGC07 - A Tool for the Design and Evaluation of Hybrid Scheduling Algorithms for Computational Grids
B.A. Vianna, A.A. Fonseca, N.T. Moura, L.T. Menezes, H.A. Mendes, J.A. Silva, C. Boeres and V.E.F. Rebello
  Abstract: One of the objectives of computational grids is to offer applications the collective computational power of distributed but typically shared heterogeneous resources. Unfortunately, efficiently harnessing the performance potential of such systems (i.e. how and where applications should execute on the grid) is a challenging endeavor due principally to the distributed, shared and heterogeneous nature of the resources involved. This paper presents a tool to aid the design and evaluation of scheduling policies suitable for efficient execution of parallel applications on computational grids.
MGC08 - Monitoring the Connectivity of a Grid
Sergio Andreozzi, Augusto Ciuffoletti, Antonia Ghisellilo
  Abstract: Grid computing is a new paradigm that enables the distributed coordination of resources and services which are geographically dispersed, span multiple trust domains and are heterogeneous. Network infrastructure monitoring, while vital for activities such as service selection, exhibits inherent scalability problems: in principle, in a Grid composed of n resources, we need to keep record of n2 end-to-end paths. We introduce an approach to network monitoring that takes into account scalability: a Grid is partitioned into domains, and network monitoring is limited to the measurement of domain-to-domain connectivity. However, partitions must be consistent with network performance, since we expect that an observed network performance between domains is representative of the performance between the Grid Services included into domains.
MGC09 - Towards a Flexible Security Framework for Peer-to-Peer based Grid Computing
A Detsch, L P Gaspary, M P Barcellos, G G H Cavalheiro
  Abstract: The dynamic, multi-organization nature of large-scale grid computing introduces security issues that must be addressed before grid systems can become widely popular. This paper proposes P2PSLF (Peer-to-Peer Security Layer Framework), a flexible security framework for peer-to-peer based grid computing. P2PSLF provides a wide range of security mechanisms (e.g., authentication, confidentiality, integrity, authorization, and audit), and allows the creation of new ones. It is independent of the overlying application, which enables new systems to be implemented without having to deal with security issues within the application. In addition, the framework is modular and reconfigurable. The set of security requirements to be satisfied in communications is determined per peer, and can be changed without recompiling the application. The framework is exercised using OurGrid, a P2P-based middleware that enables the creation of a multi-organization grid computing environment for the execution of bag-of-tasks applications.
MGC10 - Grid Middleware Services for Virtual Data Discovery, Composition, and Integration
Y Zhao, M Wilde, Ian Foster, J Voeckler, T Jordan, E Quigg, J Dobson
  Abstract: We describe the services, architecture and application of the GriPhyN Virtual Data System, a suite of components and services that allow users to describe virtual data products in declarative terms, discover definitions and assemble workflows based on those definitions, and execute the resulting workflows on Grid resources. We show how these middleware-level services have been applied by specific communities to manage scientific data and workflows. In particular, we highlight and introduce Chiron, a portal facility that enables the interactive use of the virtual data system. Chiron has been used within the QuarkNet education project and as an online "educator" for virtual data applications. We also present applications from functional MRI-based neuroscience research.
MGC11 - Data Pipelines: Enabling Large Scale MultiProtocol Data Transfers
Tevfik Kosar, George Kola and Miron Livny
  Abstract: Collaborating users need to move terabytes of data among their sites, often involving multiple protocols. This process is very fragile and involves considerable human involvement to deal with failures. In this work, we propose data pipelines, an automated system for transferring data among collaborating sites. It speaks multiple protocols, has sophisticated flow control and recovers automatically from network, storage system, software and hardware failures. We successfully used data pipelines to transfer three terabytes of DPOSS data from SRB mass storage server at San Diego Supercomputing Center to UniTree mass storage at NCSA. The whole process did not require any human intervention and the data pipeline recovered automatically from various network, storage system, software and hardware failures.
MGC12 - Advanced Delivery Mechanisms in the GrelC Project
Giovanni Aloisio, Massimo Cafaro, Sandro Fiore
  Abstract: Today many Data Grid applications need to manage and process a very large amount of data distributed across multiple grid nodes. Several applications often access large databases (i.e. protein data banks, in the bioinformatics eld) without any data access services taking into account characteristics of either applications or data types. Such applications could improve their performance and quality of results by using ecient, cross-DBMS, specialized and ad hoc implemented data access services. The Grid Relational Catalog Project (GRelC) developed at the CACT/ISUFI Laboratory of the University of Lecce provides a grid-enabled access service for relational and not relational repositories. In this paper we propose some advanced delivery mechanisms developed within the GRelC project, showing up experimental results related to an European testbed.
MGC13 - A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids
Srikumar Venugopal, Rajkumar Buyya, Lyle Winton
  Abstract: Large communities of researchers distributed around the world are engaged in analyzing huge collections of data generated by scientific instruments and replicated on distributed resources. In such an environment, scientists need to have the ability to carry out their studies by transparently accessing distributed data and computational resources. In this paper, we propose and develop a Grid broker that mediates access to distributed resources by (a) discovering suitable data sources for a given analysis scenario, (b) suitable computational resources, (c) optimally mapping analysis jobs to resources, (d) deploying and monitoring job execution on selected resources, (e) accessing data from local or remote data source during job execution and (f) collating and presenting results. The broker supports a declarative and dynamic parametric programming model for creating grid applications. We have used this model in grid-enabling a high energy physics analysis application (Belle Analysis Software Framework) on a grid testbed having resources distributed across Australia.
MGC14 - Autonomic WWW Server Management with Distributed Resources
Takuya Araki
  Abstract: If many people access a Web server at one time, the server might not be able to respond within an acceptable time or even provide the service. Therefore, enough servers should be assigned to a service to guarantee quality of service. But reserving a lot of resources for peak access is not cost e.ective, because these resources are idle most of the time. In order to solve this problem, technologies called utility computing or autonomic computing have been proposed and are under development. However, these technologies utilize resources only within one organization. In this paper, we present an autonomic system architecture that uses distributed resources leveraged by Grid technology. In our architecture, computing resources are rented from di.erent organizations. Our architecture supports J2EE systems; hence, existing Web applications can be used without any modi.cation. In addition, our architecture considers the location of the resources when redirecting a request to a server and allocating a new server, thereby leading to better performance. We adopted WS-Agreement as an interface for negotiating service level agreements. We have implemented and evaluated this system and confirmed the e.ectiveness of this architecture.
MGC15 - A Hierarchical Process Execution Support for Grid Computing
Fábio R. L. Cicerre, Edmundo R. M. Madeira, Luiz E. Buzato
  Abstract: Grid is an emerging infrastructure used to share resources among virtual organizations in a seamless manner and to provide breakthrough computing power at low cost. Nowadays there are dozens of academic and commercial products that allow execution of isolated tasks on grids, but few products support the enactment of long-running processes in a distributed fashion. In order to address such subject, this paper presents a programming model and an infrastructure that hierarchically schedules process activities using available nodes in a wide grid environment. Their advantages are automatic and structured distribution of activities and easy process monitoring and steering.
The Gridkit Resource Management
Geoff Coulson, Wei Cai, Paul Grace, Gordon Blair, Laurent Mathy, Wai Kit Yeung
  Abstract: Traditional resource discovery and management systems in Grid Computing tend to be coarse-grained, have fixed static policies and deal exclusively with concrete resource entities e.g. CPUs, memory bytes. In this paper, we present the resource discovery and resource management frameworks that forms part of our Gridkit middleware. These frameworks are underpinned by an overlay network-based communications infrastructure which allows sophisticated and dynamically changeable resource discovery. In addition, we describe how our resource frameworks manage both coarse-grained and finegrained resources, and support abstract and pluggable task description to better support end-to-end quality of service
A Practical Grid Experiment Using the ISAM Architecture for Genetic Sequence Alignment
Alberto Filho, Lincoln Morais, Rodrigo Real, Luciano da Silva, Adenauer Yamin, Iara Augustin
  Abstract: In this paper we present a practical experience using an application to solve the genetic sequence alignment problem implemented in the grid environment provided by ISAM. The ISAM architecture aims to provide an integrated solution, from development to execution, for general-purpose pervasive applications, combining techniques proceeding from context-aware, mobile and grid computing. Here, we present an overview of the genetic sequence alignment problem and the developed solution to solve it, as well as the obtained results in a multi-institutional execution of the application.
Measurement Middleware Service for Grid Computing
Artur Ziviani and Bruno Schulze
  Abstract: Grid computing and Internet measurements are two areas that have taken off in recent years, both receiving a lot of attention from the research community. In this position paper, we argue that these two promising research areas have a strong synergy that bring mutual benefits. Based on such considerations, we propose a measurement middleware service for grid computing. By defining the architecture and the methods of this service, we show that a promising symbiosis may be envisaged by the use of the proposed measurement middleware service for grid computing.
JPie Interface: A Java Implementation Of The Pi-Calculus for Grid Computing
Viktor Yarmolenko, Paul Cockshott, Ewan Borland, Paul Graham, Lewis Mackenzie
  Abstract: This paper describes a new Java interface loosely modeled on the primitives of the pi-calculus to be used as a substratum for Grid based parallel computing. It allows the creation of processes and communications channels between processes. It also allows for the communications network between processes to be dynamically reconfigurable. The aim of the design is to achieve this with the minimum number of primitives and to integrate these primitives into the existing Java class framework.
Agent-based Negotiation for Resource Allocation in Grid
Lilian Nassif, José Marcos Nogueira, Mohamed Ahmed, Roger Impey, Ahmed Karmouch
  Abstract: Grid is an emergent technology that allows the sharing of resources within groups of individuals or organizations. An optimized job submission in grid demands the use of a middleware, which is able to combine resource availability, access policies and application requirements. Most of current solutions map job requirements to available resources. Nevertheless, only this mapping is not enough to guarantee the service delivery for the user and to perform an intelligent resource allocation in grid. We present here a Multi-Agent System that effectively chooses the best place to run a grid job by making use of adaptable negotiation of job parameters concerning price, time to run the job, and quality of service, and by migrating data through the network. Our approach is concentrated in the multi-issue and chaining negotiation mechanisms that express the grid service delivery as Service Level Agreements.
Resource-Driven Component Deployment in Enterprise Grids
Emmanuel Cecchet, Vivien Quema, Btissam Boutaleb
  Abstract: Enterprise Grid Computing emerges as a new application domain for Grid environments. The multi-tiered applications commonly found in enterprise grid have complex dependencies and resource constraints. Enhanced deployment descriptions are required to allow the application programmer to express these constraints and new services are required to dynamically resolve the constraints imposed by an application and to deploy it efficiently. In this paper, we propose an extended architecture description language that allows modeling complex multi-tiered applications, their dependencies and resource requirements. We present an enterprise grid application deployment service based on a distributed system cartography that allows for efficient deployment of such applications. We illustrate the proposed model and techniques with a use case featuring a real J2EE application on a grid testbed.
A Dataflow Approach to Grid Computing
Debmalya Biswas, Nilanjana Pal
  Abstract: Grid computing has got the attention of both academia and industrial community as the next big wave in distributed computing. However, several challenges, especially the challenge of finding sufficient concurrency within real life business oriented applications, need to be overcome before Grid computing can live up to its promise. We propose using the dataflow approach to resolve some of the issues. The dataflow approach, as an alternative to the conventional control flow model, provides a means for detecting and exploiting concurrency within programs without the programmers help. We also show how adopting the dataflow approach enables failure resilience, efficient resource allocation, partitioning, migration and assembling results of concurrent parts of the program. Towards this end, we describe the proposed architecture of our prototype Dataflow-Grid.
Hiding Grid Resources Behind Brokers
Eliane Araújo, Walfredo Cirne, Gustavo Mendes
  Abstract: Grid computing is a relatively new and promising research area. Many problems regarding its deployment are still been studied and are not clearly solved yet. Scheduling on the grid is particular challenging since new aspects must be taken into account, such as grid size (which can be huge), heterogeneity of resources and their ownership by different entities. Globus Toolkit, perhaps the most successful grid infrastructure, copes with the scheduling on the grid problem with a solution that involves resource managers and brokers. Brokers schedule applications to resources that can only be accessed by their resource managers. The GRAM module is a uniform interface to different resource managers. Resources from different nature must adhere to the GRAM interface in order to be part of the grid. However, we found that GRAM interface is not enough to hide resources details from the GRAM client. Moreover, we found it extremely hard to place GRAM before intermittent resources (i.e. computational resources that appear and disappear with no previous notice, have unknown and varying power and may return incorrect results. Our proposal is to use the broker itself to hide heterogeneous resources. The broker communicates with the resources through a small set of methods necessary to perform its work. This approach is especially attractive when resources cannot cater for arbitrary applications (as with intermittent resources) and could largely simplify the deployment of the grid. We have used MyGrid broker, a grid computing solution for Bag-of-Tasks applications (i.e. those parallel applications whose tasks are independent) to implement these ideas. This broker, exposed as a compliant OGSA/OGSI grid service, was able to access heterogeneous resources, including intermittent resources as well as GRAM/fork resources.
Grid-Based Network and Systems Management Applications
Marcos Assuncao, Fernando Koch, Carlos Becker Westphall
  Abstract: Grid technologies are about the controlled and coordinated resource sharing among dynamic and multi-institutional virtual organizations (VO). A key concept is to provide mechanisms to negotiate resource-sharing arrangements among the parties that compose these virtual organizations with a given goal. Several technologies and middleware have being proposed to attain this scenario. Agents can play an important role on providing a way to develop complex systems while the grid fulfills the desired resource aggregation. The agents act as consumers and resource owners and interact, guarantying the coordinated use of the resources among the participants of a VO. In this paper we present our vision on the subject and a scenario where grid of agents can be applied. We have been working on the application of grids for the management of large scale networks and we present some scenarios of grid-based management in this paper.