Group charter

Term definitions

Resources

Additional Resources 

 

 

 

Research Group on Reliability and Robustness in

Grid Computing Systems

 

This research group has been formed within the Open Grid Forum (OGF)  to study reliability and robustness in Grid computing systems. The research group will develop recommendations for improving Web Services and Grid specifications to enhance reliability of deployed grid systems and formulate guidelines for how standards-based grid systems should be deployed to achieve high reliability.  

Our first event was the Workshop on Reliability and Robustness in Grid Computing Systems, held at GGF16 in Athens, Greece, February 13-16, 2006.  A second workshop was held at OGF19 in Chapel Hill, NC, USA, January 29-February 2, 2006.

The next meeting of the research group will be held on October 17, 2007 at OGF21 in Seattle, WA, USA.

Research Group Goals

This research group is assembling a community of standards developers and researchers to examine how future grid computing systems based on emerging Web Services and Grid standards can achieve levels of robustness and performance required for critical enterprise applications. Three considerations merit special attention. First, the scale of grid computing systems is expected to grow dramatically as grid technology transitions to industrial use. Second, as operational grids scale, there is the possibility that unplanned interactions occurring among components will result in unpredicted emergent and possibly chaotic behaviors.   Finally, grids are likely to be subjected to volatile and uncertain conditions brought about by events such as accidental outages and external attacks. These three factors can potentially endanger or severely degrade the reliability and effectiveness of operational grids in everyday use.

Research Group Scope

The research group is addressing reliability and robustness issues in industrial and scientific grid systems developed on the basis of Web Services and Grid specifications. The research group is charged to:

  • Develop and publish recommendations for improving reliability of Web Service and Grid Specifications. A draft of a future OGF informational document, Reliability of Grid Computing Systems, has been produced.
  • Investigate mechanisms for enhancing grid system reliability and robustness, in particular the relationship of these mechanisms to grid specifications being developed within GGF and other organizations. These mechanisms include, but are not limited to:
    • Services such as Grid FTP, Grid monitoring services, Grid replication services, checkpointing and recovery services, and autonomic computing services impacting grid system reliability,
    • Techniques for maintaining consistent system and component states through time.
    • Languages, terminology, and tools for risk assessment and evaluation of grid reliability that are relevant for industrial use.
  • Support research for developing test methods and metrics to evaluate grid systems reliability and robustness, including the ability of grid systems to detect, and respond to failures of components within grids.
  • Foster definition of minimum performance levels and thresholds for grid system reliability and robustness. For instance, it will be important to understand if an increase in the size of grid system (both in terms of numbers of nodes and workload) leads to unexpected behaviors that degrade reliability.
  • Address specific issues of interest to the grid standards community such as evaluation of the stability of service interface versions (for grid and web service) that may differ across a network, whose interactions may result in unpredicted instability.

Contact

Chris Dabrowski cdabrowski@nist.gov

Geoff Fox gcf@indiana.edu

 


This web page is sponsored by the NIST Information Technology Laboratory (ITL), Software Diagnostics and Conformance Testing Division


PRIVACY POLICY/SECURITY NOTICE
NIST is an agency of the U.S. Commerce Department's Technology Administration.
Created on November 08, 2005
. last updated on November 10, 2005