|
Group
charter
Term definitions
Resources
Additional Resources
|
|
Research Group on Reliability and Robustness in
Grid
Computing Systems
This research
group has been formed within the Open
Grid Forum (OGF) to study reliability
and robustness in Grid computing systems.
The research group will develop recommendations for
improving Web Services and Grid specifications to enhance reliability of
deployed grid systems and formulate guidelines for how standards-based grid
systems should be deployed to achieve high reliability.
Our first event was the Workshop
on Reliability and Robustness in Grid Computing Systems, held at GGF16 in Athens,
Greece,
February 13-16, 2006. A second workshop was held at OGF19 in Chapel
Hill, NC, USA, January 29-February 2, 2006.
The next meeting of the research group will be held on October 17, 2007
at OGF21 in Seattle, WA, USA.
Research Group Goals
This research group is assembling a community of standards
developers and researchers to examine how future
grid computing systems based on emerging Web Services and Grid standards can
achieve levels of robustness and performance required for critical enterprise
applications. Three considerations merit special attention. First, the scale
of grid computing systems is expected to grow dramatically as grid technology
transitions to industrial use. Second, as operational grids scale, there is
the possibility that unplanned interactions occurring among components will
result in unpredicted emergent and possibly chaotic behaviors. Finally, grids are likely to be subjected
to volatile and uncertain conditions brought about by events such as
accidental outages and external attacks. These three factors can potentially
endanger or severely degrade the reliability and effectiveness of operational
grids in everyday use.
Research
Group Scope
The
research group is addressing reliability
and robustness issues in industrial and scientific grid systems developed on
the basis of Web Services and Grid specifications. The research group
is charged to:
- Develop and publish recommendations for
improving reliability of Web Service and Grid Specifications. A draft of
a future OGF informational document, Reliability of Grid Computing Systems,
has been produced.
- Investigate mechanisms for enhancing grid system reliability and
robustness, in particular the relationship of these mechanisms to grid
specifications being developed within GGF and other organizations. These
mechanisms include, but are not limited to:
- Services such as Grid FTP, Grid monitoring services, Grid
replication services, checkpointing and recovery services, and
autonomic computing services impacting grid system reliability,
- Techniques for maintaining consistent system and component states
through time.
- Languages,
terminology, and tools for risk assessment and evaluation of grid
reliability that are relevant for industrial use.
- Support research for developing test methods
and metrics to evaluate grid systems reliability and robustness,
including the ability of grid systems to detect, and respond to failures
of components within grids.
- Foster definition
of minimum performance levels and thresholds for grid system reliability
and robustness. For instance, it will be important to understand
if an increase in the size of grid system (both in terms of numbers of
nodes and workload) leads to unexpected behaviors that degrade
reliability.
- Address specific
issues of interest to the grid standards community such as evaluation of
the stability of service interface versions (for grid and web service)
that may differ across a network, whose interactions may result in
unpredicted instability.
Contact
Chris Dabrowski cdabrowski@nist.gov
Geoff Fox gcf@indiana.edu
This web page is sponsored by the NIST Information Technology Laboratory (ITL), Software Diagnostics and Conformance
Testing Division
PRIVACY POLICY/SECURITY NOTICE
NIST is an agency of the U.S. Commerce Department's Technology
Administration.
Created on November 08, 2005. last updated on November 10,
2005
|