Abstract:
Structural bioinformatics applies computational methods to analyse and model three-dimensional molecular structures. These methods address data-intensive and compute-intensive problems, which demand high-performance computing (HPC) to allow data analysis in an acceptable time. Thus, structural bioinformatic applications are ideal candidates for grid and cloud computing infrastructures, so-called DCIs (Distributed Computing Infrastructures). DCIs provide access to HPC facilities and services across organisational boundaries. However, the usability of DCIs is limited and the use of the complex methods in structural bioinformatics requires a lot of experience. In addition, users mainly process and analyse data not only via single jobs but via workflows.
An approach to offer easy and intuitive access to applications on DCIs are science gateways. In general, a science gateway provides a single point of entry to a set of tools and data of a specific application domain while hiding the complex underlying infrastructure. Web-based science gateways are additionally characterised by only requiring a computer connected to the Internet and an installed web browser on the users' side. Developers of such gateways support the users with pre-configured user interfaces targeted for a specific application domain. The overall goal for creating science gateways is to increase the usability of applications.
This work is focused on workflow-enabled grid portals that are specific web-based science gateways supporting the management of workflows on DCIs. Four major aspects in the context of workflow-enabled grid portals are addressed in this work: security in portals with underlying DCIs, job and workflow management, migration of workflows between diverse science gateways, and the automatic creation of portlets for workflow management. We have developed a granular security concept, which encloses all layers of the involved infrastructure: the user interface, the high-level middleware layer, the grid middleware layer, and the HPC facilities. We are especially focused on the role-based user management concerted for the molecular simulation community and the credential management. The workflow-enabled grid portal WS-PGRADE (Web Services Parallel Grid Runtime and Developer Environment) has been extended for the use of SAML (Security Assertion Markup Language) for trust delegation. The created credential files set the stage for the authentication processes in the connected DCIs. We chose to use DCIs connected via the grid middleware UNICORE 6 because of UNICORE's scalable service-oriented architecture and its workflow engine. For the integration in WS-PGRADE, we implemented a plugin, a so-called submitter, which allows invoking jobs on UNICORE 6. The submitter additionally supports UNICORE workflows to be invoked via WS-PGRADE and thus provides workflow interoperability. Users can seamlessly re-use existing UNICORE 6 workflows in WS-PGRADE. The Application Specific Module (ASM) has been developed to simplify the implementation of workflow-enabled portlets extending WS-PGRADE. The migration of workflows is achieved by a tool for the export of Galaxy workflows to WS-PGRADE workflows. Galaxy is a workflow-enabled portal for cloud infrastructures and for local HPC facilities. It is widely used but has the disadvantage of lacking the possibility to connect to grid infrastructures. Therefore, we implemented the export of Galaxy workflows to WS-PGRADE workflows, which can be easily imported by the users in WS-PGRADE. Furthermore, this work presents a concept for automatically generating graphical user interfaces for a WS-PGRADE portal via the software framework Rapid. Rapid allows creating portlets without programming in the traditional sense. Developers are enabled to automatically convert XML files to a fully functional portlet for WS-PGRADE.
The concepts and implementations in this work are applied in the science gateway of the project MoSGrid (Molecular Simulation Grid). It forms a use case for a science gateway for structural bioinformatics and provides a complete solution for the molecular simulation community.