28–30 Jan 2019
CNR
Europe/Zurich timezone

CloudStor SWAN: Data Processing and Analysis Challenges in the Cloud

29 Jan 2019, 11:55
20m
CNR

CNR

National Research Council - Piazzale Aldo Moro 7, 00185 Roma, Italy
Presentation Cloud infrastructure and software stacks for data science Data science: applications and infrastructure

Speaker

Mr Michael D'Silva (AARNet)

Description

CloudStor SWAN (Service for Web based ANalysis) is AARNet’s first attempt at providing data processing and analysis in the cloud to the research community in Australia. This presentation will discuss AARNet’s experiences, challenges and tools used to provide research data computing in the cloud.

SWAN (Service for Web based ANalysis) helps users run scientific data processing and data analysis in the AARNET cloud quickly. One of the problems we are faced with is that researchers upload data to CloudStor and then need to download the data in order to do any processing on it. This is undesired by users as it causes issues and interrupts the natural workflow. For this reason, we have developed and deployed a modified version of the SWAN service (https://swan.web.cern.ch) developed by CERN and presented during earlier CS3 conferences which is an extended implementation of Jupyter Notebooks that also integrates directly into ownCloud.

Out of the box, CERN’s SWAN requires CERNBox and EOS to provide authentication and storage whereas AARNet’s CloudStor SWAN has been modified to interact with our ownCloud instance directly. When a user requests a SWAN notebook, the ownCloud SwanViewer App generates an ownCloud App Password and passes it onto SWAN for authentication. Once authenticated, the ownCloud App Password is invalidated and removed from ownCloud. For users who access SWAN directly, they simply use ownCloud sync client credentials.

In order to use CloudStor storage in SWAN a WebDAV connection is made to ownCloud via a second ownCloud App Password. This allows us to hide our backend storage away from direct access, giving better security, while providing a seamless user experience.

In addition to greater security, we have customised SWAN to be very generic, so that if required, we can deploy SWAN at a remote site, even on a different network allowing for future expansion options. CloudStor SWAN really only needs to be able to communicate with ownCloud itself thus making it generic enough that any ownCloud operator can deploy it.

In order to use SWAN, users upload data as per normal into CloudStor and then start up a notebook where they can interact with data either in their home directory or from a shared group drive. By keeping data in the cloud and providing tools for data processing and analysis, in the right conditions, users’ workflows become more efficient. The benefit of Jupyter Notebooks in this way is it allows users to keep data analysis functions near the data which encourages reproducible analytics.

Author

Mr Michael D'Silva (AARNet)

Presentation materials

There are no materials yet.