Data analysis in High Energy Physics experiments requires processing of large amounts of data. As the main objective is to find interesting events from among those recorded by detectors, the typical operations involve data filtering by applying cuts and producing of histograms. The typical offline data analysis scenario for TOTEM experiment at LHC, CERN involves processing of 100s of ROOT...
Rocket is the first attempt at handling one of the particular problems that other tools have failed to solve. This presentation will demonstrate AARNet’s experiences and tools used high-speed data transfers of different kinds of research data.
The research community in Australia is spread far and wide geographically, resulting in some cases to be physically far from one of our three...
Microservices are an approach to distributed systems that promote the use of finely grained services with their own lifecycles, which collaborate. The use of microservices facilitates embracing new technologies and architectural patterns. Sync and share providers could increase the modularity and facilitating the exchange of components and best practices adopting the use of microservices.
In...
Container technologies are rapidly becoming the preferred way to distribute, deploy, and run services by developers and system administrators. They provide the means to create a light-weight virtualization environment, i.e., a container, which is cheap to create, manage, and destroy, requires a negligible amount of time to set-up, and provides performance equatable with the one of the...
There is growing interest for self-hosted, scalable, fully controlled and secure file sync and share solutions among enterprises. The ownCloud has found its share as free-to-use, open-source solution, which can scale on-premise from a single commodity class server to a cluster of enterprise class machines, and serve from one to thousands of users and PB of data. Over the years, it has grown a...
Research Data Management (RDM) serves to improve the efficiency and transparency in the scientific process and to fullfil internal and external requirements. Three important goals of RDM are:
- long-term data preservation,
- scientific-process documentation,
- data publication.
One of the tasks in RDM is to define a workflow for data as part of the research process and data lifecycle. RDM...
We present our recent work [1] where we applied state of the art deep learning techniques for image recognition, automatic categorization, and labeling of nanoscience images obtained by scanning electron microscope (SEM). Roughly 20,000 SEM images were manually classified into 10 categories to form a labeled training set, which can be used as a reference set for future applications of deep...
The Joint Research Centre (JRC) of the European Commission has set up the JRC Earth Observation Data and Processing Platform (JEODPP) as a pilot infrastructure to enable the knowledge production Units to process and analyze big geospatial data in support to EU policy needs. The very heterogeneous data domains and analysis workflows of the various JRC projects require a flexible set-up of the...
Over two years Data-Cloud team at DESY provides a reliable ownCloud instance for a selected set of users. While service is still officially in a pilot phase, it’s has the same support and priority level as any other production services provided by the IT group. However, before removing it’s “beta” status some extra actions have to be taken: the instance must be fault tolerant and allow...
In the summer of 2017, I inheirited SWITCHdrive, SWITCH's ownCloud-based filesharing system. SWITCHdrive is a fairly complex service including a set of docker based microservices. I will describe the continuing story of our experiences with running such an environment. We had some interesting developments in tuning our MariadDB/Galera database infrastructure, and we have also greatly...
The past year we were able add a number of extra features to the SURFdrive service in order to make it more attractive to users and institutes and there is more to come. Another thing that we have observed that several institutes and research groups have a need for a SURFdrive in a version more tailored to their needs. SURFdrive is fine as it is but it is a one size fits all solution....
CERNBox is a cloud synchronisation service for end-users: it allows synchronising and sharing files on all major desktop and mobile platforms (Linux, Windows, MacOSX, Android, iOS) aiming to provide universal access and offline availability to any data stored in the CERN EOS infrastructure.
With 12000 users registered in the system, CERNBox has responded to the high demand in our diverse...
The report focuses on the deployment of the CERN SWAN-like environment on top of existing EOS storage. Our setup consists of a local cluster with Kubernetes to run JupyterHub and single-user Jupyter notebooks plus a dedicated server with CERNBox. The current setup is tested by our colleagues in the Laboratory of ultra-high energy physics of the St. Petersburg State University, but there are...
The revalidation, reuse and reinterpretation of data analyses requires having access to the original virtual environments, datasets, software, instructions and workflow steps which were used by the researcher to produce the original scientific results in the first place. The CERN Analysis Preservation pilot project is developing a set of tools that assist the particle...
Keeper is a central service for scientists of the Max Planck Society and their project partners for storing and archiving all relevant data of scientific projects. Keeper facilitates the storage and distribution of project data among the project members during or after a particular project phase and seamlessly integrates into the everyday work of scientists. The main goal of the Keeper service...
iRODS is Open Source Data Management that can be deployed seamlessly onto your existing infrastructure, creating a unified namespace, and a metadata catalog of all the data objects, storage, and users on your system. iRODS allows access to distributed storage assets under the unified namespace and frees organizations from getting locked into single-vendor storage solutions. iRODS can...
The purpose of the presentation we’ll propose during the CS3 conference in Krakow is to highlight the technological features of Cynny Space’s cloud object storage solution and the results of performance and usability for a sync & share use case.
1) Software specifically designed for ARM® architecture
The object storage solution is specifically designed and developed on storage nodes composed...
Over two years Data-Cloud team at DESY uses dCache as a backend storage for the ownCloud instance used in a production. As being a highly scalable storage system, dCache is widely used by many sites to store hundreds of petabytes of scientific data. However, the cloud-backend usage scenarios have added new requirements, like high availability and downtime less updates any software or hardware...
We are heading into a world were the files of most users are hosted by 4 big companies. This is the case for most home users, companies but also education and research institutions. If we want to keep our sovereignty over our data, protect our privacy and prevent vendor lock-in then we need open source self hosted and federated alternatives.
A new challenge is the increasing blending of...
“Sync and Share is Dead. Long Live Sync and Share." discusses the increasing disinterest users have in simple file storage, Simple storage is a commodity service, with Google, DropBox, and other big players who can legitimately resolve concerns about data centre security, legal control, administration and audit, and standards compliance. The competitive advantage for any given data storage...
On-premise EFSS is now an established market, and open source solutions have been key-players in the last couple of years. For many enterprises or labs, the need for privacy and handling large volumes of data are show-stoppers for using saas-based solutions. Still, for these users, the experience speaks by itself: even with good software, it is hard to deploy a scalable and reliable system...
Seafile is an open source file sync and share solution. Thanks to its high performance, scalability and reliability, it has been successfully used by many organizations in Europe, North America and China.
In this presentation, we'll provide a review of Seafile's development in 2017, and what we plan to accomplish in the future. We'll also present a site report from China with heavy usage,...
This talks covers the current state and functionality of Nextcloud. Especially the new and innovativ features of Nextcloud 12 and 13 are discussed and presented in detail. Examples are End 2 End encryption, collaboration and communication features and security and performance improvements. The second part of the talk presents the roadmap and strategic direction of Nextcloud for the coming...
ownCloud has been an excitingly successful service in the EFSS space since its breakthrough in 2013. Since customers deploy the solution in vastly different environments as public, private or hybrid cloud and utilizing different infrastructure components and identity providers, operational experience showed challenges with the previous design decisions.
This talk will reflect on the past...
Blockchain is currently one of the hot topics. Developed as part of the cryptocurrency Bitcoin as a web-based, decentralized, public and most important all secure accounting system, this database principle could not only revolutionize the worldwide financial economy in the future; Blockchain is already an topic in electromobility, health care or supply-chain-management - just to name a...
Cubbit is a hybrid cloud infrastructure comprised of a network of p2p-interacting IoT devices (swarm) coordinated by a central optimization server. The storage architecture is designed to reverse the traditional paradigm of cloud storage from "one data center to rule them all" to "a small device in everyone’s house".
Any IoT device that supports an Unix-based OS can join the swarm and...
Over the past year we dropped the requirement that ownCloud should run on every PHP platform. This allows us to research architectural changes, like push notifications, microservices, dockerized deployments, HSM integration and storing metadata in POSIX or object storages. On the client side we are exploring E2EE, virtual filesystems and delta sync. Together with feedback from our community...
The talk will introduce the main concepts of Shibboleth, advantages and disadvantages and show the integration of Shibboleth with a Sync and Share service (webapp with own session handling, not designed for using the Shibboleth session as webapp session) with Seafile as an example.
Furthermore it will discuss the problems of Shibboleth federations and possible mitigations.
A special focus will...
The typical Nextcloud setup for large installations includes a storage and a database cluster attached to multiple application servers behind a load balancer. This allows organisations to scale Nextcloud for thousands of users. But at some point the shared components like the storage, database and load balancer become a expensive bottleneck. Therefore Nextcloud introduced "Global Scale", a new...
Managing the database where you store your application data is always an
interesting challenge. As the scale of your service grows, so does the
challenge of keeping a healthy database service. However with just a few tools
and techniques it is possible to implement some serious performance
improvements with just a little bit of effort. Using the performance tools
included with MariaDB, at...
This presentation gives details and demonstrates the new SWAN sharing interface. See also: "SWAN: Service for Web-based Analysis" in "Cloud Infrastructure&Software Stacks for Data Science" session.
Current sharing in ownCloud does not allow seamless access of shared data. Media disruptions and inefficient communication methods reduce productivity for teams through a lack of information. Sharing 3 introduces a new bi-directional request-accept flow for streamlining collaboration within the ownCloud platform. This gives users further control over their data, allows them to request access...
This panel discussion session will be focusing on the actual use cases that can drive the adoption and further development of the OCM protocol. Panellists will be requested to provide their views and vision for the future with regard to interoperability between private cloud domains.
In this contribution, the evolution of CERNBox as a collaborative
platform is presented.
Powered by EOS and ownCloud, CERNBox is now the reference storage
solution for the CERN user community, with an ever-growing user base
that is now beyond 12K users.
While offline sync is instrumental for such a widespread usage, online
applications are becoming more and more important for the...
Come and hear about what is Collabora Online and how it integrates into many File Sync&Share Products to create a powerful, secure, real-time document editing experience. Hear about the improvements over the last year, catch a glimpse of where we are going next, and hear how you can get it integrated into your product - if you haven't integrated it yet.
Global academic community tends to show more interest in using cloud technologies for scientific data processing, which is determined by the need for quick joint access to the data.
This presentation will deal with the question of the convenient and effective cloud editing of documents as the main form of storing and exchanging the information.
ONLYOFFICE, a project by Latvian software...
IME I/O Acceleration layer is one of the latest efforts of DDN in order to satisfy the never ending needs for performance of the HPC community. We propose to discuss some of the latest advancements of IME product in respect to the larger evolution of Software Defined Storage has it is observed outside of the HPC market.
The arrival of the Flash has pushed existing HPC file systems to their...
Handling 100s of Terabytes of data at the speed of 10s of GB/s is nothing new in HPC. However, high performance and large capacity of the storage systems rarely go together with their ease of use. HPC storage systems are specifically difficult to access from outside the HPC cluster. While researchers and engineers tolerate the fact that they need to use rigid tools and applications such as...
To allow better scalability of ownCloud in large installations we spent some time to leverage the ownCloud integration with s3 based objectstores like Ceph and Scality.
At the ownCloud Conference 2017 we have been presenting the vision where to go.
At CS3 we will present the results!
The National Education and Research Network (RNP) is an organization that plans, designs, implements and operates the national network infrastructure under contract with the Ministry of Science, Technology, Innovation and Communications (MCTIC). A current government program includes five ministries - MCTI, Education (MEC), Culture (MinC), Health (MS) and Defense (MD), and annually define the...
SWITCH has been running cloud-based filesharing services since 2012, starting with an experiment where we hosted FileSender in the Amazon Cloud. After this experience, we decided to build a cloud service for ourselves, SWITCHengines, which runs upon an OpenStack Infrastructure. The challenge with our SWITCHengines infrastructure and filesharing is the Ceph storage that we use for our user...
Nextcloud can be scaled from very small to very big installations. This talk gives an insiders look on how to deploy, run and scale Nextcloud in different scenarios. Discussed will be a very big installation in the research space, an installation in a global enterprise and the implementation of Nextcloud at one of the largest service providers in the world. The different infrastructural...
This talk covers a journey through fuzz-testing CERN's EOS file system with AFL, from compiling EOS with afl-gcc/afl-g++, to learning to use AFL, and finally, making sense of the results obtained.
Fuzzing is a software testing process that aims to find bugs, and subsequently potential security vulnerabilities, by attempting to trigger unexpected behaviour with random inputs. It is particularly...
We started looking at the site reports of CS3 with the goal of designing a large OwnCloud/Nextcloud solution. We learned that the main product used for sync and share is Nextcloud/Owncloud and looking at that large user base implementations, saw that most are implementations are large monolithic installs. The site reports also showed that these large installs have some weaknesses like scaling...