CERN-Solid brainstorming meeting

Europe/Zurich
28/R-015 (CERN)

28/R-015

CERN

15
Show room on map
Maria Dimou (CERN)
Description

The aim of this first brainstorming  meeting is to exchange technical information on the standardisation activities of the Solid project between representatives of Solid implementors  and CERN developers of popular and very visible applications, like Indico and Invenio.

Speakers: Mitzi László , Sarven Capadisli, , Maria Dimou, Pedro Ferreira, Lars Nielsen, Jakub (Kuba) Moscicki, Hugo González Labrador

Group management members: José Benito Gonzales, Thomas Baron, Eduardo Alvarez Fernandez (for Andreas Wagner), Tim Smith

NYTimes article of 2019/11/24 and the Richard Dimbleby talk by Sir Tim Berners-Lee.

Photo from the CERN-Solid meeting of 20200207

 

Present

Mitzi, Maria, Sarven, Thomas, José, Pedro, Adrian, Eduardo, Lars and

Hugo, Kuba, Tim (partially).

Apologies: Andreas, Michal.

Executive summary of the conclusions

a. The Solid project core idea of decoupling Identification from Storage from Apps is attractive for all Web applications. 

b. The CERN use cases are currently not covered by Solid. The dokieli example is good for end users but not service providers. Therefore, work has to be done on the Solid side for the specifications to be complete and the interaction method between developers to be clear for handing them over and receiving contributions.

c. Maria, as CERN-Solid collaboration manager, will maintain the interface between Solid and CERN developers, as point b above becomes clear. This exchange/collaboration exercise will naturally need a period of discovery and adaptation. Once the content of another f2f meeting is defined, it will be scheduled - probably in Q3 2020 .

Details on the presentations (slides linked from the agenda) and the discussions that followed. All the links to the applications explained here are on the agenda and listed on the CERN-Solid category index page.

Problems of the web today (Mitzi)

Feedback from the participants on problems of the web today:

  • data silos with no sharing possible - Pedro
  • lack of control of one's own data - Eduardo
  • human nature - technology shows the same problems as the ones seen in society - Lars
  • data sovereignty and personal freedom to choose one's hw/sw - Hugo
  • cash in hundreds of billions ($961.3B for Apple) in the bank for GAFAMs - competition with them is impossible for small innovative companies - Mitzi

Solutions proposed and their challenges

  • separate storage from user identification and application.
  • user decides where to store his/her data.
  • inter-operability isn't happening because there are serious financial intensives for big companies to make it impossible.
  • there are already > 300.000 standards. the people sitting in the standards' committees are paid by big companies, so their good intentions are limited by their employers' interests.
  • it is hard to persuade on a technical solution when there is no financial incentive.
  • there is a question about who is the data owner - the one who publishes or the one who accesses?
  • user freedom to choose where his/her data will sit is part of the policy of CERN (Hugo).
  • if there were a consensus on the infrastructure level, the applications would follow (Eduardo).
  • the computing experts are few and the great public just wants practical solutions.
  • convenience outlays everything including education. For big companies there is no incentive to change what they do now.
  • there should be a technical solution on search and data ownership.
  • only by making a solution that is convenient and show you can drive the functionality you can stand a chance. (Tim)
  • every invention has good and bad applications. Only regulation can stop the bad side-effects of every invention. (Lars)
  • telecoms have been closed circuits and they opened up because they became international. Big american companies would not agree to put their own (american) interests aside. (Pedro)
  • One should have control of who you share your data with. Publish and be able to choose the providers who can give you back the service you need. (Sarven)
  • At the end of the day, even if you have the freedom to select a provider, you can't see what happens with your data after and behind, so one needs to be able to trust. (Maria)

Indico (Pedro)

Mitzi asks whether Indico complies with the Solid specs. The answer is 'no'. Sarven says the Indico implementation is very close anyway.

Research Data Management platform (Lars)

Archive, disseminate, re-use of Research Data. Warranty of data preservation is hard to achieve. Trying to facilitate uploads. CERN's Open Data is not 'open' from the beginning of data production for reasons of risk of misinterpretation, need to publish first, high price of the experiments, hence need to be the first to discover and write about it... Now there is competition on all fronts, e.g. big publishers e.g. Elsevier getting into play for cloud storage solutions, usage statistics and data analytics. Google Dataset Search compiles data and makes them available for citation counts, universities' rating etc.

The CERN team is good at running the infrastructure. The actual data, e.g. index of all species, is the responsibility of the data owners. Data sharing culture is a challenge. Data replication is needed because governments and institutes want to have what they own at their premises.

Researchers find Zenodo uploading much more functional to use than the ones publishers offer. Publishers retain info until you pay. This totally against the principle of flow and set to proof trial of scientific information. Science can be preserved when it is made available for trial and verification.

Mitzi reminds of the european health-related data not being actually hosted in Europe! The data for which enormous amount of public health money is invested, not being  available to their own countries is an exposure to danger.

The issue of trust comes up in every technical and policy aspects of every project.

CS3 Mesh project (Kuba)

Metadata-awareness is very present in european policy making bodies. EU funds were received for this project. Mitzi says that WeTransfer (dutch company) are now expanding in functionality beyond file transfer and they do comply to standards. Maybe the project could invite them to the next event to find out what they are doing. They can't be part of the project because the partners are set and because WeTransfer is a company, not an Open Source solution. Still as a proof of concept they can be considered. Similarly Dropbox is used by many users because their organisation decided to adopt this solution. The CS3 Mesh project won't leave these users out. Future maintenance is also a concern.

Kuba asks what can Solid specs and standards do  for the CS3Mesh upcoming protocols and APIs.

Doke.li as a Solid application (Sarven)

A clientside editor for decentralised article publishing, annotations and social interactions.

Solid is domain-agnostic. User is allowed to choose his/her own Identifier. Different data types, e.g. http, RDF (Resource Description Framework), various other for media data files...  Person/agent/company WebID is a URI decoupled from the authentication mechanism. WebID was swapped with OIDC (Open Id Connect).

Discussion followed around who is the community? What is their incentive to join the Solid community? Developers may adhere because it is an open solution but institutions don't see the incentive to join.

Decoupling Identification from Storage from Apps can be attractive but the method, is it there?

Solid has a standard notification system that can be used by existing application, e.g. Zenodo or Indico. E.g. sending RSVP to both applications which are different just using the same standard.

The user's WebID eg. https://csarven.ca comes with a number of locations where the user can store. Then when saving s|he is prompted with >1 options to choose. One can choose to have a Hypothesis account  or an orcid ID.

If one annotates a part of a paper, the content of the annotation is stored in one's preferred storage and a notification is sent (optionally) to the article's site.

If an annotation is removed then this can be a problem for others who made a reference to it.

Solid processes

The github solid process panels and contributors advances via chats between the panelists. The editor of the panel is appointed by Sir Tim. There is a possibility to vote. This wasn't necessary so far because the panelists are few. For the moment all developers come from inrupt.com so there is no interoperability problem. This US-based company is for-profit. Lars says why / who will justify the motives and the ethics. Sir Tim co-founded inrupt to show that a company can be ethical. Still there are decisions which are opaque to its members.

Openlink is another company with many technical solutions e.g. RDF storage similar to Solid. (who said this? Mitzi?)

Jose says that having pods in zenodo that move around gigabytes of data is hard to do. Also for WebIds we don't have user owners but institutions.

Lars talks about Next Generation Repositories' framework (NGR). Statistics or views and downloads is a must for zenodo because the users require it.

Tim S. says that we would like to find out more about how we could collaborate.

Sarven says that one of the Solid objectives is for each to have a persona facebook-like site and still being able to share events with others.

Lars says that moving data around between entities (institutions) can be interesting but it is not clear that Solid makes it easier and faster. NGR takes care of inter-operability OIA/OIE standards ... Jose says that Solid could join the Coalition of Open Access Repositories (COAR).

Hugo says the use cases we have are not covered by Solid. The examples shown by Sarven are good for end users but not service providers. The end users of our applications are not going to type URLs and run their own servers.

What makes one Solid-compliant? Is the use of WebId enough? Sarven says that the possibility to read/write anonymously should make the WebId not requirement

About inter-operability

Mitzi shows datatransferproject.dev (repo https://github.com/google/data-transfer-project) where the giant companies participate. Maria says the question is what they do with the data once they agree to mutually transfer it and why an owner of an application like Zenodo cares about this?

Lars says that he sees as today's outcome to 'get into the practice to be looking at the Solid specs to see what can be relevant to our use cases.'

Concluding thoughts

Thomas: Nice concepts e.g. decoupling AuTH/AuthZ from app and from data. We already have this concept, also for notifications. More Solid details will be very useful and necessary for feeding ideas to these on-going developments at CERN.

Mitzi asks for links to these existing on-going CERN initiatives. Maria will get relevant links from Thomas and forward. Here they are:

Pedro says there is a lot of potential in the idea of linking data with the semantic web. There is quite some way to go until the solutions are made available, performant and easy to use for the average end user. The meetulater app is very interesting. Mitzi will send the link to the site.

Eduardo asks for a clear set of the Solid components.

Maria thanked everyone for the day they devoted to this brainstorming. Understanding how busy CERN developers are, she, as CERN-Solid collaboration manager will make available the Solid process tools, development repositories and chat fora and propose a functional way to channel Solid-CERN information. CERN IT developers will pick-up, adapt, adopt, complete, comment, when appropriate. This exchange/collaboration exercise will naturally need a period of discovery and adaptation.

The cern-solid@cern.ch e-group was created for future exchanges, with members today's CERN participants. Mitzi and Sarven are eligible to post to this e-group.

 

 

 

 

 

 

 

 

There are minutes attached to this event. Show them.
    • 10:00 10:10
      who is who introduction 10m

      Maria gives an introductory kick off, explaining the aim of the meeting and the agenda for the day. Participants around the table present themselves.

      Speaker: ALL
    • 10:10 10:40
      What Solid does 30m

      Mitzi gives a presentation followed by conversation about the problems the Solid project is trying to solve.

      Speaker: Mitzi László
    • 10:40 11:10
      Short description of Indico 30m

      The application, the development team, the other Indico instances, challenges around content ownership and more.

      Speaker: Pedro Ferreira (CERN)
    • 11:10 11:40
      Short description of Zenodo - InvenioRDM 30m

      Lars explains the application's purpose, structure, contributions by the community, challenges for integrating input, dissemination and influence on Zenodo and CDS.

      Speaker: Lars Holm Nielsen (CERN)
    • 11:40 12:40
      Lunch in Restaurant 2 1h
    • 12:40 12:50
      The CS3MESH project 10m

      In this presentation we will explain the objective of the CS3MESH4EOSC EU project. Europe is facing a fragmentation of different services that are very difficult to connect to each other like you were using a centralized global service like DropBox or Google Drive. The project aims at bringing the same user-friendly functionality already offered by global services and individual on-premise deployments on a pan-European cross-institution mesh that relies on the federation of different sites by using well-known APIs. The project involves defining and promoting a set of APIs that will allow to connect these isolated service islands while retaining user privacy and data sovereignty.

      Speakers: Hugo Gonzalez Labrador (University of Vigo (ES)), Jakub Moscicki (CERN)
    • 12:50 13:20
      Developer tools and implementation 30m

      Mitzi and Sarven explain what is in focus now for the solidproject.org: The community, the software repositories, the code review processes, the communication channels. They explain where implementation efforts take place, explain inrupt's role https://solid.inrupt.com/, as well as the work done at MIT and elsewhere.

      Sarven explains some Solid applications e.g. https://dokie.li/ ,
      which showcases various disparate parts – Web standards, technologies and social ideals – coming together in a cohesive way to serve a futureresearch communication ecosystem.

      Speakers: Mitzi László, Sarven Capadisli
    • 13:20 13:50
      Solid standardisation processes 30m

      Mitzi László and Sarven Capadisli explain the Solid process, structure of the developers' community, the way they interoperate and assign priorities.

      This joint presentation aims at reaching an understanding of the existing points and forms of collaboration before discussing a collaboration with CERN.

      Speakers: Mitzi László, Sarven Capadisli
    • 13:50 15:50
      Free format discussion on development collaboration 2h

      Everyone says what effort can be offered in a CERN-Solid collaboration and what mutual benefit can be expected.

      Speaker: ALL
    • 15:50 16:10
      Coffee in the Social Room upstairs 20m
    • 16:10 16:30
      Conclusions & what now? 20m

      Maria summarises and all conclude and confirm what the next step is.

      Speaker: Maria Dimou (CERN)
    • 16:30 16:40
      A.O.B. 10m

      Links on other CERN projects in the area of Authorisation, Autorisation and Notifications, as discussed at the end of the meeting.