CERN-Solid brainstorming meeting

Europe/Zurich
28/R-015 (CERN)

28/R-015

CERN

15
Show room on map
Maria Dimou (CERN)
Description

The aim of this first brainstorming  meeting is to exchange technical information on the standardisation activities of the Solid project between representatives of Solid implementors  and CERN developers of popular and very visible applications, like Indico and Invenio.

Speakers: Mitzi László , Sarven Capadisli, , Maria Dimou, Pedro Ferreira, Lars Nielsen, Jakub (Kuba) Moscicki, Hugo González Labrador

Group management members: José Benito Gonzales, Thomas Baron, Eduardo Alvarez Fernandez (for Andreas Wagner), Tim Smith

NYTimes article of 2019/11/24 and the Richard Dimbleby talk by Sir Tim Berners-Lee.

Photo from the CERN-Solid meeting of 20200207

 

Present

Mitzi, Maria, Sarven, Thomas, José, Pedro, Adrian, Eduardo, Lars and

Hugo, Kuba, Tim (partially).

Apologies: Andreas, Michal.

Notes by Maria with comments by Sarven.

Executive summary of the conclusions

a. The Solid project core idea of  loose-coupling of identity, identification, authentication, authorization, data and user interfaces (applications) is attractive for all Web applications. 

b. The CERN use cases are currently not in the focus of Solid, still they are in its scope. The dokieli example shows similarity, even complementarity of requirements. CERN developers felt some time is necessary for the Solid specifications to be complete and the interaction method between developers to be clear for handing them over and receiving contributions. Sarven suggests to take individual notions under the Solid umbrella, that can be adopted by CERN projects,  and evolve the specifications based on the problems encountered and documented. The area of notifications is worth exploring further.

c. Maria, as CERN-Solid collaboration manager, will maintain the interface between Solid and CERN developers, as point b above becomes clear. This exchange/collaboration exercise will naturally need a period of discovery and adaptation. Once the content of another f2f meeting is defined, it will be scheduled - probably in Q3 2020 .

Details on the presentations (slides linked from the agenda) and the discussions that followed. All the links to the applications explained here are on the agenda and listed on the CERN-Solid category index page.

Problems of the web today (Mitzi)

Feedback from the participants on problems of the web today:

  • data silos with no sharing possible - Pedro
  • lack of control of one's own data - Eduardo
  • human nature - technology shows the same problems as the ones seen in society - Lars
  • data sovereignty and personal freedom to choose one's hw/sw - Hugo
  • cash in hundreds of billions ($961.3B for Apple) in the bank for GAFAMs - competition with them is impossible for small innovative companies - Mitzi

Solutions proposed and their challenges

  • separate storage from user identification and application.
  • user decides where to store his/her data.
  • inter-operability isn't happening because there are serious financial intensives for big companies to make it impossible.
  • there are already > 300.000 standards. the people sitting in the standards' committees are paid by big companies, so their good intentions are limited by their employers' interests.
  • it is hard to persuade on a technical solution when there is no financial incentive.
  • there is a question about who is the data owner - the one who publishes or the one who accesses?
  • user freedom to choose where his/her data will sit is part of the policy of CERN (Hugo).
  • if there were a consensus on the infrastructure level, the applications would follow (Eduardo).
  • the computing experts are few and the great public just wants practical solutions.
  • convenience outlays everything including education. For big companies there is no incentive to change what they do now.
  • there should be a technical solution on search and data ownership.
  • only by making a solution that is convenient and show you can drive the functionality you can stand a chance. (Tim)
  • every invention has good and bad applications. Only regulation can stop the bad side-effects of every invention. (Lars)
  • telecoms have been closed circuits and they opened up because they became international. Big american companies would not agree to put their own (american) interests aside. (Pedro)
  • One should have control of who you share your data with. Publish and be able to choose the providers who can give you back the service you need. (Sarven)
  • At the end of the day, even if you have the freedom to select a provider, you can't see what happens with your data after and behind, so one needs to be able to trust. (Maria)

Indico (Pedro)

Mitzi asks whether Indico complies with the Solid specs. The answer is 'no'. Sarven says the Indico implementation is very close anyway.

Research Data Management platform (Lars)

Archive, disseminate, re-use of Research Data. Warranty of data preservation is hard to achieve. Trying to facilitate uploads. CERN's Open Data is not 'open' from the beginning of data production for reasons of risk of misinterpretation, need to publish first, high price of the experiments, hence need to be the first to discover and write about it... Now there is competition on all fronts, e.g. big publishers e.g. Elsevier getting into play for cloud storage solutions, usage statistics and data analytics. Google Dataset Search compiles data and makes them available for citation counts, universities' rating etc.

The CERN team is good at running the infrastructure. The actual data, e.g. index of all species, is the responsibility of the data owners. Data sharing culture is a challenge. Data replication is needed because governments and institutes want to have what they own at their premises.

Researchers find Zenodo uploading much more functional to use than the ones publishers offer. Publishers retain info until you pay. This totally against the principle of flow and set to proof trial of scientific information. Science can be preserved when it is made available for trial and verification.

Mitzi reminds of the european health-related data not being actually hosted in Europe! The data for which enormous amount of public health money is invested, not being  available to their own countries is an exposure to danger.

The issue of trust comes up in every technical and policy aspects of every project.

CS3 Mesh project (Kuba)

Metadata-awareness is very present in european policy making bodies. EU funds were received for this project. Mitzi says that WeTransfer (dutch company) are now expanding in functionality beyond file transfer and they do comply to standards. Maybe the project could invite them to the next event to find out what they are doing. They can't be part of the project because the partners are set and because WeTransfer is a company, not an Open Source solution. Still as a proof of concept they can be considered. Similarly Dropbox is used by many users because their organisation decided to adopt this solution. The CS3 Mesh project won't leave these users out. Future maintenance is also a concern.

Kuba asks what can Solid specs and standards do  for the CS3Mesh upcoming protocols and APIs.

Dokeli as a Solid application (Sarven)

A clientside editor for decentralised article publishing, annotations and social interactions.

Solid is domain-agnostic. Users are allowed to choose their own Identifier. Different data types and content models e.g. RDF (Resource Description Framework).
WebID is a dereferenceable HTTP URI denoting an agent (person, group, software).
OIDC (Open Id Connect) is an authentication mechanism. WebID and OIDC can be loosely coupled: WebID+OIDC. Ditto WebID+TLS. An orcid ID can act as a WebID.

Discussion followed around the incentive to join the Solid community. Developers  adhere because it is an open solution. All contribute towards the "for everyone" mission of Solid, regardless of their background. The community creates specifications and other material, based on consensus, through open discussion and participation.

Decoupling Identification from Storage from Apps can be attractive. We have to understand which part of the  method, is there to use.

Solid has a standard notification system that can be used by existing application, e.g. Zenodo or Indico. E.g. sending RSVP to both applications which are different just using the same standard. This would be a good way to demonstrate communication across systems.

The user's WebID eg. https://csarven.ca/#i refers to different preferred storage locations. Then when saving s|he is prompted with >1 options to choose.

If one annotates a part of a reseach article, the content of the annotation is stored in one's preferred storage and a notification is sent (optionally) to the article's site.

If an annotation is removed then this can be a problem for others who made a reference to it. However, this can be mitigated by linking to their (archived) persistent resources.

Solid processes

The github solid process panels and contributors advances via chats between the panelists. The editor of the panel is appointed by Sir Tim. There is a possibility to vote. This wasn't necessary so far because the panelists are few. For the moment the 6 editors have 4 different affiliations. Three of them come from inrupt.com so there is no interoperability problem. This US-based company is for-profit. Lars says why / who will justify the motives and the ethics. Sir Tim Berners-Lee co-founded inrupt and acts as its benevolent Director, to show that a company can be ethical. Still there are decisions which are opaque to its members.

There are different companies participating in the Solid project, e.g. Openlink, offering  technical solutions similar to Solid, e.g. RDF.  All these participants work together without competing over people's data or privacy.

Jose says that having pods in zenodo that move around gigabytes of data is hard to do. Also for WebIds we don't have user owners but institutions. On this point Sarven commented that this is perfectly normal. We can have multiple WebIDs and we can choose to link any of them together (whether any are pseudo-anonymous or not) or keep them off the "graph" - be unlinkable. Quick summary here.

Lars talks about Next Generation Repositories' framework (NGR). Statistics or views and downloads is a must for zenodo because the users require it.

Tim S. says that we would like to find out more about how we could collaborate.

Sarven says that one of the Solid objectives is for each to have a persona online profile and still being able to share events with others.

Lars says that moving data around between entities (institutions) can be interesting but it is not clear that Solid makes it easier and faster. NGR takes care of inter-operability OIA/OIE standards ... Jose says that Solid could join the Coalition of Open Access Repositories (COAR). Sarven comments that the Solid approach is complementary.

Hugo says the use cases we have are not covered by Solid. The examples shown by Sarven are good for end users but not service providers. The end users of our applications are not going to type URLs and run their own servers. Sarven comments that other Solid authentication/authorisation workflows exist, that require just a click.

What makes one Solid-compliant? Is the use of WebID enough? Sarven says that the possibility to read/write anonymously should make the WebID not a requirement. He reminds that, as, in Solid,the components like identification, authentication, authorization, and storage are loosely-coupled, services and applications can be configured or arranged in different ways.

About inter-operability

Mitzi shows datatransferproject.dev (repo https://github.com/google/data-transfer-project) where the giant companies participate. Maria says the question is what they do with the data once they agree to mutually transfer it and why an owner of an application like Zenodo cares about this?

Lars says that he sees as today's outcome to 'get into the practice to be looking at the Solid specs to see what can be relevant to our use cases.'

Concluding thoughts

Thomas: Nice concepts e.g. decoupling AuTH/AuthZ from app and from data. We already have this concept, also for notifications. More Solid details will be very useful and necessary for feeding ideas to these on-going developments at CERN.

Mitzi asks for links to these existing on-going CERN initiatives. Maria will get relevant links from Thomas and forward. Here they are:

Pedro says there is a lot of potential in the idea of linking data with the semantic web. There is quite some way to go until the solutions are made available, performant and easy to use for the average end user. The meetulator app is very interesting.

Eduardo asks for a clear set of the Solid components.

Maria thanks everyone for the day they devoted to this brainstorming. Understanding how busy CERN developers are, she, as CERN-Solid collaboration manager will make available the Solid process tools, development repositories and chat fora and propose a functional way to channel Solid-CERN information. CERN IT developers will pick-up, adapt, adopt, complete, comment, when appropriate. This exchange/collaboration exercise will naturally need a period of discovery and adaptation.

Solid developer Michiel de Jong mentioned to Sarven, after the meeting, that the new CERN SSO becoming a WebID-OIDC provider, for use by Indico would be very interesting. The CS3MESH project would also profit from WebID-OIDC.

The cern-solid@cern.ch e-group was created for future exchanges, with members today's CERN participants. Mitzi and Sarven are eligible to post to this e-group.

 

 

 

 

 

 

 

 

There are minutes attached to this event. Show them.