yes so that means thank you very much
before we start I would like to
introduce to the people who couldn't join, when we did our first and previous update
in January Jan Schill who did a phenomenal amount of work in developing these
proof of concepts for joining Solid concepts with the Indico application
i would like to thank his professor Philippe Bonnet, who joins
us today, despite his busy schedule
I would like to thank our department head Frederic Hemmer
who joins as well thank you very much and my supervisors Thomas and Tim
and web experts Eduardo, Andreas CERNBox experts
Solid expert Michiel so i think this will be a very nice opportunity
to get insights and advice from all of you thank you
I shall share a screen now
start with the presentation, I will not be able to see the chat
but Jan will shout if something is wrong with the slides
or the sound.
so please I hope you can see the slides the blue slides thank you
so as I just said the Jan and i work together
This particular development that we explain today started after Christmas
but reference to his previous work between mid-september and december
will appear with links in one of our slides.
So what we will speak about today is a very short reminder about
the Solid philosophy and popular terms for those of you who couldn't join
the previous event, and the specifics of what this proof of concept is about
namely enhance Indico events with comments
with authentication done via Indico
and with the comments physically living in Solid
universe and taking registration for people joining Indico conferences
from Solid data. I am mentioning there pod data, the definitions are coming up
so one slide about what is Solid.
it stands for SOcial LInked Data, it was born in two thousand sixteen
Tim Berners-Lee initiated the project
the idea was that people's data live where the owner of the
data wants them to be stored and the owner of the data sets access rights to
chunks of those to different viewers,
exploiters. It combines existing standards from the web consortium
and it is really built on the existing web, you will see that every subsequent term
corresponds to web pages.
The Solid pod: pod is an english word i didn't know in two thousand nineteen
sorry standing as a protective container and in fact the Solid
pod is exactly that, it's a place on the web where one can store their own data
this data can be whatever, personal information and
pictures and movies and documents
and whatever and they are stored as Linked Data and Jan will explain the syntax
of Linked Data and you can see here a few examples of how
a person who has a pod appears on the web with a WebID
namely the the way to uniquely identify one's pod
so you have here some examples for Jan, for me
Tim Berners-Lee but now Pedro,
Thomas, Tim, have also pods and you are all welcome we have
the recommendations on how to do that in the links later on.
A Solid server is a web server where
users' pods are stored and where there is
an interface with the appropriate logic for you to decide for
this particular information my pictures i allow access to Jan Schill only
for my documents concerning my internal group IT CDA I allow access
to Thomas Baron Tim Smith
only etcetera we have a list and we have gone a long way
about existing Solid server implementations on the twenty fifth of january
and I we have left them in the appendix just to demonstrate
that despite the fact that the standard is young
there are four and increasing
five...ish implementations already up and running with their
weaknesses but still very active.
So to summarize the previous story about the ownership of data and
deciding on access etcetera on it
this nice picture shows exactly what Solid would like
to stop, to avoid, to rescue us from
that we have user logins and passwords for all
imaginable applications that they may share our data without asking us
and stuff like this. We are not there yet because none of these
popular applications say please give me your Solid pod and
I will not ask you any further but
it shall happen if we believe in it.
So our project started in september
and it wouldn't have completed in this
happy way had Jan not done this phenomenal amount of work.
At the beginning e we studied and tried to understand everything about the specifications
Michiel de Jong, connected here today, helped us a lot in interpreting
the documents and the chats which are thriving in gitter but still
needed some help to understand them,
then to understand the implementations namely what I just said about
pod providers Solid servers etcetera
and then these two modules that Jan will explain in detail
for enriching Indico, based on the solid principles.
In his thesis which is being completed tonight
he also makes recommendations, that we've discussed, about how
to go from here and we shall touch upon them
later in this presentation.
So concerning everything about Solid specifications and implementations this was done
in the autumn it was presented in january and for those who
couldn't be here with us you have the slides and you have a detailed report
linked from here that
is really
worth reading.
For the actual proof of concept I stop now, I continue going
on with the slides but Jan is going to continue
thank you thank you
yes next slide please
so this is the the first
module that I have developed so this is a screenshot of the user
interface of an Indico event and this is the comments' section
so before, Indico did not have any functionality to allow comments
and now only in the proof of concept there is a functionality where
you can see in the top right there is an input field where you can
put your WebID in and then press the log-in button it will
connect you to your external identity provider
and then you can authenticate with it and give access to your
data pods for this module. This will then allow you to
post comments and these comments all that you can see here
they are not stored in Indico, they are all stored in theirs or the author's
data pods. Indico only holds a reference to these comments because
otherwise indico wouldn't know what to load but the actual comment
is stored decentralised away from indico
if you go to next
here you can then actually see the user interface of the data
pods and you can see there's a folder or container and it
contains another container and then there's actual files in
ttl, that's the ttl that's the Linked Data format
that is being used in the Solid ecosystem to to store and
describe data
next slide
some details on the implementation for this particular module it is a
completely developed client site meaning that it runs in the
browser I browse to the indico event page and then the module
is being loaded and initialised and everything happens in the browser so
apart from the actual storage and the extra end point
that is needed to store the reference in
Indico everything is on the client site it is a self contained
application so in theory it could be re-used, I tried to build
it as self-contained as possible so that it is not it doesn't have any
Indico specific design so it could be used
maybe even in some applications that run on the Solid pod
directly in a check or something like this
it stores, as shown before, one comment and one file on the data pod
this is not particularly
efficient because every file needs to be fetched but with one request
there's some improvements for performance improvements
described in the thesis but also
it could be stored all in one file all the comments but for
the proof of concept we really wanted to make it as simple
as possible that's why it's designed in this way
the module also communicates directly with the data point that means
all the clients that browse to the Indico event page will need
to connect to the data pods directly meaning that one client
will probably make ten requests for ten comments and connect to ten different
Solid servers
It also needs an authenticated indico session, we decided to
do this just to mitigate spam, so people
not just come along and
post comments without really being
associated with indico or with CERN
and then it also then, as I said, Indico holds that reference
to this comment so it's just the URL
that is publicly available in Indico then browse to it or
the module and then note it. Next slide please.
The next module is a little bit smaller this is the
conference registration user interface for those that don't
know so Indico can also be used to facilitate conferences
and if I want to participate in this conference I need to register
and the module here is just the upper part where there is another
input field and a button indicating if I want to reuse some
data that I have stored in my data pod and I don't want to
provide my name again and type it all in I can just provide the URL
pointing to the WebID profile documents which is first of
all the WebID being my globally unique
identifier in the web now
also stores a document that is when I de-reference the URL is being
fetched and it has a lot of information that I provide that I can provide
for example my name my email address my affiliation
so by providing the URL on the top and pressing the auto-complete
the module will do a, do a fetch, get the document that is in Linked Data format
and then it will map
somehow to these input fields and then populate them and if there is
two values that are possible for example I already provided
my name but in my WebID there is a different name then it would
ask which one I want to keep
yes next slide here is the management overview of an Indico conference
and it just shows
the people that is or the users that have submitted
a conference registration that tested the conference registration
module and actually also pressed register yes
and this is a very brief overview of how the mapping looks
because in Solid everything is in Linked Data that means it is
using a subject-predicate object triplet to describe the data
that it has this allows interoperability so I have one server
implementation and another one and then I can migrate in theory my data
between those two and applications can also cleverly make use
of this data kind of like Indico did now
and the subject is me the predicate is then using vocabulary
that is public for example schema.org maybe some notice and
in this case it's the V-card vocabulary that is being used
for the full name not the first name here in the predicates
and then the object is my name and then in the Indico form because
it doesn't use any indicators
for Linked Data I cannot do a direct mapping, I had to
look at different fields or attributes are in the html code and one popular
field is the name attribute it is always provided in input field
and often it also carries some kind of indicator what data is
being expected so in this case I could just use the first name
and then map it to the vocabulary that I have, so I built
somewhat of a dictionary and then do the mapping
this is not always possible because indico also allows dynamic
fields so me as a conference
administrator I can create a register form that does not
that is completely random or completely dynamic completely made up so
gender for example this is not a field that Indico I think has
pre-defined so I can get a text field say in the label this
is the gender for some reason I want to know it and then my module
looks at the label of the html and then kind of makes a guess
what kind of data it is this is not ideal because if all of a sudden the
language of the event is not english anymore and it switches
to a different language this mapping wouldn't work so I have
different levels of the mapping and the last one is the label
but ideally we would want to have Linked Data straight into the html so
we can make better guesses yes
next some details to the to the implementation
so the design of the module is to retrieve personal information or any information
that maybe could be used for the registration
for an Indico conference from the data pod
the original idea was because it is all about storing data decentralised
was to use the registration form and take the data that the user
provides in the registration form and then put it on the data pod.
This is much more interesting I would say but it was abandoned
for this proof of concept due to several reasons and one is
sensitive payment details that have not now been introduced
but are definitely possible in the
conference registrations are so sensitive that they really
need reliable data retrieval and this might not always be possible because the users
are in control of the data so they can change the data whenever
they want they could sign up with one set of data or information
and then change it to another set and this makes it
much more complicated but also very interesting
Indico also allows the archival of events so for archival
reasons at some point it needs to have all the data, save it up
this might not be possible if a data pod all of a sudden
is not available anymore or if the person changed the data
or removed some part of the data so there is also ideas
that are all presented in the thesis but
for these reasons we abandoned the idea originally and then
went to the other one
and then also an interesting part that I've not mentioned or briefly
mentioned in the comments
the decentralised stored data need to be
fetched and if all these information and all these users use
different data pods in different locations all these
if all this information needs to be fetched and that means
with a conference of two hundred people I would in the worst
case and this is always the case that we need to
be thinking is that I would do two hundred requests if I want
to manage the conference and just look who who signed up so
because of these reasons we abandoned the first idea but then
came up with the other one
thank you thank you thank you very much Jan
in fact
we did discuss a lot and got advice from Adrian on indico, from Philippe
his professor on the implementation approach, from Michiel
with the Solid insight thanks a lot this work the technical
work of Jan has been phenomenal
So I would like to mention now what is the situation lucid
view on Solid today.
It is true that there are few applications that use Solid pods so far
the jungle of applications and those applications that can
abuse of our data à la facebook and the rest
don't use Solid pods, they don't recognise them, don't interface to them
unfortunately this is the situation today.
Also we have to make a recommendation you can see it in the
policy document about where to get a pod now
for doing actually the
experimentation with the proof of concept
and those of us like
Jan and myself who had to manipulate a lot our pods we were
very disappointed from the existing antique use and user interface
but we have entered several
issues in github for
that to be improved
we didn't see throughout these months by attending the monthly webinars
called Solid pod eh Solid World
for which you will see all the links later on we didn't see
enough support for the open source solutions like everything
else in open source it's just passionate and enthusiastic developers
because Solid is a standard and because various
needs come up as it gets more used there are adaptations of
the specifications, there are adaptations of the
approach towards access control
implementations start to deviate and this has an impact on
the test suite which fifteenth of december was in a
perfect state of a ninety eight plus a success or of the test suite results
and then proprietary implementation started to get
an approach that matches their customers' needs and we have discussions in the
in gitter concerning these issues so all of these are
truly honestly and exhaustively the imperfections of Solid.
Nevertheless despite these challenges
we see that there are governments that embrace Solid and sign official agreements
for hosting all public data
of their administrations tax offices or
in Belgium or National Health System in the UK etcetera in Solid pods.
this will be an incentive for the implementations to become
more efficient and better quality and for the interfaces to become more modern
every month in the Solid World there are at least four companies, we are
fighting there to get a slot to speak
which are startups that they come all in to present
with slogans like "Solid is the future our company leads the way"
and it would be a pain and a disappointment that CERN the birthplace of the web
is not going to be embarking in this adventure, noble adventure, early.
In the gitter chat as I said not everybody is active but there
are thousands of members
and we have not attributed huge resources no resources
Jan has done this
free of charge gratis for his own
thesis but it is strategically and ideologically important
for CERN to be engaged with Solid. This is
our opinion and therefore as a conclusion for all these reasons
we would like to say that for the moment we have our pods on a
NSS flavour server, this is open source this is the original implementation from MIT,
we have a very prominent
prométant, how do you see this
well promising server implementation coming up, the Community Solid Server
embraced and sponsored by Tim Berners-Lee and his company Inrupt
it is from the university of flanders it is open source
we could integrate it with CERN SSO, it would need development
we could get all these fantastic expertise we have at CERN
in the Web Frameworks to write an enviable UI
because it comes without storage without ID provider and without
UI you can have your own or you can pick one from two
available ones in the open source domain
or we could investigate the usage of CERNBox as a Solid server
through work that has been done between Solid and Nextcloud
and chief developer of this
endeavour Michiel de Jong, who is with us today a Solid expert
and to understand the alternatives and continue debating because
this is not conclusive but it is just recommendations on how to go forward
you can look at our policy document which is linked from here.
After concluding on this I just wanted to show you
everything about the available Solid servers, NSS
is what we use, CSS is what we recommend for the future, ESS is gonna be
probably very professional but we don't recommend it because
it's closed source, because it has
US-based storage et cetera, we don't want this
we think and there are others that are very promising php Solid
server will be integrated with SolidOS on which Tim Berners-Lee actively
does development but this is to be followed up it's not in the
in a shape that we can use it as of this summer.
So having said that
all the names that appear here I have already mentioned
thanks very much to Jan for this brilliant development, to Adrian
for advice to Pedro for support and original suggestions
to Tim Berners-Lee who always was available
to give insight and advice, to Michiel, who explained the mysteries
of Solid to us, to Ruben who is the chief
development project leader for the CSS, the one we want to migrate to
and to my leaders Thomas and Tim for approving this work.
We, you have all the references here please look them up
they are brief they're full of content
and thank you very much
maybe stop sharing so that I can see you thank you
there's a few questions in the chat that i would like to address
yes please Jan go ahead, I haven't...
The first one how do we identify users trusting an external Identity Provider
that is two questions or one
indicating the first one, for answering the first one
as I mentioned briefly so Solid decouples authentication data and
application
which means that in theory and as we also have in our recommendation
CERN could use its CERN authentication or authorization service to to implement
a Solid solution at eh at CERN and then use already the existing
authentication service that they have, so so no need to even
trust any external identity providers but
as of now with the Node Solid Server (NSS) for example it is implementing
an Identity Provider through
the usage of Solid OpenID connect which is just a flavour
of OpenID connect. yes
so in the second case the registration form data is stored
in Indico, right, and Indico doesn't keep a reference to the profile card
sorry you explain that in the second slide yes yes
yes no problem but yet this was definitely the original idea
that we wanted to store the the registration data in the data pod thank you
a lot of browser extensions and I think the question in the
end is how to protect against fake posts tracking user activity.
This is a really good question and it is also addressed
in the thesis, so one solution could be that Indico or that
a proxy is developed that would
instead of making all the requests in the browser on the client
the request would happen in Indico on the server
so a server implementation would be needed and then the Indico
instance does all the all the requests to all the different
pods and that would also mean a performance improvement because
we could cache all the results the responses that we would get
and kind of use cache warming to to keep the cache up to date but also to
make this request before even clients ask
for the data and then with one request to Indico the data
could be provided so in that sense indico would shield the fake pods
and by that and thus protect the its users and clients
but yeah definitely with the current solution definitely a possibility to
to track IP addresses of the clients connecting to to the data pods
and then i'm struggling to think of use case where the cost
of fetching multiple user data is low enough for solid pods
to be worth it on any examples. I would say all the examples said
do not require really a lot of data from other people but only
my data for example a very trivial example a "to do" application
I could just store my data in my pod and then there would be
maybe only one request to fetch the resource
and then I can have with one request all my data and then do all my
application work very performant and very fast and then mentioning
with the with the proxy that could be a performance improvement to
mitigate all the requests that have to be done.
Another question just put up a question on similar lines is for instance
can you always trust the data you get from the pod? you would
probably need sanitization mechanisms which most servers don't
have for data stored usually on the server side.
Yes also very good question also addressed in the in the thesis when
for example in the comments
module an adversary is writing
a cross-site scripting attack
which is just a javascript code that would be executed by the browser if rendered
there would be definitely sanitization needed and it's also
implemented in the comment module but this is definitely something
that needs to be thought about when developing new
Solid applications that data cannot always be trusted and it
needs to be very carefully
thought about when developing these new new Solid applications
when the data is not at hand anymore. Very good questions.
Thank you Jan very much
Of course. Concerning your Hannah's question: if i may add something
with a question mark.
When Hannah wrote what would be the
actual use case that would be worth the effort: How about
all the CERN community, today tens of thousands of users, may be
getting a pod in any of the proposed architectures
to be debated further
and all of those documents which are not to be put for example
in CERNBox which is the official dropbox for CERN
were to be stored in the people's pods and that would be something
like CERN offering its users like a
personal website in the Solid
terms and access
permissions for example today
my website cern.ch/maria it lives in afs
I had to read to register a webserver for that, maybe with using pods
this whole
process will be easier, provided of course that pod management
will be easier which is not
the case today from the experience we had
with the existing UI
but I mean the idea does it make sense
Thanks that's that's much clearer I think I was a bit thrown because
the entire concept of Solid seems to be about removing the
need for things like facebook to store data and those kind
of platforms are exactly the platforms where there are thousands
of users which would necessitate thousands of fetch requests
to all the different pods so I don't see how it works in
in the facebook kind of
example but I completely get that that CERNBox and also NHS
for medical records that kind of thing that does make a lot of sense
yeah yeah thank you for clarifying
because for example in Jan's implementation of the Indico comments
one pod owner had to allow
Indico to access the pod
and therefore then the advantage is that indeed you can put
all kinds of things on your pod and you decide
which other application is allowed to use it
I see Bob also joined that's very nice
Adrian also had a good point
suggesting and I also we also looked at this for the case where
we wanted to make the storage of data in the conference
possible on the data pod is if we rely on data not to change or we
we take versions and then sign them in Indico
so we would hash the data with some kind of signature and then we would compare
the data coming in so in that sense you
could be versioning your data and in that sense make sure that
the data is not updated when there can't be any updates anymore
Right so concerning the even further future implementation
the policy document that
I mentioned earlier
talks about how to get a CERN in-house Solid server
with the various options
and we have to take this offline that's for sure but just
now that all of you experts of Web Frameworks, CERNBox, authentication / authorisation services
would you
sort of continue advising
what is the optimal solution which is secure
least disruption for the existing operational services and
functional to continue from now?
It would be very nice now that you have an idea of the technicalities
that we we discuss
offline other brainstormings on how to go from now
If the management would like to make a closing note on whether
the proof of concept's successful completion is a wrap-up or
or is an assurance that it is worth remaining active in the
Solid ecosystem it would be also nice
oh thanks for for giving this opportunity Maria I would just
I would like to to thank you very much as well as Jan for
all the work you did Maria for the coordination and the all
the discussion you set up among the Solid stakeholders
and internally at CERN and Jan for the development.
So I think we reach really the goal we, one of the goal definitely
we fixed ourselves when starting the project which was really to understand
ah better the Solid
technology and what what it could bring to to our services
so thanks a lot for that.
As you mentioned in the presentation it also introduced some questions
and you mentioned a few of them and there were questions about them
one of the biggest hurdle when i tried the system was definitely
with the managing the pod inside the server the solidcommunity.net server
so I was wondering what are the prospects for getting a better
user experience or a better experience when managing one's pod
with the ongoing developments do you have timelines do you have
more visibility on that?
Who would like to answer would like to an...
I can if you want
I can say something briefly and then Jan will
complete or correct me.
The Solid server we recommended for the PoC duration, the stable one, Node
Solid Server namely solidcommunity dot net
we have very limited hope that it will
get a radically modern
user interface, although there are
developers who wholeheartedly try to do fixes but they are at slow pace
and the UI is so antique that it is not going to be better
it needs a radical redesign
with the recommendation of the future Solid server
we want to encourage us using,
CERN to use, the Community Solid Server
we know nothing, it will probably most naturally inherit a UI
like the one we have experience with, experience which is unsatisfactory
this is why the participation of you all here
Eduardo, Andreas, people with experience from the web services,
Web Frameworks is very valuable because we could
we have done with React
UIs very fast and very good, we could make one of our own
so it is to be discussed between us this is my input I don't
know about Jan, Michiel also Solid expert, maybe
you Michiel have seen other UIs we haven't used and they are
enviably modern...
well yeah there are a few UIs to browse the data on the pod
the main important UIs, the apps that you would use so
if you use media cracker to
to
keep track of which movies you like then it will start to data
on your pod you never actually see the pod UI.
Right yeah
so the media cracker is one of the applications that I mentioned
that they present themselves in the monthly Solid World and
in the gitter there is a lot of activity and it's very popular
and it has a good interface and it uses data living on a pod
but all of this requires from us before we just
jump from one alternative to the other that we do yet another
evaluation, we have done a first proof of concept to understand
exactly the Solid internals, the Solid status, the Solid
technology, ecosystem and implement something. We know this
is possible this worked
thanks tool to Jan and the Indico experts. Now we have to
iterate on something else, evaluating for example the the solutions
for the UIs, evaluating the proposals in the policy documents for a CERN-
based in-house Solid server and which flavour, all of these
is a work to be done nevertheless we have to decide
that we agree on the strategy to stay with Solid and do that
and we shall see I'm very happy to lay down the details step by step
of what is required to be done
when and how
thank you very much Maria and Michiel  and Jan  for the answers
and again for the work you did ah yes definitely I think a listing
proposing some use cases which would be very
useful  at CERN would
offer support for for for the continuation of the project
thank you very much
Maria, you asked  for some concluding reports
ah words from me as well so yeah I'd like to thank you both
for the contributions you've made to this evaluation
I do think it's extremely important that we were involved
as you know we had very early discussions as Solid was being formed
about the need for it
we were active in the communities that discuss data sovereignty in general
and about how to address the need for more control over how data is used
so we because we are active in all of these and because Solid
seems to be well supported as a possible solution to many of
the problems in one go
it was important for us to be there at the beginning to actually help steer it
and help give feedback at the beginning of how it might integrate
with open-source solutions that we are developing so i think
that has already been achieved we have got  an open dialogue with
with Tim and the other developers so i think what we wanted we have achieved
and we've seen the limitations we also had the same questions like
know Hannah put up about how could this possibly work and then
we sort of worked with it to understand a little bit how it might work
though the implementations aren't necessarily all there yet I  think again
in his thesis describes many things that could be done a heck
of a lot better and so I think we should stay
connected with it and I would like to propose the
that we eh that we start to invest a little bit more perhaps
in some of our applications to make sure that we stay aligned with it
and possibly put up one of these servers like I said on top of
our own cloud implementation so, but this is a proposal we
have to do that in collaboration with the
new management in the new work plan so that's what I will be
bringing up as soon as i can. Thank you
thank you very much everybody that that's very valuable input
other comments
Maria, if I may just a from the sideline and from Copenhagen a big thank you
to CERN in general and to you in particular
I think it's a it's a you know unique opportunity for Jan  and for a
student from my team to work with CERN  and work with you on
this type of project, Jan did  a great work
he still has a thesis to defend what we know has already a lot of 
good work as I just you know we would like to stay in touch
you know iI my background is in database systems and the systems
there is a set of issues that you that came up in Jan's work about
a schema matching about the users of some
you use the word  'guessing' but
some some machine learning techniques that could be done
to to much of the features from the different pods
and they're set up of issues
which are you know second
which are not in the critical path but which might be in the future
and as a university these are some of the things we can look at
Fantastic! I look very much forward to further collaboration Philippe.
thank you that will be very nice thank you
ok so with this maybe I thank everybody, really grateful for whole for
staying that long
definitely everything is possible offline
the gitter channel is in the references, join and comment there
it will be very nice to keep this alive
and not be only Jan and me who communicate on this.
thank you very much thank you
thank you