-
10:00
Research Track – "Web Content Analysis"
-
Session Chair: Michael Granitzer (University of Passau, Germany)
-
10:01
"The Impact of Main Content Extraction on Near-Duplicate Detection"
-
Maik Fröbe (Martin-Luther-Universität Halle, Germany)
(Martin-Luther-Universität Halle, Germany)
-
10:20
"FastWARC: Optimizing Large-Scale Web Archive Analytics"
-
Janek Bevendorff (Bauhaus-Universität Weimar)
(Bauhaus-Universität Weimar)
-
10:40
"Creating a Dataset for Keyphrase Extraction in Physics Publications and Patents"
-
Andre Rattinger (ISDS, Graz University of Technology, Austria)
(Graz University of Technology (AT))
-
11:00
"Understanding Websites"
-
Ronny Lam (Skrodon)
(HNW.NU)