Speaker
Ms
Tamar Sadeh
(Ex Libris)
Description
Relevance ranking of search results has become a de facto standard for information systems’ display of result lists. The unparalleled success of Google has demonstrated that a relevance-ranking algorithm that is tailored to the inherent structure of the information—in this case, the Web—indeed enables a search engine to sort results in a manner that is most suitable for end users. As a result of Google’s success, the behavior of users has altered, and today people around the globe focus on the first page of any result list regardless of the number of items in the list.
In the past, scholarly information providers were hesitant about relevance ranking because determining relevance, already a difficult task in the world of general Web searches, is even more daunting where academic searches are concerned. The relevance of materials depends on various factors, such as the searcher’s discipline, expertise, and goal. For example, academics might seek a basic article on a topic or may look for the most recent information; they might be interested in a specific aspect of an event or a phenomenon; or they might want to avoid information that is too detailed. Nevertheless, to address users’ expectations, information providers and library system vendors have been implementing relevance-ranking algorithms for scholarly systems in the last few years.
Scholarly relevance-ranking algorithms take into account the “proximity” of each item on the result list to the query entered by the user, as well as various types of information about each item. Such information may include the item’s citation rate, popularity, and availability; for physical items, the number of copies that the library holds may also play a role. Much of the success of such algorithms therefore depends on whether comprehensive information is available for every item in the result list.
The emerging mega-aggregate indexes of global scholarly materials, such as Primo Central from Ex Libris, Summon from Serials Solutions, and EBSCO Discovery Service, pose an even greater challenge. Blending results that originate from a library collection—mostly physical items such as books, journals, videos, CDs, maps, and manuscripts—with results from these indexes, which cover mostly articles and e-books, involves tasks such as comparing items that have only metadata with items that have full text; determining the proportion of items to display from a relatively small collection of a library (typically not more than a few million items) versus the global set of hundreds of millions of scholarly materials; and addressing library policies about how prominently the result list should display materials originating from the library’s own collections or specific collections that it considers of great value.
The talk will describe the role and challenges of relevance ranking in the scholarly environment and will highlight several aspects of successful relevance-ranking algorithms.
Author
Ms
Tamar Sadeh
(Ex Libris)