Bluesky Post 2023-07-10
Using our vocabulary of primitives, we have created four algorithms for different types of storytelling from web archive collections. Each has a diffe...
Using our vocabulary of primitives, we have created four algorithms for different types of storytelling from web archive collections. Each has a diffe...
Why not just use a search engine to select exemplars? One challenge with this approach is that many web archive collections have little to no metadata...
How well do these algorithms compare with a search engine for exemplar selection? We synthesized WARCs from 8 different Archive-It collections and loa...
From each exemplar, we applied query-generation techniques to simulate different visitors searching for these documents. We found that DSA1-DSA4 surfa...
Additionally, we found that DSA1-DSA4 surfaced documents that are never retrievable from the search engine, even when applying known-item queries. Thu...
The documents with zero retrievability were germane to their collections. The images here show documents from collections about NASA Social Media, Del...
Thus, the DSA algorithm model selects exemplars that are germane, but not discoverable using a state-of-the-art web archive search engine. Also, as se...
For storytelling we summarize each exemplar as a social card. Typically, page authors supply metadata for the title, description, and striking image f...
In the new paper, we also augment results from our ACM WebSci '21 paper detailing how to apply Machine Learning to select striking images for social c...