Bluesky Post 2023-07-10
Remember those growth curves above? DSA1’s temporal cluster step does not consider them. DSA2 addresses this while also focusing on selecting exemplar...
Remember those growth curves above? DSA1’s temporal cluster step does not consider them. DSA2 addresses this while also focusing on selecting exemplar...
But DSA2 is not the only possible web archive storytelling algorithm. DSA3 focuses on selecting exemplars that support the overall collection topic. I...
While DSA3 focuses on exemplars that support the collection’s overall topic, DSA4 selects *novel* exemplars that meet the overall topic. DSA4 does not...
Using our vocabulary of primitives, we have created four algorithms for different types of storytelling from web archive collections. Each has a diffe...
Why not just use a search engine to select exemplars? One challenge with this approach is that many web archive collections have little to no metadata...
How well do these algorithms compare with a search engine for exemplar selection? We synthesized WARCs from 8 different Archive-It collections and loa...
From each exemplar, we applied query-generation techniques to simulate different visitors searching for these documents. We found that DSA1-DSA4 surfa...
Additionally, we found that DSA1-DSA4 surfaced documents that are never retrievable from the search engine, even when applying known-item queries. Thu...
The documents with zero retrievability were germane to their collections. The images here show documents from collections about NASA Social Media, Del...