What about that metadata that is present? Grusky et al. (https://doi.org/10.18653/v1/N18-1065 ) realized that, because page authors create that metadata, it can serve as ground truth to evaluate #Automatic #Summarization.
We analyzed pages from #WebArchiving and saw how this metadata evolved. By 2010 we saw a metadata explosion with the use of #Twitter Cards, Open Graph Protocol, #Facebook Tracking, and more. Things like Twitter cards created a metadata renaissance for HTML.