Skip to content

Social Media Portal

SMP » News

The British Library to archive a 100 terabytes of web data annually

Tim Gibbon (Social Media Portal (SMP)) - 09 April 2013

The British Library to archive a 100 terabytes of data annually to capture the web

The British Library five other libraries will archive 100 terabytes of data annually including UK webpages in a preservative initiative called ‘Capturing the Digital Universe’

The British Library logoThe British Library has started to archive UK webpages in an effort to record digital content and stories.  With regulations called the ‘legal deposit’ lifted libraries are free to collate, store and present information for future generations to access.  The Department for Culture, Media and Sport developed the regulations in conjunction with the Joint Committee on Legal Deposit, which includes representatives from the Legal Deposit Libraries and different sectors of the publishing industry.   

“Capturing our digital heritage for preservation and future research is essential. As publishers were among the first to embrace the opportunities of digital publishing, recognising advantages of dissemination beyond traditional outlets and the potential of technology to drive innovation, we welcome the extension of legal deposit to digital formats and web harvesting,” explained Angela Mills Wade, executive director of the European Publishers Council, chairman of the UK Publishers Content Forum and joint chairman of the Joint Committee on Legal Deposit.

The British Library, along with Bodleian Libraries, Cambridge University Library, the National Library of Scotland, the National Library of Wales and Trinity College Dublin have permission to archive the entire UK web including along with e-journals, e-books and other formats as shown in its promotional video. The libraries will be able to archive UK electronic publication and content including blogs, microblogging services, websites, social networks etc., as they have done with print publications such as books, magazines and newspapers.

“Legal deposit arrangements remain vitally important. Preserving and maintaining a record of everything that has been published provides a priceless resource for the researchers of today and the future," echoed Culture Minister Ed Vaizey MP.

In a British Library blog post entitled '100 websites Capturing the digital universe', the organisation outlines more in-depth why it has chosen to undertake the initiative, the top 100 sites its curators has chosen and how to get involved.   

The only limitations are the regulations that libraries need to adhere to, ensuring they are to archive websites that end in .uk, or that are created or published in the UK.  This is to be extended at the end of this year and be made accessible to researchers.  In an interview Lucie Burgess, head of content strategy at The British Library explains what the organisation does and why it will be undertaking the collection of digital media.  The libraries will collate up to 4 to 8 million websites, a petabyte over the next 10 years (equivalent to 100 terabytes of data a year).

The British Library is social from Facebook and Twitter @britishlibrary.  Keep abreast of Capturing the Digital Universe debate, news and updates via Twitter hash tag #digitaluniverse and below.

Non-profit digital library the Internet Archive founded in 1996, also known as Wayback Machine found at, is another internet and web archiving service has been archiving global web sites for a number of years.

To read full press release about the Digital Universe initiative and how to get involved. 

Got a good news story? Tell us at SMP contact.

Comments powered by Disqus