Remember that Tweet someone sent you a couple of weeks ago? That funny one? With the link? Yeah, you didn’t click the link at the time because you were busy, but now you’ve got time you can go back and see what all the fuss was about.
Well, good luck with that. Twitter’s archives have been a kind of ‘terra obscura’ for the last couple of years and these days, if you haven’t saved a Tweet before it falls off your timeline, you can kiss it goodbye.
Although Twitter keeps copies of all messages sent over its system, it doesn’t let you see them without a URL. If you have a direct link to an old Tweet, it’s still there, but you won’t find it using Twitter’s search via their website or API.
Indeed, Twitter has become increasingly anti-search. Its advanced search page isn’t linked off the Twitter home page anymore, and it no longer allows you to specify dates to search between. If you want to do that, you have to use Twitter’s advanced search operatorssince: and until:.
But you still won’t get access to archived messages because anything older than a week is simply “unavailable”.
But hey, it’s ok because there’s always Google, right?
At the risk of sounding like a broken record, good luck with that too. Even when you specify the domain Twitter.com: in a Google search, you won’t get a nice, neat list of Tweets in chronological order, you’ll get a random selection of Tweets that match your criteria from a variety of sources scattered over the web.
Google used to have an agreement with Twitter, licensing all its Tweets to provide the basis of Google Realtime, a decent if not brilliant way to access Twitter’s archives. But the two-year deal expired in July 2011 and wasn’t renewed, and Google shuttered Realtime.
Oh, but wait, we’ve still got the Library of Congress? In April 2010 the Library of Congress announced a deal with Twitter to acquire every public Tweet since Twitter launched in 2006. The technical challenge of archiving, transferring and then making available billions of 140 character messages available via search is, of course, huge.
But the Library of Congress has had practice, as USAToday says:
The Library has gathered content from the web since it began harvesting congressional and presidential campaign websites in 2000. It stores more than 167 terabytes of web-based information, including legal blogs, websites of candidates for national office, and websites of Members of Congress. It also operates the National Digital Information Infrastructure and Preservation Program, which is collecting, preserving and making available significant digital content.
Since that announcement 20 months ago, we’ve heard very little about how the transfer is going, but now some more details have dribbled out. Today, the Library of Congress’ Bill Lefurgy spoke to Federal News Radio about the project, saying:
"We have an agreement with Twitter where they have a bunch of servers with their historic archive of tweets, everything that was sent out and declared to be public. They've had to do some pretty nifty experimentation and invention to develop the tools and a process to be able to move all of that data over to us.”
That might sound exciting, as if progress is being made, but I’m afraid that I have bad news for you. The likes of you and me won’t be able to search the archive when it’s ready — that’s a privilege reserved for researchers and academics.
In the Federal News Radio audio interview, Lefurgy says that they aren’t going to let people just search the archive but are instead looking at ways to provide researchers with blocks of data. Not very useful for the vast majority of Twitter users.
But why should you care? Surely Tweets are just ephemera, sent into the ether to amuse, inform or irritate and not meant to last? Well, ironically, Lefurgy put it well:
"We were excited to be involved with acquiring the Twitter archives because it's a unique record of our time. It’s also a unique way of communication. It's not so much that people are going to be interested in what you or I had for lunch, which some people like to say on Twitter."
It’s not just institutional researchers and academics who’d like to be able to search Twitter further back than seven days. Journalists, educationalists, non-academic researchers, and just the downright curious might want to see into Twitter’s past. Hell, you might even want to just be able to go back and check that thing that person said last month.
I do tend to get a bit exercised about this sort of problem, primarily because I’m not just a journalist, I am also commissioned to do research. Web companies like Twitter, Facebook and Google are, ironically, making it very, very difficult for me to gather and analyse data.
We, as a global society, are creating through our day-to-day use of social media a vast and incredibly rich record of both our day-to-day lives and the events, mundane and extraordinary, that we experience. Yet all this valuable data is being effectively lost, not because it is being thrown away, but simply because we cannot access or process it in any meaningful way. Such an oversight makes no sense to me at all. For the sake of search, what insights will we lose?
Published Date: Dec 07, 2011 09:53 pm | Updated Date: Dec 07, 2011 09:53 pm