Conceived in 1977 and opened in 1986, IBM’s Almaden Research Centre is a commanding modernist bunker at the crest of a hill near San Jose. It faces away from the city and into the beautiful distance, which is as a good a metaphor as any for the work that goes on there.
Almaden houses basic and applied research in science (computer, materials and physical), storage systems and services. I recently attended a private briefing on several of the projects underway there.
The first to be presented, by Dan Gruhl, was the semantic supercomputing project, which aims to move beyond current search practices into the analysis of concepts and relationships. It takes as its corpus the entire internet, stored and dynamically updated on a 64-rack system with 4,000 processors, and housed in Almaden’s basement. The system runs at 360 teraflops per second, enough processing power to check the web for duplicates inside five minutes, once the data is loaded.
The research is not geared exclusively to the public internet. The volume of data held behind firewalls is much greater — about 80 times greater — than that comprised by the web (porn and all) and analysis of private semantic data (emails, HTML pages, memos, instant messages) implies, as Gruhl put it, “unreasonable” computational demands. IBM would certainly like to help here. But it’s the work on wild, public data that is the most fascinating.
“We’re less interested in knowing what’s popular,” Gruhl explained. “We’re more interested in guessing what’s going to be popular next week.”
The customers for such information are those engaged in compiling intelligence for either public or private purposes and, according to Gruhl, they have different needs to ordinary search engine users. “These are people who have to live in an information space for a long time,” he says.
The heart of the challenge lies in quickly analysing unstructured, informal data. And that leads down some strange paths — irony, for example.
“Irony is really hard,” said Gruhl. “But our system can understand what is really meant when someone says, ‘The most successful product launch since Windows Vista’.”
This kind of trend analysis doesn’t come cheap, and your correspondent was still battling the screaming urge to ego-search when the free look at the big search box concluded. But there are some elements of the broader project that anyone can play with. Like, for example, an RSS feed for the entire web — “We decided that would be fun,” quipped Gruhl — called The Daily Delta, which is run as a “public alpha service” from the company’s Mountain View facility.
Fringe is essentially an experiment in MySpacing IBM’s internal contacts directory. Running parallel to the formal directory, it applies all the trappings of social software: tagging, friending, blogging and, soon, an equivalent to what Berkeley researcher Danah Boyd has dubbed “identity production”. This is the facility for individuals to define themselves by roping-in and presenting external media as evidence of who they are. There’s even an internal de.licio.us-like social bookmarking feature, called Dogear.
“The fun activities are a kind of practice for real-world situations,” explained project leader Steve Farrell.
The aim of the project is to have the new directory present a “more coherent” picture of individuals within the company, and to expose and apply “evidence-driven social networks” so people who frequently co-occur can be assumed to be in groups, irrespective of their formal place in the company structure.
An employee frequently given a certain tag by others (about 1,000 people in the global workforce have tried tagging their fellows) might be assumed to be a good “hub” for that topic.
Farrell’s team is working on exposing the social information that emerges from the directory to other applications, and “unlocking” social network data from existing applications. Eventually, the system should be able to suggest a list of potential recipients for an email on a given theme.
Of course, IBM employees at work, and teens and twenty-somethings, are not doing exactly the same thing with their time online: it’s just the patterns that match.
“Instead of sharing pictures of your cat, you’re sharing PowerPoints,” says Farrell.
“What is the parallel to getting laid in a corporate environment? We talk about that all the time. It’s collaborating.”
It’s not the sort of thing you’d expect an IBM employee to say. But nor do you expect an IBM employee to turn up for a presentation in a faded Green Day t-shirt, as Farrell did, either.
Among the multitude of ideas a visitor to Alamden might come away with, is this one: it’s definitely not your Dad’s IBM any more.