IBM Races H5N1
IBM Races H5N1
By John Russell
Sept. 18, 2006 | Sometimes it’s good to be Goliath. Unlike most (perhaps all) of its IT brethren, IBM has the size and breadth of technology expertise to make waves in basic research beyond IT and to tackle global projects that enable Big Blue to do well by doing good. The Global Pandemic Initiative (GPI) formally launched in May is a perfect example of that capacity.
“[It’s] an attempt to do a group of projects we believe will basically help the world get ready for the possibility of a 1918-like influenza pandemic,” says Joseph Jasinski, program director, healthcare and life sciences, at IBM Thomas Watson Research Center, Hawthorne, N.Y. “The economic impact would be more devastating, we believe, than in 1918 when the world was a relatively isolated place.”
That’s a tall order but reflective of Big Blue ambition. IBM Research is the technology engine underpinning many of IBM’s eye-catching projects, and the Watson Lab, with its abundance of technological exotica and supercomputing resources, is the soul of IBM Research.
Consider the scope of the GPI. IBM assembled and advisory board of world health organizations, NGOs, and universities and “basically asked them what they thought IT could do or a company like IBM could do that wasn’t currently being done in helping to get the world ready,” says Jasinski. What emerged are three projects currently under way:
• Checkmate is an effort to computationally model the influenza virus family and to “anticipate” problematic mutations in flu virus
• STEM (spatio temporal epidemiological monitor) is a nearly finished open-source modeling framework intended to help health officials worldwide build better predictive models and play more what-if games in terms of public planning scenarios
• Work with MECIDS (Middle East Consortium for Infectious Disease Surveillance) hopes to create an interoperability framework to share food- and waterborne disease information in real time.
Clearly, IBM hopes to leverage the resulting technology and public goodwill to create future business opportunities (and why not), but the projects also have clear public benefit and collectively represent a scale of undertaking that would deter smaller companies.
“[Checkmate is] a project between the Scripps Institute in Florida and IBM Research using their experimental biology and our Blue Gene supercomputer in the Watson lab, which is the second fastest machine in the world currently,” says Jasinski. “The name Checkmate comes from playing chess with the virus and ultimately figuring our where its going before it gets there.”
“We are looking at the phylogeny of all influenza viruses, H5N1, being a particularly nasty one,” says Ajay Royyuru, senior manager, computational biology center. “What if we could understand all the potential mutations that are available to influenza viruses and figure out ways in which we can recognize new variation, particularly harmful ones, before the viruses has chance to get into those evolutionary niches?”
Traditional approaches, says Royyuru, are often too slow: “You look at what exists today in bird and human populations; you characterize that strain; and you develop vaccines or antibodies or therapeutics against that particular known strain. I’m not dismissing that strategy. It’s quite useful, but there is a hit-or-miss attribute to this reactive strategy, and the problem is you do not get enough time to react.”
“What if the target is actually moving faster than the time it takes for you to react? Then you’re basically caught without an appropriate response strategy if the virus is evolving very fast. Which is the case when you have widespread infection in either animal or human population. The selection pressure on the virus is enormous at that point, and you will have a lot of escaped mutants occurring quite rapidly,” he says.
Large-scale computing is the key to tackling projects such as Checkmate, and Royyuru divides it into two branches: data driven and compute intensive.
“We get quantities of data that were unimaginable just a decade or 15 years ago. Data-driven computing is basically allowing you to take more data and make sense of it. And it’s not a huge amount of computing. It’s not a huge amount of flops. It’s just being able to put a lot of data together and draw arrows between points of data to conceptualize what the connectivity is and integrate the data,” he says.
The problem is, he says, you don’t really know “what happens when you throw the switch on. So it’s the dynamics and the interactions between the entities which you are trying to probe, not just the presence or absence of the entity or just the relationships between them.” Modeling and simulation is the compute-intensive part of large-scale computing and Blue Gene’s strength.
Checkmate requires both large data set handling and simulation horsepower. Other GPI projects, such as STEM, are less compute intensive but consume lots of data and will be only effective if many users can access it and contribute data — an ideal application for grid computing such as the World Community Grid developed by IBM.
STEM is currently available free for use for nonprofits on IBM’s alpha site, as a standalone application will soon be passed to an open-source community, Eclipse, run by the Open Healthcare Framework. The idea is to build an open-source modeling community of experts around the world who might contribute their unique data sets. Large chunks of geographic and infrastructure data are already included.
“We have a few pieces to fill in [from less developed regions]. But hopefully if we can get enough interest from the community, you might find somebody for example who’s an expert in migratory bird pathways and has the world’s best data set on where birds go, particularly relevant in H5N1. From the poultry industry or university research on poultry you might get data sets on where all the chicken in the worlds are,” says Jasinski.
IBM envisions health organizations creating specific models for their particular use. “You know, 14 people showed up sick in NYC in Queens with this particular disease on Monday. How many cases will there be by Tuesday? What are the next cities to be impacted and so forth? If you can do that at a reasonably accurate level then you can start to try to develop rational response strategies, for example, should I close the airports? Will that do any good in this day and age? Where should I position antiviral drugs or vaccines?”
There could even be commercial applications, says Jasinski. “We can imagine it being used as the front end to other commercial products. If I wanted to understand how my business would fare depending on where my locations are and what kind business I’m in and what my cash flows are like and that kind of stuff. You can imagine using the epidemiological model as an input to that kind of an analysis.”
The MECIDS project is still in planning. Its membership consist of the ministries of health of Israel, Jordan, and the Palestinian Authority, and since the United States has embargoed Hamas, IBM is required us to get an additional license, which it’s now doing.
One question is whether there is a shortage of biologists sufficiently comfortable with high-end computing to use tools such as Blue Gene to attack complex problems. Says Royyuru, “I think there is a small gap that still needs to be bridged between folks who do computation, computational biologists included, versus biology as field or discipline as a whole. We’re beginning to make some progress and getting people to talk to each other and understand each other more, but a lot more needs to be done.”
Ever the optimist, Royyuru adds, “Do we have enough compute capability to answer all the complexity that we know exists in biology? Certainly not. [But] I think for simple molecular processes we are beginning to approach that point.”
Email John Russell at firstname.lastname@example.org.