Thursday, June 15, 2006

Enterprise Grids using Virtual Infrastructure


If you look at the progress of Virtualization in the last few years one thing you will realize is that not all the ideas are new. A lot of them are borrowed from the old mainframes era. What matters most is the ability to extend the old ideas to solve new problems. For example - mainframes partitioning models do not address the new problems of present day data centers. IT infrastructure needs to manage itself. The keywords are automated, self healing and embarrasingly reliable.

Therefore the message from palo alto is to move beyond partitioning !

Grid Computing has really advanced in the last few years and commercialization of a lot of academic projects has sparked interest in new ways to find idling resources in the data center.

Even the grid computing model is borrowed from the mainframes era . Specifically enterprise grid computing which is different from the community service desktop cycle sharing projects. Enterprise Grids are more focussed and are more or less the old school mainframes era *batch jobs* rejuvenated with present day middleware.



A naive classification of grid jobs :

1) High Compute Jobs - Low Volume : Employees need to run prediction models for the next day. They submit jobs to the company web interface with the new data and some inputs.

2) High Compute Jobs - Parallel Jobs : MPI/PVM jobs. The applications are written using multiprocessor message passing interfaces. Jobs can leverage parallesism. Some jobs are sensitive to interconnect speeds. Video rendering etc.

3) Low Compute Jobs - Very high volume. EDI jobs. Convert transaction from data format A to data format B. Million transactions/minute. Also called high throughput jobs.

So does present grid computing products like Platform LSF, United Devices Synergy, Sun SGE
etc work on virtual end points ?

A grid infrastrcture usually consists of several different components. For example, a typical grid middleware will have:

  • Gatekeeper/Management Server: To manage what nodes and users are part of each Virtual Organization.

  • Resource Discovery and Monitoring : So applications on the grid can discover resources that suit their needs, and then manage them. Also called match-making.

  • Job Management: So users can submit tasks (in the form of "jobs") to the Grid. Results notification and end to end glue.

  • Security, data management, etc.

Existing grid solutions on virtual end points should just work if every Virtual Machine in the grid is treated like a Physical Machine. One fundamental difference between present grids and future grids is the added value virtualization will bring to the table. Any application. Any Operating System. Bettter monitoring. Compute Cluster is not a segregated part of the organization. It can co-exist with other services like IT and production workloads. Think Resource Pools !

My friendly neighbour developer also wants to know if virtualization can change the way applications are written. Here is an attempt to show the possibilities :

High throughput jobs like EDI don't work and scale with push based job scheduling. The monitoring approach in distributed systems usually work when the monitoring interval is more than a minute. Pull based approach where every node pulls the next job works better for high throughput.

So how about a new primitive for grid programming : VM-SPACE.
All nodes in the grid have access to a centralized single space provided by the virtualized infrastructure. Any application. Any OS. Any language !
High bandwidth on local box. NUMA semantics on local network and overlay topology over WAN.

So the app writer uses something like :

import com.vmware.VMSpace ;

class WallStreetGridApp {
// Init and the other voodoo goes here.
while (!armageddon) {
// the space abstraction is *any //language* semantic
IJob job = (IJob) getNextJob (VMSpace space) ;
processJob(job);
}
}

I am not really suggesting something new. If you were a server-side enthusiast in the last decade you must have heard about jini or java spaces. Gigaspaces is still around. I am only suggesting lowering the abstraction from JVM to the VM layer.

Which brings me to the question - How much of innovation is optimization ?