BACK TO INDEX

 Publications of year 2011
 Books and proceedings
1. Carlos A. Varela, Nalini Venkatasubramanian, and Rajkumar Buyya, editors. 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011, Newport Beach, CA, USA, May 23-26, 2011, 2011. IEEE. ISBN: 978-0-7695-4395-6. Keyword(s): distributed computing, cloud computing.
@proceedings{varela-ccgrid-2011,
editor = {Carlos A. Varela and Nalini Venkatasubramanian and Rajkumar Buyya},
title = {11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011, Newport Beach, CA, USA, May 23-26, 2011},
booktitle = {CCGRID},
publisher = {IEEE},
year = {2011},
isbn = {978-0-7695-4395-6},
url = {http://www.informatik.uni-trier.de/~ley/db/conf/ccgrid/ccgrid2011.html},
keywords = {distributed computing, cloud computing},

}


 Articles in journal, book chapters
1. Marco A.S. Netto, Christian Vecchiola, Michael Kirley, Carlos A. Varela, and Rajkumar Buyya. Use of run time predictions for automatic co-allocation of multi-cluster resources for iterative parallel applications. Journal of Parallel and Distributed Computing, 71(10):1388 - 1399, 2011. ISSN: 0743-7315. Keyword(s): concurrent programming, middleware, grid computing.
Abstract:
 Metaschedulers co-allocate resources by requesting a fixed number of processors and usage time for each cluster. These static requests, defined by users, limit the initial scheduling and prevent rescheduling of applications to other resource sets. It is also difficult for users to estimate application execution times, especially on heterogeneous environments. To overcome these problems, metaschedulers can use performance predictions for automatic resource selection. This paper proposes a resource co-allocation technique with rescheduling support based on performance predictions for multi-cluster iterative parallel applications. Iterative applications have been used to solve a variety of problems in science and engineering, including large-scale computations based on the asynchronous model more recently. We performed experiments using an iterative parallel application, which consists of benchmark multiobjective problems, with both synchronous and asynchronous communication models on Gridâ€™5000. The results show run time predictions with an average error of 7\% and prevention of up to 35\% and 57\% of run time overestimations to support rescheduling for synchronous and asynchronous models, respectively. The performance predictions require no application source code access. One of the main findings is that as the asynchronous model masks communication and computation, it requires no network information to predict execution times. By using our co-allocation technique, metaschedulers become responsible for run time predictions, process mapping, and application rescheduling; releasing the user from these burden tasks.

@article{netto-jpdc2011,
author = "Marco A.S. Netto and Christian Vecchiola and Michael Kirley and Carlos A. Varela and Rajkumar Buyya",
title = "Use of run time predictions for automatic co-allocation of multi-cluster resources for iterative parallel applications",
journal = "Journal of Parallel and Distributed Computing",
volume = "71",
number = "10",
pages = "1388 - 1399",
year = "2011",
issn = "0743-7315",
doi = "https://doi.org/10.1016/j.jpdc.2011.05.007",
url = "http://www.sciencedirect.com/science/article/pii/S0743731511001031",
pdf = "http://wcl.cs.rpi.edu/papers/netto2011.pdf",
keywords = "concurrent programming, middleware, grid computing",
abstract = {Metaschedulers co-allocate resources by requesting a fixed number of processors and usage time for each cluster. These static requests, defined by users, limit the initial scheduling and prevent rescheduling of applications to other resource sets. It is also difficult for users to estimate application execution times, especially on heterogeneous environments. To overcome these problems, metaschedulers can use performance predictions for automatic resource selection. This paper proposes a resource co-allocation technique with rescheduling support based on performance predictions for multi-cluster iterative parallel applications. Iterative applications have been used to solve a variety of problems in science and engineering, including large-scale computations based on the asynchronous model more recently. We performed experiments using an iterative parallel application, which consists of benchmark multiobjective problems, with both synchronous and asynchronous communication models on Gridâ€™5000. The results show run time predictions with an average error of 7\% and prevention of up to 35\% and 57\% of run time overestimations to support rescheduling for synchronous and asynchronous models, respectively. The performance predictions require no application source code access. One of the main findings is that as the asynchronous model masks communication and computation, it requires no network information to predict execution times. By using our co-allocation technique, metaschedulers become responsible for run time predictions, process mapping, and application rescheduling; releasing the user from these burden tasks.}
}


2. Gustavo A. Guevara S., Travis Desell, Jason Laporte, and Carlos A. Varela. Modular Visualization of Distributed Systems. CLEI Electronic Journal, 14:1-17, April 2011. Note: Best papers from CLEI 2010. Keyword(s): distributed computing, distributed systems visualization, network topologies.
Abstract:
 Effective visualization is critical to developing, analyzing, and optimizing distributed systems. We have developed OverView, a tool for online/offline distributed systems visualization, that enables modular layout mechanisms, so that different distributed system high-level programming abstractions such as actors or processes can be visualized in intuitive ways. OverView uses by default a hierarchical concentric layout that distinguishes entities from containers allowing migration patterns triggered by adaptive middleware to be visualized. In this paper, we develop a force-directed layout strategy that connects entities according to their communication patterns in order to directly exhibit the application communication topologies. In force-directed visualization, entitiesâ€™ locations are encoded with different colors to illustrate load balancing. We compare these layouts using quantitative metrics including communication to entity ratio, applied on common distributed application topologies. We conclude that modular visualization is necessary to effectively visualize distributed systems since no one layout is best for all applications.

@article{guevara-clei2011,
title = {Modular Visualization of Distributed Systems},
author = {Gustavo A. Guevara S. and Travis Desell and Jason Laporte and Carlos A. Varela},
journal = {CLEI Electronic Journal},
volume = 14,
pages = {1-17},
month = April,
year = 2011,
pdf = "http://wcl.cs.rpi.edu/papers/guevara2011.pdf",
keywords = "distributed computing, distributed systems visualization, network topologies",
note = "Best papers from CLEI 2010",
abstract = {Effective visualization is critical to developing, analyzing, and optimizing distributed systems. We have developed OverView, a tool for online/offline distributed systems visualization, that enables modular layout mechanisms, so that different distributed system high-level programming abstractions such as actors or processes can be visualized in intuitive ways. OverView uses by default a hierarchical concentric layout that distinguishes entities from containers allowing migration patterns triggered by adaptive middleware to be visualized. In this paper, we develop a force-directed layout strategy that connects entities according to their communication patterns in order to directly exhibit the application communication topologies. In force-directed visualization, entitiesâ€™ locations are encoded with different colors to illustrate load balancing. We compare these layouts using quantitative metrics including communication to entity ratio, applied on common distributed application topologies. We conclude that modular visualization is necessary to effectively visualize distributed systems since no one layout is best for all applications.}
}


 Conference articles
1. Travis Desell, Malik Magdon-Ismail, Heidi Newberg, Lee A. Newberg, Boleslaw K. Szymanski, and Carlos A. Varela. A Robust Asynchronous Newton Method for Massive Scale Computing Systems. In International Conference on Computational Intelligence and Software Engineering (CiSE), Wuhan, China, December 2011. Keyword(s): distributed computing, distributed systems, scientific computing.
Abstract:
 Volunteer computing grids offer super-computing levels of computing power at the relatively low cost of operating a server. In previous work, the authors have shown that it is possible to take traditionally iterative evolutionary algorithms and execute them on volunteer computing grids by performing them asynchronously. The asynchronous implementations dramatically increase scalability and decrease the time taken to converge to a solution. Iterative and asynchronous optimization algorithms implemented using MPI on clusters and supercomputers, and BOINC on volunteer computing grids have been packaged together in a framework for generic distributed optimization (FGDO). This paper presents a new extension to FGDO for an asynchronous Newton method (ANM) for local optimization. ANM is resilient to heterogeneous, faulty and unreliable computing nodes and is extremely scalable. Preliminary results show that it can converge to a local optimum significantly faster than conjugate gradient descent does.

@InProceedings{desell-cise-2011,
author = {Travis Desell and Malik Magdon-Ismail and Heidi Newberg and Lee A. Newberg and Boleslaw K. Szymanski and Carlos A. Varela},
title = {A Robust Asynchronous Newton Method for Massive Scale Computing Systems},
booktitle = {International Conference on Computational Intelligence and Software Engineering (CiSE)},
year = 2011,
month = {December},
pdf = {http://wcl.cs.rpi.edu/papers/cise2011.pdf},
keywords = {distributed computing, distributed systems, scientific computing},
abstract = {Volunteer computing grids offer super-computing levels of computing power at the relatively low cost of operating a server. In previous work, the authors have shown that it is possible to take traditionally iterative evolutionary algorithms and execute them on volunteer computing grids by performing them asynchronously. The asynchronous implementations dramatically increase scalability and decrease the time taken to converge to a solution. Iterative and asynchronous optimization algorithms implemented using MPI on clusters and supercomputers, and BOINC on volunteer computing grids have been packaged together in a framework for generic distributed optimization (FGDO). This paper presents a new extension to FGDO for an asynchronous Newton method (ANM) for local optimization. ANM is resilient to heterogeneous, faulty and unreliable computing nodes and is extremely scalable. Preliminary results show that it can converge to a local optimum significantly faster than conjugate gradient descent does.}

}


2. Travis Desell, Benjamin A. Willet, Matthew Arsenault, Heidi Newberg, Malik Magdon-Ismail, Boleslaw Szymanski, and Carlos A. Varela. Evolving N-Body Simulations to Determine the Origin and Structure of the Milky Way Galaxy's Halo using Volunteer Computing. In IPDPS'11 Fifth Workshop on Desktop Grids and Volunteer Computing Systems (PCGrid 2011), 2011. Keyword(s): distributed computing, scientific computing.
Abstract:
 This work describes research done by the MilkyWay@Home project to use N-Body simulations to model the formation of the Milky Way Galaxy's halo. While there have been previous efforts to use N-Body simulations to perform astronomical modeling, to our knowledge this is the first to use evolutionary algorithms to discover the initial parameters to the N-Body simulations so that they accurately model astronomical data. Performing a single 32,000 body simulation can take up to 200 hours on a typical processor, with an average of 15 hours. As optimizing the input parameters to these N-Body simulations typically takes at least 30,000 or more simulations, this work is made possible by utilizing the computing power of the 35,000 volunteered hosts at the MilkyWay@Home project, which are currently providing around 800 teraFLOPS. This work also describes improvements to an open-source framework for generic distributed optimization (FGDO), which provide more efficient validation in performing these evolutionary algorithms in conjunction the Berkeley Open Infrastructure for Network Computing (BOINC).

@inproceedings{desell-pcgrid2011,
title = {Evolving N-Body Simulations to Determine the Origin and Structure of the Milky Way Galaxy's Halo using Volunteer Computing},
author = {Travis Desell and Benjamin A. Willet and Matthew Arsenault and Heidi Newberg and Malik Magdon-Ismail and Boleslaw Szymanski and Carlos A. Varela},
booktitle = "IPDPS'11 Fifth Workshop on Desktop Grids and Volunteer Computing Systems (PCGrid 2011)",
year = 2011,
pdf = "http://wcl.cs.rpi.edu/papers/desell2011.pdf",
keywords = "distributed computing, scientific computing",
abstract = {This work describes research done by the MilkyWay@Home project to use N-Body simulations to model the formation of the Milky Way Galaxy's halo. While there have been previous efforts to use N-Body simulations to perform astronomical modeling, to our knowledge this is the first to use evolutionary algorithms to discover the initial parameters to the N-Body simulations so that they accurately model astronomical data. Performing a single 32,000 body simulation can take up to 200 hours on a typical processor, with an average of 15 hours. As optimizing the input parameters to these N-Body simulations typically takes at least 30,000 or more simulations, this work is made possible by utilizing the computing power of the 35,000 volunteered hosts at the MilkyWay@Home project, which are currently providing around 800 teraFLOPS. This work also describes improvements to an open-source framework for generic distributed optimization (FGDO), which provide more efficient validation in performing these evolutionary algorithms in conjunction the Berkeley Open Infrastructure for Network Computing (BOINC).}
}


3. Shigeru Imai and Carlos A. Varela. Light-Weight Adaptive Task Offloading from Smartphones to Nearby Computational Resources. In Research in Applied Computation Symposium (RACS 2011), Miami, Florida, November 2011. Keyword(s): distributed computing, distributed systems.
Abstract:

@InProceedings{imai-varela-iphone-racs-2011,
author = {Shigeru Imai and Carlos A. Varela},
booktitle = {Research in Applied Computation Symposium (RACS 2011)},
year = 2011,
month = {November},
pdf = {http://wcl.cs.rpi.edu/papers/racs2011.pdf},
keywords = {distributed computing, distributed systems},
}


4. Qingling Wang and Carlos A. Varela. Impact of Cloud Computing Virtualization Strategies on Workloads' Performance. In 4th IEEE/ACM International Conference on Utility and Cloud Computing(UCC 2011), Melbourne, Australia, December 2011. Keyword(s): distributed computing, distributed systems, cloud computing.
Abstract:
 Cloud computing brings significant benefits for service providers and users because of its characteristics: \emph{e.g.}, on demand, pay for use, scalable computing. Virtualization management is a critical task to accomplish effective sharing of physical resources and scalability. Existing research focuses on live Virtual Machine (VM) migration as a workload consolidation strategy. However, the impact of other virtual network configuration strategies, such as optimizing total number of VMs for a given workload, the number of virtual CPUs (vCPUs) per VM, and the memory size of each VM has been less studied. This paper presents specific performance patterns on different workloads for various virtual network configuration strategies. For loosely coupled CPU-intensive workloads, on an 8-CPU machine, with memory size varying from 512MB to 4096MB and vCPUs ranging from 1 to 16 per VM, 1, 2, 4, 8 and 16VMs configurations have similar running time. The prerequisite of this conclusion is that all 8 physical processors are occupied by vCPUs. For tightly coupled CPU-intensive workloads, the total number of VMs, vCPUs per VM, and memory allocated per VM, become critical for performance. We obtained the best performance when the ratio of the total number of vCPUs to processors is 2. Doubling the memory size on each VM, for example from 1024MB to 2048MB, gave us at most 15% improvement of performance when the ratio of total vCPUs to physical processors is 2. This research will help private cloud administrators decide how to configure virtual resources for given workloads to optimize performance. It will also help public cloud providers know where to place VMs and when to consolidate workloads to be able to turn on/off Physical Machines (PMs), thereby saving energy and associated cost. Finally it helps cloud service users decide what kind of and how many VM instances to allocate for a given workload and a given budget.

@InProceedings{wang-varela-ucc-2011,
author = {Qingling Wang and Carlos A. Varela},
title = {Impact of Cloud Computing Virtualization Strategies on Workloads' Performance},
booktitle = {4th IEEE/ACM International Conference on Utility and Cloud Computing(UCC 2011)},
year = 2011,
month = {December},
pdf = {http://wcl.cs.rpi.edu/papers/ucc2011.pdf},
keywords = {distributed computing, distributed systems, cloud computing},
abstract = {Cloud computing brings significant benefits for service providers and users because of its characteristics: \emph{e.g.}, on demand, pay for use, scalable computing. Virtualization management is a critical task to accomplish effective sharing of physical resources and scalability. Existing research focuses on live Virtual Machine (VM) migration as a workload consolidation strategy. However, the impact of other virtual network configuration strategies, such as optimizing total number of VMs for a given workload, the number of virtual CPUs (vCPUs) per VM, and the memory size of each VM has been less studied. This paper presents specific performance patterns on different workloads for various virtual network configuration strategies. For loosely coupled CPU-intensive workloads, on an 8-CPU machine, with memory size varying from 512MB to 4096MB and vCPUs ranging from 1 to 16 per VM, 1, 2, 4, 8 and 16VMs configurations have similar running time. The prerequisite of this conclusion is that all 8 physical processors are occupied by vCPUs. For tightly coupled CPU-intensive workloads, the total number of VMs, vCPUs per VM, and memory allocated per VM, become critical for performance. We obtained the best performance when the ratio of the total number of vCPUs to processors is 2. Doubling the memory size on each VM, for example from 1024MB to 2048MB, gave us at most 15% improvement of performance when the ratio of total vCPUs to physical processors is 2. This research will help private cloud administrators decide how to configure virtual resources for given workloads to optimize performance. It will also help public cloud providers know where to place VMs and when to consolidate workloads to be able to turn on/off Physical Machines (PMs), thereby saving energy and associated cost. Finally it helps cloud service users decide what kind of and how many VM instances to allocate for a given workload and a given budget.}
}


 Miscellaneous
1. Qingling Wang. Middleware for Autonomous Reconfiguration of Virtual Machines. Master's thesis, Rensselaer Polytechnic Institute, August 2011. Keyword(s): distributed computing, cloud computing, middleware.
Abstract:
 Cloud computing brings significant benefits for service providers and service users because of its characteristics: e.g., on demand, pay for use, scalable computing. Virtualization management is a critical component to accomplish effective sharing of physical resources and scalability. Existing research focuses on live Virtual Machine (VM) migration as a VM consolidation strategy. However, the impact of other virtual network configuration strategies, such as optimizing total number of VMs for a given workload, the number of virtual CPUs (vCPUs) per VM, and the memory size of each VM has been less studied. This thesis presents specific performance patterns on different workloads for various virtual network configuration strategies. We conclude that, for loosely coupled CPU-intensive workloads, memory size and number of vCPUs per VM do not have significant performance effects. On an 8-CPU machine, with memory size varying from 512MB to 4096MB and vCPUs ranging from 1 to 16 per VM; 1, 2, 4, 8 and 16VM configurations have similar running time. The prerequisite of this conclusion is that all 8 physical processors be occupied by vCPUs. For tightly coupled CPU-intensive workloads, the total number of VMs, vCPUs per VM and memory allocated per VM become critical for performance. We obtained the best performance when the ratio of total number of vCPUs to processors is 2. Doubling memory size on each VM, for example from 1024MB to 2048MB, brings at most 15% improvement of performance when number of VMs is greater than 2. Based on the experimental results, we propose a framework and a threshold-based strategy set to dynamically refine virtualization configurations. The framework mainly contains three parts: resources monitor, virtual network configuration controller and scheduler, which are responsible for monitoring resource usage on both virtual and physical layers, controlling virtual resources distribution, and scheduling concrete reconfiguration steps respectively. Our reconfiguration approach consists of four strategies: VM migration and VM malleability strategies, which are at global level, vCPU tuning and memory ballooning, which are at local level. The strategies evaluate and trigger specific reconfiguration steps (for example, double the number of vCPUs on each VM) by comparing current allocated resources and corresponding utilizations with expected values. The evaluation experimental results of threshold-based strategy show that reconfiguration in global level works better for tightly coupled CPU-intensive workloads than for loosely coupled ones. Local reconfiguration including dynamically changing number of vCPUs and memory size allocated to VMs, improves the performance of initially sub-optimal virtual network configurations, even though it falls short of performing as well as the initially optimal virtual network configurations. This research will help private cloud administrators decide how to configure virtual resources for a given workload to optimize performance. It will also help service providers know where to place VMs and when to consolidate workloads to be able to turn on/off Physical Machines (PMs), thereby saving energy and associated costs. Finally it let service users know what kind of and how many VM instances to allocate in a public cloud for a given workload and budget.

@MastersThesis{wang-autonomousreconfiguration-2011,
author = {Qingling Wang},
title = {Middleware for Autonomous Reconfiguration of Virtual Machines},
school = {Rensselaer Polytechnic Institute},
year = 2011,
month = {Aug},
pdf = "http://wcl.cs.rpi.edu/theses/qinglingwang-master.pdf",
keywords = "distributed computing, cloud computing, middleware",
abstract = {Cloud computing brings significant benefits for service providers and service users because of its characteristics: e.g., on demand, pay for use, scalable computing. Virtualization management is a critical component to accomplish effective sharing of physical resources and scalability. Existing research focuses on live Virtual Machine (VM) migration as a VM consolidation strategy. However, the impact of other virtual network configuration strategies, such as optimizing total number of VMs for a given workload, the number of virtual CPUs (vCPUs) per VM, and the memory size of each VM has been less studied. This thesis presents specific performance patterns on different workloads for various virtual network configuration strategies. We conclude that, for loosely coupled CPU-intensive workloads, memory size and number of vCPUs per VM do not have significant performance effects. On an 8-CPU machine, with memory size varying from 512MB to 4096MB and vCPUs ranging from 1 to 16 per VM; 1, 2, 4, 8 and 16VM configurations have similar running time. The prerequisite of this conclusion is that all 8 physical processors be occupied by vCPUs. For tightly coupled CPU-intensive workloads, the total number of VMs, vCPUs per VM and memory allocated per VM become critical for performance. We obtained the best performance when the ratio of total number of vCPUs to processors is 2. Doubling memory size on each VM, for example from 1024MB to 2048MB, brings at most 15% improvement of performance when number of VMs is greater than 2. Based on the experimental results, we propose a framework and a threshold-based strategy set to dynamically refine virtualization configurations. The framework mainly contains three parts: resources monitor, virtual network configuration controller and scheduler, which are responsible for monitoring resource usage on both virtual and physical layers, controlling virtual resources distribution, and scheduling concrete reconfiguration steps respectively. Our reconfiguration approach consists of four strategies: VM migration and VM malleability strategies, which are at global level, vCPU tuning and memory ballooning, which are at local level. The strategies evaluate and trigger specific reconfiguration steps (for example, double the number of vCPUs on each VM) by comparing current allocated resources and corresponding utilizations with expected values. The evaluation experimental results of threshold-based strategy show that reconfiguration in global level works better for tightly coupled CPU-intensive workloads than for loosely coupled ones. Local reconfiguration including dynamically changing number of vCPUs and memory size allocated to VMs, improves the performance of initially sub-optimal virtual network configurations, even though it falls short of performing as well as the initially optimal virtual network configurations. This research will help private cloud administrators decide how to configure virtual resources for a given workload to optimize performance. It will also help service providers know where to place VMs and when to consolidate workloads to be able to turn on/off Physical Machines (PMs), thereby saving energy and associated costs. Finally it let service users know what kind of and how many VM instances to allocate in a public cloud for a given workload and budget.}
}


2. Carlos A. Varela. Flexible Software Technology for Scalable Cloud Computing. NSF Workshop on the Science of Cloud Computing, Arlington, Virginia, U.S.A., March 17-18, 2011., March 2011. Keyword(s): distributed computing, cloud computing.
@Misc{varela-cloud-2011,
author = {Carlos A. Varela},
title = {Flexible Software Technology for Scalable Cloud Computing},
howpublished = {NSF Workshop on the Science of Cloud Computing, Arlington, Virginia, U.S.A., March 17-18, 2011.},
month = {March},
year = 2011,
keywords = {distributed computing, cloud computing},
pdf = {http://nsfcloud2011.cs.ucsb.edu/papers/Varela_Paper.pdf}
}


BACK TO INDEX

Disclaimer:

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.