Publications of year 2017 |
Articles in journal, book chapters |
Ubiquitous sensing is pervasive in society for such applications as biometrics for health care, smart grids for power delivery, and avionics for transportation safety. As society continues to rely ever more on sensors for various applications, there is a need to address the accuracy of sensor readings for health maintenance, signal identification, and control. While there have been advances in information fusion for avionics control and user warnings, there is still a need for further research in methods that allow for fault detection and recovery techniques to be easily realized and implemented with minimal risk of software errors. |
@article{imai-pilots-ieee-aesm-2017, title = {Airplane Flight Safety Using Error-Tolerant Data Stream Processing}, author = {Shigeru Imai and Erik Blasch and Alessandro Galli and Wennan Zhu and Frederick Lee and Carlos A. Varela}, journal = {IEEE Aerospace and Electronics Systems Magazine}, volume = {32}, number = {4}, year = 2017, pages = {4-17}, pdf = {http://wcl.cs.rpi.edu/papers/pilots-aesm.pdf}, url = {http://www.brightcopy.net/allen/aesm/32-4/index.php#/6}, keywords = {programming languages, cyber physical systems, data streaming}, abstract = {Ubiquitous sensing is pervasive in society for such applications as biometrics for health care, smart grids for power delivery, and avionics for transportation safety. As society continues to rely ever more on sensors for various applications, there is a need to address the accuracy of sensor readings for health maintenance, signal identification, and control. While there have been advances in information fusion for avionics control and user warnings, there is still a need for further research in methods that allow for fault detection and recovery techniques to be easily realized and implemented with minimal risk of software errors.} }
In sensor-based systems, spatio-temporal data streams are often related in non-trivial ways. For example in avionics, while the airspeed that an aircraft attains in cruise phase depends on the weight it carries, it also depends on many other factors such as engine inputs, angle of attack, and air density. It is therefore a challenge to develop failure models that can help recognize errors in the data, such as an incorrect fuel quantity or an incorrect airspeed. In this paper, we present a highly-declarative programming framework that facilitates the development of self-healing avionics applications, which can detect and recover from data errors. Our programming framework enables specifying expert-created failure models using error signatures, as well as learning failure models from data. To account for unanticipated failure modes, we propose a new dynamic Bayes classifier, that detects outliers and upgrades them to new modes when statistically significant. We evaluate error signatures and our dynamic Bayes classifier for accuracy, response time, and adaptability of error detection. While error signatures can be more accurate and responsive than dynamic Bayesian learning, the latter method adapts better due to its data-driven nature. |
@Article{imai-chen-zhu-varela-clustercomp-2017, author = {Shigeru Imai and Sida Chen and Wennan Zhu and Carlos A. Varela}, title = {Dynamic Data-Driven Learning for Self-Healing Avionics}, journal = {Cluster Computing}, year = {2017}, month = {Nov}, issn = {1573-7543}, doi = {10.1007/s10586-017-1291-8}, pdf = {http://wcl.cs.rpi.edu/papers/pilots-cluster.pdf}, url = {http://rdcu.be/yJNh}, keywords = {programming languages, cyber physical systems, data streaming}, abstract = {In sensor-based systems, spatio-temporal data streams are often related in non-trivial ways. For example in avionics, while the airspeed that an aircraft attains in cruise phase depends on the weight it carries, it also depends on many other factors such as engine inputs, angle of attack, and air density. It is therefore a challenge to develop failure models that can help recognize errors in the data, such as an incorrect fuel quantity or an incorrect airspeed. In this paper, we present a highly-declarative programming framework that facilitates the development of self-healing avionics applications, which can detect and recover from data errors. Our programming framework enables specifying expert-created failure models using error signatures, as well as learning failure models from data. To account for unanticipated failure modes, we propose a new dynamic Bayes classifier, that detects outliers and upgrades them to new modes when statistically significant. We evaluate error signatures and our dynamic Bayes classifier for accuracy, response time, and adaptability of error detection. While error signatures can be more accurate and responsive than dynamic Bayesian learning, the latter method adapts better due to its data-driven nature.} }
Conference articles |
Recently, a new concept called desktop cloud emerged, which was developed to offer cloud computing services on non-dedicated resources. Similarly to cloud computing, desktop clouds are based on virtualization, and like other computational systems, may experience faults at any time. As a consequence, reliability has become a concern for researchers. Fault-tolerance strategies focused on independent virtual machines include snapshots (checkpoints) to resume the execution from a healthy state of a virtual machine on the same or another host, which is trivial because hypervisors provide this function. However, it is not trivial to obtain a global snapshot of a distributed system formed by applications that communicate among them because the concept of global clock does not exist, so it can not be guaranteed that snapshots of each VM will be taken at the same time. Therefore, some protocol is needed to coordinate the participants to obtain a global snapshot. In this paper, we propose a global snapshot protocol called UnaCloud Snapshot for its application in the context of desktop clouds over TCP/IP networks. That differs from other proposals that use a virtual network to inspect and manipulate the traffic circulating among virtual machines making it difficult to apply them to more realistic environments. We obtain a consistent global snapshot for a general distributed system running on virtual machines that maintains the semantics of the system without modifying applications running on virtual machines or hypervisors. A first prototype was developed and the preliminary results of our evaluation are presented. |
@InProceedings{gomez-sbacpad-2017, author = {Carlos Gomez and Harold Castro and Carlos Varela}, title = {Global snapshot of a distributed system running on virtual machines}, booktitle = {International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) 2017}, year = 2017, address = {Campinas, Brazil}, month = {October}, pdf = {http://wcl.cs.rpi.edu/papers/sbacpad2017.pdf}, keywords = {distributed computing, cloud computing, distributed systems}, abstract = {Recently, a new concept called desktop cloud emerged, which was developed to offer cloud computing services on non-dedicated resources. Similarly to cloud computing, desktop clouds are based on virtualization, and like other computational systems, may experience faults at any time. As a consequence, reliability has become a concern for researchers. Fault-tolerance strategies focused on independent virtual machines include snapshots (checkpoints) to resume the execution from a healthy state of a virtual machine on the same or another host, which is trivial because hypervisors provide this function. However, it is not trivial to obtain a global snapshot of a distributed system formed by applications that communicate among them because the concept of global clock does not exist, so it can not be guaranteed that snapshots of each VM will be taken at the same time. Therefore, some protocol is needed to coordinate the participants to obtain a global snapshot. In this paper, we propose a global snapshot protocol called UnaCloud Snapshot for its application in the context of desktop clouds over TCP/IP networks. That differs from other proposals that use a virtual network to inspect and manipulate the traffic circulating among virtual machines making it difficult to apply them to more realistic environments. We obtain a consistent global snapshot for a general distributed system running on virtual machines that maintains the semantics of the system without modifying applications running on virtual machines or hypervisors. A first prototype was developed and the preliminary results of our evaluation are presented.} }
In cloud-based stream processing services, the maximum sustainable throughput (MST) is defined as the maximum throughput that a system composed of a fixed number of virtual machines (VMs) can ingest indefinitely. If the incoming data rate exceeds the system's MST, unprocessed data accumulates, eventually making the system inoperable. Thus, it is important for the service provider to keep the MST always larger than the incoming data rate by dynamically changing the number of VMs used by the system. In this paper, we identify a common data processing environment used by modern data stream processing systems, and we propose MST prediction models for this environment. We train the models using linear regression with samples obtained from a few VMs and predict MST for a larger number of VMs. To minimize the time and cost for model training, we statistically determine a set of training samples using Intel's Storm benchmarks with representative resource usage patterns. Using typical use-case benchmarks on Amazon's EC2 public cloud, our experiments show that, training with up to 8 VMs, we can predict MST for streaming applications with less than 4% average prediction error for 12 VMs, 9% for 16 VMs, and 32% for 24 VMs. Further, we evaluate our prediction models with simulation-based elastic VM scheduling on a realistic workload. These simulation results show that with 10% over-provisioning, our proposed models' cost efficiency is on par with the cost of an optimal scaling policy without incurring any service level agreement violations. |
@InProceedings{imai-patterson-varela-ccgrid-2017, author = {Shigeru Imai and Stacy Patterson and Carlos A. Varela}, title = {Maximum Sustainable Throughput Prediction for Data Stream Processing over Public Clouds}, booktitle = {17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2017)}, year = 2017, address = {Madrid, Spain}, month = {May}, pdf = {http://wcl.cs.rpi.edu/papers/ccgrid2017.pdf}, keywords = {distributed computing, cloud computing, stream processing}, abstract = {In cloud-based stream processing services, the maximum sustainable throughput (MST) is defined as the maximum throughput that a system composed of a fixed number of virtual machines (VMs) can ingest indefinitely. If the incoming data rate exceeds the system's MST, unprocessed data accumulates, eventually making the system inoperable. Thus, it is important for the service provider to keep the MST always larger than the incoming data rate by dynamically changing the number of VMs used by the system. In this paper, we identify a common data processing environment used by modern data stream processing systems, and we propose MST prediction models for this environment. We train the models using linear regression with samples obtained from a few VMs and predict MST for a larger number of VMs. To minimize the time and cost for model training, we statistically determine a set of training samples using Intel's Storm benchmarks with representative resource usage patterns. Using typical use-case benchmarks on Amazon's EC2 public cloud, our experiments show that, training with up to 8 VMs, we can predict MST for streaming applications with less than 4% average prediction error for 12 VMs, 9% for 16 VMs, and 32% for 24 VMs. Further, we evaluate our prediction models with simulation-based elastic VM scheduling on a realistic workload. These simulation results show that with 10% over-provisioning, our proposed models' cost efficiency is on par with the cost of an optimal scaling policy without incurring any service level agreement violations.} }
Loss of thrust emergencies-e.g., induced by bird/drone strikes or fuel exhaustion-create the need for dynamic data-driven flight trajectory planning to advise pilots or control UAVs. While total loss of thrust trajectories to nearby airports can be pre-computed for all initial points in a 3D flight plan, dynamic aspects such as partial power and airplane surface damage must be considered for accuracy. In this paper, we propose a new Dynamic Data-Driven Avionics Software (DDDAS) approach which during flight updates a damaged aircraft performance model, used in turn to generate plausible flight trajectories to a safe landing site. Our damaged aircraft model is parameterized on a baseline glide ratio for a clean aircraft configuration assuming best gliding airspeed on straight flight. The model predicts purely geometric criteria for flight trajectory generation, namely, glide ratio and turn radius for different bank angles and drag configurations. Given actual aircraft performance data, we dynamically infer the baseline glide ratio to update the damaged aircraft model. Our new flight trajectory generation algorithm thus can significantly improve upon prior Dubins based trajectory generation work by considering these data-driven geometric criteria. We further introduce a trajectory utility function to rank trajectories for safety. As a use case, we consider the Hudson River ditching of US Airways 1549 in January 2009 using a flight simulator to evaluate our trajectories and to get sensor data. In this case, a baseline glide ratio of 17.25:1 enabled us to generate trajectories up to 28 seconds after the birds strike, whereas, a 19:1 baseline glide ratio enabled us to generate trajectories up to 36 seconds after the birds strike. DDDAS can significantly improve the accuracy of generated flight trajectories thereby enabling better decision support systems for pilots in emergency conditions. |
@InProceedings{paul-dddas-2017, author = {Saswata Paul and Frederick Hole and Alexandra Zytek and Carlos A. Varela}, title = {Flight Trajectory Planning for Fixed Wing Aircraft in Loss of Thrust Emergencies}, booktitle = {Dynamic Data-Driven Application Systems (InfoSymbiotics/DDDAS 2017)}, year = 2017, address = {Cambridge, MA}, month = {Aug}, pdf = {http://wcl.cs.rpi.edu/papers/trajectory_tech_report_oct_17.pdf}, url = {http://arxiv.org/abs/1711.00716}, keywords = {dddas, cyber physical systems, data streaming, trajectory generation}, abstract = {Loss of thrust emergencies-e.g., induced by bird/drone strikes or fuel exhaustion-create the need for dynamic data-driven flight trajectory planning to advise pilots or control UAVs. While total loss of thrust trajectories to nearby airports can be pre-computed for all initial points in a 3D flight plan, dynamic aspects such as partial power and airplane surface damage must be considered for accuracy. In this paper, we propose a new Dynamic Data-Driven Avionics Software (DDDAS) approach which during flight updates a damaged aircraft performance model, used in turn to generate plausible flight trajectories to a safe landing site. Our damaged aircraft model is parameterized on a baseline glide ratio for a clean aircraft configuration assuming best gliding airspeed on straight flight. The model predicts purely geometric criteria for flight trajectory generation, namely, glide ratio and turn radius for different bank angles and drag configurations. Given actual aircraft performance data, we dynamically infer the baseline glide ratio to update the damaged aircraft model. Our new flight trajectory generation algorithm thus can significantly improve upon prior Dubins based trajectory generation work by considering these data-driven geometric criteria. We further introduce a trajectory utility function to rank trajectories for safety. As a use case, we consider the Hudson River ditching of US Airways 1549 in January 2009 using a flight simulator to evaluate our trajectories and to get sensor data. In this case, a baseline glide ratio of 17.25:1 enabled us to generate trajectories up to 28 seconds after the birds strike, whereas, a 19:1 baseline glide ratio enabled us to generate trajectories up to 36 seconds after the birds strike. DDDAS can significantly improve the accuracy of generated flight trajectories thereby enabling better decision support systems for pilots in emergency conditions.} }
Internal reports |
In cloud-based stream processing services, the maximum sustainable throughput (MST) is defined as the maximum throughput that a system composed of a fixed number of virtual machines (VMs) can ingest indefinitely. If the incoming data rate exceeds the system's MST, unprocessed data accumulates, eventually making the system inoperable. Thus, it is important for the service provider to keep the MST always larger than the incoming data rate by dynamically changing the number of VMs used by the system. In this paper, we identify a common data processing environment used by modern data stream processing systems, and we propose MST prediction models for this environment. We train the models using linear regression with samples obtained from a few VMs and predict MST for a larger number of VMs. To minimize the time and cost for model training, we statistically determine a set of training samples using Intel's Storm benchmarks with representative resource usage patterns. Using typical use-case benchmarks on Amazon's EC2 public cloud, our experiments show that, training with up to 8 VMs, we can predict MST for streaming applications with less than 4% average prediction error for 12 VMs, 9% for 16 VMs, and 32% for 24 VMs. Further, we evaluate our prediction models with simulation-based elastic VM scheduling on a realistic workload. These simulation results show that with 10% over-provisioning, our proposed models' cost efficiency is on par with the cost of an optimal scaling policy without incurring any service level agreement violations. |
@TechReport{imai-patterson-varela-mst-tr-2017, author = {Shigeru Imai and Stacy Patterson and Carlos A. Varela}, title = {Maximum Sustainable Throughput Prediction for Large-Scale Data Streaming Systems}, institution = {Rensselaer Polytechnic Institute Department of Computer Science}, year = 2017, month = November, pdf = {http://wcl.cs.rpi.edu/papers/mst2017.pdf}, keywords = {distributed computing, cloud computing, stream processing}, abstract = {In cloud-based stream processing services, the maximum sustainable throughput (MST) is defined as the maximum throughput that a system composed of a fixed number of virtual machines (VMs) can ingest indefinitely. If the incoming data rate exceeds the system's MST, unprocessed data accumulates, eventually making the system inoperable. Thus, it is important for the service provider to keep the MST always larger than the incoming data rate by dynamically changing the number of VMs used by the system. In this paper, we identify a common data processing environment used by modern data stream processing systems, and we propose MST prediction models for this environment. We train the models using linear regression with samples obtained from a few VMs and predict MST for a larger number of VMs. To minimize the time and cost for model training, we statistically determine a set of training samples using Intel's Storm benchmarks with representative resource usage patterns. Using typical use-case benchmarks on Amazon's EC2 public cloud, our experiments show that, training with up to 8 VMs, we can predict MST for streaming applications with less than 4% average prediction error for 12 VMs, 9% for 16 VMs, and 32% for 24 VMs. Further, we evaluate our prediction models with simulation-based elastic VM scheduling on a realistic workload. These simulation results show that with 10% over-provisioning, our proposed models' cost efficiency is on par with the cost of an optimal scaling policy without incurring any service level agreement violations.} }
Miscellaneous |
The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This has instigated (1) shorter establishment times for start-ups, (2) creation of scalable global enterprise applications, (3) better cost-to-value associativity for scientific and high performance computing applications, and (4) different invocation/execution models for pervasive and ubiquitous applications. The recent technological developments and paradigms such as serverless computing, software-defined networking, Internet of Things, and processing at network edge are creating new opportunities for Cloud computing. However, they are also posing several new challenges and creating the need for new approaches and research strategies, as well as the re-evaluation of the models that were developed to address issues such as scalability, elasticity, reliability, security, sustainability, and application models. The proposed manifesto addresses them by identifying the major open challenges in Cloud computing, emerging trends, and impact areas. It then offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing. |
@Misc{varela-manifesto-cloud-2017, author = {{Buyya}, R. and {Narayana Srirama}, S. and {Casale}, G. and {Calheiros}, R. and {Simmhan}, Y. and {Varghese}, B. and {Gelenbe}, E. and {Javadi}, B. and {Vaquero}, L.~M. and {Netto}, M.~A.~S. and {Nadjaran Toosi}, A. and {Rodriguez}, M.~A. and {Llorente}, I.~M. and {De Capitani di Vimercati}, S. and {Samarati}, P. and {Milojicic}, D. and {Varela}, C. and {Bahsoon}, R. and {Dias de Assuncao}, M. and {Rana}, O. and {Zhou}, W. and {Jin}, H. and {Gentzsch}, W. and {Zomaya}, A. and {Shen}, H.}, title = "{A Manifesto for Future Generation Cloud Computing: Research Directions for the Next Decade}", journal = {ArXiv e-prints}, archivePrefix = "arXiv", keywords = {distributed computing, cloud computing}, year = 2017, month = November, pdf = {https://arxiv.org/pdf/1711.09123.pdf}, url = {https://arxiv.org/abs/1711.09123}, abstract = {The Cloud computing paradigm has revolutionised the computer science horizon during the past decade and has enabled the emergence of computing as the fifth utility. It has captured significant attention of academia, industries, and government bodies. Now, it has emerged as the backbone of modern economy by offering subscription-based services anytime, anywhere following a pay-as-you-go model. This has instigated (1) shorter establishment times for start-ups, (2) creation of scalable global enterprise applications, (3) better cost-to-value associativity for scientific and high performance computing applications, and (4) different invocation/execution models for pervasive and ubiquitous applications. The recent technological developments and paradigms such as serverless computing, software-defined networking, Internet of Things, and processing at network edge are creating new opportunities for Cloud computing. However, they are also posing several new challenges and creating the need for new approaches and research strategies, as well as the re-evaluation of the models that were developed to address issues such as scalability, elasticity, reliability, security, sustainability, and application models. The proposed manifesto addresses them by identifying the major open challenges in Cloud computing, emerging trends, and impact areas. It then offers research directions for the next decade, thus helping in the realisation of Future Generation Cloud Computing.} }
This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All person copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
Les documents contenus dans ces répertoires sont rendus disponibles par les auteurs qui y ont contribué en vue d'assurer la diffusion à temps de travaux savants et techniques sur une base non-commerciale. Les droits de copie et autres droits sont gardés par les auteurs et par les détenteurs du copyright, en dépit du fait qu'ils présentent ici leurs travaux sous forme électronique. Les personnes copiant ces informations doivent adhérer aux termes et contraintes couverts par le copyright de chaque auteur. Ces travaux ne peuvent pas être rendus disponibles ailleurs sans la permission explicite du détenteur du copyright.
This document was translated from BibTEX by bibtex2html