Matching entries: 0
settings...

2015

Chhokra A, Abdelwahed S, Dubey A, Neema S and Karsai G (2015). From System Modeling to Formal Verification. The 2015 Electronic System Level Synthesis Conference. San Francisco, 07/01/2015, 2015. ECSI.
Abstract: Due to increasing design complexity, modern systems are modeled at a high level of abstraction. SystemC is widely accepted as a system level language for modeling complex embedded systems. Verification of these SystemC designs nullifies the chances of error propagation down to the hardware. Due to lack of formal semantics of SystemC, the verification of such designs is done mostly in an unsystematic manner. This paper provides a new modeling environment that enables the designer to simulate and formally verify the designs by generating SystemC code. The generated SystemC code is automatically translated to timed automata for formal analysis.
BibTeX:
@proceedings{4711,
  author = {Chhokra, Ajay and Abdelwahed, Sherif and Dubey, Abhishek and Neema, Sandeep and Karsai, Gabor},
  title = {From System Modeling to Formal Verification},
  journal = {The 2015 Electronic System Level Synthesis Conference},
  publisher = {ECSI},
  year = {2015},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Session3_Paper3.pdf}
}
Chhokra A, Dubey A, Mahadevan N and Karsai G (2015). A component-based approach for modeling failure propagations in power systems, In Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES), 2015 Workshop on., April, 2015. , pp. 1-6.
Abstract: Resiliency and reliability is of paramount impor- tance for energy cyber physical systems. Electrical protection systems including detection elements such as Distance Relays and actuation elements such as Breakers are designed to protect the system from abnormal operations and arrest failure propagation by rapidly isolating the faulty components. However, failure in the protection devices themselves can and do lead to major system events and fault cascades, often leading to blackouts. This paper augments our past work on Temporal Causal Diagrams (TCD), a modeling formalism designed to help reason about the failure progressions by (a) describing a way to generate the TCD model from the system specification, and (b) understand the system failure dynamics for TCD reasoners by configuring simulation models.
BibTeX:
@inproceedings{7115412,
  author = {Chhokra, A. and Dubey, A. and Mahadevan, N. and Karsai, G.},
  title = {A component-based approach for modeling failure propagations in power systems},
  booktitle = {Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES), 2015 Workshop on},
  year = {2015},
  pages = {1-6},
  doi = {10.1109/MSCPES.2015.7115412}
}
Mahadevan N, Dubey A, Chhokra A, Guo H and Karsai G (2015). Using temporal causal models to isolate failures in power system protection devices. Instrumentation Measurement Magazine, IEEE., August, 2015. Vol. 18(4), pp. 28-39.
BibTeX:
@article{7155770,
  author = {Mahadevan, N. and Dubey, A. and Chhokra, A. and Guo, H. and Karsai, G.},
  title = {Using temporal causal models to isolate failures in power system protection devices},
  journal = {Instrumentation Measurement Magazine, IEEE},
  year = {2015},
  volume = {18},
  number = {4},
  pages = {28-39},
  doi = {10.1109/MIM.2015.7155770}
}
Balasubramanian D, Dubey A, Otte W, Levendovszky T, Gokhale A, Kumar P, Emfinger W and Karsai G (2015). DREMS ML: A Wide Spectrum Architecture Design Language for Distributed Computing Platform. Sci. Comput. Program.. Amsterdam, The Netherlands, The Netherlands, August, 2015. Vol. 106(C), pp. 3-29. Elsevier North-Holland, Inc..
Abstract: Complex sensing, processing and control applications running on distributed platforms are difficult to design, develop, analyze, integrate, deploy and operate, especially if resource constraints, fault tolerance and security issues are to be addressed. While technology exists today for engineering distributed, real-time component-based applications, many problems remain unsolved by existing tools. Model-driven development techniques are powerful, but there are very few existing and complete tool chains that offer an end-to-end solution to developers, from design to deployment. There is a need for an integrated model-driven development environment that addresses all phases of application lifecycle including design, development, verification, analysis, integration, deployment, operation and maintenance, with supporting automation in every phase. Arguably, a centerpiece of such a model-driven environment is the modeling language. To that end, this paper presents a wide-spectrum architecture design language called DREMS ML that itself is an integrated collection of individual domain-specific sub-languages. We claim that the language promotes “correct-by-construction” software development and integration by supporting each individual phase of the application lifecycle. Using a case study, we demonstrate how the design of DREMS ML impacts the development of embedded systems.
BibTeX:
@article{Balasubramanian:2015:DM:2798457.2798672,
  author = {Balasubramanian, Daniel and Dubey, Abhishek and Otte, William and Levendovszky, Tihamer and Gokhale, Aniruddha and Kumar, Pranav and Emfinger, William and Karsai, Gabor},
  title = {DREMS ML: A Wide Spectrum Architecture Design Language for Distributed Computing Platform},
  journal = {Sci. Comput. Program.},
  publisher = {Elsevier North-Holland, Inc.},
  year = {2015},
  volume = {106},
  number = {C},
  pages = {3--29},
  url = {http://dx.doi.org/10.1016/j.scico.2015.04.002},
  doi = {10.1016/j.scico.2015.04.002}
}
Dubey A, Sturm M, Lehofer M and Sztipanovits J (2015). Smart City Hubs: Opportunities for Integrating and Studying Human CPS at Scale. Workshop on Big Data Analytics in CPS: Enabling the Move from IoT to Real-Time Control.
BibTeX:
@article{BigData2015,
  author = {Dubey, Abhishek and Sturm, Monika and Lehofer, Martin and Sztipanovits, Janos },
  title = {Smart City Hubs: Opportunities for Integrating and Studying Human CPS at Scale},
  journal = {Workshop on Big Data Analytics in CPS: Enabling the Move from IoT to Real-Time Control},
  year = {2015},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/extendedAbstract.pdf}
}
Otte W, Lehofer M and Dubey A (2015). Challenges for Application Platforms for Integrated Cyber Physical Systems. Workshop on Big Data Analytics in CPS: Enabling the Move from IoT to Real-Time Control.
BibTeX:
@article{BigData2015-2,
  author = {Otte, William and Lehofer, Martin and Dubey, Abhishek},
  title = {Challenges for Application Platforms for Integrated Cyber Physical Systems},
  journal = {Workshop on Big Data Analytics in CPS: Enabling the Move from IoT to Real-Time Control},
  year = {2015},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/abstract.pdf}
}
Pradhan S, Dubey A, Gokhale A and Lehofer M (2015). CHARIOT: A Domain Specific Language for Extensible Cyber-Physical Systems, In The 15th Workshop on Domain-Specific Modeling. Pittsburgh, Pennsylvania, United States, october, 2015.
Abstract: Wider adoption, availability and ubiquity of wireless networking technologies, integrated sensors, actuators, and edge computing devices is facilitating a paradigm shift by allowing us to transition from traditional statically configured vertical silos of Cyber-Physical Systems (CPS) to next generation CPS that are more open, dynamic and extensible. Fractionated spacecraft, smart cities computing architectures, Unmanned Aerial Vehicle (UAV) clusters, platoon of vehicles on highways are all examples of extensible CPS wherein extensibility is implied by the dynamic aggregation of physical resources, affect of physical dynamics on availability of computing resources, and various multi-domain applications hosted on these systems. However, realization of extensible CPS requires resolving design-time and run-time challenges emanating from properties specific to these systems. In this paper, we first describe different properties of extensible CPS - dynamism, extensibility, remote deployment, security, heterogeneity and resilience. Then we identify different design-time challenges stemming from heterogeneity and resilience requirements. We particularly focus on software heterogeneity arising from availability of various communication middleware. We then present appropriate solutions in the context of a novel domain specific language, which can be used to design resilient systems while remaining agnostic to middleware heterogeneities. We also describe how this language and its features have evolved from our past work. We use a platform of fractionated spacecraft to describe our solution.
BibTeX:
@inproceedings{DSM2015,
  author = {Subhav Pradhan and Abhishek Dubey and Aniruddha Gokhale and Martin Lehofer},
  title = {CHARIOT: A Domain Specific Language for Extensible Cyber-Physical Systems},
  booktitle = {The 15th Workshop on Domain-Specific Modeling},
  year = {2015}
}
Nannapaneni S, Dubey A, Abdelwahed S, Mahadevan S, Neema S and Bapty T (2015). Mission-based reliability prediction in component-based systems. INTERNATIONAL JOURNAL OF PROGNOSTICS AND HEALTH MANAGEMENT.
BibTeX:
@article{IJPHM15,
  author = {Saideep Nannapaneni and Abhishek Dubey and Sherif Abdelwahed and Sankaran Mahadevan and Sandeep Neema and Ted Bapty},
  title = {Mission-based reliability prediction in component-based systems},
  journal = {INTERNATIONAL JOURNAL OF PROGNOSTICS AND HEALTH MANAGEMENT},
  year = {2015},
  note = {under review}
}
Levendovszky T, Dubey A, Otte WR, Balasubramanian Daniel andand Emfinger W, Kumar P and Karsai G (2015). Achieving Resilience in Distributed Software Systems via Self-Reconfiguration. Elsevier Journal of Systems and Software.
BibTeX:
@article{JOSSIII,
  author = {Levendovszky, Tihamer and Dubey, Abhishek and Otte, William R. and Balasubramanian, Daniel andand Emfinger, William and Kumar, Pranav and Karsai, Gabor},
  title = {Achieving Resilience in Distributed Software Systems via Self-Reconfiguration},
  journal = {Elsevier Journal of Systems and Software},
  year = {2015},
  note = {(Under review)}
}
Goncalo M, Moondra A, Dubey A, Bhattacharjee A and Koutsoukos X (2015). Computation and Communication Evaluation of an Authentication Mechanism for Time-Triggered Networked Control Systems: An Empirical Study. Computers and Security.
BibTeX:
@article{JOSSIV,
  author = {Goncalo, Martin and Moondra, Arul and Dubey, Abhishek and Bhattacharjee, Anirban and Koutsoukos, Xenofon},
  title = {Computation and Communication Evaluation of an Authentication Mechanism for Time-Triggered Networked Control Systems: An Empirical Study},
  journal = {Computers and Security},
  year = {2015},
  note = {(Under review)}
}
Jain R, Lukic S, Chokra A, Mahadevan N, Dubey A and Karsai G (2015). An Improved Distance Relay model with Directional element, and Memory Polarization for TCD based Fault Propagation Studies, In North American Power Symposium (NAPS)., October, 2015. , pp. 1-6.
Abstract: Modern Power Systems have evolved into a very complex network of multiple sources, lines, breakers, loads and others. The performance of these interdependent components decide the reliability of the power systems. A tool called “Reasoner” is being developed to deduce fault propagations using a Temporal Causal Diagram (TCD) approach. It translates the physical system as a Cause-effect model. This work discusses the development of an advanced distance relay model, which monitors the system, and challenges the operation of reasoner for refinement. Process of generation of a Fault and Discrepancy Mapping file from the test system is presented. This file is used by the reasoner to scrutinize relays’ responses for active system faults, and hypothesize potential mis-operations (or cyber faults) with a confidence metric. Analyzer (relay model) is integrated to OpenDSS for fault analysis. The understanding of the system interdependency (fault propagation behavior) using reasoner can make the grid more robust against cascaded failures.
BibTeX:
@inproceedings{NAPS15,
  author = {Rishabh Jain and Srdjan Lukic and Ajay Chokra and Nag Mahadevan and Abhishek Dubey and Gabor Karsai},
  title = {An Improved Distance Relay model with Directional element, and Memory Polarization for TCD based Fault Propagation Studies},
  booktitle = {North American Power Symposium (NAPS)},
  year = {2015},
  pages = {1--6}
}
Emfinger W, Kumar P, Dubey A and Karsai G (2015). Towards Assurances in Self-Adaptive, Dynamic, Distributed Real-time Embedded Systems, In Software Engineering for Self-Adaptive Systems III. Springer Berlin Heidelberg.
BibTeX:
@incollection{sefsasIII,
  author = {Emfinger, William and Kumar, Pranav and Dubey, Abhishek and Karsai, Gabor},
  title = {Towards Assurances in Self-Adaptive, Dynamic, Distributed Real-time Embedded Systems},
  booktitle = {Software Engineering for Self-Adaptive Systems III},
  publisher = {Springer Berlin Heidelberg},
  year = {2015},
  note = {(Under review)}
}
Pradhan S, Dubey A, Otte W, Karsai G and Gokhale A (2015). Towards a Product Line of Heterogeneous Distributed Applications, 04/2015, 2015. (ISIS-15-117)
BibTeX:
@article{subhav15,
  author = {Pradhan, Subhav and Dubey, Abhishek and and Otte, William and Karsai, Gabor and Gokhale, Aniruddha},
  title = {Towards a Product Line of Heterogeneous Distributed Applications},
  year = {2015},
  number = {ISIS-15-117},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Pradhan_SEAMS_TechReport.pdf}
}

2014

Karsai G, Balasubramanian D, Dubey A and Otte WR (2014). Distributed and Managed: Research Challenges and Opportunities of the Next Generation Cyber-Physical Systems, In 17th IEEE Symposium on Object/Component/Service-oriented Real-time Distributed Computing., June, 2014.
Abstract: Fractionated spacecraft - a cluster of simple satellites that are
wirelessly connected, perform high-resolution sensing functions by
running distributed sensor fusion applications. Coordinated swarms of
networked Unmanned Aerial Vehicles carry out data collection damage
assessment flights over large geographical areas affected by weather
events. Fleets of Unmanned Underwater Vehicles collect climate change
data from oceans with the help of sensor fusion and motion control
applications. Smart data acquisition and control devices implement
distributed sensing and control functions for the Smart Electric Grid.

Such textquoteleftcyber-physical cloud computing platformstextquoteright present novel
challenges because the system is built from mobile embedded devices,
is inherently distributed and typically has highly fluctuating
connectivity among the modules. Architecting software for these
systems raises many challenges not present in traditional cloud
computing. Effective management of constrained resources and
application isolation without adversely affecting performance are
necessary. Autonomous fault management and real-time performance
requirements must be met in a verifiable manner. It is also both
critical and challenging to support multiple end-users whose diverse
software applications have changing demands for computational and
communication resources, while operating on different levels and in
separate domains of security.

The solution presented in this paper is based on a layered
architecture consisting of a novel operating system, a middleware
layer, and component-structured applications. The component model
facilitates the creation of software applications from modular and
reusable components that are deployed in the distributed system and
interact only through well-defined mechanisms. The complexity of
creating applications and performing system integration is mitigated
through the use of a domain-specific model-driven development process
that relies on a domain-specific modeling language and its
accompanying graphical modeling tools, software generators for
synthesizing infrastructure code, and the extensive use of model-based
analysis for verification and validation.

BibTeX:
@conference{4619,
  author = {Karsai, Gabor and Balasubramanian, Daniel and Dubey, Abhishek and Otte, William R},
  title = {Distributed and Managed: Research Challenges and Opportunities of the Next Generation Cyber-Physical Systems},
  booktitle = {17th IEEE Symposium on Object/Component/Service-oriented Real-time Distributed Computing},
  year = {2014},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/drems-isorc.pdf}
}
Pradhan S, Otte W, Dubey A, Gokhale A and Karsai G (2014). Key Considerations for a Resilient and Autonomous Deployment and Configuration Infrastructure for Cyber-Physical Systems, In 11th IEEE International Conference and Workshops on the Engineering of Autonomic and Autonomous Systems (EASe-2014). Laurel, MD, USE IEEE.
Abstract: Multi-module Cyber-Physical Systems (CPSs),
such as satellite clusters, swarms of Unmanned Aerial Vehicles
(UAV), and fleets of Unmanned Underwater Vehicles (UUV)
are examples of managed distributed real-time systems where
mission-critical applications, such as sensor fusion or coordinated
flight control, are hosted. These systems are dynamic
and reconfigurable, and provide a “CPS cluster-as-a-service”
for mission-specific scientific applications that can benefit from
the elasticity of the cluster membership and heterogeneity of
the cluster members. The distributed and remote nature of
these systems often necessitates the use of Deployment and Con-
figuration (D&C) services to manage the lifecycle of software
applications. Fluctuating resources, volatile cluster membership
and changing environmental conditions require resilient
D&C services. However, the dynamic nature of the system often
precludes human intervention during the D&C activities, which
motivates the need for a self-adaptive D&C infrastructure
that supports autonomous resilience. Such an infrastructure
must have the ability to adapt existing applications on-the-fly
in order to provide application resilience and must itself be
able to adapt to account for changes in the system as well as
tolerate failures. This paper makes two contributions towards
addressing these needed. First, we identify the key challenges
in achieving such a self-adaptive D&C infrastructure. Second,
we present our ideas on resolving these challenges and realizing
a self-adaptive D&C infrastructure.
BibTeX:
@conference{4677,
  author = {Subhav Pradhan and William Otte and Abhishek Dubey and Aniruddha Gokhale and Karsai, Gabor},
  title = {Key Considerations for a Resilient and Autonomous Deployment and Configuration Infrastructure for Cyber-Physical Systems},
  booktitle = {11th IEEE International Conference and Workshops on the Engineering of Autonomic and Autonomous Systems (EASe-2014)},
  publisher = {IEEE},
  year = {2014},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Pradhan_EASe-2014.pdf}
}
Levendovszky T, Dubey A, Otte WR, Balasubramanian D, Coglio A, Nyako S, Emfinger W, Kumar P, Gokhale A and Karsai G (2014). Distributed Real-Time Managed Systems: A Model-Driven Distributed Secure Information Architecture Platform for Managed Embedded Systems. Software, IEEE., Mar, 2014. Vol. 31(2), pp. 62-69.
Abstract: Architecting software for a cloud computing platform built from mobile embedded devices incurs many challenges that aren't present in traditional cloud computing. Both effectively managing constrained resources and isolating applications without adverse performance effects are needed. A practical design- and runtime solution incorporates modern software development practices and technologies along with novel approaches to address these challenges. The patterns and principles manifested in this system can potentially serve as guidelines for current and future practitioners in this field.
BibTeX:
@article{6671577,
  author = {Levendovszky, Tihamer and Dubey, Abhishek and Otte, William R. and Balasubramanian, Daniel and Coglio, Alessandro and Nyako, Sandor and Emfinger, William and Kumar, Pranav and Gokhale, Aniruddha and Karsai, Gabor},
  title = {Distributed Real-Time Managed Systems: A Model-Driven Distributed Secure Information Architecture Platform for Managed Embedded Systems},
  journal = {Software, IEEE},
  year = {2014},
  volume = {31},
  number = {2},
  pages = {62-69},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/f6paper.pdf},
  doi = {10.1109/MS.2013.143}
}
Martins G, Bhattacharjee A, Dubey A and Koutsoukos X (2014). Performance evaluation of an authentication mechanism in time-triggered networked control systems, In Resilient Control Systems (ISRCS), 2014 7th International Symposium on., Aug, 2014. , pp. 1-6.
Abstract: An important challenge in networked control systems is to ensure the confidentiality and integrity of the message in order to secure the communication and prevent attackers or intruders from compromising the system. However, security mechanisms may jeopardize the temporal behavior of the network data communication because of the computation and communication overhead. In this paper, we study the effect of adding Hash Based Message Authentication (HMAC) to a time-triggered networked control system. Time Triggered Architectures (TTAs) provide a deterministic and predictable timing behavior that is used to ensure safety, reliability and fault tolerance properties. The paper analyzes the computation and communication overhead of adding HMAC and the impact on the performance of the time-triggered network. Experimental validation and performance evaluation results using a TTEthernet network are also presented.
BibTeX:
@inproceedings{6900098,
  author = {Martins, G. and Bhattacharjee, A. and Dubey, A. and Koutsoukos, X.D.},
  title = {Performance evaluation of an authentication mechanism in time-triggered networked control systems},
  booktitle = {Resilient Control Systems (ISRCS), 2014 7th International Symposium on},
  year = {2014},
  pages = {1-6},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Performance%20Evaluation%20of%20an%20Authentication%20Mechanism%20in%20Time-Triggered%20Networked%20Control%20Systems.pdf},
  doi = {10.1109/ISRCS.2014.6900098}
}
Balasubramanian D, Levendovszky T, Dubey A and Karsai G (2014). Taming Multi-Paradigm Integration in a Software Architecture Description Language, In Proceedings of the 8th Workshop on Multi-Paradigm Modeling co-located with the 17th International Conference on Model Driven Engineering Languages and Systems, MPM@MODELS 2014, Valencia, Spain. , pp. 67-76.
Abstract: Software architecture description languages offer a convenient
way of describing the high-level structure of a software system.
Such descriptions facilitate rapid prototyping, code generation and automated
analysis. One of the big challenges facing the software community
is the design of architecture description languages that are general
enough to describe a wide-range of systems, yet detailed enough to capture
domain-specific properties and provide a high level of tool automation.
This paper presents the multi-paradigm challenges we faced and
solutions we built when creating a domain-specific modeling language
for software architectures of distributed real-time systems.
BibTeX:
@inproceedings{DBLP:conf/models/BalasubramanianLDK14,
  author = {Daniel Balasubramanian and
Tihamer Levendovszky and
Abhishek Dubey and
Gabor Karsai}, title = {Taming Multi-Paradigm Integration in a Software Architecture Description Language}, booktitle = {Proceedings of the 8th Workshop on Multi-Paradigm Modeling co-located with the 17th International Conference on Model Driven Engineering Languages and Systems, MPM@MODELS 2014, Valencia, Spain}, year = {2014}, pages = {67--76}, url = {http://ceur-ws.org/Vol-1237/paper7.pdf} }
Kumar PS, Dubey A and Karsai G (2014). Colored Petri Net-based Modeling and Formal Analysis of Component-based Applications, In Proceedings of the 11th Workshop on Model-Driven Engineering, Verification and Validation co-located with 17th International Conference on Model Driven Engineering Languages and Systems, MoDeVVa@MODELS 2014, Valencia, Spain, September 30, 2014.. , pp. 79-88.
BibTeX:
@inproceedings{DBLP:conf/models/KumarDK14,
  author = {Pranav Srinivas Kumar and
Abhishek Dubey and
Gabor Karsai}, title = {Colored Petri Net-based Modeling and Formal Analysis of Component-based Applications}, booktitle = {Proceedings of the 11th Workshop on Model-Driven Engineering, Verification and Validation co-located with 17th International Conference on Model Driven Engineering Languages and Systems, MoDeVVa@MODELS 2014, Valencia, Spain, September 30, 2014.}, year = {2014}, pages = {79--88}, url = {http://ceur-ws.org/Vol-1235/paper-10.pdf} }
Balasubramanian D, Dubey A, Otte WR, Emfinger W, Kumar PS and Karsai G (2014). A Rapid Testing Framework for a Mobile Cloud, In 25nd IEEE International Symposium on Rapid System Prototyping, RSP
2014, New Delhi, India, October 16-17, 2014. , pp. 128-134.
Abstract: Mobile clouds such as network-connected vehicles and satellite clusters are an emerging class of systems that are extensions to traditional real-time embedded systems: they provide long-term mission platforms made up of dynamic clusters of heterogeneous hardware nodes communicating over ad hoc wireless networks. Besides the inherent complexities entailed by a distributed architecture, developing software and testing these systems is difficult due to a number of other reasons, including the mobile nature of such systems, which can require a model of the physical dynamics of the system for accurate simulation and testing. This paper describes a rapid development and testing framework for a distributed satellite system. Our solutions include a modeling language for configuring and specifying an application's interaction with the middleware layer, a physics simulator integrated with hardware in the loop to provide the system's physical dynamics and the integration of a network traffic tool to dynamically vary the network bandwidth based on the physical dynamics.
BibTeX:
@inproceedings{DBLP:conf/rsp/BalasubramanianDOEKK14,
  author = {Daniel Balasubramanian and Abhishek Dubey and William R. Otte and William Emfinger and Pranav Srinivas Kumar and Gabor Karsai},
  title = {A Rapid Testing Framework for a Mobile Cloud},
  booktitle = {25nd IEEE International Symposium on Rapid System Prototyping, RSP
2014, New Delhi, India, October 16-17, 2014}, year = {2014}, pages = {128--134}, url = {http://dx.doi.org/10.1109/RSP.2014.6966903}, doi = {10.1109/RSP.2014.6966903} }
Emfinger W, Karsai G, Dubey A and Gokhale A (2014). Analysis, Verification, and Management Toolsuite for Cyber-physical Applications on Time-varying Networks, In Proceedings of the 4th ACM SIGBED International Workshop on Design, Modeling, and Evaluation of Cyber-Physical Systems. New York, NY, USA , pp. 44-47. ACM.
BibTeX:
@inproceedings{Emfinger:2014:AVM:2593458.2593459,
  author = {Emfinger, William and Karsai, Gabor and Dubey, Abhishek and Gokhale, Aniruddha},
  title = {Analysis, Verification, and Management Toolsuite for Cyber-physical Applications on Time-varying Networks},
  booktitle = {Proceedings of the 4th ACM SIGBED International Workshop on Design, Modeling, and Evaluation of Cyber-Physical Systems},
  publisher = {ACM},
  year = {2014},
  pages = {44--47},
  url = {http://doi.acm.org/10.1145/2593458.2593459},
  doi = {10.1145/2593458.2593459}
}
Dubey A, Otte W and Karsai G (2014). An Information Architecture Platform for Mobile, Secure, and Resilient Distributed Systems, In High Confidence Software and Systems Conference.
BibTeX:
@inproceedings{HCSS2014,
  author = {Abhsihek Dubey and William Otte and Gabor Karsai},
  title = {An Information Architecture Platform for Mobile, Secure, and Resilient Distributed Systems},
  booktitle = {High Confidence Software and Systems Conference},
  year = {2014},
  url = {http://cps-vo.org/file/12236/download/42058}
}
Mahadevan N, Dubey A, Guo H and Karsai G (2014). Using temporal causal models to isolate failures in Power System protection devices, In AUTOTESTCON, 2014 IEEE. , pp. 270-279.
Abstract: Power transmission systems are instrumented with a network of fast-acting protection devices that detect anomalies and arrest the fault propagation, thereby isolating the faulty components to protect the remaining system. However, often a local protection action leads to cascading effects to other regions resulting in blackouts. Also, inherent faults in protection devices such as relays and breakers could lead to failed or misguided protection action. A fault model associated with such a system needs to capture the dynamic behavior of the system as well as protection scheme in both nominal and faulty modes of operation. With an effort to capture the temporal behavior and the fault propagation of the system and its autonomous protection mechanism, we propose a new modeling formalism - the Temporal Causal Diagram (TCD). In this paper, we will describe the TCD modeling formalism and apply it to create fault-models that capture the fault propagation and state evolution of the Power Systems and their autonomous protection units. Further, we showcase simulation models derived from the TCD models and use these to simulate single and multi-fault scenarios in power transmission systems.
BibTeX:
@inproceedings{mahadevan2014using,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Guo, Huangcheng and Karsai, Gabor},
  title = {Using temporal causal models to isolate failures in Power System protection devices},
  booktitle = {AUTOTESTCON, 2014 IEEE},
  year = {2014},
  pages = {270--279},
  doi = {10.1109/AUTEST.2014.6935156}
}
Mahadevan N, Dubey A, Karsai G, Srivastava A and Liu C-C (2014). Temporal causal diagrams for diagnosing failures in cyber-physical systems. Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society.
Abstract: Resilient and reliable operation of cyber physical systems
of societal importance such as Smart Electric Grids is one of
the top national priorities. Due to their critical nature, these
systems are equipped with fast-acting, local protection mechanisms.
However, commonly misguided protection actions
together with system dynamics can lead to un-intentional cascading
effects. This paper describes the ongoing work using
Temporal Causal Diagrams (TCD), a refinement of the Timed
Failure Propagation Graphs (TFPG), to diagnose problems
associated with the power transmission lines protected by a
combination of relays and breakers.

The TCD models represent the faults and their propagation
as TFPG, the nominal and faulty behavior of components
(including local, discrete controllers and protection devices)
as Timed Discrete Event Systems (TDES), and capture the
cumulative and cascading effects of these interactions. The
TCD diagnosis engine includes an extended TFPG-like reasoner
which in addition to observing the alarms and mode
changes (as the TFPG), monitors the event traces (that correspond
to the behavioral aspects of the model) to generate hypotheses
that consistently explain all the observations. In this
paper, we show the results of applying the TCD to a segment
of a power transmission system that is protected by distance
relays and breakers.

BibTeX:
@article{NAGPHM2014,
  author = {Nagabhushan Mahadevan and Abhishek Dubey and Gabor Karsai and Anurag Srivastava and Chen-Ching Liu},
  title = {Temporal causal diagrams for diagnosing failures in cyber-physical systems},
  journal = {Annual Conference of the Prognostics and Health Management Society},
  publisher = {Prognostics and Health Management Society},
  year = {2014},
  url = {http://www.phmsociety.org/node/1439}
}
Nannapaneni S, Dubey A, Abdelwahed S, Mahadevan S and Neema S (2014). A Model-Based Approach for Reliability Assessment in Component-Based Systems. Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society.
Abstract: This paper describes a formal framework for reliability
assessment of component-based systems with respect to
specific missions. A mission comprises of different timed
mission stages, with each stage requiring a number of highlevel
functions. The work presented here describes a
modeling language to capture the functional decomposition
and missions of a system. The components and their
alternatives are mapped to basic functions which are used to
implement the system-level functions. Our contribution is the
extraction of mission-specific reliability block diagram from
these high-level models of component assemblies. This is
then used to compute the mission reliability using reliability
information of components. This framework can be used for
real-time monitoring of system performance where reliability
of the mission is computed over time as the mission is in
progress. Other quantities of interest such as mission
feasibility, function availability can also be computed using
this framework. Mission feasibility answers the question
whether the mission can be accomplished given the current
state of components in the system and function availability
provides information if the function is available in the future
given the current state of the system. The software used in
this framework includes Generic Modeling Environment
(GME) and Python. GME is used for modeling the system
and Python for reliability computations. The proposed
methodology is demonstrated using a radio-controlled (RC)
car in carrying out a simple surveillance mission.
BibTeX:
@article{SaideepPHM2014,
  author = {Saideep Nannapaneni and Abhishek Dubey and Sherif Abdelwahed and Sankaran Mahadevan and Sandeep Neema},
  title = {A Model-Based Approach for Reliability Assessment in Component-Based Systems},
  journal = {Annual Conference of the Prognostics and Health Management Society},
  publisher = {Prognostics and Health Management Society},
  year = {2014},
  url = {http://www.phmsociety.org/node/1439}
}
Pradhan S, Emfinger W, Dubey A, Otte W, Balasubramanian D, Gokhale A, Karsai G and Coglio A (2014). Establishing Secure Interactions across Distributed Applications in Satellite Clusters, In Space Mission Challenges for Information Technology (SMC-IT), 2014 IEEE International Conference on., Sept, 2014. , pp. 67-74.
Abstract: Recent developments in small satellites have led to an increasing interest in building satellite clusters as open systems that provide a "cluster-as-a-service" in space. Since applications with different security classification levels must be supported in these open systems, the system must provide strict information partitioning such that only applications with matching security classifications interact with each other. The anonymous publish/subscribe communication pattern is a powerful interaction abstraction that has enjoyed great success in previous space software architectures, such as NASA's Core Flight Executive. However, the difficulty is that existing solutions that support anonymous publish/subscribe communication, such as the OMG Data Distribution Service (DDS), do not support information partitioning based on security classifications, which is a key requirement for some systems. This paper makes two contributions to address these limitations. First, we present a transport mechanism called Secure Transport that uses a lattice of labels to represent security classifications and enforces Multi-Level Security (MLS) policies to ensure strict information partitioning. Second, we present a novel discovery service that allows us to use an existing DDS implementation with our custom transport mechanism to realize a publish/subscribe middleware with information partitioning based on security classifications of applications. We also include an evaluation of our solution in the context of a use case scenario.
BibTeX:
@inproceedings{SMCIT14,
  author = {Pradhan, S. and Emfinger, W. and Dubey, A. and Otte, W.R. and Balasubramanian, D. and Gokhale, A. and Karsai, G. and Coglio, A.},
  title = {Establishing Secure Interactions across Distributed Applications in Satellite Clusters},
  booktitle = {Space Mission Challenges for Information Technology (SMC-IT), 2014 IEEE International Conference on},
  year = {2014},
  pages = {67-74},
  doi = {10.1109/SMC-IT.2014.17}
}
Otte WR, Dubey A and Karsai G (2014). A Resilient and Secure Software Platform and Architecture for Distributed Spacecraft, In SPIE Defense, Security, and Sensing.
Abstract: A distributed spacecraft is a cluster of independent satellite modules flying in formation that communicate via
ad-hoc wireless networks. This system in space is a cloud platform that facilitates sharing sensors and other
computing and communication resources across multiple applications, potentially developed and maintained by
different organizations. Effectively, such architecture can realize the functions of monolithic satellites at a reduced
cost and with improved adaptivity and robustness.

Openness of these architectures pose special challenges because the distributed software platform has to
support applications from different security domains and organizations, and where information flows have to be
carefully managed and compartmentalized. If the platform is used as a robust shared resource its management,
configuration, and resilience becomes a challenge in itself.

We have designed and prototyped a distributed software platform for such architectures. The core element
of the platform is a new operating system whose services were designed to restrict access to the network and
the file system, and to enforce resource management constraints for all non-privileged processes Mixed-criticality
applications operating at different security labels are deployed and controlled by a privileged management process
that is also pre-configuring all information flows. This paper describes the design and objective of this layer.

BibTeX:
@inproceedings{spieDSS2014,
  author = {William R. Otte and Abhishek Dubey and Gabor Karsai},
  title = {A Resilient and Secure Software Platform and Architecture for Distributed Spacecraft},
  booktitle = {SPIE Defense, Security, and Sensing},
  year = {2014}
}
Pradhan S, Otte W, Dubey A, Szabo C, Gokhale A and Karsai G (2014). Towards a Self-adaptive Deployment and Configuration Infrastructure for Cyber-Physical Systems, 06/2014, 2014. (ISIS-13-101)
BibTeX:
@article{subhav14,
  author = {Pradhan, Subhav and Otte, William and Dubey, Abhishek and Szabo, Csanad and Gokhale, Aniruddha and Karsai, Gabor},
  title = {Towards a Self-adaptive Deployment and Configuration Infrastructure for Cyber-Physical Systems},
  year = {2014},
  number = {ISIS-13-101},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Pradhan_SEAMS_TechReport.pdf}
}

2013

Mahadevan N, Dubey A, Balasubramanian D and Karsai G (2013). Deliberative Reasoning in Software Health Management, 04/2013, 2013. (ISIS-13-101)
BibTeX:
@article{4556,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Balasubramanian, Daniel and Karsai, Gabor},
  title = {Deliberative Reasoning in Software Health Management},
  year = {2013},
  number = {ISIS-13-101}
}
Shi J, Amgai R, Abdelwahed S, Dubey A, Humphreys J, Alattar M and Jia R (2013). Generic modeling and analysis framework for shipboard system design, In Electric Ship Technologies Symposium (ESTS), 2013 IEEE., April, 2013. , pp. 420-428.
Abstract: This paper proposes a novel modeling and simulation environment for ship design based on the principles of Model Integrated Computing (MIC). The proposed approach facilitates the design and analysis of shipboard power systems and similar systems that integrate components from different fields of expertise. The conventional simulation platforms such as Matlab®, Simulink®, PSCAD® and VTB® require the designers to have explicit knowledge of the syntactic and semantic information of the desired domain within the tools. This constraint, however, severely slows down the design and analysis process, and causes cross-domain or cross-platform operations remain error prone and expensive. Our approach focuses on the development of a modeling environment that provides generic support for a variety of application across different domains by capturing modeling concepts, composition principles and operation constraints. For the preliminary demonstration of the modeling concept, in this paper we limit the scope of design to cross-platform implementations of the proposed environment by developing an application model of a simplified shipboard power system and using Matlab engine and VTB solver separately to evaluate the performance with different respects. In the case studies a fault scenario is pre-specified and tested on the system model. The corresponding time domain bus voltage magnitude and angle profiles are generated via invoking external solver, displayed to users and then saved for future analysis.
BibTeX:
@inproceedings{6523770,
  author = {Jian Shi and Amgai, R. and Abdelwahed, S. and Dubey, A. and Humphreys, J. and Alattar, M. and Jia, R.},
  title = {Generic modeling and analysis framework for shipboard system design},
  booktitle = {Electric Ship Technologies Symposium (ESTS), 2013 IEEE},
  year = {2013},
  pages = {420-428},
  doi = {10.1109/ESTS.2013.6523770}
}
Dubey A, Karsai G and Mahadevan N (2013). Fault-Adaptivity in Hard Real-Time Component-Based Software Systems, In Software Engineering for Self-Adaptive Systems II. Vol. 7475, pp. 294-323. Springer Berlin Heidelberg.
BibTeX:
@incollection{book1,
  author = {Dubey, Abhishek and Karsai, Gabor and Mahadevan, Nagabhushan},
  editor = {Lemos, Rogerio and Giese, Holger and Muller, HausiA. and Shaw, Mary},
  title = {Fault-Adaptivity in Hard Real-Time Component-Based Software Systems},
  booktitle = {Software Engineering for Self-Adaptive Systems II},
  publisher = {Springer Berlin Heidelberg},
  year = {2013},
  volume = {7475},
  pages = {294-323},
  url = {http://dx.doi.org/10.1007/978-3-642-35813-5_12},
  doi = {10.1007/978-3-642-35813-5_12}
}
Balasubramanian D, Emfinger W, Kumar P, Otte W, Dubey A and Karsai G (2013). An Application Development and Deployment Platform for Satellite Clusters, In Workshop on Spacecraft Flight Software.
BibTeX:
@inproceedings{FSW2013,
  author = {Daniel Balasubramanian and William Emfinger and Pranav Kumar and William Otte and Abhishek Dubey and Gabor Karsai },
  title = {An Application Development and Deployment Platform for Satellite Clusters},
  booktitle = {Workshop on Spacecraft Flight Software},
  year = {2013},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/FSW.pdf}
}
Pradhan S, Otte W, Dubey A, Gokhale A and Karsai G (2013). Towards a Resilient Deployment and Configuration Infrastructure for Fractionated Spacecraft, In Proceedings of the 5th Workshop on Adaptive and Reconfigurable Embedded Systems (APRES '13), CPSWeek. Philadelphia, PA, USA, April, 2013. IEEE.
BibTeX:
@inproceedings{ISIS_F6_DnC_APRES:13,
  author = {Subhav Pradhan and William Otte and Abhishek Dubey and Aniruddha Gokhale and Gabor Karsai},
  title = {Towards a Resilient Deployment and Configuration Infrastructure for Fractionated Spacecraft},
  booktitle = {Proceedings of the 5th Workshop on Adaptive and Reconfigurable Embedded Systems (APRES '13), CPSWeek},
  publisher = {IEEE},
  year = {2013},
  url = {http://sigbed.seas.upenn.edu/archives/2013-12/apres-4.pdf}
}
Dubey A, Gokhale A, Karsai G, Otte W and Willemsen J (2013). A Model-Driven Software Component Framework for Fractionated Spacecraft, In Proceedings of the 5th International Conference on Spacecraft Formation Flying Missions and Technologies (SFFMT). Munich, Germany, May, 2013. IEEE.
Abstract: Fractionated spacecraft is a novel space architecture that uses a cluster of small spacecraft modules (with their own attitude control and propulsion systems) connected via wireless links to accomplish complex missions. Resources, such as sensors, persistent storage space, processing power, and downlink bandwidth can be shared among the members of the cluster thanks to the networking. Such spacecraft can serve as a cost effective, highly adaptable, and fault tolerant platform for running various distributed mission software applications that collect, process, and downlink data. Naturally, a key component in such a system is the software platform: the distributed operating system and software infrastructure that makes such applications possible. Existing operating systems are insufficient, and newer technologies like component frameworks do not address all the requirements of such flexible space architectures. The high degree of flexibility and the need for thorough planning and analysis of the resource management necessitates the use of advanced development techniques. This paper describes the core principles and design of a software component framework for fractionated spacecraft that is a special case of a distributed real-time embedded system. Additionally we describe how a model-driven development environment helps with the design and engineering of complex applications for this platform.
BibTeX:
@inproceedings{ISIS_F6_SFFMT:13,
  author = {Abhishek Dubey and Aniruddha Gokhale and Gabor Karsai and William Otte and Johnny Willemsen},
  title = {A Model-Driven Software Component Framework for Fractionated Spacecraft},
  booktitle = {Proceedings of the 5th International Conference on Spacecraft Formation Flying Missions and Technologies (SFFMT)},
  publisher = {IEEE},
  year = {2013},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/f6mdk.pdf}
}
Mahadevan N, Dubey A, Balasubramanian D and Karsai G (2013). Deliberative, search-based mitigation strategies for model-based software health management. Innovations in Systems and Software Engineering. Vol. 9(4), pp. 293-318. Springer London.
Abstract: Rising software complexity in aerospace systems makes them very difficult to analyze and prepare for all possible fault scenarios at design time; therefore, classical run-time fault tolerance techniques such as self-checking pairs and triple modular redundancy are used. However, several recent incidents have made it clear that existing software fault tolerance techniques alone are not sufficient. To improve system dependability, simpler, yet formally specified and verified run-time monitoring, diagnosis, and fault mitigation capabilities are needed. Such architectures are already in use for managing the health of vehicles and systems. Software health management is the application of these techniques to software systems. In this paper, we briefly describe the software health management techniques and architecture developed by our research group. The foundation of the architecture is a real-time component framework (built upon ARINC-653 platform services) that defines a model of computation for software components. Dedicated architectural elements: the Component Level Health Manager (CLHM) and System Level Health Manager (SLHM) provide the health management services: anomaly detection, fault source isolation, and fault mitigation. The SLHM includes a diagnosis engine that (1) uses a Timed Failure Propagation Graph (TFPG) model derived from the component assembly model, (2) reasons about cascading fault effects in the system, and (3) isolates the fault source component(s). Thereafter, the appropriate system-level mitigation action is taken. The main focus of this article is the description of the fault mitigation architecture that uses goal-based deliberative reasoning to determine the best mitigation actions for recovering the system from the identified failure mode.
BibTeX:
@article{mykey,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Balasubramanian, Daniel and Karsai, Gabor},
  title = {Deliberative, search-based mitigation strategies for model-based software health management},
  journal = {Innovations in Systems and Software Engineering},
  publisher = {Springer London},
  year = {2013},
  volume = {9},
  number = {4},
  pages = {293-318},
  url = {http://dx.doi.org/10.1007/s11334-013-0215-x},
  doi = {10.1007/s11334-013-0215-x}
}

2012

Dubey A, Mahadevan N and Karsai G (2012). The Inertial Measurement Unit Example: A Software Health Management Case Study, 02/2012, 2012. (ISIS-12-101)
Abstract: This report captures in detail a Two-level Software Health Management strategy on a real-life example of an Inertial Measurement Unit subsystem. We describe in detail the design of the component and system level health management strategy. Results are expressed as relevant portions of the detailed logs that shows the successful adaptation of the monitor/ detect/ diagnose/ mitigate approach to Software Health Management.
BibTeX:
@article{4496,
  author = {Dubey, Abhishek and Mahadevan, Nagabhushan and Karsai, Gabor},
  title = {The Inertial Measurement Unit Example: A Software Health Management Case Study},
  year = {2012},
  number = {ISIS-12-101}
}
Dubey A, Karsai G and Mahadevan N (2012). Formalization of a Component Model for Real-time Systems, 04/2012, 2012.
Abstract: Component-based software development for real-time systems necessitates a well-defined textquoteleftcomponent modeltextquoteright that allows compositional analysis and reasoning about systems. Such a model defines what a component is, how it works, and how it interacts with other components. It is especially important for real-time systems to have such a component model, as many problems in these systems arise from poorly understood and analyzed component interactions. In this paper we describe a component model for hard real-time systems that relies on the services of an ARINC-653 compliant real-time operating system platform. The model provides high-level abstractions of component interactions, both for the synchronous and asynchronous case. We present a formalization of the component model in the form of timed transition traces. Such formalization is necessary to be able to derive interesting system level properties such as fault propagation graphs from models of component assemblies. We provide a brief discussion about such system level fault propagation templates for this component model.
BibTeX:
@article{4507,
  author = {Dubey, Abhishek and Karsai, Gabor and Mahadevan, Nagabhushan},
  title = {Formalization of a Component Model for Real-time Systems},
  year = {2012}
}
Monceaux WP, Evans DE, Rappold KN, Butler CD, Abdelwahed S, Mehrotra R and Dubey A (2012). Implementing Autonomic Computing Methods to Improve Attack Resilience in Web Services , pp. 422.
BibTeX:
@article{4574,
  author = {Monceaux, Weston P and Evans, Deland E and Rappold, Keith N and Butler, Cary D and Abdelwahed, Sherif and Rajat Mehrotra and Dubey, Abhishek},
  title = {Implementing Autonomic Computing Methods to Improve Attack Resilience in Web Services},
  year = {2012},
  pages = {422}
}
Dubey A, Emfinger W, Gokhale A, Karsai G, Otte W, Parsons J, Szabo C, Coglio A, Smith E and Bose P (2012). A software platform for fractionated spacecraft, In Aerospace Conference, 2012 IEEE., March, 2012. , pp. 1-20.
Abstract: A fractionated spacecraft is a cluster of independent modules that interact wirelessly to maintain cluster flight and realize the functions usually performed by a monolithic satellite. This spacecraft architecture poses novel software challenges because the hardware platform is inherently distributed, with highly fluctuating connectivity among the modules. It is critical for mission success to support autonomous fault management and to satisfy real-time performance requirements. It is also both critical and challenging to support multiple organizations and users whose diverse software applications have changing demands for computational and communication resources, while operating on different levels and in separate domains of security. The solution proposed in this paper is based on a layered architecture consisting of a novel operating system, a middleware layer, and component-structured applications. The operating system provides primitives for concurrency, synchronization, and secure information flows; it also enforces application separation and resource management policies. The middleware provides higher-level services supporting request/response and publish/subscribe interactions for distributed software. The component model facilitates the creation of software applications from modular and reusable components that are deployed in the distributed system and interact only through well-defined mechanisms. Two cross-cutting aspects - multi-level security and multi-layered fault management - are addressed at all levels of the architecture. The complexity of creating applications and performing system integration is mitigated through the use of a domain-specific model-driven development process that relies on a dedicated modeling language and its accompanying graphical modeling tools, software generators for synthesizing infrastructure code, and the extensive use of model-based analysis for verification and validation.
BibTeX:
@inproceedings{6187334,
  author = {Dubey, A. and Emfinger, W. and Gokhale, A. and Karsai, G. and Otte, W.R. and Parsons, J. and Szabo, C. and Coglio, A. and Smith, E. and Bose, P.},
  title = {A software platform for fractionated spacecraft},
  booktitle = {Aerospace Conference, 2012 IEEE},
  year = {2012},
  pages = {1--20},
  doi = {10.1109/AERO.2012.6187334}
}
Mahadevan N, Dubey A and Karsai G (2012). Architecting Health Management into Software Component Assemblies: Lessons Learned from the ARINC-653 Component Model, In Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC), 2012 IEEE 15th International Symposium on., April, 2012. , pp. 79-86.
Abstract: Complex real-time software systems require an active fault management capability. While testing, verification and validation schemes and their constant evolution help improve the dependability of these systems, an active fault management strategy is essential to potentially mitigate the unacceptable behaviors at run-time. In our work we have applied the experience gained from the field of Systems Health Management towards component-based software systems. The software components interact via well-defined concurrency patterns and are executed on a real-time component framework built upon ARINC-653 platform services. In this paper, we present the lessons learned in architecting and applying a two-level health management strategy to assemblies of software components.
BibTeX:
@inproceedings{6195864,
  author = {Mahadevan, Nag and Dubey, Abhishek and Karsai, Gabor},
  title = {Architecting Health Management into Software Component Assemblies: Lessons Learned from the ARINC-653 Component Model},
  booktitle = {Object/Component/Service-Oriented Real-Time Distributed Computing (ISORC), 2012 IEEE 15th International Symposium on},
  year = {2012},
  pages = {79-86},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Paper_9.pdf},
  doi = {10.1109/ISORC.2012.19}
}
Dabholkar A, Dubey A, Gokhale A, Karsai G and Mahadevan N (2012). Reliable Distributed Real-Time and Embedded Systems through Safe Middleware Adaptation, In Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on., Oct, 2012. , pp. 362-371.
Abstract: Distributed real-time and embedded (DRE) systems are a class of real-time systems formed through a composition of predominantly legacy, closed and statically scheduled realtime subsystems, which comprise over-provisioned resources to deal with worst-case failure scenarios. The formation of the systems of systems leads to a new range of faults that manifest at different granularities for which no statically defined fault tolerance scheme applies. Thus, dynamic and adaptive fault tolerance mechanisms are needed which must execute within the available resources without compromising the safety and timeliness of existing real-time tasks in the individual subsystems. To address these requirements, this paper describes a middleware solution called Safe Middleware Adaptation for Real-Time Fault Tolerance (SafeMAT), which opportunistically leverages the available slack in the over-provisioned resources of individual subsystems. SafeMAT comprises three primary artifacts: (1) a flexible and configurable distributed, runtime resource monitoring framework that can pinpoint in real-time the available slack in the system that is used in making dynamic and adaptive fault tolerance decisions; (2) a safe and resourceaware dynamic failure adaptation algorithm that enables efficient recovery from different granularities of failures within the available slack in the execution schedule while ensuring real-time constraints are not violated and resources are not overloaded; and (3) a framework that empirically validates the correctness of the dynamic mechanisms and the safety of the DRE system. Experimental results evaluating SafeMAT on an avionics application indicates that SafeMAT incurs only 9-15 % runtime failover and 2-6 % processor utilization overheads at runtime thereby providing safe and predictable failure adaptability in real-time
BibTeX:
@inproceedings{6424876,
  author = {Dabholkar, A. and Dubey, A. and Gokhale, A. and Karsai, G. and Mahadevan, N.},
  title = {Reliable Distributed Real-Time and Embedded Systems through Safe Middleware Adaptation},
  booktitle = {Reliable Distributed Systems (SRDS), 2012 IEEE 31st Symposium on},
  year = {2012},
  pages = {362-371},
  doi = {10.1109/SRDS.2012.59}
}
Dubey A, Mahadevan N and Karsai G (2012). A Deliberative Reasoner for Model-Based Software Health Management, In The Eighth International Conference on Autonomic and Autonomous Systems. , pp. 86-92.
Abstract: While traditional design-time and off-line approaches
to testing and verification contribute significantly to
improving and ensuring high dependability of software, they
may not cover all possible fault scenarios that a system could
encounter at runtime. Thus, runtime ‘health management’ of
complex embedded software systems is needed to improve their
dependability. Our approach to Software Health Management
uses concepts from the field of ‘Systems Health Management’:
detection, diagnosis and mitigation. In earlier work we had
shown how to use a reactive mitigation strategy specified
using a timed state machine model for system health manager.
This paper describes the algorithm and key concepts for an
alternative approach to system mitigation using a deliberative
strategy, which relies on a function-allocation model to identify
alternative component-assembly configurations that can restore
the functions needed for the goals of the system.
BibTeX:
@inproceedings{Dubey2012,
  author = {Abhishek Dubey and Nagabhushan Mahadevan and Gabor Karsai},
  title = {A Deliberative Reasoner for Model-Based Software Health Management},
  booktitle = {The Eighth International Conference on Autonomic and Autonomous Systems},
  year = {2012},
  pages = {86--92},
  note = {Best Paper Award},
  url = {http://www.thinkmind.org/download.php?articleid=icas_2012_4_30_20079}
}
Mehrotra R, Dubey A, Abdelwahed S and Krisa R (2012). RFDMon: A Real-time and Fault-tolerant Distributed System Monitoring Approach, In The Eighth International Conference on Autonomic and Autonomous Systems. , pp. 57-63.
Abstract: One of the main requirements for building an autonomic system is to have a robust monitoring framework. In this paper, a systematic distributed event based (DEB) system monitoring framework “RFDMon ” is presented for measuring system variables (CPU utilization, memory utilization, disk utilization, network utilization, etc.), system health (temperature and voltage of Motherboard and CPU) application performance variables (application response time, queue size, and throughput), and scientific application data structures (PBS information and MPI variables) accurately with minimum latency at a specified rate and with controllable resource utilization. This framework is designed to be tolerant to faults in monitoring framework, self-configuring (can start and stop monitoring the nodes and configure monitors for threshold values/changes for publishing the measurements), aware of execution of the framework on multiple nodes through HEARTBEAT messages, extensive (monitors multiple parameters through periodic and aperiodic sensors), resource constrainable (computational resources can be limited for monitors), and expandable for adding extra monitors on the fly. Since RFDMon uses a Data Distribution Services (DDS) middleware, it can be used for deploying in systems with heterogeneous nodes. Additionally, it provides a functionality to limit the maximum cap on resources consumed by monitoring processes such that it reduces the effect on the availability of resources for the applications.
BibTeX:
@inproceedings{Dubey20122,
  author = {Rajat Mehrotra and Abhishek Dubey and Sherif Abdelwahed and Rowland Krisa},
  title = {RFDMon: A Real-time and Fault-tolerant Distributed System Monitoring Approach},
  booktitle = {The Eighth International Conference on Autonomic and Autonomous Systems},
  year = {2012},
  pages = {57--63},
  url = {http://www.thinkmind.org/download.php?articleid=icas_2012_3_10_20052}
}

2011

Mahadevan N, Dubey A and Karsai G (2011). A Case Study On The Application of Software Health Management Techniques Nashville, 01/2011, 2011. (ISIS-11-101)
Abstract: Ever increasing complexity of software used in large-scale, safety critical cyber-physical systems makes it increasingly difficult to expose and thence correct all potential bugs. There is a need to augment the existing fault tolerance methodologies with new approaches that address latent software bugs exposed at runtime. This paper describes an approach that borrows and adapts traditional textquoteleftSystems Health Managementtextquoteright techniques to improve software dependability through simple formal specification of runtime monitoring, diagnosis and mitigation strategies. The two-level approach of Health Management at Component and System level is demonstrated on a simulated case study of an Air Data Inertial Reference Unit (ADIRU). That subsystem was categorized as the primary failure source for the in-flight upset caused in the Malaysian Air flight 124 over Perth, Australia in August 2005.
BibTeX:
@article{4245,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Karsai, Gabor},
  title = {A Case Study On The Application of Software Health Management Techniques},
  year = {2011},
  number = {ISIS-11-101}
}
Saxena T and Dubey A (2011). Meta-Tools For Designing Scientific Workflow Management Systems: Part-I, Survey (ISIS-11-105)
BibTeX:
@article{4381,
  author = {Saxena, Tripti and Dubey, Abhishek},
  title = {Meta-Tools For Designing Scientific Workflow Management Systems: Part-I, Survey},
  year = {2011},
  number = {ISIS-11-105}
}
Mehrotra R, Dubey A, Kwalkowski J, Paterno M, Singh A, Herber R and Abdelwahed S (2011). RFDMon: A Real-Time and Fault-Tolerant Distributed System Monitoring Approach Nashville, 10/2011, 2011.
BibTeX:
@article{4477,
  author = {Rajat Mehrotra and Dubey, Abhishek and Jim Kwalkowski and Marc Paterno and Amitoj Singh and Randolph Herber and Abdelwahed, Sherif},
  title = {RFDMon: A Real-Time and Fault-Tolerant Distributed System Monitoring Approach},
  year = {2011}
}
Dubey A, Karsai G and Mahadevan N (2011). Model-based software health management for real-time systems, In Aerospace Conference, 2011 IEEE., March, 2011. , pp. 1-18.
Abstract: Complexity of software systems has reached the point where we need run-time mechanisms that can be used to provide fault management services. Testing and verification may not cover all possible scenarios that a system will encounter, hence a simpler, yet formally specified run-time monitoring, diagnosis, and fault mitigation architecture is needed to increase the software system's dependability. The approach described in this paper borrows concepts and principles from the field of “Systems Health Management” for complex systems and implements a two level health management strategy that can be applied through a model-based software development process. The Component-level Health Manager (CLHM) for software components provides a localized and limited functionality for managing the health of a component locally. It also reports to the higher-level System Health Manager (SHM) which manages the health of the overall system. SHM consists of a diagnosis engine that uses the timed fault propagation (TFPG) model based on the component assembly. It reasons about the anomalies reported by CLHM and hypothesizes about the possible fault sources. Thereafter, necessary system level mitigation action can be taken. System-level mitigation approaches are subject of on-going investigations and have not been included in this paper. We conclude the paper with case study and discussion.
BibTeX:
@inproceedings{5747559,
  author = {Abhsihek Dubey and Gabor Karsai and Nagabhushan Mahadevan},
  title = {Model-based software health management for real-time systems},
  booktitle = {Aerospace Conference, 2011 IEEE},
  year = {2011},
  pages = {1--18},
  doi = {10.1109/AERO.2011.5747559}
}
Mehrotra R, Dubey A, Abdelwahed S and Monceaux W (2011). Large Scale Monitoring and Online Analysis in a Distributed Virtualized Environment, In Engineering of Autonomic and Autonomous Systems (EASe), 2011 8th IEEE International Conference and Workshops on., April, 2011. , pp. 1-9.
Abstract: Due to increase in number and complexity of the large scale systems, performance monitoring and multidimensional quality of service (QoS) management has become a difficult and error prone task for system administrators. Recently, the trend has been to use virtualization technology, which facilitates hosting of multiple distributed systems with minimum infrastructure cost via sharing of computational and memory resources among multiple instances, and allows dynamic creation of even bigger clusters. An effective monitoring technique should not only be fine grained with respect to the measured variables, but also should be able to provide a high level overview of the distributed systems to the administrator of all variables that can affect the QoS requirements. At the same time, the technique should not add performance burden to the system. Finally, it should be integrated with a control methodology that manages performance of the enterprise system. In this paper, a systematic distributed event based (DEB) performance monitoring approach is presented for distributed systems by measuring system variables (physical/virtual CPU utilization and memory utilization), application variables (application queue size, queue waiting time, and service time), and performance variables (response time, throughput, and power consumption) accurately with minimum latency at a specified rate. Furthermore, we have shown that proposed monitoring approach can be utilized to provide input to an application monitoring utility to understand the underlying performance model of the system for a successful on-line control of the distributed systems for achieving predefined QoS parameters.
BibTeX:
@inproceedings{5946180,
  author = {Mehrotra, R. and Dubey, A. and Abdelwahed, S. and Monceaux, W.},
  title = {Large Scale Monitoring and Online Analysis in a Distributed Virtualized Environment},
  booktitle = {Engineering of Autonomic and Autonomous Systems (EASe), 2011 8th IEEE International Conference and Workshops on},
  year = {2011},
  pages = {1--9},
  doi = {10.1109/EASe.2011.17}
}
Roy N, Dubey A and Gokhale A (2011). Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting, In Cloud Computing (CLOUD), 2011 IEEE International Conference on., July, 2011. , pp. 500-507.
Abstract: Large-scale component-based enterprise applications that leverage Cloud resources expect Quality of Service(QoS) guarantees in accordance with service level agreements between the customer and service providers. In the context of Cloud computing, auto scaling mechanisms hold the promise of assuring QoS properties to the applications while simultaneously making efficient use of resources and keeping operational costs low for the service providers. Despite the perceived advantages of auto scaling, realizing the full potential of auto scaling is hard due to multiple challenges stemming from the need to precisely estimate resource usage in the face of significant variability in client workload patterns. This paper makes three contributions to overcome the general lack of effective techniques for workload forecasting and optimal resource allocation. First, it discusses the challenges involved in auto scaling in the cloud. Second, it develops a model-predictive algorithm for workload forecasting that is used for resource auto scaling. Finally, empirical results are provided that demonstrate that resources can be allocated and deal located by our algorithm in a way that satisfies both the application QoS while keeping operational costs low.
BibTeX:
@inproceedings{6008748,
  author = {Nilabja Roy and Abhishek Dubey and Aniruddha Gokhale},
  title = {Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting},
  booktitle = {Cloud Computing (CLOUD), 2011 IEEE International Conference on},
  year = {2011},
  pages = {500--507},
  doi = {10.1109/CLOUD.2011.42}
}
Dubey A, Karsai G and Mahadevan N (2011). A Component Model for Hard Real-time Systems: CCM wwith ARINC-653. Software: Practice and Experience. Vol. 41(12), pp. 1517-1550. John Wiley & Sons, Ltd.
Abstract: The size and complexity of software in safety-critical systems is increasing at a rapid pace. One technology that can be used to mitigate this complexity is component-based software development. However, in spite of the apparent benefits of a component-based approach to development, little work has been done in applying these concepts to hard real-time systems. This paper improves the state of the art by making three contributions: (1) we present a component model for hard real-time systems and define the semantics of different types of component interactions; (2) we present an implementation of a middleware that supports this component model. This middleware combines an open-source CORBA Component Model (CCM) implementation (MICO) with ARINC-653: a state-of-the-art real-time operating systems (RTOS) standard, (3) finally; we describe a modeling environment that enables design, analysis, and deployment of component assemblies. We conclude with a discussion of the lessons learned during this exercise. Our experiences point toward extending both the CCM as well as revising the ARINC-653.
BibTeX:
@article{ACM_SPE:10,
  author = {Dubey, Abhishek and Karsai, Gabor and Mahadevan, Nagabhushan},
  title = {A Component Model for Hard Real-time Systems: CCM wwith ARINC-653},
  journal = {Software: Practice and Experience},
  publisher = {John Wiley & Sons, Ltd},
  year = {2011},
  volume = {41},
  number = {12},
  pages = {1517--1550},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Journal_0.pdf},
  doi = {10.1002/spe.1083}
}
Abdelwahed S, Dubey A, Karsai G and Mahadevan N (2011). Machine learning and knowledge discovery for engineering systems health management, November, 2011. Chapman and Hall/CRC Press.
Abstract: System-level detection, diagnosis, and mitigation of faults in complex systems that include physical components as well as software are essential to achieve high dependability. The paper introduces a model, referred to as the Timed Failure Propagation Graph (TFPG) that captures the causal propagation of observable fault effects in systems. Several algorithms based on this model have been developed, including: consistency-based centralized and distributed algorithms for multiple-fault fault source isolation in real-time, algorithms to calculate diagnosability metrics, and algorithms to prognosticate impending failures. The model and the associated algorithms are applicable to physical systems but recently they have been applied to component-based software systems as well, where similar fault propagation can take place. The paper describes the modeling paradigm, the algorithms developed, and how they were applied in a system and software context.
BibTeX:
@inbook{book10,
  author = {Sherif Abdelwahed and Abhishek Dubey and Gabor Karsai and Nagabhushan Mahadevan},
  editor = {Ashok Srivastava and Jiawei Han},
  title = {Machine learning and knowledge discovery for engineering systems health management},
  publisher = {Chapman and Hall/CRC Press},
  year = {2011}
}
Mahadevan N, Dubey A and Karsai G (2011). Application of Software Health Management Techniques, In Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. New York, NY, USA , pp. 1-10. ACM.
Abstract: The growing complexity of software used in large-scale, safety critical cyber-physical systems makes it increasingly difficult to expose and hence correct all potential defects. There is a need to augment the existing fault tolerance methodologies with new approaches that address latent software defects exposed at runtime. This paper describes an approach that borrows and adapts traditional 'System Health Management' techniques to improve software dependability through simple formal specification of runtime monitoring, diagnosis, and mitigation strategies. The two-level approach to health management at the component and system level is demonstrated on a simulated case study of an Air Data Inertial Reference Unit (ADIRU). An ADIRU was categorized as the primary failure source for the in-flight upset caused in the Malaysian Air flight 124 over Perth, Australia in 2005.
BibTeX:
@inproceedings{Mahadevan:2011:ASH:1988008.1988010,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Karsai, Gabor},
  title = {Application of Software Health Management Techniques},
  booktitle = {Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems},
  publisher = {ACM},
  year = {2011},
  pages = {1--10},
  url = {http://doi.acm.org/10.1145/1988008.1988010},
  doi = {10.1145/1988008.1988010}
}
Mehrotra R, Dubey A, Abdelwahed S and Tantawi A (2011). A power-aware modeling and autonomic management framework for distributed computing systems. Handbook of Energy-Aware and Green Computing. Vol. 2
BibTeX:
@article{mehrotra2011power,
  author = {Mehrotra, Rajat and Dubey, Abhishek and Abdelwahed, Sherif and Tantawi, Asser},
  title = {A power-aware modeling and autonomic management framework for distributed computing systems},
  journal = {Handbook of Energy-Aware and Green Computing},
  year = {2011},
  volume = {2}
}
Nordstrom S, Dubey A, Keskinpala T, Neema S and Bapty T (2011). Autonomic Healing of Model-Based Systems. Journal of Aerospace Computing, Information, and Communication. Vol. 8(4), pp. 87-99.
BibTeX:
@article{nordstrom2011autonomic,
  author = {Nordstrom, Steve and Dubey, Abhishek and Keskinpala, Turker and Neema, Sandeep and Bapty, Theodore},
  title = {Autonomic Healing of Model-Based Systems},
  journal = {Journal of Aerospace Computing, Information, and Communication},
  year = {2011},
  volume = {8},
  number = {4},
  pages = {87--99},
  doi = {10.2514/1.31940}
}
Roy N, Dubey A, Gokhale A and Dowdy L (2011). A Capacity Planning Process for Performance Assurance of Component-based Distributed Systems, In Proceeding of the second joint WOSP/SIPEW international conference on Performance engineering. New York, NY, USA , pp. 259-270. ACM.
Abstract: For service providers of multi-tiered component-based applications, such as web portals, assuring high performance and availability to their customers without impacting revenue requires effective and careful capacity planning that aims at minimizing the number of resources, and utilizing them efficiently while simultaneously supporting a large customer base and meeting their service level agreements. This paper presents a novel, hybrid capacity planning process that results from a systematic blending of 1) analytical modeling, where traditional modeling techniques are enhanced to overcome their limitations in providing accurate performance estimates; 2) profile-based techniques, which determine performance profiles of individual software components for use in resource allocation and balancing resource usage; and 3) allocation heuristics that determine minimum number of resources to allocate software components. Our results illustrate that using our technique, performance (i.e., bounded response time) can be assured while reducing operating costs by using 25% less resources and increasing revenues by handling 20% more clients compared to traditional approaches.
BibTeX:
@inproceedings{Roy:2011:CPP:1958746.1958784,
  author = {Roy, Nilabja and Dubey, Abhishek and Gokhale, Aniruddha and Dowdy, Larry},
  title = {A Capacity Planning Process for Performance Assurance of Component-based Distributed Systems},
  booktitle = {Proceeding of the second joint WOSP/SIPEW international conference on Performance engineering},
  publisher = {ACM},
  year = {2011},
  pages = {259--270},
  url = {http://www.dre.vanderbilt.edu/~gokhale/WWW/papers/ICPE11_MAQPRO.pdf},
  doi = {10.1145/1958746.1958784}
}

2010

Saxena T, Dubey A, Balasubramanian D and Karsai G (2010). Enabling Self-Management by Using Model-Based Design Space Exploration, In 7th IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe). Los Alamitos, CA, USA , pp. 137-144. IEEE Computer Society.
Abstract: Reconfiguration and self-management are important properties for systems that operate in hazardous and uncontrolled environments, such as inter-planetary space. These systems need a reconfiguration mechanism that provides recovery from individual component failures as well as the ability to dynamically adapt to evolving mission goals. One way to provide this functionality is to define a model of alternative system configurations and allow the system to choose the current configuration based on its current state, including environmental parameters and goals. The primary difficulties with this approach are (1) the state space of configurations can grow very large, which can make explicit enumeration infeasible, and (2) the component failures and evolving system goals must be somehow encoded in the system configuration model. This paper describes an online reconfiguration method based on model-based design-space exploration. We symbolically encode the set of valid system configurations and assert the current system state and goals as symbolic constraints. Our initial work indicates that this method scales and is capable of providing effective online dynamic reconfiguration.
BibTeX:
@inproceedings{10.1109/EASe.2010.22,
  author = {Tripti Saxena and Abhishek Dubey and Daniel Balasubramanian and Gabor Karsai},
  title = {Enabling Self-Management by Using Model-Based Design Space Exploration},
  booktitle = {7th IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe)},
  publisher = {IEEE Computer Society},
  year = {2010},
  pages = {137-144},
  doi = {10.1109/EASe.2010.22}
}
Dubey A, Karsai G, Kereskenyi R and Mahadevan N (2010). A Real-Time Component Framework: Experience with CCM and ARINC-653, In IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. Los Alamitos, CA, USA , pp. 143-150. IEEE Computer Society.
Abstract: The complexity of software in systems like aerospace vehicles has reached the point where new techniques are needed to ensure system dependability while improving the productivity of developers.
One possible approach is to use precisely defined software execution platforms that (1) enable the system to be composed from separate components, (2) restrict component interactions and prevent fault propagation, and (3) whose compositional properties are well-known.
In this paper we describe the initial steps towards building a platform that combines component-based software construction with hard real-time operating system services. Specifically, the paper discusses how the CORBA Component Model (CCM) could be combined with the ARINC-653 platform services and the lessons learned from this experiment. The results point towards both extending the CCM as well as revising the ARINC-653.
BibTeX:
@inproceedings{10.1109/ISORC.2010.39,
  author = {Abhishek Dubey and Gabor Karsai and Robert Kereskenyi and Nagabhushan Mahadevan},
  title = {A Real-Time Component Framework: Experience with CCM and ARINC-653},
  booktitle = {IEEE International Symposium on Object-Oriented Real-Time Distributed Computing},
  publisher = {IEEE Computer Society},
  year = {2010},
  pages = {143--150},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Paper_2.pdf},
  doi = {10.1109/ISORC.2010.39}
}
Mehrotra R, Dubey A, Abdelwahed S and Tantawi A (2010). Integrated Monitoring and Control for Performance Management of Distributed Enterprise Systems, In International Symposium on Modeling, Analysis, and Simulation of Computer Systems. Los Alamitos, CA, USA , pp. 424-426. IEEE Computer Society.
Abstract: This paper describes an integrated monitoring and control framework for managing performance of distributed enterprise systems.
BibTeX:
@inproceedings{10.1109/MASCOTS.2010.57,
  author = {Rajat Mehrotra and Abhishek Dubey and Sherif Abdelwahed and Asser Tantawi},
  title = {Integrated Monitoring and Control for Performance Management of Distributed Enterprise Systems},
  booktitle = {International Symposium on Modeling, Analysis, and Simulation of Computer Systems},
  publisher = {IEEE Computer Society},
  year = {2010},
  pages = {424--426},
  doi = {10.1109/MASCOTS.2010.57}
}
Piccoli L, Dubey A, Simone JN and Kowalkowlski JB (2010). LQCD workflow execution framework: Models, provenance and fault-tolerance. Journal of Physics: Conference Series. Vol. 219(7), pp. 072047.
Abstract: Large computing clusters used for scientific processing suffer from systemic failures when operated over long continuous periods for executing workflows. Diagnosing job problems and faults leading to eventual failures in this complex environment is difficult, specifically when the success of whole workflow might be affected by a single job failure.
In this paper, we introduce a model-based, hierarchical, reliable execution framework that encompass workflow specification, data provenance, execution tracking and online monitoring of each workflow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is translated into a concrete data dependency based sequence of participants with defined arguments.

As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified rules. These rules specify conditions that must be true pre-execution, during execution and post-execution.

Monitoring information for each participant is propagated upwards through the reflex and healing architecture, which consist of hierarchical network of decentralized fault management entities, called reflex engines. They are instantiated as state machines or timed automatons that change state and initiate reflexive mitigation action(s) upon occurrence of certain faults.

We describe how this cluster reliability framework is combined with the workflow execution framework using formal rules and actions specified within a structure of first order predicate logic that enables a dynamic management design that reduces manual administrative workload, and increases cluster-productivity. Preliminary results on a virtual setup with injection failures are shown.

BibTeX:
@article{1742-6596-219-7-072047,
  author = {Luciano Piccoli and Abhishek Dubey and James N Simone and James B Kowalkowlski},
  title = {LQCD workflow execution framework: Models, provenance and fault-tolerance},
  journal = {Journal of Physics: Conference Series},
  year = {2010},
  volume = {219},
  number = {7},
  pages = {072047},
  url = {http://stacks.iop.org/1742-6596/219/i=7/a=072047}
}
Mehrotra R, Dubey A, Abdelwahed S and Tantawi A (2010). Model Identification for Performance Management of Distributed Enterprise Systems Nashville (ISIS-10-104)
BibTeX:
@article{4181,
  author = {Rajat Mehrotra and Dubey, Abhishek and Abdelwahed, Sherif and Asser Tantawi},
  title = {Model Identification for Performance Management of Distributed Enterprise Systems},
  year = {2010},
  number = {ISIS-10-104}
}
Dubey A, Karsai G and Mahadevan N (2010). Towards Model-based Software Health Management for Real-Time Systems
Abstract: The complexity of software systems has reached the point where we need run-time mechanisms that
can be used to provide fault management services. Testing and verification may not cover all possible
scenarios that a system can encounter, hence a simpler, yet formally specified run-time monitoring,
diagnosis, and fault mitigation architecture is needed to increase the software systemtextquoterights dependability.
The approach described in this paper borrows concepts and principles from the field of textquoteleftSystems Health
Managementtextquoteright for complex systems. The paper introduces the fundamental ideas for software health
management, and then illustrates how these can be implemented in a model-based software development process, including a case study and related work.
BibTeX:
@article{4196,
  author = {Dubey, Abhishek and Karsai, Gabor and Mahadevan, Nagabhushan},
  title = {Towards Model-based Software Health Management for Real-Time Systems},
  year = {2010}
}
Neema H, Dubey A and Karsai G (2010). A Report On Simulating External Applications With SOAMANET in the Loop Nashville, 08/2010, 2010. (ISIS-10-108)
BibTeX:
@article{4201,
  author = {Neema, Himanshu and Dubey, Abhishek and Karsai, Gabor},
  title = {A Report On Simulating External Applications With SOAMANET in the Loop},
  year = {2010},
  number = {ISIS-10-108}
}
Pan P, Dubey A and Piccoli L (2010). Dynamic Workflow Management and Monitoring Using DDS, In Engineering of Autonomic and Autonomous Systems (EASe), 2010 Seventh IEEE International Conference and Workshops on., March, 2010. , pp. 20-29.
Abstract: Large scientific computing data-centers require a distributed dependability subsystem that can provide fault isolation and recovery and is capable of learning and predicting failures to improve the reliability of scientific workflows. This paper extends our previous work on the autonomic scientific workflow management systems by presenting a hierarchical dynamic workflow management system that tracks the state of job execution using timed state machines. Workflow monitoring is achieved using a reliable distributed monitoring framework, which employs publish-subscribe middleware built upon OMG Data Distribution Service standard. Failure recovery is achieved by stopping and restarting the failed portions of workflow directed acyclic graph.
BibTeX:
@inproceedings{5457822,
  author = {Pan Pan and Abhishek Dubey and Luciano Piccoli},
  title = {Dynamic Workflow Management and Monitoring Using DDS},
  booktitle = {Engineering of Autonomic and Autonomous Systems (EASe), 2010 Seventh IEEE International Conference and Workshops on},
  year = {2010},
  pages = {20-29},
  doi = {10.1109/EASe.2010.12}
}
Mahadevan N, Abdelwahed S, Dubey A and Karsai G (2010). Distributed diagnosis of complex systems using timed failure propagation graph models, In AUTOTESTCON, 2010 IEEE., Sept, 2010. , pp. 1-6.
Abstract: Timed failure propagation graph (TFPG) is a directed graph model that represents temporal progression of failure effects in physical systems. In this paper, a distributed diagnosis approach for complex systems is introduced based on the TFPG model settings. In this approach, the system is partitioned into a set of local subsystems each represented by a subgraph of the global system TFPG model. Information flow between subsystems is achieved through special input and output nodes. A high level diagnoser integrates the diagnosis results of the local subsystems using an abstract high level model to obtain a globally consistent diagnosis of the system.
BibTeX:
@inproceedings{5613575,
  author = {Mahadevan, N. and Abdelwahed, S. and Dubey, A. and Karsai, G.},
  title = {Distributed diagnosis of complex systems using timed failure propagation graph models},
  booktitle = {AUTOTESTCON, 2010 IEEE},
  year = {2010},
  pages = {1-6},
  doi = {10.1109/AUTEST.2010.5613575}
}
Balasubramanian J, Gokhale A, Dubey A, Wolf F, Lu C, Gill C and Schmidt D (2010). Middleware for Resource-Aware Deployment and Configuration of Fault-Tolerant Real-time Systems, In RTAS '10: Proceedings of the 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium. Washington, DC, USA , pp. 69-78. IEEE Computer Society.
BibTeX:
@inproceedings{JaiRtas2010,
  author = {Balasubramanian, Jaiganesh and Gokhale, Aniruddha and Dubey, Abhishek and Wolf, Friedhelm and Lu, Chenyang and Gill, Chris and Schmidt, Douglas},
  title = {Middleware for Resource-Aware Deployment and Configuration of Fault-Tolerant Real-time Systems},
  booktitle = {RTAS '10: Proceedings of the 2010 16th IEEE Real-Time and Embedded Technology and Applications Symposium},
  publisher = {IEEE Computer Society},
  year = {2010},
  pages = {69--78},
  url = {https://www.dre.vanderbilt.edu/~schmidt/PDF/RTAS-2010.pdf},
  doi = {10.1109/RTAS.2010.30}
}

2009

Dubey A, Riley D, Abdelwahed S and Bapty T (2009). Modeling and Analysis of Probabilistic Timed Systems, In IEEE International Conference on the Engineering of Computer-Based Systems. Los Alamitos, CA, USA , pp. 69-78. IEEE Computer Society.
Abstract: Probabilistic models are useful for analyzing systems which operate under the presence of uncertainty. In this paper, we present a technique for verifying safety and liveness properties for probabilistic timed automata. The proposed technique is an extension of a technique used to verify stochastic hybrid automata using an approximation with Markov Decision Processes. A case study for CSMA/CD protocol has been used to show case the methodology used in our technique.
BibTeX:
@inproceedings{10.1109/ECBS.2009.44,
  author = {Abhishek Dubey and Derek Riley and Sherif Abdelwahed and Ted Bapty},
  title = {Modeling and Analysis of Probabilistic Timed Systems},
  booktitle = {IEEE International Conference on the Engineering of Computer-Based Systems},
  publisher = {IEEE Computer Society},
  year = {2009},
  pages = {69-78},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/PTAVerification_0.pdf},
  doi = {10.1109/ECBS.2009.44}
}
Dubey A, Karsai G and Abdelwahed S (2009). Compensating for Timing Jitter in Computing Systems with General-Purpose Operating Systems, In IEEE International Symposium on Object-Oriented Real-Time Distributed Computing. Los Alamitos, CA, USA , pp. 55-62. IEEE Computer Society.
Abstract: Fault-tolerant frameworks for large scale computing clusters require sensor programs, which are executed periodically to facilitate performance and fault management. By construction, these clusters use general purpose operating systems such as Linux that are built for best average case performance and do not provide deterministic scheduling guarantees. Consequently, periodic applications show jitter in execution times relative to the expected execution time. Obtaining a deterministic schedule for periodic tasks in general purpose operating systems is difficult without using kernel-level modifications such as RTAI and RTLinux. However, due to performance and administrative issues kernel modification cannot be used in all scenarios. In this paper, we address the problem of jitter compensation for periodic tasks that cannot rely on modifying the operating system kernel. Towards that, (a) we present motivating examples; (b) we present a feedback controller based approach that runs in the user space and actively compensates periodic schedule based on past jitter; This approach is platform-agnostic i.e. it can be used in different operating systems without modification; and (c) we show through analysis and experiments that this approach is platform-agnostic i.e. it can be used in different operating systems without modification and also that it maintains a stable system with bounded total jitter.
BibTeX:
@inproceedings{10.1109/ISORC.2009.28,
  author = {Abhishek Dubey and Gabor Karsai and Sherif Abdelwahed},
  title = {Compensating for Timing Jitter in Computing Systems with General-Purpose Operating Systems},
  booktitle = {IEEE International Symposium on Object-Oriented Real-Time Distributed Computing},
  publisher = {IEEE Computer Society},
  year = {2009},
  pages = {55--62},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Isorc_0.pdf},
  doi = {10.1109/ISORC.2009.28}
}
Dubey A, Piccoli L, Kowalkowski JB, Simone JN, Sun X-H, Karsai G and Neema S (2009). Using Runtime Verification to Design a Reliable Execution Framework for Scientific Workflows, In EASE '09: Proceedings of the 2009 Sixth IEEE Conference and Workshops on Engineering of Autonomic and Autonomous Systems. Washington, DC, USA , pp. 87-96. IEEE Computer Society.
Abstract: In this paper, we describe the design of a scientific workflow execution framework that integrates runtime verification to monitor its execution and checking it against the formal specifications. For controlling workflow execution, this framework provides for data provenance, execution tracking and online monitoring of each work flow task, also referred to as participants. The sequence of participants is described in an abstract parameterized view, which is used to generate concrete data dependency based sequence of participants with defined arguments. As participants belonging to a workflow are mapped onto machines and executed, periodic and on-demand monitoring of vital health parameters on allocated nodes is enabled according to pre-specified invariant conditions with actions to be taken upon violation of invariants.
BibTeX:
@inproceedings{1545686,
  author = {Dubey, Abhishek and Piccoli, Luciano and Kowalkowski, James B. and Simone, James N. and Sun, Xian-He and Karsai, Gabor and Neema, Sandeep},
  title = {Using Runtime Verification to Design a Reliable Execution Framework for Scientific Workflows},
  booktitle = {EASE '09: Proceedings of the 2009 Sixth IEEE Conference and Workshops on Engineering of Autonomic and Autonomous Systems},
  publisher = {IEEE Computer Society},
  year = {2009},
  pages = {87--96},
  doi = {10.1109/EASe.2009.13}
}
Dubey A (2009). Algorithms for Synthesizing Safe Sets of Operation for Embedded Systems, In ECBS '09: Proceedings of the 2009 16th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems. Washington, DC, USA , pp. 149-155. IEEE Computer Society.
Abstract: A large number of embedded computing systems are modeled as hybrid system with both discrete and continuous dynamics. In this paper, we present algorithms for analyzing nonlinear time-invariant continuous-time systems by employing reachability algorithms. We propose synthesis algorithms for finding sets of initial states for the continuous dynamical systems so that temporal properties, such as safety and liveness properties, are satisfied. The initial sets produced by the algorithms are related to some classical concepts for continuous dynamical systems, such as invariant sets and domains of attraction.
BibTeX:
@inproceedings{1545725,
  author = {Dubey, Abhishek},
  title = {Algorithms for Synthesizing Safe Sets of Operation for Embedded Systems},
  booktitle = {ECBS '09: Proceedings of the 2009 16th Annual IEEE International Conference and Workshop on the Engineering of Computer Based Systems},
  publisher = {IEEE Computer Society},
  year = {2009},
  pages = {149--155},
  doi = {10.1109/ECBS.2009.43}
}
Balasubramanian J, Gokhale A, Wolf F, Dubey A, Lu C, Gill C and Schmidt DC (2009). Resource-Aware Deployment and Configuration of Fault-tolerant Real-time Systems, 10/2009, 2009. (ISIS-09-109)
BibTeX:
@article{4121,
  author = {Balasubramanian, Jaiganesh and Aniruddha Gokhale and Friedhelm Wolf and Dubey, Abhishek and Lu, Chenyang and Gill, Chris and Schmidt, Douglas C.},
  title = {Resource-Aware Deployment and Configuration of Fault-tolerant Real-time Systems},
  year = {2009},
  number = {ISIS-09-109}
}
Dubey A, Mahadevan N and Kereskenyi R (2009). Reflex and Healing Architecture for Software Health Management, In International Workshop on Software Health Management, IEEE conference on Space Mission Challenges for Information Technology., 07/2009, 2009.
Abstract: This paper discusses the applicability of reflex and healing architecture for implementing software health management in complex ‘system of systems’, such as those used in interplanetary space missions.
BibTeX:
@conference{4122,
  author = {Dubey, Abhishek and Mahadevan, Nagabhushan and Kereskenyi, Robert},
  title = {Reflex and Healing Architecture for Software Health Management},
  booktitle = {International Workshop on Software Health Management, IEEE conference on Space Mission Challenges for Information Technology},
  year = {2009},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Paper_0.pdf}
}
Dubey A (2009). A Discussion on Supervisory Control Theory in Real-Time Discrete Event Systems, 11/2009, 2009. (ISIS-09-112)
BibTeX:
@article{4136,
  author = {Dubey, Abhishek},
  title = {A Discussion on Supervisory Control Theory in Real-Time Discrete Event Systems},
  year = {2009},
  number = {ISIS-09-112}
}
Dubey A, Karsai G, Kereskenyi R and Mahadevan N (2009). Towards a Real-time Component Framework for Software Health Management Nashville, 11/2009, 2009. (ISIS-09-111)
Abstract: The complexity of software in systems like aerospace vehicles has reached the point where new techniques are needed to ensure system dependability. Such techniques include a novel direction called textquoteleftSoftware Health Managementtextquoteright (SHM) that extends classic software fault tolerance with techniques borrowed from System Health Management. In this paper the initial steps towards building a SHM approach are described that combine component-based software construction with hard real-time operating system platforms. Specifically, the paper discusses how the CORBA Component Model could be combined with the ARINC-653 platform services and the lessons learned from this experiment. The results point towards both extending the CCM as well as revising the ARINC-653
BibTeX:
@article{4137,
  author = {Dubey, Abhishek and Karsai, Gabor and Kereskenyi, Robert and Mahadevan, Nagabhushan},
  title = {Towards a Real-time Component Framework for Software Health Management},
  year = {2009},
  number = {ISIS-09-111}
}
Dubey A, Mehrotra R, Abdelwahed S and Tantawi A (2009). Performance Modeling of Distributed Multi-tier Enterprise Systems. SIGMETRICS Perform. Eval. Rev.. New York, NY, USA, October, 2009. Vol. 37(2), pp. 9-11. ACM.
BibTeX:
@article{Dubey:2009:PMD:1639562.1639566,
  author = {Dubey, Abhishek and Mehrotra, Rajat and Abdelwahed, Sherif and Tantawi, Asser},
  title = {Performance Modeling of Distributed Multi-tier Enterprise Systems},
  journal = {SIGMETRICS Perform. Eval. Rev.},
  publisher = {ACM},
  year = {2009},
  volume = {37},
  number = {2},
  pages = {9--11},
  url = {http://doi.acm.org/10.1145/1639562.1639566},
  doi = {10.1145/1639562.1639566}
}
Dubey A (2009). Towards Dynamic CPU Demand Estimation in Multi-Tiered Web Setup
Abstract: The complexity of software in systems like aerospace vehicles has reached the point where new techniques are needed to ensure system dependability. Such techniques include a novel direction called ‘Software Health Management’ (SHM) that extends classic software fault tolerance with techniques borrowed from System Health Management. In this paper, the initial steps towards building a SHM approach are described that combine component-based software construction with hard real-time operating system platforms. Specifically, the paper discusses how the CORBA Component Model could be combined with the ARINC-653 platform services and the lessons learned from this experiment. The results point towards both extending the CCM as well as revising the ARINC-653.
BibTeX:
@article{SWHM31,
  author = {Abhishek Dubey},
  title = {Towards Dynamic CPU Demand Estimation in Multi-Tiered Web Setup},
  year = {2009},
  url = {https://wiki.isis.vanderbilt.edu/mbshm/images/3/3e/TechReport2009.pdf}
}

2008

Dubey A, Nordstrom S, Keskinpala T, Neema S, Bapty T and Karsai G (2008). Towards A Model-Based Autonomic Reliability Framework for Computing Clusters, In 5th IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe). , pp. 75-85.
Abstract: One of the primary problems with computing clusters is to ensure that they maintain a reliable working state most of the time to justify economics of operation. In this paper, we introduce a model-based hierarchical reliability framework that enables periodic monitoring of vital health parameters across the cluster and provides for autonomic fault mitigation. We also discuss some of the challenges faced by autonomic reliability frameworks in cluster environments such as non-determinism in task scheduling in standard operating systems such as Linux and need for synchronized execution of monitoring sensors across the cluster. Additionally, we present a solution to these problems in the context of our framework, which utilizes a feedback controller based approach to compensate for the scheduling jitter in non real-time operating systems. Finally, we present experimental data that illustrates the effectiveness of our approach.
BibTeX:
@inproceedings{Ease08,
  author = {Dubey, Abhishek and Nordstrom, Steve and Keskinpala, Turker and Neema, Sandeep and Bapty, Ted and Karsai, Gabor},
  title = {Towards A Model-Based Autonomic Reliability Framework for Computing Clusters},
  booktitle = {5th IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe)},
  year = {2008},
  pages = {75--85},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Dubey_A_4_0_2008_Towards_A_.pdf}
}
Dubey A, Neema S, Kowalkowski J and Singh A (2008). Scientific Computing Autonomic Reliability Framework, In ESCIENCE '08: Proceedings of the 2008 Fourth IEEE International Conference on eScience. Washington, DC, USA , pp. 352-353. IEEE Computer Society.
Abstract: Large scientific computing clusters require a distributed dependability subsystem that can provide fault isolation and recovery and is capable of learning and predicting failures, to improve the reliability of scientific workflows. In this paper, we outline the key ideas in the design of a Scientific Computing Autonomic Reliability Framework (SCARF) for large computing clusters used in the Lattice Quantum Chromo Dynamics project at Fermi Lab.
BibTeX:
@inproceedings{escience08,
  author = {Dubey, Abhishek and Neema, Sandeep and Kowalkowski, Jim and Singh, Amitoj},
  title = {Scientific Computing Autonomic Reliability Framework},
  booktitle = {ESCIENCE '08: Proceedings of the 2008 Fourth IEEE International Conference on eScience},
  publisher = {IEEE Computer Society},
  year = {2008},
  pages = {352--353},
  doi = {10.1109/eScience.2008.113}
}

2007

Nordstrom S, Dubey A, Keskinpala T, Datta R, Neema S and Bapty T (2007). Model Predictive Analysis for Autonomic Workflow Management in Large-scale Scientific Computing Environments, In 4th IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe). , pp. 37-42.
Abstract: In large scale scientific computing, proper planning and management of computational resources lead to higher system utilizations and increased scientific productivity. Scientists are increasingly leveraging the use of business process management techniques and workflow management tools to balance the needs of the scientific analyses with the availability of computational resources. However, the advancements in productivity from execution of workflows in a large scale computing environments are often thwarted by runtime resource failures. This paper presents our initial work toward autonomic model based fault analysis in workflow based environments.
BibTeX:
@inproceedings{Ease07,
  author = {Steve Nordstrom and Abhishek Dubey and Turker Keskinpala and Rahul Datta and Sandeep Neema and Ted Bapty},
  title = {Model Predictive Analysis for Autonomic Workflow Management in Large-scale Scientific Computing Environments},
  booktitle = {4th IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe)},
  year = {2007},
  pages = {37--42},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/WorkflowML.pdf}
}
Dubey A, Nordstrom S, Keskinpala T, Neema S, Bapty T and Karsai G (2007). Towards a verifiable real-time, autonomic, fault mitigation framework for large scale real-time systems. Innovations in Systems and Software Engineering. Vol. 3(1), pp. 33-52. Springer-Verlag.
Abstract: Designing autonomic fault responses is difficult, particularly in large-scale systems, as there is no single ‘perfect’ fault mitigation response to a given failure. The design of appropriate mitigation actions depend upon the goals and state of the application and environment. Strict time deadlines in real-time systems further exacerbate this problem. Any autonomic behavior in such systems must not only be functionally correct but should also conform to properties of liveness, safety and bounded time responsiveness. This paper details a real-time fault-tolerant framework, which uses a reflex and healing architecture to provide fault mitigation capabilities for large-scale real-time systems. At the heart of this architecture is a real-time reflex engine, which has a state-based failure management logic that can respond to both event- and time-based triggers. We also present a semantic domain for verifying properties of systems, which use this framework of real-time reflex engines. Lastly, a case study, which examines the details of such an approach, is presented.
BibTeX:
@article{secondkey,
  author = {Dubey, Abhishek and Nordstrom, Steve and Keskinpala, Turker and Neema, Sandeep and Bapty, Ted and Karsai, Gabor},
  title = {Towards a verifiable real-time, autonomic, fault mitigation framework for large scale real-time systems},
  journal = {Innovations in Systems and Software Engineering},
  publisher = {Springer-Verlag},
  year = {2007},
  volume = {3},
  number = {1},
  pages = {33-52},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Dubey_A_1_24_2007_Towards_a_.pdf},
  doi = {10.1007/s11334-006-0015-7}
}

2006

Dubey A, Nordstrom S, Keskinpala T, Neema S and Bapty T (2006). Verifying autonomic fault mitigation strategies in large scale real-time systems, In 3rd IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe). , pp. 129-140.
Abstract: In large scale real-time systems many problems associated with self-management are exacerbated by the addition of time deadlines. In these systems any autonomic behavior must not only be functionally correct but they must also not violate properties of liveness, safety and bounded time responsiveness. In this paper we present and analyze a real-time Reflex Engine for providing fault mitigation capability to large scale real time systems. We also present a semantic domain for analyzing and verifying the properties of such systems along with the framework of real-time reflex engines.
BibTeX:
@inproceedings{Ease06,
  author = {Abhishek Dubey and Steve Nordstrom and Turker Keskinpala and Sandeep Neema and Ted Bapty},
  title = {Verifying autonomic fault mitigation strategies in large scale real-time systems},
  booktitle = {3rd IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe)},
  year = {2006},
  pages = {129--140},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Dubey_A_3_0_2006_Verifying_.pdf}
}
Nordstrom S, Dubey A, Keskinpala T, Bapty T and Neema S (2006). Ghost: Guided healing and optimization search technique for healing large-scale embedded systems, In 3rd IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe). , pp. 129-140.
Abstract: Reflex and healing architectures have been shown to provide adequate user-defined initial failure mitigation behaviors in the presence of system faults. What is lacking, however, is a user-guided means of healing the system after the initial reflexes have been enacted. This process should be autonomic in the sense that new system configurations can be achieved by defining a priori only a small set of criteria to which the healed system should conform. What follows is an explanation of this technique for guided healing which allows system designers to direct the healing process from a higher level in such a way that the resulting system configurations satisfy their particular needs. A brief example outlining the application of this approach is given.
BibTeX:
@inproceedings{GhostEase06,
  author = {Steve Nordstrom and Abhishek Dubey and Turker Keskinpala and Ted Bapty and Sandeep Neema},
  title = {Ghost: Guided healing and optimization search technique for healing large-scale embedded systems},
  booktitle = {3rd IEEE International Workshop on Engineering of Autonomic & Autonomous Systems (EASe)},
  year = {2006},
  pages = {129--140},
  doi = {10.1109/EASE.2006.8}
}
Koo TJ, Wu X, Su H, Chen J and Dubey A (2006). ReachLab: Computation Platform for the Analysis of Hybrid Automata, In 9th International Workshop on Hybrid Systems: Computation and Control (HSCC'06).
BibTeX:
@inproceedings{HSCC06,
  author = {Takkuen John Koo and Xianbin Wu and Hang Su and Jie Chen and Abhishek Dubey},
  title = {ReachLab: Computation Platform for the Analysis of Hybrid Automata},
  booktitle = {9th International Workshop on Hybrid Systems: Computation and Control (HSCC'06)},
  year = {2006}
}
Nordstrom S, Bapty T, Neema S, Dubey A and Keskinpala T (2006). A guided explorative approach for autonomic healing of model based systems, In Second IEEE conference on Space Mission Challenges for Information Technology (SMC-IT)., July, 2006.
Abstract: Embedded computing is an area in which many of the Self-* properties of autonomic systems are desirable. Model based tools for designing embedded systems, while proven successful in many applications, are not yet applicable toward building autonomic or self-sustaining embedded systems. This paper reports on the progress made by our group in developing a model based toolset which specifically targets the creation of autonomic embedded systems.
BibTeX:
@inproceedings{SMCIT06,
  author = {Steve Nordstrom and Ted Bapty and Sandeep Neema and Abhishek Dubey and Turker Keskinpala},
  title = {A guided explorative approach for autonomic healing of model based systems},
  booktitle = {Second IEEE conference on Space Mission Challenges for Information Technology (SMC-IT)},
  year = {2006},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Nordstrom_SG_7_0_2006_A_Guided_E.pdf}
}
Keskinpala T, Dubey A, Nordstrom S, Bapty T and Neema S (2006). A model driven tool for automated system level testing of middleware, In Fourth System Testing and Validation Workshop (STV).
Abstract: This paper presents a contribution to the challenges of manually creating test configurations and deployments for high performance distributed middleware frameworks. We present our testing tool based on the Model Integrated Computing (MIC) paradigm and describe and discuss its generative abilities that can be used to generate many test configurations and deployment scenarios from high-level system specifications through model replication.
BibTeX:
@inproceedings{STV06,
  author = {Turker Keskinpala and Abhishek Dubey and Steve Nordstrom and Ted Bapty and Sandeep Neema},
  title = {A model driven tool for automated system level testing of middleware},
  booktitle = {Fourth System Testing and Validation Workshop (STV)},
  year = {2006},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/STV06Paper_Final_TK.pdf}
}

2005

Dubey A, Wu X, Su H and Koo TJ (2005). Computation Platform for Automatic Analysis of Embedded Software Systems Using Model Based Approach. Lecture Notes in Computer Science., In ATVA. Vol. 3707, pp. 114-128.
Abstract: In this paper, we describe a computation platform called ReachLab, which enables automatic analysis of embedded software systems that interact with continuous environment. Algorithms are used to specify how the state space of the system model should be explored in order to perform analysis. In ReachLab, both system models and analysis algorithm models are specified in the same framework using Hybrid System Analysis and Design Language (HADL), which is a meta-model based language. The platform allows the models of algorithms to be constructed hierarchically and promotes their reuse in constructing more complex algorithms. Moreover, the platform is designed in such a way that the concerns of design and implementation of analysis algorithms are separated. On one hand, the models of analysis algorithms are abstract and therefore the design of algorithms can be made independent of implementation details. On the other hand, translators are provided to automatically generate implementations from the models for computing analysis results based on computation kernels. Multiple computation kernels, which are based on specific computation tools such as d/dt and the Level Set toolbox, are supported and can be chosen to enable hybrid state space exploration. An example is provided to illustrate the design and implementation process in ReachLab.
BibTeX:
@article{lncs05,
  author = {Abhishek Dubey and Xianbin Wu and Hang Su and Takkuen John Koo},
  title = {Computation Platform for Automatic Analysis of Embedded Software Systems Using Model Based Approach},
  booktitle = {ATVA},
  journal = {Lecture Notes in Computer Science},
  year = {2005},
  volume = {3707},
  pages = {114-128},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Dubey_A_10_4_2005_Computatio.pdf}
}