Extensible Cyber Physical Systems

My research focuses on design tools, programming abstractions, platforms and analytical techniques for designing and managing low latency, resilient, extensible platforms for Cyber-Physical Systems (CPS). In particular, I am interested in the application of my work to Industrial Internet of Things (IIoT) and smart city technologies. Extensibility and smartness are two pillars of these emerging paradigms. Extensibility implies the flexibility to grow or shrink with respect to the set of (1) services supported by a group of smart applications and (2) hardware resources used to provide these services while satisfying the constraints imposed by the overall cyber physical system. This flexibility is critical to create a feasible architecture that can be self-reliant over long and continuous operating conditions and can be shared by multiple CPS applications simultaneously. Smartness implies the ability to provide services that become better over time as more data become available. The ability to disseminate, process, and analyze data efficiently, scalably, and securely in real-time are crucial to achieving this vision, which has numerous benefits to society: correct and reliable smart applications, especially when deployed citywide, can improve several areas, such as traffic, health and power management. Of particular interest is the application of these techniques to smart city operations, focusing on two domains specifically: transportation and smart grid. Overall, my current research can be broken down into three related categories:

Failure Diagnostics

Understanding the failure propagation dynamics across the dynamic cyber physical architecture is crucial. For example, in one of the projects we are developing hierarchical failure propagation models for understanding failure dynamics in smart grid and using that information for online fault diagnostics and prognostics. The uniqueness of the approach is in that it does not involve complex real-time computations involving high-fidelity models, but performs reasoning using efficient graph algorithms based on the observation of various anomalies in the system. Such approaches to fault management, if successful, will improve the effectiveness of isolating failures in large-scale systems such as Smart Electric Grids, by identifying impending failure propagations, and determining the time to critical failure, which can increase the system reliability and reduce the losses accrued due to power failures.

Autonomous Recovery Techniques

In this area, I am developing robust recovery techniques to mitigate the effect of failures. However, this is challenging due to a number of reasons. First of all, the set of available physical resources that are part of the platform can change over time either due to physical constraints or due to failures or due to addition or removal of resources. Second by design the system is heterogeneous wherein different sensors and computation platform provide different resources and capabilities. Third, the system must be able to handle both internal faults as well as environmental changes while ensuring that the cyber physical application properties and requirements including strict timing requirements are met. Our approach to this problem to date has led to the development of a novel two level mitigation approach called reflex and healing. The first level of mitigation is provided by preprogrammed distributed agents called reflex engines. A reflex engine uses a preprogrammed timed state machine to deliver reactive and possibly preemptive action in response to expiry of time and/or system events. These engines are often arranged in a hierarchical management structure, where in the control moves to upper layer if the faults cannot be arrested at a lower layer. The healing layer is a system level mitigation capability. In a recent project, we have developed techniques to use the goals of the system and a function decomposition tree to dynamically search for alternative solutions based on security and resource constraints and then reconfigure the system accordingly within the temporal deadlines.

Resilient Platform and Robust Programming Abstractions

There are a number of challenges that need to be overcome in order to be able to transition to a scalable distributed architecture that can be used as the foundation for building these next generation extensible smart systems. One of the fundamental challenges is ensuring correctness of the applications as they operate in the physical environment. Given the dynamic nature of the system, we cannot rely on adhoc development strategies. My research has been focused on developing robust design patterns and platform abstractions that help in both correct by construction design and applications and correct integration and operation of multiple applications together. Key contributions include a robust component model for building cyber-physical applications, and spatial and temporal separation among the different system components, guaranteeing fault isolation. Another important challenge is to ensure privacy and security. For that, we have developed mechanisms to encorce fine-grained privileges across a distributed and dynamic platform for controlling access to system services and a novel Multi-Level Security (MLS) information sharing policy across applications.

An extension of this framework currently under research and development is the CHARIOT (Cyber-pHysical Application aRchItecture with Objective-based reconfiguraTion) framework. It provides a universal cyber-physical component model that allows distributed CPS applications to be constructed using software components and hardware devices without being tied down to any specific platform or middleware. This solves one of the key problems facing the IIoT - integration of heterogeneous subsystems. A unique aspect of CHARIOT is that it allows the system to be described using a modular objective decomposition approach, where in each objective is mapped to one or more data workflows. This function to component association enables us to assess the impact of individual failures on the system objectives and reconfigure the system at runtime.