Learning in Dynamic Environments

Two of the more common assumptions that applied machine learning researchers make when using an algorithm is that: (1) the training & testing data are sampled from a fixed - albeit unknown - probability distribution, and (2) there are an equal number of samples from all classes. The former is referred to as concept drift (a.k.a., learning in non-stationary environments) when new data are presented over time, and the latter is known as class imbalance. Our group is developing novel incremental learning algorithms to explicitly address problems related learning from concept drift and class imbalance, which has been largely understudied in the literature. Our current and future research include developing novel neural network based algorithms for such learning scenarios.

Selected Publications

Large Scale Feature Selection

There is an ever-increasing number of applications that are generating massive amounts of data, which are of high dimensionality. Not only is the cardinality of the data rapidly increasing, but also the dimensionality. Such applications include analysis of a vast amount of data generated by social networks, media networks, blogs, healthcare informatics and genomics to name a few. Unfortunately, not all of the features in the data informative or meaningful and we are completely unaware of which features are meaningful. Therefore, we need algorithms to extract the meaningful and informative features, while remaining scalable to a massive population of data. Our research in scalable feature selection is developing new algorithms to cope with such data.

Selected Publications

Machine Learning in Cybersecurity

Advances in multi-core computing systems, networking, mobile and smart devices, complex software and Internet have enabled the development of revolutionary capabilities that have served many fields. However, along with these advances, vulnerabilities in the computing systems stemming from failure to enforce the semantics of computation, have led to an ever-increasing number of attacks and their sophistication leading to heavy financial losses. Our research group is developing online and adaptive algorithms, known as self-protective agents, that monitor activity on a network with many applications running. Furthermore, our research also focuses on adversarial machine learning, which has broader impacts in cybersecurity.

Selected Publications

Finding Insights in Life Sciences with Machine Learning

Our group’s research applies machine learning algorithms developed in the lab to metagenomics and other areas of the life sciences. Metagenomics is the study of genetic material obtained directly from an environmental sample, which means that everything is sequenced from a sample (i.e., all of the organisms). We have applied our feature selection expertise to 16S and metagenomic data to help microbial ecologists determine the protein families and microorganisms that best differentiate between multiple phenotypes within an environmental study.

We have addressed the problem of inferring sparse time-varying networks from a set of under-sampled measurements. More formally, we proposed the Approximate Kernel RecONstruction (AKRON) Kalman filter to reconstruct these time varying networks from data collected from the different life stages of a fruitfly.

Selected Publications