Time series, conceived as a sequence of data points sorted in time order, prevail as a data structure in many different application areas such as telecommunications, finance, biomedicine, transportation and energy, among others. In order to extract valuable information from these specific sources of data and get predictive insights, the community has so far strived to derive machine learning models specially tailored to deal with time series. Among them, prediction, classification, clustering, motif discovery and anomaly detection have played a central role in the data science realm. The JRL has joined this upsurge of research by developing distance-based classification and clustering approaches for time series data, in which a measure of similarity among time series can be designed to boost the performance of the model by isolating the learning task from irrelevant characteristics of the time series (e.g. non-linear distorsions and lags over time). In particular we focus on similarity measures for online settings, namely, data-intensive scenarios where computational resources are stringently limited and thereby, similarity measures must be computed in an incremental fashion.
The term Big Data has gained progressive momentum during the last decade, due to the feasibility of collecting data from almost any source and analyzing to achieve data-based insights that enable cost and time reductions, new product developments, optimized offerings, or smart decision making, among others profits. In these Big Data scenarios, some characteristics may play a relevant role: it is not feasible to store the whole dataset, traditional algorithms cannot handle data produced at high rates, and changes in data distribution may occur during learning process. An increasing number of applications are based on these training data continuously available (stream learning), and applied to real scenarios, such as mobile phones, sensor networks, industrial process controls and intelligent user interfaces, among others. Some of these applications produce non-stationary data streams which are becoming increasingly prevalent, and where the process generating the data may change over time, producing changes in the patterns to be modeled (concept drift). This causes that predictive models trained over these streaming data become obsolete and do not adapt suitably to the new distribution. Especially in online learning scenarios, where only a single sample is provided to the learning algorithm at every time instant, there is a pressing need for new algorithms that adapt to these changes as fast as possible, while maintaining good performance scores. Online learning in the presence of concept drift has been a very hot topic during the last few years, and still remains under active debate in the community because of its numerous open challenges.
In the current scientific community, optimization problems receive much attention. We can find several kinds of optimization, such as continuous, linear, combinatorial, or numerical. The resolution of problems arisen in this field usually supposes a great intellectual and computational effort. Additionally, lots of optimization problems are easily applicable to, or directly drawn from, real world situations. For these reasons, many different methods have been proposed up to date to be applied to these problems.
The team that compose the JRL has a wide experience in the design and development of both heuristics and metaheuristics, specifically in those that fall inside the bio-inspired paradigm. Bio-inspired, or nature-inspired methods have a great popularity in the current scientific community, being the focused scope of many research contributions in the literature year by year. The rationale behind the acquired momentum by this broad family of methods lies on their outstanding performance evinced in hundreds of research fields and problem instances. In this regard many different inspirational sources can be found for these solvers, such as the behavioral patterns of bats, fireflies, corals, bees or cuckoos, as well as the mechanisms behind genetic inheritance, musical harmony composition or bacterial foraging.
Up to now, several application fields have been addressed by the JRL from the perspective of the bio-inspired optimization. Some of these fields are transportation, logistics, graph mining, traffic forecasting or robotics.
Lifelong Machine Learning (LML), also known as Lifelong Learning or Continual Learning or Continuous Learning, is an emerging paradigm that has gained momentum in the last years mainly for two main reasons: on the one hand, LML is closer to the human learning process than the traditional Machine Learning. While the later paradigm carries out an “isolated learning” without considering any previous learned knowledge for the future on a given dataset to generate a model, LML retains the knowledge learned in the past and uses it to help future learning, being capable of accumulating and exploiting this past knowledge, as humans do. On the other hand, it considers the world in a more realistic way: it is often too complex and composed of too many tasks, the labeling process of the real training data is not usually feasible due to its very labor-intensive and time-consuming characteristics, it is evolving constantly, and the real applications may generate huge volumes of data that are difficult to deal with. Under these circumstances, past knowledge accumulation and the ability to exploit this past knowledge is a key part in the LML process, not only for helping future learning, but also for collecting and labeling training data (self-supervision) and discovering new tasks (self-motivating) to learn in order to achieve autonomy in learning. Humans are very good at learning based on their prior knowledge, and this is what LML tries to imitate. Besides, forgetting is also necessary. While humans show the remarkable ability to learn different tasks without any of them negatively interfering with each other, LML approaches should achieve the same capability of retaining past knowledge without overriding and degrading the performance of the model.
Adversarial Machine Learning (AML) is an emerging research field that blends together machine learning and cybersecurity. Specifically, AML aims at designing robust machine learning algorithms capable of managing data instances (examples) that have been intelligently manipulated to mislead the algorithms and, in the process, attain some benefit from them. More and more machine learning algorithms are incorporated to our everyday lives, as exemplified by automated medical diagnosis, virtual personal assistants, fraud detection, self-driving cars and other scenarios. In these applications sensitive data are generated and fed to Machine Learning models, which are prone to adversarial attacks. In order to prevent models from being fooled, new learning algorithms must be carefully designed to minimize their vulnerability. To this end, the JRL is lately conducting research towards proposing new learning procedures that are robust against adversarial inputs.