KEG Research

Domains of interest

Data Engineering

Practical Approaches:

Data Mining application prototypes for both structured data (forecasting time series, generating synthetic data, assisted medical diagnosis, spam detection, signature recognition) and unstructured data (knowledge extraction from IoT big data, opinion mining, community detection, semi-supervised text labelling, contradiction detection).
Business Intelligence application prototypes dealing with heterogeneous data integration by ontology-driven, (semi-) automatic design of unified data structures and automatic design of the corresponding ETL processes.

In the field of Knowledge Extraction from Data, we proposed techniques for:

Handle incomplete records and eliminate irrelevant or redundant information, managing imbalanced class distributions and error costs.
Address complexities associated with Big Data, including algorithm selection, model optimization, and performance metric identification.
Forecast time series data, generate artificial datasets, and perform schema mapping and data fusion.
Perform context-sensitive information retrieval from unstructured sources, as well as community detection and opinion mining.

The proposed techniques have been applied for developing solutions in various areas:

Near real-time sensors data processing: We designed a near real-time sensor management solution, focusing on technological specifics to ensure efficient data handling and processing for rapid decision-making.
Recommendation Systems: Our scalable recommendation system leverages machine learning to forecast device usage based on historical sensor data. We have also developed context-sensitive, semantic-driven recommendation systems tailored for applications like online advertising and tourism.
IoT device usage characterisation: Using machine learning, we created a novel solution to characterize IoT device usage patterns, enabling more efficient energy resource management through accurate usage profiling.
Forecasting time series data: Our forecasting methods support time series data with varying levels of intermittency, tested on both real-world and synthetically generated datasets
Topic Extraction and Representation: Identifying the topic polarity in a given document; projecting (very) large (un)structured data to relevant dimensions and providing representation to allow knowledge extraction.
Community Detection: Identifying clusters from implicit and/or explicit connections; community detection in social data; opinion-driven community detection.
User Profiling: We develop methods to group individuals by shared characteristics, uncover patterns across various profiles, and predict trends and behaviors, with applications in educational domains and IoT usage analytics.
Contradiction Detection: Opinion mining-driven contradiction detection.
Medical Decision Support Systems: Assisting medical diagnosis in prostate cancer and rheumatoid diseases.

Natural Language Processing

Practical Approaches:

Employing latest state-of-the-art methods for various NLP tasks such as text classification, named entity recognition and text generation, with approaches from the domain of Deep Learning.
Our work focuses both on a scientific research dimension, and on practical real world problems, as our proposed solutions are scaled and applied into industry.

Scientific topics of interest:

Enhancing and studying the logic reasoning capabilities of Large Language Models (LLMs) for Natural Language Inference (NLI) tasks in an unsupervised manner.
Exploring the cross-lingual capabilities of language models in low data settings, increasing the degree of transfer on the language level and task level.
Leveraging the inherent explanations generated by Large Multimodal Model (LMMs) for Autonomous Driving tasks, considering scene understanding, planning and behavioral reasoning.
Bias detection and mitigation in image generation for LMMs, considering multiple types of bias such as gender, religious or racial.
Knowledge distillation, employing smaller efficient models in order to limit training and inference costs, mitigating environmental impact.
Investigating explainability to enhance model transparency and improve robustness.

Prototypes and solutions delivered for real world problems:

AI Assistant for understanding and classifying multilingual online recipes for the cooking domain, the solution being used for smart ovens.
Sentiment Analysis for client conversations in the banking domain, focusing on the Romanian language.
AI Home Assistant for Romanian, trained on a wide range of tasks and intents.

Neuroscience

Practical Approaches:

Machine learning solutions for the analysis of neuronal data (including traditional and deep learning approaches) to further our understanding of the brain.

In the field of computational neuroscience, we proposed techniques for:

Spike sorting: Identifying individual neuron activity from extracellular recordings by isolating and classifying action potentials (spikes). This process is critical for understanding neural communication patterns and is often the first step in analyzing data from electrophysiological studies. Our group has designed and/or validated various methods for spike detection (source separation using GANs), feature extraction (weighted PCA, autoencoders, Superlet transform) and clustering (community detection-based methods, new methods).
Burst detection: Recognizing sequences of rapid spikes occurring closely in time, which indicate bursts of neuronal activity. These bursts are thought to play key roles in signal transmission and information processing within neural circuits. Our group has designed a new method for the detection of bursts.
LFP data analysis: Analyzing large-scale, coordinated fluctuations in local field potentials (LFPs), which reflect synchronized neural activity across populations of neurons. Our group has designed methods based on symbolic analysis and microstate analysis for the detection of anaethesia in LFP data.
EEG data analysis: The study of brain activity patterns through processing and interpreting electrical signals recorded from the scalp, which helps to understand neural dynamics, cognitive processes, and brain disorders by examining signal features like frequency, power, and event-related changes. Our group has designed methods based on symbolic analysis-based for the interpretation of EEG data and GAN-based for the enhancement of classification.
Functional networks: Mapping and examining networks of brain regions or neurons based on their activity correlations. Functional networks offer insights into how different parts of the brain interact dynamically to perform complex tasks and are essential for studying brain connectivity and functionality. Our group has focused on the study of graph neural networks within this direction of research, we have analysed methods for the extraction of relevant information from complete weighted graphs and approaches on how to infer graphs from unstructured data using graph neural networks.

For the neuroscience domain, our group collaborates with the Transylvanian Institute of Neuroscience (TINS), specifically with Dr. Eng. Raul-Cristian Muresan and Dr. Eng. Vlad Vasile Moca.

Projects

SEArCH - Adaptive eLearning Systems using Concept Maps

National grant funded by CNMP Program 4: Research partnership for priority domains, (2008-2011)

Homepage: http://search.utcluj.ro/

The goal of the project is to define a model of an adaptive e-learning environment, using Concept Maps. Adaptive e-learning systems are the newest paradigm in modern learning approaches. Adaptive presentation refers to content segmentation and management according to the student particularities and goals, and is based on identification of the user's type. One of the key factors in such systems is the correct and continuous identification of the user learning style, to provide the most appropriate content presentation to each individual user. The means of attaining such objectives are the initial evaluation of the user for identification of style and level of expertise. Based on those measurements, the content is presented according to the type, providing an initial curriculum segmentation and adapted presentation. During the learning process, based on dynamic on-going measurements, the user evaluation is continuously refined, in the attempt of fitting the best the particular needs. Thus, the model ultimate goal is to correct identify user's type, and continuously adapt the content (both in quantity and difficulty) according to its type. Currently, we have investigated various ways for identifying the initial user typology, based on static features. We proposed two solutions: using a Bayesian network, and by employment of a clustering method to determine the different groups of learning typologies, corresponding to the theoretical learning styles present in literature, based on the pretest (psycho-pedagogical).

ArhiNet - Integrated System for Developing Semantically-Enhanced Archive Content

National grant funded by CNMP Program 4: Research partnership for priority domains, (2007-2010)

Homepage: http://coned.utcluj.ro/ARHINET/arch.html

This inter-disciplinary project addresses the study, development and management of interactive e-content for digital enhancement of cultural heritage. The project aims at the study and development of an integrated system for creating and managing archival content based on semantic enhancements. The domain ontology-enhanced content allows for semantically relevant information retrieval. The project also aims at the development of an information mining subsystem and reasoning mechanisms to identify new correlations that will be added to the domain knowledge.

IntelPro - Intelligent System for Assisting the Therapeutic Decision at Patients with Prostate Cancer

National research grant funded by ANCS, CEEX - INFOSOC, (2005-2008)

Homepage: http://cv.utcluj.ro/intelpro/

The goal of our task in the project was to provide robust solutions which can be used to assist the physicians in the diagnosis of prostate cancer, or as support in the learning process. The data-mining system speeds up the diagnosis process and improves the accuracy of the diagnosis. The system could be extended to suggest possible treatments or courses of action in a particular case. It is not intended to replace the physician, but to support him. The developed components have tried to tackle some of the particularities involved in mining medical problems. Although the techniques we adopted so far are aimed at "solving" prostate cancer problems, they are not restricted to this field. The methods can be extended to different medical problems, or we can go even further and apply them in areas like loan applications, oil-slick detection, and so on.

GridMOSI - Virtual Organization Using Grid Technology for High Performance Modeling, Simulation and Optimization

National research grant funded by ANCS, CEEX, (2005-2008)

Homepage: http://wiki.gridmosi.ro/wiki/GridMOSI:Info

Datasets

Movie Reviews and Product Reviews - Amazon

The archive contains the following three datasets: Product Reviews, Movie Reviews, and Polarity Assignment Test Data, all containing data from amazon.com. Partial annotation performed by Alexandru Cristian Cosma, Vlad Vasile Itu, and Darius Suciu, 2014. Further information can be found in the Readme.txt files in the archive folders.

Datasets employed in: Alexandru Cristian Cosma, Vlad Vasile Itu, and Darius Suciu, "Unsupervised domain independent opinion extraction", awarded first prize at the Computer Science Students Conference 2014, CS Department, Technical University of Cluj-Napoca.

Download: Amazon Reviews Sentiment Analysis

Movie Reviews (Romanian)

The data have been manually collected from 4 different Romanian movie sites/blogs: filme-carti.ro, cineblog.info, procinema.ro, and filmblog.ro.

The reviews have been divided into two classes: positive and negative. The dataset contains 1000 documents: 500 positive and 500 negative. The data has been manually annotated for the task of sentiment analysis.

Datasets employed in: Roxana Russu and Oana Luminita Vlad, "Applying Opinion Mining Learning Techniques for Romanian Language", mention at the Computer Science Students Conference 2014, CS Department, Technical University of Cluj-Napoca.

Download: Movie Reviews Romanian