The Knowledge Engineering Group (KEG) is addressing problems for knowledge extraction, representation, storage and management that the information era has brought in various segments of human activity due to data overload.

Fundamental theoretical aspects: dealing with problem-specific features extraction from both structured data, pre-processing techniques for handling noisy and/or incomplete data, learning from balanced/unbalanced and structured/unstructured data.

Practical approaches:
  • Data Mining application prototypes for both structured data (assisted medical diagnosis, spam detection, signature recognition) and unstructured data (topic extraction, opinion mining, community detection, semi-supervised text labelling, contradiction detection).
  • Business Intelligence application prototypes dealing with heterogeneous data integration by ontology-driven, (semi-) automatic design of unified data structures and automatic design of the corresponding ETL processes.
In the Knowledge extraction from data field we proposed techniques for:
  • handling incomplete records and irrelevant and/or redundant pieces of information, imbalanced class distribution and error costs
  • identifying the right performance metric given the context, algorithm and model selection
  • schema mapping and data fusion
  • context-sensitive IR from unstructured sources
  • community detection and opinion mining
The proposed techniques have been applied for developing prototype solutions in various areas:
  • Recommendation systems - context sensitive, semantic driven recommendation systems for online advertisement and tourism
  • Topic extraction and representation - identifying the topic polarity in a given document; projecting (very) large (un)structured data to relevant dimensions and providing representation to allow knowledge extraction
  • Community detection- identifying clusters from implicit and/or explicit connections; community detection social data; opinion driven community detection
  • User profiling - finding groups of individuals with similar features, finding/defining patterns for various profiles, predicting trends and future behaviour applied to the educational domain
  • Contradiction Detection - opinion mining driven contradiction detection
  • Medical decision support systems - assisting medical diagnosis in prostate cancer and rheumatoid diseases


SEArCH - Adaptive eLearning Systems using Concept Maps
National grant funded by CNMP Program 4: Research partnership for priority domains, (2008-2011)
The goal of the project is to define a model of an adaptive e-learning environment, using Concept Maps. Adaptive e-learning systems are the newest paradigm in modern learning approaches. Adaptive presentation refers to content segmentation and management according to the student particularities and goals, and is based on identification of the user's type. One of the key factors in such systems is the correct and continuous identification of the user learning style, to provide the most appropriate content presentation to each individual user. The means of attaining such objectives are the initial evaluation of the user for identification of style and level of expertise. Based on those measurements, the content is presented according to the type, providing an initial curriculum segmentation and adapted presentation. During the learning process, based on dynamic on-going measurements, the user evaluation is continuously refined, in the attempt of fitting the best the particular needs. Thus, the model ultimate goal is to correct identify user's type, and continuously adapt the content (both in quntity and difficulty) according to its type. Currently, we have investigated various ways for identifying the initial user typology, based on static features. We proposed two solutions: using a Bayesian network, and by employment of a clustering method to determine the different groups of learning typologies, corresponding to the theoretical learning styles present in literature, based on the pretest (psicho-pedagogical).

ArhiNet - Integrated System for developing semantically-enhanced archive content
National grant funded by CNMP Program 4: Research partnership for priority domains,(2007-2010)
This inter-disciplinary project addresses the study, development and management of interactive e-content for digital enhancement of cultural heritage. The project aims at the study and development of an integrated system for creating and managing archival content based on semantic enhancements. The domain ontology enhanced content allows for semantically relevant information retrieval. The project also aims at the development of an information mining subsystem and reasoning mechanisms to identify new correlations that will be added to the domain knowledge.

IntelPro - Intelligent system for assisting the therapeutically decision at patients with prostate cancer
National research grant funded by ANCS, CEEX - INFOSOC, (2005-2008)
The goal of our task in the project was to provide robust solutions which can be used to assist the physicians in the diagnosis of prostate cancer, or as support in the learning process. The data-mining system speeds up the diagnosis process and improves the accuracy of the diagnosis. The system could be extended to suggest possible treatment, or courses of action in a particular case. It is not intended to replace the physician, but to support him. The developed components have tried to tackle some of the particularities involved in mining medical problems. Although the techniques we adopted so far are aimed at "solving" prostate cancer problems, they are not restricted to this field. The methods can be extended to different medical problems, or we can go even further and apply them in areas like loan applications, oil-slick detection, and so on.

GridMOSI - Virtual Organization using Grid Technology for High Performance Modeling, Simulation and Optimization
National research grant funded by ANCS, CEEX, (2005-2008)