Managing the Scarcity of Monitoring Data through Machine Learning in Healthcare Domain

Maxhuni, Alban

In the field of Ubiquitous Computing, a significant problem of building accurate machine learning models is the effort and time consuming process to gather labeled data for the learning algorithm. Moreover, efficient data use demands are constantly growing. These demands for efficient data use are growing constantly. Researchers are therefore exploring the use of machine learning techniques to overcome the problem of data scarcity. In healthcare, classification tasks require a ground truth normally provided by an expert physician, ending up with a small set of labeled data with a larger set of unlabeled data. It is also common to rely on self-reported data through questionnaires, however, this introduce an extra burden to the user who is not always able or willing to fill in. Finally, in some healthcare domains it is important to be able to provide immediate response (feedback), even if the user is not familiarized with the use of an application. In all of these cases the amount of available data may be insufficient to produce reliable models. This thesis proposes a new approach specifically designed for the challenges in producing better predictive models. We propose using our novel Intermediate Models to predict the mood variables associated with the questionnaire using data acquired from smartphones. Then, we use the predicted mood variables with the rest of the data to predict the class, in our empirical assessment, the state mood of a bipolar disorder patient or stress levels of employees have been used. The motivation behind this new approach is that there are relevant proposed methods such as latent variables used as intermediate information helping to create better predictive models. These methods are used in literature to complete the missing data using the most common value, the most probable value given the class, or induce a model for predicting missing values using all the information from features and the class. However, these variables are artificially created and used as intermediate information to build better model. In our Intermediate Models, we know in advance how many mood variables to use and we have the information from these variables, which allow us to produce better models. To address scarce data, we propose applying a semi-supervised learning setting while taking advantage of the presence of all unlabeled datasets. In addition, we propose using transfer learning methods that is used to improve the learning performance with the aim at avoiding expensive data labeling efforts. To the best of our knowledge, there are few works that have used transfer learning for healthcare applications to address the problem of limited labeled data. The proposed methods have been applied in two different healthcare fields: mental-health and human behaviour field. This thesis addresses two classification problems, a) classification of episodic state of bipolar disorder patients, and b) detecting work-related stress using data acquired from smartphone sensing modalities.

Managing the Scarcity of Monitoring Data through Machine Learning in Healthcare Domain / Maxhuni, Alban. - (2017), pp. 1-180.