In a new paper, Pegah Alizadeh, professor at ESILV and the Devinci Research Center member, proposes a new method to decomplexify classification problems in machine learning.
The paper named “Clustering Approach to Solve Hierarchical Classification Problem Complexity” has been published in AAAI conference on artificial intelligence.
This research event is known as a A* ranking conference with around 25% of acceptance rate.
The work’s authors are Pegah Alizadeh, a member of the Digital Group from Devinci Research Center, and two other researchers from the LIPN laboratory of Sorbonne Paris Nord university.
Applying classification knowledge to Sussex Huawei Locomotion dataset
Classification is a prevalent supervised problem in machine learning: the process of classifying, distinguishing, and distributing kinds of “things” into different groups.
When the classification domain is so large or if we have complex types of data such as signals, texts, or genomes, usual machine learning or deep learning approaches are not enough for the classification.
In this work, we more specifically concentrate on the problem of human activity recognition on the Sussex Huawei Locomotion (SHL) dataset.
The dataset is a versatile annotated dataset of modes of locomotion and transportation of mobile users and was recorded over seven months by 3 participants engaging in 8 different modes of transportation in a real-life setting in the United Kingdom: standing or sitting (still), walking, run, bike, car, bus, train, and subway.
Based on our experiences in similar cases, learning separable spaces between groups of classes (concepts) is easier than learning each class alone.
For example, it is easier to learn groups of activities related to the body movements group (running, walking) versus the” on-wheels” activities group (bicycling, driving a car) first.
Later we concentrate on each group separately and learn more specific classes inside each of these groups.
An intuitive approach is to compute all the combinations of the hierarchical form and choose the one with the best performance. However, the theoretical analysis proposed by this paper shows a high complexity for finding an exact solution.
A new approach to perform classification
For this reason, this paper proposes an original approach based on the association of clustering and classification approaches to overcome this limitation.
It presents a better approach to learning the concepts by grouping classes in a recursive manner rather than class by class approach.
It also introduces an effective greedy algorithm and two theoretical measures (namely cohesion and dispersion) to evaluate the connection between the clusters and the classes. Extensive experiments on the Sussex Huawei Locomotion (SHL) dataset show that this approach improves classification performances while reducing the number of instances used to learn each concept.