Gait Analysis

Description

Gait analysis (GA) toolkit is a machine learning-based module developed for detecting common activities of daily living (ADLs), such as walking, jogging, going upstairs, going downstairs, sitting, and standing. The GA toolkit contains a pre-trained model based on the smartphone acceleration dataset obtained from wearable inertial sensors.

Documentation

Inputs and Outputs

The trained gait analysis module takes sample readings of raw acceleration sensor signals in CSV or text format. The input data contains three axial acceleration values with timestamps and the subject ID. The output of the prediction consists of gait class scores for each of the six ADLs (walking, jogging, going upstairs, going downstairs, sitting, and standing) in JSON format containing the probability estimation of each ADL. The output of the prediction can also produce the gait class with the highest gait score.

Background Info

In the development of the GA toolkit, an array of machine learning methods (random forest, support vector machine, decision tree, logistic regression, etc.) and deep learning models (CNN, CNN-LSTM, DeepConvLSTM, etc.) were already trained and compared, among which the CNN-LSTM and random forest have shown nearly equal performance, outperforming all other trained models. Random forest was selected as the benchmark model capable of detecting both the dynamic and static activities at a lower error rate and relatively at a lower cost of computation. Before feeding the data to these models, the necessary preprocessing tasks (such as data cleaning, normalization, filtering, balancing, etc.) were performed on the raw sensor signals. Then, the fundamental step of data transformation into many short segments was performed, where the sensor signals were framed into partially overlapping windows. In the GA module, a window length of 6.4 seconds (128 samples) and an overlap of 1.25 seconds were applied for segmenting the raw sensor dataset. After segmentation, relevant time-domain features are extracted from each window, and the resulting feature vectors were used to train machine learning models. The final predictive model has been tuned to extract the best possible performance using an exhaustive grid search with cross-validation over a defined hyperparameter space. Fig. 1. shows the general process of the GA toolkit development pipeline.

Installation Ιnstructions

The GA toolkit has been packaged as a Docker image that can be accessed by running the following commands:

Login to container registry using guest account:
docker login gitlab.telecom.ntua.gr:5050 -u alameda_ai_toolkit_registry_user -p yuhuVWLPfRZfNKgSUJef
Run frontend docker image:
docker run -d -p 3000:3000 gitlab.telecom.ntua.gr:5050/alameda/alameda-source-code/ai-toolkit/ai-toolkit-registry/patientmanagement-sample:latest
Run backend docker image:
docker run -d -p 8000:8000 gitlab.telecom.ntua.gr:5050/alameda/alameda-source-code/ai-toolkit/ai-toolkit-registry/app-web:latest
Logout from container registry:
docker logout gitlab.telecom.ntua.gr:5050
Visit in browser:
localhost:8088/apidocs

Datasets & Samples

Currently, the GA toolkit model is trained using the WISDM dataset, which is publicly available here. This dataset has a total of 1098209 samples collected from 36 volunteer subjects as they performed six daily living activities: walking, jogging, upstairs, downstairs, sitting, and standing for a specific period. All the participants were wearing a built-in motion sensor of the smartphone in their front leg pocket during the experiment execution. An accelerometer sensor with a sampling frequency of 20 Hz was used to record each activity and the data collection process was supervised by a dedicated person to ensure the quality of data. The WISDM dataset is an imbalanced dataset where the walking activity takes up the most, reaching 38.6% while standing only accounts for 4.4%.

OpenAPI Documentation

View Swagger Documentation