Table of Contents

  1. Problem Statement
  2. System Overview
  3. Training Data
    3.1 Datasets Used
    3.2 Data Preprocessing
  4. Real-Time Data
    4.1 Data Collection
    4.2 Data Preprocessing
  5. Machine Learning Model
    5.1 Model Topologies Tried
    5.2 Model Topology Selection
    5.3 Model Training
  6. Real-Time Inference
    6.1 Inference for Model Development
    6.2 Inference for Demo
  7. Platforms Used
    7.1 Google Drive
    7.2 Google Colaboratory
    7.3 TensorFlow

Problem Statement

In this project, we are detecting human activities using machine learning model on the IMU data (accelerometer and gyroscope) collected from Apple Watch worn by the user on their dominant hand. We are targeting the detection of following human activities:

System Overview

The diagram below shows the overall approach used in this project:

Training Data

Datasets Used

We started our project with PAMAP2 dataset but the trained model’s accuracy was not good enough on the real-time data we collected. This was because PAMAP2 dataset is too small to train a large model successfully. Therefore, we tried with a bigger dataset (WISDM dataset) which has data for 51 users. This dataset was good enough to train a large model but the problem with this dataset is that after training, the model is used to predict activities from data collected in real-time which uses different sensors as compared to this dataset. So, to prevent the trained model from getting biased towards any particular dataset (i.e data collected from a particular type of sensor), we finally used a combination of WISDM and PAMAP2 datasets.

Link to the datasets: PAMAP2 WISDM

Raw dat files for PAMAP2 can be found here : Data/PAMAP2_Dataset/Protocol
Raw csv files for WISDM can be found here : Data/WISDM_Dataset/raw/watch

Data Preprocessing

Following are the various steps used in preprocessing the training datasets:

Our combined training data has 60 unique users which performed all the targeted activities. To perform validation and testing accurately, users to validate and test the model are not being used for training. The processed training dataset has been divided into following subsets:

This division of data in three subsets (training, validation and testing) has been done randomly. The following snippet of the data-processing script selects subjects randomly for the three subsets:

The final dataset used for training, validation and testing of the network has the following distribution:

Link to notebook used for PAMAP2 dataset preprocessing: Notebooks/Preprocess_PAMAP2.ipynb
Link to notebook used for WISDM+PAMAP2 dataset preprocessing: Notebooks/Preprocess_WISDM_PAMAP2.ipynb
Processed numpy files for PAMAP2 can be found here : Data/PAMAP2
Processed numpy files for PAMAP2+WISDM can be found here : Data/WISDM_PAMAP2

Real-Time Data

Data Collection

We are using the SensorLog app on Apple Watch to collect 6-DOF IMU data (accelerometer and gyroscope). The Apple Watch is worn by the user on their dominant hand. The SensorLog app can sample the data at a frequency of upto 100 Hz. Since our processed training data has a sampling frequency of 20 Hz, we are using the app to collect the IMU data at the same frequency. The app provides us data in the csv format. Once we have the csv files from the Apple Watch, these files are uploaded on Google Drive for further processing and inference.

Raw csv files can be found here : Data/Live_Data/raw

Data Processing

We run a preprocessing python script on the raw data that performs the following operations:

Link to notebook used for real-time data preprocessing: Notebooks/Preprocess_Live_Data.ipynb
Processed numpy files can be found here : Data/Live_Data/processed

Machine Learning Model

Model Topologies Tried

We experimented with multiple different neural network topologies like MLPs, CNNs, LSTMs and ConvLSTMs to perform this task of human activity recognition. Below is the list of network topologies we tried and their training and validation accuracies for 70 epochs.

Multi Level Perceptron (MLP)

A multilayer perceptron (MLP) is a class of feedforward artificial neural network (ANN). An MLP consists of at least three layers of nodes: an input layer, a hidden layer and an output layer. Its multiple layers and non-linear activation distinguishes data that is not linearly separable.

Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems. This is a behavior required in complex problem domains like machine translation, speech recognition, human activity recognition and more.

Convolutional Neural Networks (CNN)

A convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, brain-computer interfaces, and financial time series. CNNs are regularized versions of multilayer perceptrons. Multilayer perceptrons usually mean fully connected networks, that is, each neuron in one layer is connected to all neurons in the next layer. CNNs take advantage of the hierarchical pattern in data and assemble more complex patterns using smaller and simpler patterns. Therefore, on the scale of connectedness and complexity, CNNs are on the lower extreme.

Convolutional Long Short-Term Memory (ConvLSTM)

ConvLSTM is an integration of a CNN (Convolutional layers) with an LSTM. First, the CNN part of the model process the data and one-dimensional result feed an LSTM model.

Model Topology Selection

Below table shows various accuracies (training, validation, testing) and the activities which are correctly predicted from real-time data by the different network topologies:

Model Topology Training Accuracy Validation Accuracy Testing Accuracy Activities predicted correctly Real-Time Data
MLP 96.77 78.72 81.64 None
LSTM 98.04 79.68 82.63 None
CNN 99.26 88.74 92.60 All
ConvLSTM 99.47 89.23 93.32 Sitting & Eating

From this table, we can say that CNNs work best for us because of the following reasons:

Model Training

We tried training the model with different number of epochs and batch sizes and settled with 70 epochs with a batch size of 256. The model accuracy starts saturating in the range of 60-70 epochs. It takes around 4 minutes and 40 seconds to train and validate the model. After training, the best model is saved which is later on loaded for inference with the test data and real-time data.

Strategies used to improve the model’s accuracy:

Label Original Activities
Walking Walking
Sitting Sitting, Typing
Eating Eating Chips, Eating Pasta
Brushing Teeth Brushing Teeth

This merging of activities also helps us to increase the amount of data per label.

Link to notebook used for network training: Notebooks/WALG.ipynb

Real-Time Inference

Inference for Model Development

In order to verify the accuracy of our model on real-time data, we collected multiple data files for each target activity with two different subjects. This was done using the procedure explained in Real-Time Data section. These data files were used to make decisions in the model development process. These raw data files were preprocessed as explained in Real-Time Data - Data Preprocessing section.

Below is the snapshot of the script used for making inference:

Below are the results of the inference from the trained model:

Raw csv files can be found here : Data/Live_Data/raw
Processed numpy files can be found here : Data/Live_Data/processed

Inference for Demo

For final demo, we are using a different subject than the ones which were used during model development. Our subject performed all 4 target activities consecutively. We recorded a video with timestamps and collected sensor data using SensorLog app. Python script in “WALG_Demo.ipynb” notebook is used to first pre-process the raw data and then pretrained model is used to make predictions.

In order to filter the noise, predicted values are postprocessed by the same script. We are using a running window which compares the prediction for last three samples before giving a final processed prediction. Following is the snapshot of the script used for post-processing the predictions:

Link to the video demo.

Following are the predictions made by our model:

Link to raw data used for demo: Data/Live_Data/raw/2020_12_16_Suparno_All.csv
Link to processed data used for demo: Data/Live_Data/processed/X_2020_12_16_Suparno_All.npy
Link to the file storing timestamps for demo: Data/Live_Data/tstamps/2020_12_16_Suparno_All_tstamps.csv

Why Deep Learning for HAR?

In this project, we are using deep learning approach over other classical approaches because of the following reasons:

Platforms Used

Google Drive

We stored all our data on Google Drive. The Shared Drives feature helped us a lot in collaboration.

Google Colaboratory

All python scripting was done on Google Colab. Linking Drive with Colab allowed easy access to the data.

TensorFlow

All the model development and training was done using TensorFlow.