StairAmbulationHealthy2021 - A Stride Segmentation and Event Detection dataset with focus on stairs#

The dataset can be downloaded from here:

Note

The dataset only contains the healthy participants of the full dataset presented in the paper!

General information#

The dataset was recorded with Nilspod V2 sensors from Portabiles. One sensor was attached on the instep of each foot and one sensor was attached on the lower back. On loading the transform the coordinate systems of the foot-mounted IMUs to the gaitmap coordinate system.

coordinate system definition

We provide two tpcp.Dataset classes to access the data:

  1. StairAmbulationHealthy2021PerTest: This class allows to access all data and events for each of the performed gait tests individually.

  2. StairAmbulationHealthy2021Full: This class allows to access the entire recordings for each participant (two recordings per participant) independently of the performed gait tests.

In the following we will show the usage of both classes and the data that is contained within.

Warning

For this example to work, you need to have a global config set containing the path to the dataset. Check the README.md for more information.

StairAmbulationHealthy2021PerTest#

First we can simply create an instance of the dataset class and directly see the contained data points. Note, that we will enable the loading of all available data (pressure, baro, and hip sensor). You might want to disable that, to reduce the RAM usage and speed up the data loading.

from joblib import Memory

from gaitmap_datasets import StairAmbulationHealthy2021PerTest

dataset = StairAmbulationHealthy2021PerTest(
    include_pressure_data=True,
    include_baro_data=True,
    include_hip_sensor=True,
    memory=Memory("../.cache"),
)
dataset

StairAmbulationHealthy2021PerTest [520 groups/rows]

participant test
2 subject_01 slope_ascending_normal
3 subject_01 slope_descending_normal
4 subject_01 stair_flat_down_fast
5 subject_01 stair_flat_down_normal
6 subject_01 stair_flat_down_slow
... ... ...
555 subject_20 staircase_down_slow
556 subject_20 staircase_flying_normal
557 subject_20 staircase_up_fast
558 subject_20 staircase_up_normal
559 subject_20 staircase_up_slow

520 rows × 2 columns



We can see that we have 20 participants and each of them has performed 26 gaittests on different level walking and stair configurations. For more information about the individual tests see the documentation of the dataset itself.

Using the dataset class, we can select any subset of tests and participants.

subset = dataset.get_subset(
    test=["stair_long_down_normal", "stair_long_up_normal"], participant=["subject_01", "subject_02"]
)
subset

StairAmbulationHealthy2021PerTest [4 groups/rows]

participant test
0 subject_01 stair_long_down_normal
1 subject_01 stair_long_up_normal
2 subject_02 stair_long_down_normal
3 subject_02 stair_long_up_normal


Once we have the selection of data we want to work with, we can iterate the dataset object to access the data of individual datapoints or just index it as below.

StairAmbulationHealthy2021PerTest [1 groups/rows]

participant test
0 subject_01 stair_long_down_normal


On this datapoint, we can now access the data. We will start with the metadata. It contains all the general information about the participant and the sensors.

{'subject_id': '001', 'gender': 'm', 'age': 30, 'height': 180, 'weight': 80, 'shoe_size': 43, 'sensor_ids': {'left_sensor': '6f13', 'right_sensor': '48b4', 'hip_sensor': '7fe5'}, 'fsr_ids': {'left_sensor': {'toe': {'id': 11, 'r_ref': 960}, 'mth': {'id': 12, 'r_ref': 1080}, 'heel': {'id': 14, 'r_ref': 1600}}, 'right_sensor': {'toe': {'id': 25, 'r_ref': 960}, 'mth': {'id': 23, 'r_ref': 1040}, 'heel': {'id': 13, 'r_ref': 1750}}}}

We can also access the imu data, the pressure data and the barometer data. All of them have an index that marks the seconds from the start of the individual test we selected.

hip_sensor right_sensor left_sensor
gyr_x gyr_y gyr_z acc_x acc_y acc_z gyr_x gyr_y gyr_z acc_x acc_y acc_z gyr_x gyr_y gyr_z acc_x acc_y acc_z
time [s]
0.000000 -0.673821 -2.219711 3.989576 -1.109224 -0.779031 3.269550 -1.708803 20.533450 13.178328 -4.148941 0.856966 7.934212 -27.653339 46.176474 1.623280 -5.973765 -1.405257 6.718803
0.004883 -0.148201 0.629284 3.921438 -1.242736 -0.927407 3.297521 4.232482 39.221054 8.136686 -4.662022 0.983592 7.465433 -32.374318 61.496262 6.074041 -7.198881 -1.264667 5.848837
0.009766 0.304408 2.350481 3.671904 -1.360925 -0.994182 3.301974 10.793392 51.340733 2.590154 -5.521944 0.932085 6.961508 -34.805837 63.604297 8.223882 -7.605480 -1.164228 5.691747
0.014648 0.980363 2.645559 3.613431 -1.436905 -1.038566 3.487607 17.245912 58.379854 -1.838347 -4.657856 1.070203 7.428502 -35.533192 59.861692 8.566129 -7.899871 -0.896148 5.474552
0.019531 1.588171 2.643522 3.801951 -1.475019 -1.050089 3.744858 25.502436 71.363512 -7.721585 -4.630227 1.252811 7.392234 -34.215627 51.900007 8.451064 -7.865341 -0.660739 5.492805


left_sensor right_sensor
toe_force mth_force heel_force total_force toe_force mth_force heel_force total_force
time [s]
0.000000 1.659813 6.777890 15.147939 23.585642 1.125175 4.705902 4.379144 10.210221
0.004883 1.553467 6.460682 16.192648 24.206797 1.116481 4.415445 4.623942 10.155869
0.009766 1.457679 6.101595 17.186031 24.745305 1.109248 4.183058 4.997029 10.289334
0.014648 1.366159 5.780482 18.726731 25.873372 1.106464 3.957821 5.470416 10.534701
0.019531 1.302267 5.527134 20.015014 26.844416 1.105327 3.695105 6.047230 10.847663


hip_sensor right_sensor left_sensor
baro baro baro
time [s]
0.000000 992.78 991.94 992.06
0.004883 992.78 991.94 992.06
0.009766 992.77 991.94 992.06
0.014648 992.77 991.94 992.07
0.019531 992.77 991.94 992.07


In addition we provide ground truth information for the event detection. All event data is provided in samples from the start of the test.

Note that we use a trailing _ to indicate that this is data calculated based on the ground truth and not just the IMU data.

First, manually labeled stride borders.

start end
s_id
589 318.0 567.0
591 567.0 788.0
593 788.0 996.0
595 996.0 1204.0
597 1204.0 1411.0
599 1411.0 1619.0
601 1619.0 1832.0
603 1832.0 2061.0
605 2061.0 2279.0
607 2279.0 2473.0
609 2473.0 2673.0
611 2673.0 2869.0
613 2869.0 3070.0
615 3070.0 3269.0
617 3269.0 3483.0
619 3483.0 3714.0
621 3714.0 3969.0


Second, the events extracted using the pressure-insole. Note, that the min_vel event is actually calculated based on the IMU data. For more information see the docstring of this property.

start end ic tc min_vel pre_ic
s_id
586 501 713 655 569 501 NaN
588 713 925 869 789 713 655.0
590 925 1131 1079 996 925 869.0
592 1131 1335 1285 1205 1131 1079.0
594 1335 1554 1492 1410 1335 1285.0
596 1554 1754 1699 1618 1554 1492.0
598 1754 2002 1927 1829 1754 1699.0
600 2002 2222 2161 2061 2002 1927.0
602 2222 2406 2354 2278 2222 2161.0
604 2406 2601 2553 2473 2406 2354.0
606 2601 2801 2751 2671 2601 2553.0
608 2801 3008 2945 2868 2801 2751.0
610 3008 3207 3148 3070 3008 2945.0
612 3207 3405 3349 3268 3207 3148.0
614 3405 3644 3567 3481 3405 3349.0
616 3644 3894 3811 3712 3644 3567.0
618 3894 4231 4096 3970 3894 3811.0


As further groundtruth we provide a label for each segmented stride that contains information about the height change during the stride. This information is derived by measuring the heights of the individual stair steps and labeling each stride based on video, to mark all strides that were performed on a specicic stair configuration.

{'left_sensor':        start     end        type  z_level
s_id
589    318.0   567.0       level      0.0
591    567.0   788.0  descending    -29.0
593    788.0   996.0  descending    -29.0
595    996.0  1204.0  descending    -29.0
597   1204.0  1411.0  descending    -29.0
599   1411.0  1619.0  descending    -29.0
601   1619.0  1832.0  descending    -29.0
603   1832.0  2061.0  descending    -29.0
605   2061.0  2279.0  descending    -14.5
607   2279.0  2473.0  descending    -29.0
609   2473.0  2673.0  descending    -29.0
611   2673.0  2869.0  descending    -29.0
613   2869.0  3070.0  descending    -29.0
615   3070.0  3269.0  descending    -29.0
617   3269.0  3483.0  descending    -29.0
619   3483.0  3714.0  descending    -29.0
621   3714.0  3969.0       level      0.0, 'right_sensor':        start     end        type  z_level
s_id
588    169.0   450.0       level      0.0
590    450.0   682.0  descending    -14.5
592    682.0   895.0  descending    -29.0
594    895.0  1106.0  descending    -29.0
596   1106.0  1309.0  descending    -29.0
598   1309.0  1513.0  descending    -29.0
600   1513.0  1720.0  descending    -29.0
602   1720.0  1950.0  descending    -29.0
604   1950.0  2183.0  descending    -14.5
606   2183.0  2377.0  descending    -29.0
608   2377.0  2576.0  descending    -29.0
610   2576.0  2773.0  descending    -29.0
612   2773.0  2972.0  descending    -29.0
614   2972.0  3170.0  descending    -29.0
616   3170.0  3378.0  descending    -29.0
618   3378.0  3593.0  descending    -29.0
620   3593.0  3842.0  descending    -14.5
622   3842.0  4131.0       level      0.0}

The same method used to access this information can also be used to filter the stride list (i.e. only level strides).

{'left_sensor':        start     end   type  z_level
s_id
589    318.0   567.0  level      0.0
621   3714.0  3969.0  level      0.0, 'right_sensor':        start     end   type  z_level
s_id
588    169.0   450.0  level      0.0
622   3842.0  4131.0  level      0.0}

Below we plot all the relevant data for a single gait test to make it easier to understand.

For the selected test, we can see that the participant basically started walking right away. While it can not be easily seen from the raw data IMU itself, the participant walked down a stair in two bouts. This can be more clearly seen in the baro data, which shows a slowly increasing pressure value, indicating a reduction in altitude.

import matplotlib.pyplot as plt

foot = "right_sensor"
_, axs = plt.subplots(nrows=3, figsize=(10, 10), sharex=True)
imu_data[foot].filter(like="gyr").plot(ax=axs[0])
imu_data[foot].filter(like="acc").plot(ax=axs[1])
baro_data[foot].plot(ax=axs[2])

axs[0].set_ylabel("Rate of rotation [deg/s]")
axs[1].set_ylabel("Acceleration [m/s^2]")
axs[2].set_ylabel("Air Pressure [mbar]")
axs[2].set_xlabel("Time [s]")

plt.show()
stair ambulation healthy 2021

When zooming in we can see the individual events withing the strides. The min_vel event is in the resting period between strides, and the IC and TC events at the falling and rising edges of pressure signal, respectively. The start and endpoints of the segmented strides (dashed lines) are at the maximum of the gyr_y signal.

foot = "right_sensor"
fig, axs = plt.subplots(nrows=2, figsize=(10, 10), sharex=True)
imu_data[foot].filter(like="gyr").plot(ax=axs[0])
pressure_data[foot]["total_force"].plot(ax=axs[1])
events = insole_events[foot].drop(columns=["start", "end"])
events /= datapoint.sampling_rate_hz
styles = ["ro", "gs", "b^", "m*"]
for style, (i, e) in zip(styles, events.T.iterrows()):
    e = e.dropna()
    axs[0].plot(e, imu_data[foot]["gyr_y"].loc[e.to_numpy()].to_numpy(), style, label=i, markersize=8)
    axs[1].plot(e, pressure_data[foot]["total_force"].loc[e.to_numpy()].to_numpy(), style, markersize=8)
for i, s in segmented_stride_list[foot].iterrows():
    s /= datapoint.sampling_rate_hz
    axs[0].axvline(s["start"], color="k", linestyle="--")
    axs[0].axvline(s["end"], color="k", linestyle="--")
    axs[1].axvline(s["start"], color="k", linestyle="--")
    axs[1].axvline(s["end"], color="k", linestyle="--")

axs[0].legend()
axs[0].set_xlim(12, 15)
axs[0].set_ylim(-500, 600)

axs[0].set_ylabel("Rate of rotation [deg/s]")
axs[1].set_ylabel("Pressure equivalent weight [kg]")
axs[1].set_xlabel("Time [s]")

plt.show()
stair ambulation healthy 2021

StairAmbulationHealthy2021Full#

The StairAmbulationHealthy2021Full dataset is contains the complete recordings of all 20 participants, not cut into individual tests. Note, that there are still two recordings per participant. This is because data was collected at two different locations and hence, the data is split into two sections.

The StairAmbulationHealthy2021Full dataclass can be used equivalently to the StairAmbulationHealthyPerTest dataset. The only difference is that instead of the individual tests, we can see the two parts in the index for the dataset.

from gaitmap_datasets import StairAmbulationHealthy2021Full

dataset = StairAmbulationHealthy2021Full(
    include_pressure_data=True,
    include_baro_data=True,
    include_hip_sensor=True,
    memory=Memory("../.cache"),
)
dataset

StairAmbulationHealthy2021Full [40 groups/rows]

participant part
0 subject_01 part_1
1 subject_01 part_2
2 subject_02 part_1
3 subject_02 part_2
4 subject_03 part_1
5 subject_03 part_2
6 subject_04 part_1
7 subject_04 part_2
8 subject_05 part_1
9 subject_05 part_2
10 subject_06 part_1
11 subject_06 part_2
12 subject_07 part_1
13 subject_07 part_2
14 subject_08 part_1
15 subject_08 part_2
16 subject_09 part_1
17 subject_09 part_2
18 subject_10 part_1
19 subject_10 part_2
20 subject_11 part_1
21 subject_11 part_2
22 subject_12 part_1
23 subject_12 part_2
24 subject_13 part_1
25 subject_13 part_2
26 subject_14 part_1
27 subject_14 part_2
28 subject_15 part_1
29 subject_15 part_2
30 subject_16 part_1
31 subject_16 part_2
32 subject_17 part_1
33 subject_17 part_2
34 subject_18 part_1
35 subject_18 part_2
36 subject_19 part_1
37 subject_19 part_2
38 subject_20 part_1
39 subject_20 part_2


subset = dataset.get_subset(participant="subject_01", part="part_2")
subset

StairAmbulationHealthy2021Full [1 groups/rows]

participant part
0 subject_01 part_2


As most parameters and attributes are identical, we will not repeat them.

One interesting addition is the test_list attribute. If it is required to understand which tests where performed in the respective sessions, we can access them as a region-of-interest list.

start end
roi_id
slope_ascending_normal 5779 21912
slope_descending_normal 22409 40494
stair_flat_down_normal 41008 45937
stair_flat_up_normal 46470 49780
stair_flat_down_fast 50281 52782
stair_flat_up_fast 53390 55994
stair_flat_down_slow 56552 60832
stair_flat_up_slow 61383 65989
stair_long_flying_normal 66577 89845
stair_long_down_normal 90322 94722
stair_long_up_normal 95201 100466
stair_long_down_fast 100961 104364
stair_long_up_fast 104874 108498
stair_long_down_slow 109186 117087
stair_long_up_slow 117571 125476
stair_long_down_single_step 126056 135266
stair_long_up_single_step 135727 145803
stair_long_down_double_step 146219 149371
stair_long_up_double_step 149853 153682


When plotting the data in the entire part_2 recording, we can see that it spans multiple tests including multiple walks up and down various stairs.

If you would zoom in, you can see that between each test, the participants were instructed to jump up and down 3 times. These jump events were used as marker to cut the individual tests.

imu_data = subset.data
baro_data = subset.baro_data

foot = "right_sensor"
_, axs = plt.subplots(nrows=3, figsize=(10, 10), sharex=True)
imu_data[foot].filter(like="gyr").plot(ax=axs[0])
imu_data[foot].filter(like="acc").plot(ax=axs[1])
baro_data[foot].plot(ax=axs[2])
for i, s in subset.test_list.iterrows():
    s /= subset.sampling_rate_hz
    axs[0].axvspan(s["start"], s["end"], color="k", alpha=0.2)
    axs[1].axvspan(s["start"], s["end"], color="k", alpha=0.2)
    axs[2].axvspan(s["start"], s["end"], color="k", alpha=0.2)

axs[0].set_ylabel("Rate of rotation [deg/s]")
axs[1].set_ylabel("Acceleration [m/s^2]")
axs[2].set_ylabel("Air Pressure [mbar]")
axs[2].set_xlabel("Time [s]")

plt.xlim(550, 650)
plt.show()
stair ambulation healthy 2021

A note on caching#

To make it possible to interact with the entire dataset, without filling your RAM immediately, all data is only loaded once you access the respective data attribute (e.g. data or pressure_data). However, this means, if you access the same piece of data multiple times (or multiple pieces of related data), data needs to be loaded again from disk and preprocessed. This is slow. Therefore, we allow to use joblib.Memory to cache the data in a fast disk cache. You can configure the cache directory using the memory parameter of the dataset class. Keep in mind, that the cache directory can become quite large. We recommend clearing the cache from time to time, to free up space.

Total running time of the script: ( 0 minutes 5.850 seconds)

Estimated memory usage: 142 MB

Gallery generated by Sphinx-Gallery