# Difference between revisions of "stat946w18/Predicting Floor-Level for 911 Calls with Neural Networks and Smartphone Sensor Data"

(→Floor Estimation) |
|||

(27 intermediate revisions by 14 users not shown) | |||

Line 1: | Line 1: | ||

= Introduction = | = Introduction = | ||

− | During emergency 911 calls, knowing the exact position of the | + | During emergency 911 calls, knowing the exact position of the victims is crucial to a fast response and a successful rescue. Knowing the victim's floor level in an emergency can speed up the search by a factor proportional to the number of floors in the building. Problems arise when the caller is unable to give their physical position accurately. This can happen for instance when the caller is disoriented, held hostage, or a child is calling on behalf of the victim. GPS sensors on smartphones can provide the rescuers with the geographic location. However GPS fails to give an accurate floor level inside a tall building. Previous work have explored using Wi-Fi signals or beacons placed inside the buildings, but these methods are not self-contained and require prior infrastructure knowledge. |

− | Fortunately, today’s smartphones are equipped with many more sensors including | + | Fortunately, today’s smartphones are equipped with many more sensors including barometers and magnetometers. Deep learning can be applied to predict floor level based on these sensor readings. |

+ | Firstly, an LSTM is trained to classify whether the caller is indoors or outdoors using GPS, RSSI (Received Signal Strength Indication), and magnetometer sensor readings. Next, an unsupervised clustering algorithm is used to predict the floor level depending on the barometric pressure difference. With these two parts working together, a self-contained floor level prediction system can achieve 100% accuracy, without any external prior knowledge. | ||

+ | |||

+ | This paper is published in ICLR 2018. The code, data, and app are open-source on [https://github.com/williamFalcon/Predicting-floor-level-for-911-Calls-with-Neural-Networks-and-Smartphone-Sensor-Data (GitHub)] | ||

= Data Description = | = Data Description = | ||

− | The | + | The authors developed an iOS app called Sensory and used it to collect data on an iPhone 6. The following sensor readings were recorded: indoors, created at, session id, floor, RSSI strength, GPS latitude, GPS longitude, GPS vertical accuracy, GPS horizontal accuracy, GPS course, GPS speed, barometric relative altitude, barometric pressure, environment context, environment mean building floors, environment activity, city name, country name, magnet x, magnet y, magnet z, magnet total. |

The indoor-outdoor data has to be manually entered as soon as the user enters or exits a building. To gather the data for floor level prediction, the authors conducted 63 trials among five different buildings throughout New York City. The actual floor level was recorded manually for validation purposes only, since unsupervised learning is being used. | The indoor-outdoor data has to be manually entered as soon as the user enters or exits a building. To gather the data for floor level prediction, the authors conducted 63 trials among five different buildings throughout New York City. The actual floor level was recorded manually for validation purposes only, since unsupervised learning is being used. | ||

+ | |||

+ | === Note: Barometric formula === | ||

+ | |||

+ | The barometric measures, sometimes called the exponential atomsphere or isothermal atmosphere, is the measure used to model how the pressure (or density) of the air changes with altitude. The pressure drops approximately by 11.3 Pa per meter in first 1000 meters above sea level. | ||

= Methods = | = Methods = | ||

− | The proposed method first determines if the user is | + | The proposed method first determines if the user is indoor or outdoor and detects the instances of transition between them. When an outdoor to indoor transition event occurs, the elevation of the user is saved using an estimation from the cellphone barometer. Finally, the exact floor level is predicted through clustering techniques. Indoor/outdoor classification is critical to the working of this method. Once the user is detected to be outdoors, he is assumed to be at the ground level. The vertical height and floor estimation is applied only when the user is indoors. The indoor/outdoor transitions are used to save the barometer readings at the ground level for use as reference pressure. |

=== Indoor/Outdoor Classification === | === Indoor/Outdoor Classification === | ||

Line 21: | Line 28: | ||

<math> X_i</math> contains a set of <math>d</math> consecutive sensor readings, i.e. <math> X_i = [x_1, x_2,...,x_d] </math>. <math>Y</math> is labelled as 0 for outdoors and 1 for indoors. <math>d</math> is chosen to be 3 by random-search so that <math>X</math> has 3 points <math>X_i = [x_{j-1}, x_j, x_{j+1}]</math> and the middle <math>x_j</math> is used for the <math>y</math> label. | <math> X_i</math> contains a set of <math>d</math> consecutive sensor readings, i.e. <math> X_i = [x_1, x_2,...,x_d] </math>. <math>Y</math> is labelled as 0 for outdoors and 1 for indoors. <math>d</math> is chosen to be 3 by random-search so that <math>X</math> has 3 points <math>X_i = [x_{j-1}, x_j, x_{j+1}]</math> and the middle <math>x_j</math> is used for the <math>y</math> label. | ||

− | The LSTM contains three layers. Layers one and two have 50 neurons followed by a dropout layer set to 0.2. Layer 3 has two neurons fed directly into a one-neuron feedforward layer with a sigmoid activation function. The input is the sensor readings, and the output is the indoor-outdoor label. The objective function is the cross-entropy between the true | + | The LSTM contains three layers. Layers one and two have 50 neurons followed by a dropout layer set to 0.2. Layer 3 has two neurons fed directly into a one-neuron feedforward layer with a sigmoid activation function. The input is the sensor readings, and the output is the indoor-outdoor label. The objective function is the cross-entropy between the true labels and the predictions. |

\begin{equation} | \begin{equation} | ||

Line 28: | Line 35: | ||

\end{equation} | \end{equation} | ||

− | The main reason why the neural network is able to predict whether the user is | + | The main reason why the neural network is able to predict whether the user is indoors or outdoors is that it learns a pattern of how the walls of buildings interfere with the GPS signals. The LSTM is able to find the pattern in the GPS signal strength in combination with other sensor readings to give an accurate prediction. However, the change in GPS signal does not happen instantaneously as the user walks indoor. Thus, a window of 20 seconds is allowed, and the minimum barometric pressure reading within that window is recorded as the ground floor. |

=== Indoor/Outdoor Transition === | === Indoor/Outdoor Transition === | ||

− | To determine the exact time the user makes an indoor | + | To determine the exact time the user makes an indoor/outdoor transition, two vector masks are convolved across the LSTM predictions. |

\begin{equation} | \begin{equation} | ||

Line 41: | Line 48: | ||

\end{equation} | \end{equation} | ||

− | The Jaccard distances is | + | The Jaccard distances measures the similarity of two sets and is calculated with the following equation: |

\begin{equation} | \begin{equation} | ||

Line 48: | Line 55: | ||

\end{equation} | \end{equation} | ||

− | If the Jaccard distance is greater | + | If the Jaccard distance between <math>V_{1}</math> and sub-sequence <math> s_i </math> is greater or equal to the threshold 0.4, it means there was a transition from indoors to outdoors in the vicinity of the 20 second range of the vector mask. Similarly, a distance of to 0.4 or greater to <math>V_{2}</math> indicates a transition from outdoors to indoors. Sets of transition windows are merged together if they occur close in time to each other, with the average transition time of both windows being used as the new transition time. |

[[File:FindIOIndexes.png | 700px]] | [[File:FindIOIndexes.png | 700px]] | ||

Line 59: | Line 66: | ||

\label{equation:baroHeight} | \label{equation:baroHeight} | ||

\end{equation} | \end{equation} | ||

+ | |||

+ | In appendix B.1, the authors acknowledge that for this system to work, pressures variations due to weather or temperature must be accounted for as those variations are on the same order of magnitude or larger than the pressure variations caused by changing altitude. They suggest using a nearby reference station with known altitude to continuously measure and correct for this effect. | ||

=== Floor Estimation === | === Floor Estimation === | ||

− | Given the user’s relative altitude, the floor level can be determined. However, this is not a straightforward task because different buildings have different floor heights, different floor labeling (E.g. not including the 13th floor), and floor heights within the same building can vary from floor to floor. To solve these problems, | + | Given the user’s relative altitude, the floor level can be determined. However, this is not a straightforward task because different buildings have different floor heights, different floor labeling (E.g. not including the 13th floor), and floor heights within the same building can vary from floor to floor. To solve these problems, altitude data collected are clustered into groups by grouping sorted altitude data points that are within 1.5 meters of each other. Each cluster represents the approximate altitude of a floor. |

Here is an example of altitude data collected across 41 trials in the Uris Hall building in New York City. Each dashed line represent the center of a cluster. | Here is an example of altitude data collected across 41 trials in the Uris Hall building in New York City. Each dashed line represent the center of a cluster. | ||

Line 74: | Line 83: | ||

= Experiments and Results = | = Experiments and Results = | ||

+ | The authors performed evaluation on two different tasks: The indoor-outdoor classification task and the floor level prediction task. In the indoor-outdoor detection task, they compared six different models, LSTM, feedforward neural networks, logistic regression, SVM, HMM and Random Forests. In the floor level prediction task, they evaluated the full system. | ||

+ | |||

+ | == Indoor-Outdoor Classification Results == | ||

Here are the results for the indoor-outdoor classification problem using different machine learning techniques. LSTM has the best performance on the test set. | Here are the results for the indoor-outdoor classification problem using different machine learning techniques. LSTM has the best performance on the test set. | ||

The LSTM is trained for 24 epochs with a batch size of 128. All the hyper-parameters such as learning rate(0.006), number of layers, d size, number of hidden units and dropout rate were searched through random search algorithm. | The LSTM is trained for 24 epochs with a batch size of 128. All the hyper-parameters such as learning rate(0.006), number of layers, d size, number of hidden units and dropout rate were searched through random search algorithm. | ||

Line 79: | Line 91: | ||

[[File:IOResults.png]] | [[File:IOResults.png]] | ||

− | The following are the results for the floor level prediction. | + | == Floor Level Prediction Results == |

+ | The following are the results for the floor level prediction from the 63 collected samples. Results are given as the percent which matched the floor exactly, off by one, or off by more than one. In each column, the left number is the accuracy using a fixed floor height, and the number on the right is the accuracy when clustering was used to calculate a variable floor height. It was found that using the clustering technique produced 100% accuracy on floor predictions. The conclusion from these results is that using building-specific floor heights produces significantly better results. | ||

[[File:FloorLevelResults.png]] | [[File:FloorLevelResults.png]] | ||

+ | == Floor Level Clustering Results == | ||

Here is the comparison between the estimated floor height and the ground truth in the Uris Hall building. | Here is the comparison between the estimated floor height and the ground truth in the Uris Hall building. | ||

Line 88: | Line 102: | ||

= Criticism = | = Criticism = | ||

− | This paper is an interesting application of deep learning and | + | This paper is an interesting application of deep learning and achieves an outstanding result of 100% accuracy. However, it offers no new theoretical discoveries. The machine learning techniques used are fairly standard. The neural networks used in this paper only contains 3 layers, and the clustering is applied on one-dimensional data. This leads to the question whether deep learning is necessary and suitable for this task. |

It was explained in the paper that there are many cases where the system does not work. Some cases that were mentioned include: buildings with glass walls, delayed GPS signals, | It was explained in the paper that there are many cases where the system does not work. Some cases that were mentioned include: buildings with glass walls, delayed GPS signals, | ||

− | and pressure changes caused by air conditioning. Other examples I can think of are: uneven floors with some area higher than others, floors rarely visited, and tunnels from one building to another. These special cases are not mentioned in the paper. | + | and pressure changes caused by air conditioning. Other examples I can think of are: uneven floors with some area higher than others, floors rarely visited, and tunnels from one building to another. These special cases are not specifically mentioned in the paper, but they do note that differences between outdoors and pressure-sealed buildings is a problem |

+ | |||

+ | Another weakness of the method comes from the clustering technique. It requires a fair bit of training data. The author suggested two approaches. First, the data can be stored in the individual smartphone. This is not realistic as most people do not visit every single floor of every building, even if it is their own apartment buildings. The second approach is to let a central system (emergency department) collect data from multiple users (which is what the paper’s results are based on). However, such data collection would need to be done in accordance with local laws. Perhaps a better solution would be to use elevation reading to estimate a floor based on typical floor height. Even having a small range of floors of interest could help first responders significantly narrow down their response time. | ||

+ | |||

+ | Aside from all the technical issues, if knowing the exact floor is required, would it maybe be easier to let the rescuers carry a barometer with them and search for the floor with the transmitted pressure reading? | ||

+ | |||

+ | == Real-world Considerations == | ||

+ | |||

+ | In the appendices the real-world issues discovered are discussed and possible solutions are proposed. | ||

+ | |||

+ | '''Pressure Variance''' | ||

+ | |||

+ | Changing weather conditions and geographical locations can greatly affect the barometric pressure from different cases. As a possible solution, gathering the current pressure conditions from a nearby landmark such as an airport can be used to normalize the local pressure. Alternatively, the knowledge of local wi-fi access points can establish if a user is changing locations or if the pressure is naturally changing. | ||

+ | |||

+ | Another potential issue when using pressure readings is that different phones were found to read the local pressure at varying offsets from each-other. This shows that some form of calibration of the phone would have to be provided prior to the use of the app. | ||

− | + | '''Battery Impact''' | |

+ | In having an app regularly collecting data from the GPS and motion sensors, the battery life of the device will be severely impacted. While the motion sensing has already been addressed in iOS systems by running on a dedicated chip, the GPS would need to be sampled far less frequently. | ||

− | + | = Conclusion = | |

+ | This paper presented a novel deep learning application in predicting the floor level given sensory data from mobile phones. While there are no new theoretical discoveries, the application is novel and important for 911-responders; indeed, previous studies have shown that survival rates for urgent medical events drop exponentially for each floor increase. Although much of this is attributed to the actual floor height, this situation makes it all the more important to reduce ground-to-floor travel time. |

## Latest revision as of 15:01, 18 April 2018

## Contents

# Introduction

During emergency 911 calls, knowing the exact position of the victims is crucial to a fast response and a successful rescue. Knowing the victim's floor level in an emergency can speed up the search by a factor proportional to the number of floors in the building. Problems arise when the caller is unable to give their physical position accurately. This can happen for instance when the caller is disoriented, held hostage, or a child is calling on behalf of the victim. GPS sensors on smartphones can provide the rescuers with the geographic location. However GPS fails to give an accurate floor level inside a tall building. Previous work have explored using Wi-Fi signals or beacons placed inside the buildings, but these methods are not self-contained and require prior infrastructure knowledge.

Fortunately, today’s smartphones are equipped with many more sensors including barometers and magnetometers. Deep learning can be applied to predict floor level based on these sensor readings. Firstly, an LSTM is trained to classify whether the caller is indoors or outdoors using GPS, RSSI (Received Signal Strength Indication), and magnetometer sensor readings. Next, an unsupervised clustering algorithm is used to predict the floor level depending on the barometric pressure difference. With these two parts working together, a self-contained floor level prediction system can achieve 100% accuracy, without any external prior knowledge.

This paper is published in ICLR 2018. The code, data, and app are open-source on (GitHub)

# Data Description

The authors developed an iOS app called Sensory and used it to collect data on an iPhone 6. The following sensor readings were recorded: indoors, created at, session id, floor, RSSI strength, GPS latitude, GPS longitude, GPS vertical accuracy, GPS horizontal accuracy, GPS course, GPS speed, barometric relative altitude, barometric pressure, environment context, environment mean building floors, environment activity, city name, country name, magnet x, magnet y, magnet z, magnet total.

The indoor-outdoor data has to be manually entered as soon as the user enters or exits a building. To gather the data for floor level prediction, the authors conducted 63 trials among five different buildings throughout New York City. The actual floor level was recorded manually for validation purposes only, since unsupervised learning is being used.

### Note: Barometric formula

The barometric measures, sometimes called the exponential atomsphere or isothermal atmosphere, is the measure used to model how the pressure (or density) of the air changes with altitude. The pressure drops approximately by 11.3 Pa per meter in first 1000 meters above sea level.

# Methods

The proposed method first determines if the user is indoor or outdoor and detects the instances of transition between them. When an outdoor to indoor transition event occurs, the elevation of the user is saved using an estimation from the cellphone barometer. Finally, the exact floor level is predicted through clustering techniques. Indoor/outdoor classification is critical to the working of this method. Once the user is detected to be outdoors, he is assumed to be at the ground level. The vertical height and floor estimation is applied only when the user is indoors. The indoor/outdoor transitions are used to save the barometer readings at the ground level for use as reference pressure.

### Indoor/Outdoor Classification

An LSTM network is used to solve the indoor-outdoor classification problem. Here is a diagram of the network architecture.

Figure 1: LSTM network architecture. A 3-layer LSTM. Inputs are sensor readings for d consecutive time-steps. Target is y = 1 if indoors and y = 0 if outdoors.

[math] X_i[/math] contains a set of [math]d[/math] consecutive sensor readings, i.e. [math] X_i = [x_1, x_2,...,x_d] [/math]. [math]Y[/math] is labelled as 0 for outdoors and 1 for indoors. [math]d[/math] is chosen to be 3 by random-search so that [math]X[/math] has 3 points [math]X_i = [x_{j-1}, x_j, x_{j+1}][/math] and the middle [math]x_j[/math] is used for the [math]y[/math] label. The LSTM contains three layers. Layers one and two have 50 neurons followed by a dropout layer set to 0.2. Layer 3 has two neurons fed directly into a one-neuron feedforward layer with a sigmoid activation function. The input is the sensor readings, and the output is the indoor-outdoor label. The objective function is the cross-entropy between the true labels and the predictions.

\begin{equation} C(y_i, \hat{y}_i) = \frac{1}{n} \sum_{i=1}^{n} -(y_i log(\hat{y_i}) + (1 - y_i) log(1 - \hat{y_i})) \label{equation:binCE} \end{equation}

The main reason why the neural network is able to predict whether the user is indoors or outdoors is that it learns a pattern of how the walls of buildings interfere with the GPS signals. The LSTM is able to find the pattern in the GPS signal strength in combination with other sensor readings to give an accurate prediction. However, the change in GPS signal does not happen instantaneously as the user walks indoor. Thus, a window of 20 seconds is allowed, and the minimum barometric pressure reading within that window is recorded as the ground floor.

### Indoor/Outdoor Transition

To determine the exact time the user makes an indoor/outdoor transition, two vector masks are convolved across the LSTM predictions.

\begin{equation} V_1 = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0] \end{equation}

\begin{equation} V_2 = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] \end{equation}

The Jaccard distances measures the similarity of two sets and is calculated with the following equation:

\begin{equation} J_j = J(s_i, V_j) = \frac{|s_i \cap V_j|}{|s_i| + |V_j| - |s_i \cap V_j|} \label{equation:Jaccard} \end{equation}

If the Jaccard distance between [math]V_{1}[/math] and sub-sequence [math] s_i [/math] is greater or equal to the threshold 0.4, it means there was a transition from indoors to outdoors in the vicinity of the 20 second range of the vector mask. Similarly, a distance of to 0.4 or greater to [math]V_{2}[/math] indicates a transition from outdoors to indoors. Sets of transition windows are merged together if they occur close in time to each other, with the average transition time of both windows being used as the new transition time.

### Vertical Height Estimation

Once the barometric pressure of the ground floor is known, the user’s current relative altitude can be calculated by the international pressure equation, where [math]m_\Delta[/math] is the estimated height, [math] p_1 [/math] is the pressure reading of the device, and [math] p_0 [/math] is the reference pressure at ground level while transitioning from outdoor to indoor.

\begin{equation} m_\Delta = f_{floor}(p_0, p_1) = 44330 (1 - (\frac{p_1}{p_0})^{\frac{1}{5.255}}) \label{equation:baroHeight} \end{equation}

In appendix B.1, the authors acknowledge that for this system to work, pressures variations due to weather or temperature must be accounted for as those variations are on the same order of magnitude or larger than the pressure variations caused by changing altitude. They suggest using a nearby reference station with known altitude to continuously measure and correct for this effect.

### Floor Estimation

Given the user’s relative altitude, the floor level can be determined. However, this is not a straightforward task because different buildings have different floor heights, different floor labeling (E.g. not including the 13th floor), and floor heights within the same building can vary from floor to floor. To solve these problems, altitude data collected are clustered into groups by grouping sorted altitude data points that are within 1.5 meters of each other. Each cluster represents the approximate altitude of a floor.

Here is an example of altitude data collected across 41 trials in the Uris Hall building in New York City. Each dashed line represent the center of a cluster.

Figure 2: Distribution of measurements across 41 trials in the Uris Hall building in New York City. A clear size difference is specially noticeable at the lobby. Each dotted line corresponds to an actual floor in the building learned from clustered data-points.

Here is the algorithm for the floor level prediction.

# Experiments and Results

The authors performed evaluation on two different tasks: The indoor-outdoor classification task and the floor level prediction task. In the indoor-outdoor detection task, they compared six different models, LSTM, feedforward neural networks, logistic regression, SVM, HMM and Random Forests. In the floor level prediction task, they evaluated the full system.

## Indoor-Outdoor Classification Results

Here are the results for the indoor-outdoor classification problem using different machine learning techniques. LSTM has the best performance on the test set. The LSTM is trained for 24 epochs with a batch size of 128. All the hyper-parameters such as learning rate(0.006), number of layers, d size, number of hidden units and dropout rate were searched through random search algorithm.

## Floor Level Prediction Results

The following are the results for the floor level prediction from the 63 collected samples. Results are given as the percent which matched the floor exactly, off by one, or off by more than one. In each column, the left number is the accuracy using a fixed floor height, and the number on the right is the accuracy when clustering was used to calculate a variable floor height. It was found that using the clustering technique produced 100% accuracy on floor predictions. The conclusion from these results is that using building-specific floor heights produces significantly better results.

## Floor Level Clustering Results

Here is the comparison between the estimated floor height and the ground truth in the Uris Hall building.

# Criticism

This paper is an interesting application of deep learning and achieves an outstanding result of 100% accuracy. However, it offers no new theoretical discoveries. The machine learning techniques used are fairly standard. The neural networks used in this paper only contains 3 layers, and the clustering is applied on one-dimensional data. This leads to the question whether deep learning is necessary and suitable for this task.

It was explained in the paper that there are many cases where the system does not work. Some cases that were mentioned include: buildings with glass walls, delayed GPS signals, and pressure changes caused by air conditioning. Other examples I can think of are: uneven floors with some area higher than others, floors rarely visited, and tunnels from one building to another. These special cases are not specifically mentioned in the paper, but they do note that differences between outdoors and pressure-sealed buildings is a problem

Another weakness of the method comes from the clustering technique. It requires a fair bit of training data. The author suggested two approaches. First, the data can be stored in the individual smartphone. This is not realistic as most people do not visit every single floor of every building, even if it is their own apartment buildings. The second approach is to let a central system (emergency department) collect data from multiple users (which is what the paper’s results are based on). However, such data collection would need to be done in accordance with local laws. Perhaps a better solution would be to use elevation reading to estimate a floor based on typical floor height. Even having a small range of floors of interest could help first responders significantly narrow down their response time.

Aside from all the technical issues, if knowing the exact floor is required, would it maybe be easier to let the rescuers carry a barometer with them and search for the floor with the transmitted pressure reading?

## Real-world Considerations

In the appendices the real-world issues discovered are discussed and possible solutions are proposed.

**Pressure Variance**

Changing weather conditions and geographical locations can greatly affect the barometric pressure from different cases. As a possible solution, gathering the current pressure conditions from a nearby landmark such as an airport can be used to normalize the local pressure. Alternatively, the knowledge of local wi-fi access points can establish if a user is changing locations or if the pressure is naturally changing.

Another potential issue when using pressure readings is that different phones were found to read the local pressure at varying offsets from each-other. This shows that some form of calibration of the phone would have to be provided prior to the use of the app.

**Battery Impact**

In having an app regularly collecting data from the GPS and motion sensors, the battery life of the device will be severely impacted. While the motion sensing has already been addressed in iOS systems by running on a dedicated chip, the GPS would need to be sampled far less frequently.

# Conclusion

This paper presented a novel deep learning application in predicting the floor level given sensory data from mobile phones. While there are no new theoretical discoveries, the application is novel and important for 911-responders; indeed, previous studies have shown that survival rates for urgent medical events drop exponentially for each floor increase. Although much of this is attributed to the actual floor height, this situation makes it all the more important to reduce ground-to-floor travel time.