Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations: Difference between revisions

From statwiki
Jump to navigation Jump to search
Line 43: Line 43:
\end{align*}
\end{align*}


as the left hand side of the PDE above
as the left hand side of the PDE above. Now assume that <math display="inline"> u(t,x) </math> can be approximated by a deep neural network. Therefore, the function <math display="inline"> f(t,x) </math> can also be approximate by a neural network since it is simply a function of <math display="inline"> u(t,x) </math>. In order to calculate <math display="inline"> f(t,x) </math> as a function of <math display="inline"> u(t,x) </math>, derivates of <math display="inline"> u(t,x) </math> will need to be taken with respect to the inputs which is accomplished using a technique called automatic differentiation [?]. Importantly, the weights of the two neural networks will be shared, since <math display="inline"> f(t,x) </math> is simply a function of <math display="inline"> u(t,x) </math>. In order to find this set of weights, we create a loss function which has two distinct parts. The first part quantifies how well the neural network satisfies the known data points and is given by:
 
\begin{align*}
MSE_u = \frac{1}{N_u} \Sum_{i=1}^{N_u}
\end{align*}


=== Discrete-Time Models ===
=== Discrete-Time Models ===

Revision as of 15:35, 13 November 2020

Presented by

Cameron Meaney

Introduction

In recent years, there has been an enormous growth in the amount of data and computing power available to researchers. Unfortunately, for many real-world scenarios, the cost of data acquisition is simply too high to collect an amount of data sufficient to guarantee robustness or convergence of training algorithms. In such situations, researchers are faced with the challenge of trying to generate results based on partial or incomplete datasets. Regularization techniques or methods which can artificially inflate the dataset become particularly useful in these situations; however, such techniques are often highly dependent of the specifics of the problem.

Luckily, in important real-world scenarios that we endeavor to analyze, there is often a wealth of existing information from which we can draw. This existing information commonly manifests in the form of a mathematical model, particularly a set of partial differential equations (PDEs). In this paper, the authors provide a technique for incorporating the information of a physical system contained in a PDE into the optimization of a deep neural network. This technique is most useful in situations where established PDE models exist, but where our amount of available data is too small for neural network training. In essence, the accompanying PDE model can be used as a regularization agent, constraining the space of acceptable solutions to help the optimization converge more quickly and more accurately.

Problem Setup

Consider the following general PDE

\begin{align*} u_t + N[u;\lambda] = 0 \end{align*}

where the [math]\displaystyle{ u }[/math] is the function we wish to find, subscripts denote partial derivatives, [math]\displaystyle{ \vec{\lambda} }[/math] is the set of parameters on which the PDE depends, and [math]\displaystyle{ N }[/math] is a differential, potentially nonlinear operator. This general form encompasses a wide array of PDEs used across the physical sciences including conservation laws, diffusion processes, advection-diffusion-reaction systems, and kinetic equations. Suppose that we have noisy measurements of the PDE solution, [math]\displaystyle{ u }[/math] scattered across the spatio-temporal input domain. Then, we are interested in answering two questions about the physical system:

(1) Given fixed model parameters [math]\displaystyle{ \vec{\lambda} }[/math], what can be said about the unknown hidden state [math]\displaystyle{ u(t,x) }[/math]?

and

(2) What parameters [math]\displaystyle{ \vec{\lambda} }[/math] best describe the observed data?

Data-Driven Solutions of PDEs

We will begin by attempting to answer the first of the questions above. Specifically, if given a small amount of noisy measurements of the solution of the PDE

\begin{align*} u_t + N[u] = 0, \end{align*}

can we estimate the full solution, [math]\displaystyle{ u(t,x) }[/math], by approximating it with a deep neural network? Approximating the solution of the PDE with a neural network results in what the authors refer to as a 'Physics-Informed Neural Network' (PINN). Importantly, this technique is most useful when we are in the small-data regime - for if we had lots of data, it simply wouldn't be necessary to include information from the PDE because the data alone would be sufficient. In these examples, we are seeking to learn from a very small amount of data which makes information from the PDE necessary to include.


Continuous-Time Models

Consider the case where our noisy measurements of the solution are randomly scattered across the spatio-temporal input domain. This case is referred to as the 'continuous-time case.' Define the function

\begin{align*} f = u_t + N[u] \end{align*}

as the left hand side of the PDE above. Now assume that [math]\displaystyle{ u(t,x) }[/math] can be approximated by a deep neural network. Therefore, the function [math]\displaystyle{ f(t,x) }[/math] can also be approximate by a neural network since it is simply a function of [math]\displaystyle{ u(t,x) }[/math]. In order to calculate [math]\displaystyle{ f(t,x) }[/math] as a function of [math]\displaystyle{ u(t,x) }[/math], derivates of [math]\displaystyle{ u(t,x) }[/math] will need to be taken with respect to the inputs which is accomplished using a technique called automatic differentiation [?]. Importantly, the weights of the two neural networks will be shared, since [math]\displaystyle{ f(t,x) }[/math] is simply a function of [math]\displaystyle{ u(t,x) }[/math]. In order to find this set of weights, we create a loss function which has two distinct parts. The first part quantifies how well the neural network satisfies the known data points and is given by:

\begin{align*} MSE_u = \frac{1}{N_u} \Sum_{i=1}^{N_u} \end{align*}

Discrete-Time Models