The application of neural networks—particularly deep learning—is vast and spans numerous fields. One such field is mechanical engineering, where these techniques can be applied to dynamics.
In this article, we introduce the use of neural networks to solve ordinary differential equations (ODEs), a core aspect of dynamic and multi-body systems. We will explore the key concepts and nuances necessary for formulating these neural network models. In the next article, we will discuss a practical example.
A neural network operates by beginning with input preprocessing and weight initialization, then proceeding through feedforward propagation where activation functions are applied, followed by loss calculation, back-propagation to compute gradients, weight updates to refine the model, and ultimately achieving solution approximation through iterative training.
Before we proceed, let's state the formula for the input fed into the network, as shown in the image below:
Here, W represents the weights and b the biases. Through training, we determine optimal values for both. In this expression, l denotes the layer level and
p^{(l)} represents the activation function applied at that level and A represents the affine transformation or the pre-activation output at layer l in a neural network.
It is important to note that the depth of the network corresponds to the number of affine maps, and the maximum dimension of neurons, n, represents the total number of neurons present in the network (for example, those within a cycle).
When approximating the solution to an ODE in a multi-body system, several normalization techniques can be employed—namely the C⁰-norm, Cᵏ-norm, and L²-norm. The C⁰-norm is defined as the maximum absolute difference between the functions, |f(x) – g(x)|, across the domain. In contrast, the Cᵏ-norm not only considers the maximum difference between the functions but also measures the discrepancies in their derivatives up to the kth order (for each derivative order 0≤l≤k0 \leq l \leq k0≤l≤k). Finally, the L²-norm calculates the square root of the integral of the squared differences between f and g, providing an overall measure of the approximation error over the entire domain.
A key question arises regarding the appropriate size of a neural network—specifically, how many neurons are necessary to achieve a good approximation such that the error ∥F−F^∥\|F - \hat{F}\| remains below a given threshold ε\varepsilon. Yarotsky (2017) notes that meeting this condition is not straightforward, and balancing network complexity with approximation accuracy is a significant challenge.
For a rectified linear unit (ReLU), positive values of xxx pass through unchanged—essentially acting as an identity mapping—while negative values are set to zero. This behavior is sometimes expressed as:
Additionally, this concept extends to the formulation of the maximum function, which can be written as:
Both expressions are fundamental in understanding how neural networks approximate the linear functionalities we can achieve by leveraging the ReLU activation function.
For smooth approximation functions—such as those used in solving partial differential equations—a Taylor series approximation can serve as an effective classical numerical method for obtaining solutions to dynamic equations. However, these classical methods can become computationally expensive in high-dimensional problems or when rigorous uncertainty quantification is required, often leading to longer solution times. In contrast, the primary challenge with neural networks is selecting the optimal architecture and parameters. Once the right neural network is established, it can perform these estimations in a matter of seconds.
In the next article, we will delve deeper into solving neural ordinary differential equations (ODEs) and present a practical example.