paint-brush
Physics-Informed with Power-Enhanced Residual Network: Numerical Resultsby@interpolation
164 reads

Physics-Informed with Power-Enhanced Residual Network: Numerical Results

by The Interpolation PublicationFebruary 28th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Discover the power of Power-Enhancing Residual Networks for superior interpolation in 2D/3D domains with physics-informed solutions, also available on Arxiv.
featured image - Physics-Informed with Power-Enhanced Residual Network: Numerical Results
The Interpolation Publication HackerNoon profile picture

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Amir Noorizadegan, Department of Civil Engineering, National Taiwan University;

(2) D.L. Young, Core Tech System Co. Ltd, Moldex3D, Department of Civil Engineering, National Taiwan University & [email protected];

(3) Y.C. Hon, Department of Mathematics, City University of Hong Kong;

(4) C.S. Chen, Department of Civil Engineering, National Taiwan University & [email protected].

Abstract & Introduction

Neural Networks

PINN for Solving Inverse Burgers’ Equation

Residual Network

Numerical Results

Results, Acknowledgments & References

5 Numerical Results


1. Mean Square Error: The training errors shown in the plotted graphs, relative to the iteration number, are computed using the mean square error criterion.



2. Relative L2 Norm Error: The validation errors, calculated over the test data and presented in the plotted graphs concerning the iteration number, are measured using the relative L2 norm error metric.



3. Maximum Absolute Error: When visualizing errors across the entire domain, whether in 2D or 3D scenarios, the error represented on the contour error plot is referred to as the maximum absolute error. It is important to note that the contour bars are scaled according to the largest error in the plot.


Maximum Absolute Error = max |u − N|


These error metrics provide valuable insights into the accuracy and convergence of the methods used in this study. In this section four methods will be investigated.


1. Plain NN: A conventional neural network without any additional modifications or residual connections (see Fig. 2(a)).


2. ResNet: A residual neural network architecture where the output of each layer is obtained by adding the residual to the layer’s output (see Fig. 2(b)).


3. SkipResNet: An extension of ResNet, where the residual connection is applied every other layer, alternating between including and excluding the residual connection (see Fig. 2(c) where p = 1).


4. SQR-SkipResNet: An innovative variation of the ResNet architecture, where the squared residual is added every other layer. In this approach, the output of each alternate layer is obtained by squaring the previous layer’s output and adding the squared residual to it (see Fig. 2(c)-(d) where p = 2).



The numerical experiments were executed on a computer equipped with an Intel(R) Core(TM) i9-9900 CPU operating at 3.10GHz with a total of 64.0 GB of RAM.


Example 1 For the first example, three test functions are investigated and depicted in Fig. 3. The top panel of Fig. 3 displays the 3D surface plot of the test functions, while the bottom panel presents the corresponding contour plots. F1 is a smooth function, originally introduced by Franke [9], which has been extensively used for studying radial basis function (RBF) interpolation. On the other hand, F2 and F3 are non-smooth functions [10].



Our observations from these plots are as follows:


1. Plot (a) indicates that the ResNet is not accurate enough compared to the other three networks, both during training and validation. This pattern has been consistently observed in various examples, and we will no longer investigate the ResNet performance.


Figure 3: The profile of the F1


Figure 4: The profiles of training on F1 for different number of collocation points n.Dotted-line curves denote training error, and solid-line curves demote validation error.


Figure 5: Example 1: The profiles of (a) training and validation results on F1 with 5000 data points. Dotted-line curves denote training error, and solid-line curves denote validation error. The corresponding contour error plots for (b) the plain NN and (c) SQR-SkipResNet.


2. As indicated by the plot, it can be observed that the Plain NN necessitates approximately 2400 iterations for convergence, whereas the proposed SQR-SkipResNet achieves convergence in a significantly reduced 1400 iterations. Additionally, the latter method exhibits higher accuracy compared SQR-SkipResNet.


3. Plot (a) also shows that SkipResNet performs somewhat between Plain NN and SQR-SkipResNet. This behavior has been observed in different examples conducted by the authors, but we do not plan to further investigate to the former.


4. Contour error plots for both Plain NN and SQR-SkipResNet are presented in plots (b) and (c) respectively. These plots highlight that the maximum absolute error achieved with SQR-SkipResNet exhibits a remarkable improvement of approximately 60% compared to Plain NN.


Therefore, a higher accuracy and better convergence are observed when using SQRSkipResNet compared to other algorithms.



More investigations on the performance of the SQR-SkipResNet has been done by interpolating the non-smooth functions F2 and F3. Figure 6 presents the interpolation results for F2 on the top panel and F3 on the bottom panel with n = 1000. The corresponding training and validation error with respect to the epoch are shown in the


Figure 6: Example 1: The profile of the F2, and F3.


Table 1: Example 1: Maximum absolute errors and CPU time (t) for F2-F3.


The second and third columns show the interpolated surface using Plain NN and SQR-SkipResNet, respectively. Clearly, a better surface interpolation has been carried using the proposed method. More details are listed in Table 1. This table shows that the accuracy using the SQR-SkipResNet is slightly better than Plain NN, however it is worth nothing that these functions are non-smooth and a slightly changes in error would affect the quality of interpolation tremendously as shown in Fig. 6. However to reach a better accuracy, the SQR-SkipResnet requires larger number of iterations and consequently the higher CPU time. This can be seen as the trade-off that SQRSkipResNet makes for interpolating non-smooth functions to obtain better accuracy, in contrast to the smaller CPU time it requires for interpolating smooth functions.


Example 2 In this example, we demonstrate the performance of the proposed method in a real case study. Specifically, we interpolate the Mt. Eden or Maungawhau volcano in Auckland, NZ, as depicted in Fig. 7(a) [11]. The available data consists of 5307 elevation points uniformly distributed in a mesh grid area of size 10 by 10 meters. Plot (b) shows the 3D surface, and plot (c) presents the contour plot of the volcano.



Figure 7: Example 2: (a) An image showcasing the Mt. Eden or Maungawhau volcano located in Auckland, New Zealand [11]. (b) A 3D surface representation generated from a dataset containing n = 5307 data points. (c) A contour plot providing insights into the topography of Mt. Eden.






Additionally, we provide more details on the interpolated surface and accuracy in Fig. 8. The second row shows the interpolated surface using Plain NN, while the third row shows the results obtained with SQR-SkipResNet. Specifically, plots 8(b) and 8(e) depict the interpolated surfaces for Plain NN and SQR-SkipResNet, respectively. Similarly, plots 8(c) and 8(f) display the contour plots for both methods. Finally, plots 8(d) and 8(g) represent the contour error plots, measured by the maximum absolute error, for Plain NN and SQR-SkipResNet, respectively.


Clearly, the results using SQR-SkipResNet significantly outperform those from Plain NN. The accuracy of Plain NN, specifically in terms of the maximum absolute error, improves significantly (500%) when using the SQR-SkipResNet architecture. This underscores the superiority of SQR-SkipResNet in achieving more accurate and reliable interpolation results.


In our second experiment, we repeat the the previous example but this time we use Adam optimizer with the learning rate of 1.0E-3, and 10k iteration. The organization of plots are as the previous example. Plot 9(a) shows that SQR-SkipResNet works much more accurate from the beginning of the iterations with much less fluctuation compare with Plain NN (compare plots 9 (b) and 9(e), respectively, and its corresponding contour plots in 9(c) and 9(f)). We also see that the interpolated surface when using the SQRSkipResNet (plot 9(g)) can be completely better than Plain NN (plot 9(d)). The accuracy with respect to the maximum absolute error for the latter one is about 462% better than the Plain NN. A comparisopn between these two optimizers, L-BFGS-B (Fig. 8 and


Figure 10: Example 3: The Stanford Bunny.


Fig. 9) shows a better performance using Adam for both Plain NN and proposed SQRSkipResNet.




Finally we see that in all cases, SQR-SkipResNet led to better accuracy compare to Plain NN.


Example 3 In the concluding example regarding the interpolation problems, we analyze the effectiveness of the proposed neural network in a 3D example, specifically using the Stanford bunny model, as depicted in Fig. 10(a). The entire bunny model has been scaled by a factor of 10. A distribution of points over the bunny’s surface is illustrated in Fig. 10(b), comprising a total of 8171 data points. The validation error is performed using the following test function (refer to [12], F4):




Figure 11: Example 3: Error profile comparison for the Stanford Bunny model using LBFGS-B (top panel) and Adam optimizers (bottom panel). Training errors are indicated by the dotted line, and validation errors are represented by the solid line.


In Fig. 11, the training process (dotted line) is depicted with 500 data values, while the remaining 7671 points are reserved for validation error assessment (solid line). The top panel showcases results obtained using the L-BFGS-B optimizer, while the bottom panel displays outcomes achieved through the Adam optimizer. As demonstrated in Fig. 11(a), the SQR-SkipResNet surpasses the Plain NN in terms of accuracy and convergence rate across both the training and test datasets. The recorded CPU times amount to 35 seconds for Plain NN and 15 seconds for SQR-SkipResNet. Plots (b) and (c) offer insight into the maximum absolute error, highlighting an accuracy improvement of approximately 70% when implementing the proposed network architecture.



Moreover, the lower panel of the figure reveals that the efficacy of the SQR-SkipResNet method persists even when utilizing the Adam optimizer. Plot (a) illustrates a more rapid convergence rate for the proposed method when evaluated against test data. The plots (b) and (c) portraying the maximum absolute error clearly exhibit significantly improved accuracy achieved through the proposed approach. This consistent superiority serves to highlight the distinct advantages of the SQR-SkipResNet approach over its alternatives. In comparing the L-BFGS-B and Adam optimizers, it becomes evident that the former displays enhanced performance in both accuracy and CPU time, accomplishing the desired accuracy level more efficiently.


One might wonder about the advantages of employing deep neural networks and their computational implications. To illustrate this aspect, we emphasize the significance of network depth in neural networks, as shown in Fig. 12, specifically focus focusing on F4 with


Figure 12: Example 3: Profiles of the validation errors for interpolating the the Stanford bunny for different number of layers using (a) Plain NN and (b) SQR-SkipResNet.


n = 500 data points and employing the L-BFGS-B optimizer. The results presented here encompass scenarios with 5, 10, and 20 hidden layers, each consisting of 50 neurons.


Conversely, in the case of SQR-SkipResNet, a deeper network correlates with improved convergence rate and enhanced accuracy. This suggests that deeper hidden layers can identify features when embedded within an appropriate neural network architecture for this particular example. This stands in contrast to our findings in the second example (Table 2), which highlighted the problem-dependent nature of selecting an optimal number of layers. In this context, the recorded CPU times for models with 5 and 20 hidden layers amount to 19 and 16 seconds, respectively. This observation suggests that deeper networks may not necessarily result in longer CPU times; rather, they can potentially expedite training due to improved convergence rates, as evident in this case.



Figure 13: Example 4: Profiles of training (dotted line) and validation error (solid line) for different number of layers.



Comparing the two plots, we observe that the accuracy difference between Plain NN and SQR-SkipResNet becomes more pronounced as the network size increases. This emphasizes the crucial role of architecture selection in achieving stable results.