Andre Ye in Analytics Vidhya:
The Universal Approximation Theorem is, very literally, the theoretical foundation of why neural networks work. Put simply, it states that a neural network with one hidden layer containing a sufficient but finite number of neurons can approximate any continuous function to a reasonable accuracy, under certain conditions for activation functions (namely, that they must be sigmoid-like).
Formulated in 1989 by George Cybenko only for sigmoid activations and proven by Kurt Hornik in 1991 to apply to all activation functions (the architecture of the neural network, not the choice of function, was the driver behind performance), its discovery was a significant driver in spurring the excited development of neural networks into the plethora of applications in which they are used today.
Most importantly, however, the theorem is an eye-opening explanation of why neural networks appear to behave so intelligently. Understanding it is a key step in developing a strong and deep understanding of neural networks.