Mathematical foundations of machine learning

If you’re simply downloading and running a model, and doing some basic optimization, mathematics isn’t essential; you can think of the model as a black box. However, to understand why large models are effective and how to define a function in a high-dimensional space, you need to delve into relevant mathematical tools. Machine learning appears to prioritize algorithms, but its essence is data modeling, and the act of building models is inseparable from mathematical tools. I recommend a mathematics book specifically tailored for machine learning, Mathematics for Machine Learning. It covers almost all the mathematical concepts used in machine learning, such as linear algebra, multivariate calculus, probability theory, and function approximation, and uses these concepts to derive four core machine learning methods: linear regression, principal component analysis, Gaussian mixture models, and support vector machines. It’s possibly one of the best books for understanding machine learning from a mathematical perspective.
A large model can be viewed as a super-large function with a massive number of parameters, but after reading this book, you might answer that a large model is a subspace within a larger space. This book teaches you how to see the geometric structure through linear algebra and that gradient descent is not merely a mathematical formula, but rather an optimization direction in space; the model training process is a continuous optimization process. So why do some models overfit? This requires knowledge of probability theory to answer. The reasons are varied, but generally, the main reason is either an overly large model or an insufficient sample size. In this case, the machine learning process, superficially appearing as curve fitting, is essentially a process of probabilistic inference based on unknown data.
The author of this book, Marc Peter Deisenroth, is a professor of computer science at University College London and has participated in the development of DeepMind and Google Brain. His main purpose in writing this book is to provide readers with a deeper understanding of the mathematical concepts involved in machine learning.