Machine learning: All about distributions

Forewords

I am recently starting the “Introduction to Machine Learning” course at college. Therefore, I decide to start a new series on the topic of machine learning, in which I will note down important take aways from ML.

All about distributions

The essence of machine learning lies in statistics and optimization (an argument borrowed from my professor). The datasets that we look at have some inherent patterns or distributions, while we create probabilistic models to fit these inherent patterns. We then use optimization tools to actually do the fitting.

I know this summary can be abstract, so next let’s look at some examples together.

The examples

We first consider the classic example of linear regression:

Another example worth looking at is classification, which will be introduced in more details in the following logs.

For now, the most important takeaway is: Regression and classification are two of the most important and fundamental applications of machine learning.

Appendix: KL-Divergence and Cross-entropy

Remarks: It’s worth noting that KL-divergence and cross-entropy both characterize the difference between two distributions. This may sound familiar to you.

Spoiler: Cross-entropy can be used as an objective of optimization too! (Or you may also call it a loss function)

Reference & Extended Readings

  1. STATS 302@DKU
  2. “交叉熵”如何做损失函数?打包理解“信息量”、“比特”、“熵”、“KL散度”、“交叉熵”_哔哩哔哩_bilibili
  3. “损失函数”是如何设计出来的?直观理解“最小二乘法”和“极大似然估计法”_哔哩哔哩_bilibili
  4. 《动手学深度学习》 — 动手学深度学习 2.0.0 documentation

评论

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注