Many young professionals master machine learning using separate frameworks, relying on certain mathematical models. It turns out that all of these models have significant drawbacks. But the point is not simply to state them. This information allows us to draw several important conclusions.
Often people choose one tool for learning machine learning, which eventually becomes their “home framework”. (You’ve probably met people who use only kernels or Bayes’s theorem, or Probably approximately correct (PAC) learning). It is important not to allow such a situation in your practice with machine learning agency.
Be open to new analytical forms. Usually we tend to overlook the flaws of the model we are most familiar with and exaggerate the flaws of others. The best way to avoid this is through education.
The theory itself has no value without practice.
Below is a summary of the difficulties (as well as the benefits) of various machine learning models.
Bayesian Learning
Methodology. You define the former possibility order for the samples, then P (sample) uses Bayes’ theorem to find the posterior likelihood P (sample | x). True Bayesian proponents integrate posterior to produce predictions, while many directly use the set with the largest posterior.
Positive sides:
- The ability to work with small amounts of data.
- High flexibility.
- Applicable in engineering.
Negative sides:
- Information is problematic in theory.
- It is often difficult to explicitly set an adequate prior value.
- Problems with high computational complexity are often encountered.
- Requires significant human involvement. Partly because of the difficulties noted above, and partly because the concept of “first have to set a priori” is built into the framework, this method assumes limited automation.
Graphical / Generative Models
Methodology. Sometimes a Bayesian method is applied, sometimes not. Samples are normally assumed to contain independent and identically distributed (IID) data of fixed or variable length. The samples are presented graphically using conditionally independent values encoded in the graph. For some graphs, there are fast prediction algorithms.
Positive sides: compared to purely Bayesian systems, such a way sometimes has less computational complexity. And more importantly, the graph language is natural, which helps to obtain an a priori value.
Negative sides:
- Often (still) does not solve Bayesian difficulties.
- For real-world purposes, true subject independence is rare, so the results quickly deteriorate due to bias in the interpretation of conditional independence.
Convex Loss Optimization
Methodology. Specify a loss function associated with a standard loss function that is convex in some parametric forecasting system. Optimize the parametric forecasting system to find the global optimum.
Positive sides:
- Mathematically clean solutions in which acceptable computational complexity is partly taken into account.
- A relatively automated approach.
Negative sides: it is sometimes too tempting to forget that nonconvex suppression purposes are often used, and that inconsistency is always dangerous.
Limited forms. Although going to a convex loss function means that some optimization becomes convex, optimization on representations that are not single-layer linear combinations is usually difficult.