Understanding Cost Models

https://justinjaffray.com/understanding-cost-models/

成本模型可能是一个非连续的区间,非连续的区间意味着需要精准的估计,否则得到的就不会是最好的计划。

When I first learned that Oracle allows users to revert the behaviour of their query planner to ~any earlier version I was horrified. Shouldn’t we, as implementors, be striving to always make things better? I really saw this as throwing in the towel: an admission that none of us really have any idea what we’re doing here. In the time since then my opinion has shifted pretty substantially. When I learned that Spanner recently adopted this methodology as well, my reaction was “yup, makes sense.” With some time away to reflect on it, were I still at Cockroach Labs I think I’d be pushing for that same change to be made in CockroachDB in the near future*.

当我第一次得知 Oracle 允许用户将其查询规划器的行为恢复到任何早期版本时,我感到震惊。作为实施者,我们难道不应该努力让事情变得更好吗?我真的认为这是认输:承认我们没有人真正知道我们在这里做什么。从那时起,我的观点发生了很大的变化。当我得知 Spanner 最近也采用了这种方法时,我的反应是“是的,有道理”。如果我还有时间思考一下,如果我还在 Cockroach Labs,我想我会在不久的将来推动 CockroachDB 做出同样的改变*。

The reason for the shift in my mindset is twofold. 我的心态转变的原因有两个。

First, I think I’m starting to fully grasp the complexity we’re wrangling here. It’s easy to argue that we could build a simpler cost model that doesn’t defy human understanding, and that’s true! But I promise you that when your simpler cost model hits customer workloads there will be a reckoning in one direction or another. We could also throw in the towel in the other direction: if this problem is so intractable, why are we building these massive jenga towers of estimations? I think that’s a reasonable position to take too, though it’s one I’ve argued against in the past. 首先,我想我开始完全理解我们在这里争论的复杂性。人们很容易认为我们可以建立一个更简单的成本模型,并且不会违背人类的理解,这是事实!但我向您保证,当您的更简单的成本模型满足客户工作负载时,将会向一个方向或另一个方向进行计算。我们也可以从另一个方向认输:如果这个问题如此棘手,我们为什么要建造这些巨大的叠叠乐塔?我认为这也是一个合理的立场,尽管我过去曾反对过这一立场。

Second is the assymmetry present in the planner’s relationship with customers. None of your customers are going to call you up and offer you more money because the new version of your software ran their query 15% faster, but they will complain if your change made their query even a hair slower. This creates a tendency to fear change: if a change to the planner causes 99% of affected queries to run incrementally faster and 1% to incrementally slow down, that’s a bad change, and if I’m the one bashing on the keyboard to bring it about, the absolute best outcome is that nobody complains. But this kind of incremental change, over time, is how you make a planner robust, and how you slowly dig it out of local maxima. Basically, you have to be able to make Pareto-inefficient changes, and permitting your customers to insulate themselves from those changes is how you make them safely.

其次是规划者与客户关系中存在的不对称性。您的客户不会打电话给您并为您提供更多的钱,因为您的软件的新版本运行他们的查询速度快了 15%,但如果您的更改使他们的查询速度变慢了一点点,他们就会抱怨。这就产生了一种害怕改变的倾向:如果对规划器的改变导致 99% 受影响的查询运行速度逐渐加快,而 1% 的查询运行速度逐渐减慢,那么这就是一个糟糕的改变,如果我是那个敲击键盘带来的改变对此,最好的结果绝对是没有人抱怨。但随着时间的推移,这种增量变化是让规划器变得稳健的方法,也是你如何慢慢地将其从局部最大值中挖掘出来的方法。基本上,您必须能够进行帕累托低效的更改,并且让您的客户将自己与这些更改隔离开来,这就是您安全地进行更改的方法。

Where does this leave us? I think unfortunately more-or-less at the same place we started, but hopefully a little wiser: a cost model is effectively a set of beliefs your software holds about the real-world. Sometimes they will be wrong. Or right! Or wrong in one case, and right in another. Or right today, and wrong tomorrow.

这给我们留下了什么?我认为不幸的是或多或少与我们开始的地方相同,但希望更明智一点:成本模型实际上是你的软件对现实世界所持有的一组信念。有时他们会错。或者说对了!或者在一种情况下是错误的,在另一种情况下是正确的。或者今天对,明天错。