网络文章@202411


20刀好兄弟PolarDB:论数据库该卖什么价?

总体来说,云数据库的定价依然锚定的是专家服务费用,更具体的说就是盯着传统企业级市场数据库专业服务的定价设计的。只不过针对 SMB 极小微场景会有一些优待 —— 因为凑补的超卖实例反正也没啥成本。像 Neon / Supabase 干脆就对这种场景直接免费了。

换个角度看,云上年消费过百万的公司,就应该开始仔细算帐了;过千万就该全面下云了;云上年消费上亿的公司如果还不自建,那真的就是在头上顶着一个 “人傻钱多” 的肉猪 Flag,招杀猪盘。米哈游下云,亡羊补牢为时未晚,至于这种规模还要往云上搬的小红书,那就祝他们好运了

无论是商业数据库的订阅支持,还是开源数据库专业服务,还是云数据库,都不难看出这里的核心生产要素是 “专家”,而不是 “软件” 和 ”硬件“。

从成本上看,当下的综合硬件单位成本约 60 ~ 300 ¥/ vCPU · 年,在数据库服务中的成本可以说微不足道。因为开源数据库的出现,绝大多数商业数据库产品的许可证价值直接归零;很多国产数据库也就是 PG 换皮,没有什么研发成本;所以核心成本就是专家和销售的成本。

因此在2024年的当下,靠 许可 赚钱的数据库,要么是垄断/供应商锁定的保护费,要么是认知不对称的杀猪钱。要么本质上还是挂羊头卖狗肉,把专家费摊丁入亩抹入到许可费用中。

**数据库公司,真正出售的不是软件,而是专家的服务支持**。从上面的例子不难看出,专家的服务支持价格与数据库规模绑定,国际市场公允售价为 **1 ~ 2 万人民币 / vCPU 每年**。无论从数据库服务公司采购,还是从云厂商采购,都差不多是这个范围。


软件2.0。我有时看到人们提到神经…… |作者:安德烈·卡帕蒂 |中等的 — Software 2.0. I sometimes see people refer to neural… | by Andrej Karpathy | Medium

Instead, our approach is to specify some goal on the behavior of a desirable program (e.g., “satisfy a dataset of input output pairs of examples”, or “win a game of Go”), write a rough skeleton of the code (i.e. a neural net architecture) that identifies a subset of program space to search, and use the computational resources at our disposal to search this space for a program that works. In the case of neural networks, we restrict the search to a continuous subset of the program space where the search process can be made (somewhat surprisingly) efficient with backpropagation and stochastic gradient descent.

相反,我们的方法是指定所需程序的行为目标(例如,“满足示例的输入输出对的数据集”,或“赢得围棋比赛”),编写代码的粗略框架(即神经网络架构),它识别要搜索的程序空间的子集,并使用我们可支配的计算资源在该空间中搜索有效的程序。在神经网络的情况下,我们将搜索限制在程序空间的连续子集内,其中通过反向传播和随机梯度下降可以使搜索过程变得高效(有点令人惊讶)。

It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data (or more generally, identify a desirable behavior) than to explicitly write the program. Because of this and many other benefits of Software 2.0 programs that I will go into below, we are witnessing a massive transition across the industry where of a lot of 1.0 code is being ported into 2.0 code. Software (1.0) is eating the world, and now AI (Software 2.0) is eating software.

事实证明,现实世界中的大部分问题都具有这样的特性:收集数据(或者更一般地说,识别所需的行为)比显式编写程序要容易得多。由于软件 2.0 程序的这一优势以及我将在下面介绍的许多其他优势,我们正在见证整个行业的巨大转变,其中许多 1.0 代码正在被移植到 2.0 代码中。软件(1.0)正在吞噬世界,而现在人工智能(软件2.0)正在吞噬软件。

In particular, we’ve built up a vast amount of tooling that assists humans in writing 1.0 code, such as powerful IDEs with features like syntax highlighting, debuggers, profilers, go to def, git integration, etc. In the 2.0 stack, the programming is done by accumulating, massaging and cleaning datasets. For example, when the network fails in some hard or rare cases, we do not fix those predictions by writing code, but by including more labeled examples of those cases. Who is going to develop the first Software 2.0 IDEs, which help with all of the workflows in accumulating, visualizing, cleaning, labeling, and sourcing datasets? Perhaps the IDE bubbles up images that the network suspects are mislabeled based on the per-example loss, or assists in labeling by seeding labels with predictions, or suggests useful examples to label based on the uncertainty of the network’s predictions.

特别是,我们已经构建了大量帮助人们编写 1.0 代码的工具,例如具有语法突出显示、调试器、分析器、go to def、git 集成等功能的强大 IDE。在 2.0 堆栈中,编程是通过积累、整理和清理数据集来完成的。例如,当网络在某些困难或罕见的情况下失败时,我们不会通过编写代码来修复这些预测,而是通过包含这些情况的更多标记示例来修复这些预测。谁将开发第一个 Software 2.0 IDE,以帮助完成积累、可视化、清理、标记和采购数据集等所有工作流程?也许 IDE 会根据每个示例的损失来冒泡网络怀疑被错误标记的图像,或者通过用预测播种标签来帮助标记,或者根据网络预测的不确定性建议有用的示例进行标记。


我觉得受访者里面提到分治方法和端到端算法之间的差异非常有启发性:过去的算法比较强调程序如何操控数据,程序通常来说都非常短小;但是在大模型时代,程序和数据其实是融为一体的,两者之间是没有区分边界。这样的话,那么冯诺依曼架构的机器可能并不是适合未来的的LLM和AI程序,硬件需要变化和革新。

Software 2.0. I sometimes see people refer to neural… | by Andrej Karpathy | Medium Andrej提到的Software2.0应该也是这个道理。

(149) 大模型解决不了英伟达的难题,AI新范式必将出现:专访安克创新CEO阳萌 - YouTube

Pasted-Image-20241115110822.png


A deep look into our new massive multitenant architecture

转向DST(deterministic simulation testing)来保证正确性。rust的async调度性能开销相比自己手写event loop更大一些,大约有50us左右的开销。

Pasted-Image-20241113110712.png


捕获的少于你创造的 — Capture less than you create

But it's also possible to look at this through another lens, and see a huge missed opportunity! If hundreds of billions of dollars in valuations came to be from tools that I originated, why am I not at least a pétit billionaire?! What missteps along the way must I have made to deserve life as merely a rich software entrepreneur, with so few direct, personal receipts from the work on Rails? 但也可以从另一个角度来看这个问题,并看到一个巨大的错失机会!如果数千亿美元的估值来自我发明的工具,为什么我至少不是一个小亿万富翁?我一路走来到底犯了哪些错误,才能过上仅仅一个富有的软件企业家的生活,从 Rails 工作中获得的直接、个人收入如此之少?

This line of thinking is lethal to the open source spirit. 这种思路对于开源精神来说是致命的。

The moment you go down the path of gratitude grievances, you'll see ungrateful ghosts everywhere. People who owe you something, if they succeed. A ratio that's never quite right between what you've helped create and what you've managed to capture. If you let it, it'll haunt you forever. 当你走上恩怨之路的那一刻,你就会看到忘恩负义的鬼魂无处不在。如果他们成功了,他们就欠你一些东西。你帮助创造的东西和你成功捕获的东西之间的比例从来都不太正确。如果你放任不管,它会永远困扰你。

And that's also the open source spirit: To let a billion lemons go unsqueezed. To capture vanishingly less than you create. To marvel at a vast commons of software, offered with no strings attached, to any who might wish to build. 这也是开源精神:让十亿个柠檬不被挤压。捕捉的东西远远少于你创造的东西。惊叹于为任何想要构建的人提供的大量软件,没有任何附加条件。

Thou shall not lust after thy open source's users and their success. 你不应该贪图你的开源用户和他们的成功。