数据库公司质量管理问题
是否会接受performance regression? 如果有某些优化在大量场景有positive, 但是某些case有regression如何处理和对待呢? 如何处理因为外部系统造成的performance gression, 比如一段时间s3 bucket latency有明显增加或者是不稳定。
当我们设计一个系统,需要对接许多外部系统的时候,理论上来说我们是需要编写测试用来进行覆盖。但是我们在现实过程中会出现测试覆盖不完全的的问题,只是简单地测试来其中某些功能可以工作,但是实际在用户使用时发现各种问题。
这里的原因可能是
- RD对于这个外部系统理解并不深入(比如S3 prefix IOPS有5500的上限)
- RD因为需要开发更多的功能而导致没有时间做更加详细的测试。
你是否遇到过这种情况?我想听听你对于这个问题的看法。你是否遇到过项目因为需要更多的feature/optimization, 而导致有意或者是无意地牺牲了质量。
Will performance regression be accepted on level ? If some optimizations are positive in a large number of scenarios, but some cases have regression, how to handle and treat them? Will you reject this optimization?
How to deal with performance regression caused by external systems? such as, during performance regression, a significant increase in s3 bucket latency or instability over time. Do you have any way to exclude this kind of side effect? Or to say, we have to deal with this latency instability in our code?
When we design a system that needs to interface with many external systems, theoretically we need to write thorough tests.
However, in reality, we don't do that thorough tests. We just write some simple test cases to make sure it works, then ship it out. But when customers use it, there are many unexpected problems.
And I try to summarize the reasons are:
- RD does not have a deep understanding of the external system (e.g. S3 prefix IOPS has an upper limit of 5500)
- RD has no time to do more detailed testing because he needs to develop more features and optimizations.
Have you ever encountered this situation? I would like to hear your thoughts on this issue.
Have you ever encountered projects where quality was sacrificed, intentionally or unintentionally, for more features/optimizations.
几个开源数据库的CI/CD pipeline
- CR(commit, pipeline)
- Commit pipeline (需要跑哪些过程,UT,大约多长时间。准入标准是什么?)
- 测试Case由谁来负责添加? 如果没有的话,QA主要工作是什么?
- QA是否负责提供infra / resource provision for testing? QA提供了哪些常见服务?
- 测试用例中是否会对接到其他外部系统?如果有的话,那么怎么进行这些系统的管理
- 是否需要保存测试数据?还是可以现场进行构造?测试数据是否会长期保存?
- 测试是使用什么语言编写的?RD是否可以往里面添加。在添加test case的时候是否有什么需要注意的地方吗?规范性,一致性, 运行时间。
- 每天是否跑完全的功能测试/性能测试,大约跑多久时间?
- 如果出现性能问题,但是涉及到外部系统的话,怎么排除这些外部系统的影响?
- 有什么办法可以进行主动测试?
开源数据库CI/CD Pipeline的探讨问题
- 提交触发的CI流程:
- 什么过程会在每次提交后自动执行?
- 这些过程通常需要多长时间?
- 代码合并到主分支的准入标准是什么?
- 测试用例责任和管理:
- 谁负责添加和维护测试用例?
- QA的主要职责包括哪些方面?
- 如何确保测试覆盖全面并且与需求相匹配?
- QA的基础设施职责:
- QA是否负责提供和维护测试所需的基础设施和资源?
- QA提供了哪些常见的测试服务?
- 涉及外部系统的测试:
- 测试用例是否需要与外部系统接口?
- 如果需要,如何管理这些外部系统以确保测试的准确性和可靠性?
- 测试数据的管理:
- 测试数据是现场构造还是预先准备?
- 测试数据需要长期保存吗?如果是,如何处理和存储这些数据?
- 测试编写和标准:
- 测试通常使用什么语言编写?
- 在添加测试用例时,有哪些规范性和一致性需要注意?
- 测试的执行时间有何要求和限制?
- 完整功能和性能测试:
- 是否每天进行完整的功能测试或性能测试?
- 这些测试通常需要多长时间?
- 外部系统影响下的性能问题:
- 如果性能问题涉及到外部系统,有哪些方法可以排除这些影响?
- 主动测试方法:
- 有哪些方法可以进行主动测试以预见和解决潜在问题?
Open Source Database CI/CD Pipeline Discussion Questions
- CI Process Triggered by Commit:
- What processes are automatically executed after each commit? (UT, Performance Test? Integration Test?)
- How long do these processes typically take?
- What are the criteria for code to be merged into the main branch? (Coverage? Static linter? )
- Test Case Responsibility and Management:
- Who is responsible for adding and maintaining test cases? (RD or QA? )
- What are the primary responsibilities of the QA team? (To construct test cases? Or to triage test failures or performance regression? )
- How can we ensure that the test coverage is comprehensive and matches the requirements? (Is there any hard requirements about code coverage? Static analysis tools or something? )
- QA Infrastructure Responsibilities:
- Is the QA team responsible for providing and maintaining the infrastructure and resources needed for testing? (I know sometimes RD need a cluster to reproduce a bug to fix it, or to do performance test before commit)
- What common testing services does the QA provide? (Is there any way to do proactive testing? )
- Testing Involving External Systems:
- Do the test cases need to interface with external systems? (Take starrocks for example, we have to test integration with aws glue, s3, or other services)
- If so, how are these external systems managed to ensure the accuracy and reliability of the tests? (For performance regression example, how can you assure performance is not affected by external systems)
- Management of Test Data:
- Is the test data dynamically constructed on the spot or prepared in advance?
- Is there a need to save test data long-term? If so, how is this data handled and stored? (Do you have any backup about test cases and data?)
- Test Writing and Standards:
- What language are the tests typically written in? Or is there any testing framework? (Lower barrier for RD to add test cases)
- What considerations regarding consistency and standards should be noted when adding test cases? (Like is there any coding style, or requirements of test cases, to make test cases more coherent and easier to read/maintain)
- What are the requirements and limitations regarding the execution time of tests?