Software Engineers because only a Systems Engineer on the infrastructure team was allowed to access it directly. The Systems Engineers did not sufficiently understand the production code being deployed to these servers because that was the responsibility of the Software Engineer. This often led to hours spent debugging environment issues and constantly having to work towards keeping a consistent state across the testing, staging, and production environments. Software Engineers would develop locally but could not test their code until they promoted it to the test environment, which would lead to situations where code could not be deployed because it was waiting for a test server to become available. Because the test and staging environments were managed separately, there were often differences between the environments, causing additional issues in testing changes before releasing to production.
目前的基础设施依赖性造成了瓶颈并增加了周期时间。所有的Scrum团队都依赖于一个外部的基础设施团队来管理内部的服务器。这些服务器对软件工程师来说是一个黑匣子,因为只有基础设施团队的系统工程师被允许直接访问它。系统工程师并不充分了解部署在这些服务器上的生产代码,因为那是软件工程师的责任。这往往会导致花费数小时来调试环境问题,并不断努力保持测试、预发环境和生产环境的状态一致。软件工程师会在本地开发,但在提交到测试环境之前无法测试他们的代码,这将导致代码无法部署,因为它在等待可用的测试服务器。因为测试环境和预发环境是分开管理的,环境之间经常有差异,导致在发布到生产之前,当测试变化时会引起额外的问题。
To eliminate these dependencies, infrastructure was containerized, and a Docker environment was implemented, including automated migration and setup/termination as needed using terraform scripting. Additionally, a blue/green deployment model was implemented to test before moving traffic. This allowed each Scrum Team to work independently, free from being dependent on the external infrastructure team.
为了消除这些依赖性,基础设施被容器化,并实施了Docker环境,包括自动迁移、设置和终止,根据需要使用terraform 脚本。此外,采用蓝/绿部署模型,在用户使用系统产生流量之前进行测试。这使得每个Scrum团队可以独立工作,不需要依赖外部的基础设施团队。
7.10 OKRs OKR
Objectives and Key Results [36] were established for all Release Trains. The specific relevant Key Results were:
New-New Code Coverage
A target coverage rate for automated testing of all new lines of code for either legacy or new applications was established.
Overall Code Coverage
A target coverage rate for automated testing overall (new and legacy code) was established.
Mean Time To Recovery (MTTR)
The failure of any individual gate within the CI pipeline must be recovered within 24 hours.
所有的发布火车都建立了OKR[36]。具体的相关关键结果(KR)是:
新代码覆盖率
无论是遗留的的还是新的应用程序,对所有新的代码行确立了自动化测试的目标覆盖率。
总体代码覆盖率
对全部新代码和遗留代码,确立了自动化测试的总体覆盖率目标。
平均恢复时间(MTTR)
在CI流水线中的任何单个故障必须在24小时内恢复。
8 Results 结论
As one of the largest Scaled Agile implementations, Rocket Mortgage was more successful than most large implementations, reducing cycle time by 46.5%. The goal of implementing Scaled Scrum on top of Scaled Agile was further reduction in cycle time, higher quality, and improved customer and team satisfaction.
作为最大的SAFe实施案例之