Software Delivery Best Practices
Many highly-effective software teams share a core set of best practices, as described in sources like The DevOps Handbook,The Phoenix Project, The Unicorn Project, and the Google Site Reliability Handbook. This page summarizes a set of core practices that will help teams become and stay highly-effective.
Software Engineering Best Practices
Practice Zero: Retrospect regularly, continuously improve, and everything else will follow.
Make work visible with an Agile or Kanban board, physical or virtual.
Implement a checklist or decision tree to indicate how work moves across the agile board and out to customers.
Prioritize creating high-quality environments on a self-service basis for customers, developers, testers, and marketers.
Ideally, automate cloning a given customer environment as a test/development environment, with obfuscated data. ✍️
With development, testing, and marketing environments, include a realistic sample data set.
Keep the data set in a source control repository, like any other asset.
Continuously extend the sample data set as new features are deployed. ✍️
Ideally, include new sample data as part of user stories or other business requirements. ✍️
Leverage source control like your life depends on it.
- In each source control repository, include a LICENSE file (or equivalent) itemizing any external dependencies.
Select a work planning tool that integrates with source control (such as Jira, LeanKit, or Visual Studio Team Services) to manage the five primary activities of your software delivery lifecycle (requirements, development, documentation, deployment, and support).
Use your planning tool's source-control integration to seamlessly link every change to the default branch ("trunk") with a work item.
- Ideally, configure source control to reject any commit that isn't linked to the planning tool.
Incorporate the planning tool and agile board workflow into your change management and authorization process.
- When integrated with source control, a work planning tool can provide context, traceability, and an audit trail for all changes.
Implement feature toggles to separate deploying a new capability from releasing a new capability to customers.
Verify code works by delivering automated unit tests with the user-facing software.
Set a minimum level of unit test coverage, such as 75%.
Establish a development style guide with expected conventions and common coding patterns, including security coding patterns.
- Enforce the style guide by applying static analysis to new pull requests being merged into the default branch – using a tool like Checkmarx or Codacy – including both quality and security rule sets.
Peer review and peer test each work item that is committing a change to a source code repository as part of the agile board workflow.
- Ideally, use two code review checkpoints, reserving the second for a subject matter expert or senior developer.
Merge pull requests promptly or decline them.
For any product with fifty features or more, subdivide the code base into highly coherent, loosely coupled modules. ✍️
- Align test suites with the modules that exercise the included features. ✍️
- Ideally, design the modules so that they can be independently deployed.
- (Often, micro-services are used to implement the modules.)
Release Engineering Best Practices
Assign a unique fix version to each new release.
In source control, tag the commits included in a release with the fix version.
- If everything on the default branch is not always released at once, implement a release branch to represent what has been deployed to date.
Use a technical change log to itemize the development work items resolved with each fix version, cross referencing the commits to source control.
Automate the version build process.
Orchestrate builds with a continuous integration server like Jenkins.
- Design builds to run scripts that developers can also run from the command-line without the server, and place the CI scripts under version control.
- Standup a runbook (or runpage) for each build server and development tool with essential maintenance details.
- Maintain the build server and other tools with software updates and security patches.
Automatically build and unit test a development version on every merge to the default branch of the source control repository.
Fail any version or reopen any task with one or more failing unit tests. #ZeroTolerance
Resolve all build failures immediately.
Validate each issue included in a version as it will be distributed to production ("test like a customer").
Version external APIs so that customers can transition dependent services. ✍️
For any product with an external API, create a set of regression unit tests that can be run from an external environment, to exercise the API.
- Ideally, align API docs with the API tests so that developers can learn through example. ✍️
- Maintain this set of regression tests in a separate repository so that updates can be easily monitored for breaking changes.
- Continually extend the regression test suite whenever a new API access point is created.
Continuously update customer documentation and other aspects of the distribution so that your product remains release-ready at all times.
Under Release on Demand, conduct a routine Go/NoGo checkpoint to confirm with internal change-management stakeholders that new major versions are ready for release to customers.
Stage upgrades to customer environments so that a predefined subset (10-30%) can be updated first.
Some Site Reliability Best Practices
Everything degrades. If you are not getting better, then you are getting worse.
Establish a status site to provide transparency around service availability and performance, showing current availability, availability incidents, and maintenance events. ✍️
- For site reliability, examples include Atlassian Status, Azure Status, and Salesforce Trust.
- For product reliability, see Salesforce Known Issues.
Prioritize and resolve defect reports promptly or close as won't do.
After a critical incident, conduct a blameless postmortem to identify the contributing causes and to indicate where and how services can be improved.
"Regardless of what we discover, we understand and truly believe that everyone did the best job they could, given what they knew at the time, their skills and abilities, the resources available, and the situation at hand." – Norm Kerth, the Agile Prime Directive.
Analyze performance metrics, defect reports, and critical errors to identify trends and support upgrade immunity. ✍️
Log critical errors that occur in customer environments, especially uncaught exceptions.
Log performance metrics and non-critical errors from customer environments.
- Aggregate logs for visualization and audit control using a telemetry tool, such as Splunk or Kibana. ✍️
For added-value features that rely on external APIs, or that can impede performance under load, use a runtime feature toggle (circuit breaker) to disable the capability on demand. ✍️
- Be sure that the UI degrades gracefully.
Conduct penetration, performance, and load-balancing tests in production at peak times to verify and improve runtime resilience.
More to come ...
Some Product Management Best Practices
Build developer innovation tasks into the schedule by using hackathons, innovation sprints, or idea days. You'll thank me later.
Use Value Stream Mapping and a Value Chain to reduce waste and improve efficiency. ✍️
- Construct a Value Chain by defining maps for the six primary activities of marketing, requirements, development, deployment, documentation, and support.
Create customer personas so that everyone can better understand and relate to the people for whom you’re building products.
Based on the needs of the customer personas, build a set of user stories, framed inside of your business outcomes.
Provide a general Definition of Ready and a Definition of Done for your product's user stories.
Include specific and testable Acceptance Criteria for each user story that can be used to confirm a story is complete and working as intended.
Monitor Agile metrics such as Depth of Ready Backlog, Throughput, Cycle Time, Feature Adoption, Net Defects Created vs Resolved, Defect Removal Efficiency.
- Define expected metric ranges, review regularly, and remediate negative trends.
- Track feature adoption by observing feature toggle settings and other telemetry.
More to come ...
Please submit feedback to the DreamOps Success Group http://dreamops.org/group.