In our previous two blog posts, we saw that to increase your competitive advantage, you need to improve engineering productivity, and that for engineering to be productive, companies need to embrace DevOps culture, tools, and technologies. DevOps supports a high-velocity, high-performance engineering culture that is focused on achieving results where the score is always kept – for the customer and for the business.
So what does implementing DevOps best practices throughout the application lifecycle look like in a mature DevOps organization? The steps are similar to any application lifecycle, but because the endgame is maximum velocity via continuous deployment, the actions within each step are different from traditional software development.
To illustrate it, let’s assume the following scenario: Your CIO has just given you the mandate to take an application to the cloud. You will need a number of new environments for it; at the very least for development, continuous integration (CI), and production. What does this process look like in a DevOps organization? Below is an overview of how we handle it at Pythian.
This is fairly obvious, but the key to collecting requirements is to identify the right people, involve them early, clearly understand their requirements and goals as well as the dependencies between stakeholders.
Having a clear understanding of the ‘big picture’ and your business stakeholders expectations is critical to aligning efforts throughout the lifecycle of the project. For example, even though we know we need to build in platform elasticity from day one, we still need to understand the intended audience, expected starting scale, and many other details that need to be reflected in the outcome of our work. The business will be able to paint a complete picture that will set the stage for us to execute on. This is where we start our journey.
Compliance, Auditability, and Security
The mandate from our Security Office requires that our environment must provide a granular trail for internal, external, and statutory audits. Specifically, we need to track all changes to the environment, software deployment details, and user access to all layers of the stack. We must also ensure that all sensitive data is encrypted in transit and while in storage. We need to be ready to demonstrate that access control and risk management procedures are in place throughout the whole software delivery lifecycle, and that we have the means to identify and mitigate common threats and vulnerabilities.
As for infrastructure, at minimum, it must be programmable (ideally with a rich API), composable, highly available, elastic, scalable, secure, auditable, and testable. We also need to be able to version our infrastructure as it evolves over time. We must be able to programmatically describe the entire infrastructure in very high detail and then execute the resulting code to provision the entire environment, in a highly available fashion – in multiple geographical regions, with one click or command. Upon successful execution of that code, the infrastructure must be brought to life configured, scaled, and ready for workload deployment. We also need the whole process to be repeatable, so that we can spin up new environments rapidly, with predictable results, in a matter of minutes.
With infrastructure codified and provisioned, we will need to ensure we have good operational visibility into the entire stack, so that we can be well informed of the platform’s health and assume a proactive stance towards maintenance. We also want it to auto-scale in line with demand. In case of failure, we want to be alerted to the failure, but we want our nodes to auto-heal by automatically provisioning a replacement for the failed node. Ideally, all of this should happen with as little human interaction as possible.
CTO and CIO Mandate
While we are intimately familiar with certain base requirements mandated by our technology leaders, such as security, compliance, and auditability features, availability or elasticity, we still need to involve our CIO and CTO to ensure we can satisfy their requirements for the project. This may be as simple as rehashing our internal best practices or it may involve discussing and agreeing on more minute details, such as Disaster Recovery requirements, the associated recovery time objectives (RTO) and recovery point objectives (RPO), or high availability requirements.
Once we have requirements from business, our CTO and CIO, we involve our development team, so that we clearly understand their technology and infrastructure needs. Once the technology requirements are clear, together we shape a rough picture of the platform we will be building and running. We will be working closely with the development team throughout all phases of the project to ensure ongoing tight alignment between their requirements and the supporting infrastructure we are building. The development team will contribute to the infrastructure codebase at various stages of the project. Once we finish working on the detailed picture of the infrastructure, we can discuss the costs with our CFO.
Our CFO is acutely aware that running our own servers at scale requires a significant upfront expenditure and ongoing operational expense, regardless of actual computing resource utilization. She likes the fact that workloads running in a properly automated cloud change the financial paradigm removing CapEx and leaving OpEx only and, that operational costs become directly proportional to the resources required to meet that demand. We still need to present our estimated financial requirements to our CFO and seek approval however. The CFO communicates to us the budget and together we figure out how to complete our work under the constraints it imposes. If we have to make changes to the infrastructure plan to fit the budget, we work with the development team and iterate on revisions until we arrive at a solution that satisfies all stakeholders’ requirements.
Choosing a Platform
At Pythian, our DevOps team’s key strength is the culture of agility, ownership, constant experimentation, learning, and drive towards deep understanding and improvement. As practitioners of Agile, Lean, and continuous improvement, we must be able to repeatedly test, fail, learn, and improve, at a rapid pace. Therefore, we needed automatable, repeatable, and disposable environments; we need a cloud environment. When choosing a cloud provider, we need to ensure it’s a good fit with all of the stakeholder requirements established earlier in the process.
Let’s briefly review how the Amazon Web Services environment fits into our needs and the mandates of our individual stakeholders.
For Infrastructure Automation and Configuration Management
We have at our disposal tools like AWS OpsWorks – the enabler of versionable infrastructure-as-code. For working with a broader set of AWS resources, there’s AWS CloudFormation. It allows us to provision complex configuration option sets across all core AWS resources, from VPC to auto-scaling policies to data pipelines, access control, and log management.
For Operational Visibility
We can instrument OpsViz with AWS CloudWatch and use the collected metrics to drive our auto-scaling policies and events to drive auto-healing. Coupling AWS CloudWatch with services like AWS Simple Notification Service (SNS) will allow us to receive alerts on critical events, while simultaneously triggering other services to help remedy the situation quickly.
For Compliance and Auditability
AWS Certificate Manager will allow us to provision and manage all the certificates required to enable in-flight and stored data encryption. To address the compliance requirements, we will set up AWS CloudTrail to record all changes to the infrastructure, AWS Config to store change history in an auditable format, and CloudWatch to automate elasticity and alert us to events occurring in the stack. All of the above can be complemented with extensive logging coupled with Amazon Kinesis as well as an EMR-based log analysis and alerting system triggered by SNS notifications.
Throughout the entire platform lifecycle, the AWS security suite – IAM, Inspector, and Certificate Manager – allows us to maintain a highly secure operational posture. By utilizing granular IAM roles, applying the rule of least required privilege and by following the well-documented AWS security best practices, we can ensure that our platform and all its components are compliant and secure at all times. For an added layer of security for our Web applications and to gain a fine-grained control and minimize our application’s exposure profile, we would employ a web application firewall.
Now that we have the platform ready for a workload, let’s discuss our software delivery pipeline. We are agile, we have excellent test coverage, and we deploy often—many times per day. We need an automated, consistent, and speedy release process—we need Continuous Integration and Continuous Delivery.
We can already stand up our development, integration and production environments quickly, because we’ve already codified our infrastructure. What we need is a way to efficiently promote our code through all the stages of the pipeline all the way to production.
Enter AWS CodeCommit, a git implementation; AWS CodeDeploy, a deployment automation tool, and CodePipeline, the continuous integration service. We can use Jenkins with the AWS CodePipeline plugin. We wire it to our github repository, set Jenkins to poll the pipeline and let it run through all stages, all the way to production. This isn’t the only way to deploy code to AWS however. We could also deploy an archive containing our code along with all required artifacts into AWS S3. We would then use CodeDeploy to provision our nodes with required dependencies, then safely distribute and unpack our artifact to a predefined set of application nodes. Regardless of the method chosen, AWS will provide good visibility into the process and allow us to build in the safeguards required to avoid problems.
There are also some significant side benefits to executing our DevOps strategy on AWS. Implementing a formal Disaster Recovery and Business Continuity policy is made much easier, because we gain the ability to rapidly stand up new production-grade environments by taking advantage of native multi-region services like Amazon S3 or Amazon DynamoDB in conjunction with infrastructure-as-code. The software development lifecycle is streamlined by using scaled down mirror copies of production infrastructure, testability is improved, and deployments automated.
I hope I was able to convey, in admittedly very broad strokes, how AWS is a great place in which to execute your DevOps strategy, as it provides a robust cloud environment in which the DevOps culture can rapidly iterate and thrive. With its services and tools thoughtfully tied together, AWS provides a broad, powerful, and advanced toolset. A toolset that, when utilized properly, can be a game changer when it comes to operations, software delivery, security, and compliance. What may not be obvious at first, is that the entire AWS ecosystem can be composed in a number of ways, but when following best practices and common AWS design patterns, it expertly guides the user towards mature, highly available, secure, resilient, and elastic architecture.
At Pythian, we have been using Amazon Web Services for a number of years, and have been recommending it as one of the primary options to our clients whenever we need a robust and flexible cloud platform. We believe that organizations that adopt the DevOps mindset and embrace the AWS tools and services can compete with market leaders at the game of software, and even set the pace themselves. Pythian’s cloud experts would be very happy to help your organization get a competitive edge by utilizing all that AWS has to offer.