Anatomy of an Outage

Icon

A Quick Fix Turned Long-Term Solution

In 2017, an organization we know well decided to expand their website’s reach globally. Business pressures were pretty strong at the time, so they outsourced the front-end development to a well-known, but low-cost off-shore development shop. The development was completed on time and on budget. They used an AWS-based microservices tech stack and decided to save a bit of time by focusing on functionality and planned to think through the solution’s scalability over time.

Over the next few years the application worked well and was left alone. Time passed and the business grew.

The Technical Debt Surfaced

Suddenly at 2am on a Wednesday morning in mid-December 2022 the application seized up, and while it didn’t fall over entirely it slowed to a crawl, making it effectively useless for customers. The application’s success had finally caused demand for its services to exceed the capabilities of its architecture.

Triage

In the short term, our architects scaled various microservices to account for the increased load and stopped a bot attack that coincidentally was going on at the same time. Those updates temporarily eliminated the slow load times.

The longer-term fix consisted of multiple improvements:

  • Replacing the current in-instance memory caching with Redis in-memory caching service so that all instances share the same cache.
  • Adjusting the CMS cache to auto-heal itself.
  • Adjusting the WordPress custom Plugin to have the same outbound API call timeouts that other pieces of the infrastructure have.
  • Putting proper load testing in place to identify inefficiencies in services.
  • Most importantly, adding Performance Sprints to the project roadmap so that the team has the room to focus on performance and growth.

Quick Fixes are Great, but Know the Implications

Situations like this are tough. We all understand the reality of business pressures and budget constraints, the value of a solution that will work for now and the need for speed. But remember that if you do live to fight another day, and your short term decisions are successful and your user-base grows, those quick solutions, the conscious decisions, the unconscious mistakes – they do not go away.

Proactively Managing Technical Debt

What could have helped? Here are some suggestions to help you start managing your application’s technical debt:

Automated Testing

You can write a few scripts and have them test your application on any cadence it requires to stay on top of priority fixes that should be made. At Valtira, we’ve built a team that can get automated tests up and running in just a couple of weeks.

  • This process usually includes load testing to make sure the application is prepared for expected usage and growth. We can also do geographic testing to ensure it works in all areas it’s intended to.
  • Monitoring performance trends across time is a great early indicator that can help you catch problems before they arise.

Yearly Architecture Reviews

We help our clients compile a high-level picture of all legacy and current systems. We recommend yearly reviews of this documentation to update and monitor any changes in usage across those platforms.

  • This process may identify updates that need to be planned for in the next year. Even if you’re phasing out an application there still may be updates you can make at a low cost to decrease the probability of an outage, especially if a planned decommissioning gets delayed or postponed.
  • These yearly reviews can also help new staff onboard and get familiar with a system so in the event they need to support the application they can move more quickly.

Analytics Reviews

It’s important to check in on the analytics of your legacy applications because you may notice that more users are using an application than it was built for. Or there may be new devices and screen sizes being used that the application is not optimized for.

Plug-In Updates

If your application uses plug-ins, you not only need to make sure that they are up to date, but also that each time you add functionality, the application is smoke tested to ensure that the updates haven’t broken anything.

At Valtira, we have two decades of experience building, maintaining, and fixing outages for our customers. If you’re looking for a trusted partner to help you manage and identify risk, reach out and we can walk you through our process.

Fixing outages are just one way we help our clients. We also build solutions from the ground up! We generally follow a 4-step approach that helps guide our clients through the entire process.

Strategy & Discovery
We plan the strategic, creative, functional, and technical aspects of the website or web app during the discovery phase. This phase is of utmost importance to the project as the deliverables will be the basis for not only the website build-out but the overall digital strategy. We also outline the Key Performance Indicators or KPI’s during this stage.

Design
The Information Architecture dictates the structure of the site and validates the user’s path from entry to conversion. Wireframes highlight how each page type is laid out on both desktop and mobile devices. Our user experience team determines how to best layout each page for an intuitive and user-centric experience on every page of the website. Once IA, Wireframes, and UX are complete we apply the new or existing brand guidelines to the design, ensuring the visual design is consistent with our clients brands and delivers the customer experience they expect.

Development
Using the delivered design and strategy documentation, our team of developers merges the design with the technology. The development phase is performed as a series of sprints delivering functionality as it is completed in accordance with the project plan and technical architecture. Ongoing status reports and development deliverables are created and tested in-line as they are completed.

Deployment
Once completed, the application is thoroughly tested against the documented test plans and submitted to the client for user acceptance testing. After launch, we generally begin to do monthly maintenance on the application to ensure the customer experience is consistent and highly performing. We also work closely with our clients to plan for future enhancements based on internal and external feedback gathered after launch.

Want to learn more? Contact us today to get your custom quote for your technology needs.

Ready to get started or have questions?

We’d love to talk about how we can work together or help you to brainstorm your next project and see how we might help.

More from Valtira

DevSecOps: Balancing Security and Time-to-Market

The digital landscape is evolving at an unprecedented pace, pushing businesses to deliver software applications and services more rapidly than ever before. In this era of agility and continuous delivery, it's crucial to strike a balance between ensuring robust...

Leveraging AI in DevSecOps with Valtira

DevSecOps, which integrates security into the DevOps process, is a critical strategy for modern software development. It's a method that aims to identify and rectify security issues early in the development lifecycle. However, the implementation of DevSecOps is not...