The trick to making decommissioning legacy a safe, effective experience – that can even be fun!
As much as we all like to play with shiny new things, the chances are you'll be surrounded by a fair amount of legacy tech (unless you're at a start-up). Decommissioning this safely can be a mammoth task and requires enlisting the help of many people who may not be interested in what you're doing – unless, of course, you break something. Motivating others to help with changes where a successful launch results in no visible impact – other than reducing risks people may not be aware of – is hard. Having led several successful projects of this type, this article will cover what did and didn't work well for me and how it could be applied to any change.
The first and most important challenge is getting buy-in. Your project may have been officially scheduled and approved by whoever decides the team priorities, but that does not mean everyone else is really on board.
Sometime later, when the going gets tough and you run into gnarly problems, it will be useful if everyone has previously bought into the vision – even people you may not think will be directly involved. This means other engineers, customer support, customer account management, product, architecture. For anyone who the ripples of this change may reach, it will be better if they understand the problem you are solving from the start, before their snazzy new feature gets pushed back a month to help with the decommission.
Think about what the impact of doing, and more importantly, not doing, this project will have on everyone. Even if your old server is causing you a lot of pain, that pain may not be visible to others – especially if you are dealing with it effectively. It may be easier for them to see the problems with implementing the change you are asking, than foresee what could happen if it’s not done.
For example, the platform team is regularly restarting your logging server in the night when it collapses, but it is always done before anyone notices. Your dev teams are likely to think about the pain of having to change the log endpoint in all their application config, rather than the problem of them inevitably losing logs in the future if the logging system is not replaced.
It’s important to tailor your messaging to your audience, rather than think about the impact on the team leading the project. Your frame of reference should be whatever that person cares about the most. For example, the customer account team will be more interested in not having to deal with angry customers during an outage where you have lost all your logs. The dev team will be keener to hear about the Terraform provider for the new system which will give them a higher degree of self-service. Neither of them will likely be interested in the logging system having a better authentication mechanism.
So, now that everyone thinks your decommissioning project is the best thing since sliced bread, what’s next?!
One of the difficulties in decommissioning old systems is that it’s often not clear what is dependent on them, and it’s tricky to find out. A common way to handle this is by asking a question like, ‘I want to delete system X, can anyone who thinks this may impact them let me know?’ in general engineering channels.
However, one week later, when you have only two replies and you are wondering whether to press the big red delete button, what is your conclusion?
- Everyone read your message, did their due diligence, and are sure there is no impact so didn’t reach out.
- Some people read your message, a few people checked their resources, and everyone else forgot.
A lesson I have learned (to the disappointment of my ego) is:
Don’t assume everyone reads your messages.
Even with the best will in the world, people are busy and absorbed in their own engineering challenges, so may easily miss the message that is key to your next steps.
This problem is exacerbated with remote working where there are fewer natural opportunities to casually check in over the water cooler on whether your morning’s clean-up is likely to take out someone’s application in prod. If you have strong management backing for your project, you could take the stance that they had their warning and will have to deal with any consequences of not paying adequate attention. However, the likelihood is that you are going to have to help pick up the pieces anyway, as the dependency is on your system.
A more effective approach I have found after a few weeks of shouting Can I delete this? Anyone? Someone?’ into the Slack void is to try and assign ownership before asking for help. To do this, you could sit down with a couple of folks who have been around the longest and so have a good idea of which team things sit with. Alternatively, you can approach someone who may have less knowledge but more authority to assign things to different teams.
Even if you get it wrong and assign some resources to the wrong team, it is easier to review a shorter list where they can call out the ones you’ve got wrong, than a lengthy company-wide document.
Focus on individuals
Once you have effectively divided ownership, it is easier to follow up with people on a 1:1 basis on their part of the project. People are more likely to respond to a direct message, and it’s easier for you to keep track of what progress is being made and where.
Now you’ve convinced everyone of the big picture and worked out who needs to do what, it’s going to be all smooth sailing from here, right?
Unfortunately, even though this is significant progress, it is not the whole battle. I remember when I presented a new tagging strategy, saying how it would make our cloud estate much more manageable for everyone. Everyone agreed with it, they would all help make the changes in their apps, and they saw the neatly tagged nirvana on the horizon. I was delighted.
However, over the next couple of weeks, there was very little adoption of the new tags, and I asked a few people why they hadn’t done anything they agreed to. ‘The thing is, Natalie, we are in the middle of migrating from Angular to React and back again, and solving world peace with Kubernetes is up next.’
‘But you said you would replace all your hyphens with underscores! I was looking forward to sleeping better at night.’
‘Oh we will, but not now.’
Now those examples are silly, but you get the point.
Be clear with the expected time frames
Companies handle scheduling commitments and deadlines in different ways, but goodwill doesn’t replace a time-framed commitment. You need both.
Ask people to commit to a deadline and state whether they foresee any reason it would not work for them. You need to make it clear that it’s not just a best efforts job, but that they are timelines everyone needs to meet in order for success; any time conflicts can then be discussed and re-prioritized upfront. Otherwise, you may end up in a last-minute scramble when you realize you don’t have what you need a week before the turn-off date and people are constrained by urgent customer commitments.
Communicate success along the way to make the journey clear
Engineers love solving problems and it’s easy to focus on what’s left to do rather than what progress has been made. Announcing what achievements have been made and their impact can help motivate the team and engage others in the project. It also brings people on the journey and makes them aware of the great work you’re doing.
I remember feeling overwhelmed in the middle of a clean-up of 500 DNS records when faced with the remaining 300 DNS records to review and remove. However, removing 200 records without any negative impact was already a great achievement; publicly thanking everyone and showing the collective progress from many small contributions helped drive us the last part of the way.
Although this work is not glamorous, it is incredibly important. The faster we build new shiny services, the faster the legacy will pile up, so safely and effectively decommissioning things is a skill that will always be needed. It can even be fun (removing whole systems and servers is better than culling code) so happy deleting!