Engineering disaster recovery: How to build a DRaaS solution for FinTech
COVID, wars, and cyberattacks getting more sophisticated — you are here. To survive in this hyper-paced world where so many things can go wrong, your business needs to ensure the continuity of its operations.
Disaster Recovery as a Service (DRaaS) exists to safeguard your crucial data and infrastructure against unforeseen disasters. Explore why DRaaS is indispensable for FinTechs and how to build or choose a robust solution tailored to your needs.
.jpg)
What is a DRaaS solution, and why do FinTechs need it?
DRaaS (Disaster Recovery as a Service) is a solution that takes data as it’s traveling through your network and saves a backup copy. Some versions stop just there: they are called BaaS (Backup-as-a-Service). Others also save and secure your business’s processing power and business logic embedded in applications; these are full-scale DRaaS solutions. So, if you have specific code or commands running in containers, DRaaS can back those containers up so that when those or databases go down, it can bring up all of the infrastructure your code is running on.
Major DRaaS use cases
Real disasters
DRaaS can help handle such threats to business continuity as natural disasters, equipment failures, and cyberattacks. The latter can be an especially pressing danger in the modern war context. Let’s take the current Russo-Ukrainian war as an example.
LockBit international hacker group is notorious for cyberattacking financial institutions in Western countries that back Ukraine with ammunition and other assistance. Over the last five years, the criminals carried out more than 3,000 cyberattacks employing ransomware, demanding large sums for functionality restoration. For instance, one American company faced a $90 million ransom demand from LockBit. Just recently, two members of the criminal group were caught.
As cyberattacks are escalating globally, with geopolitical tensions keeping pace, DRaaS solution providers respond with the same speed. Disaster recovery technologies are enhanced and diversified to meet the demand. In the scenario above, when the attackers encrypt all of your data and say, “Give us bitcoin, and we’ll unlock it,” a DRaaS solution lets you restore your system to the previous point in time.
Smaller-scale nuisances
Things like floods or $90 million cyberattacks just occur once in a while. In between, there are smaller but still harmful events happening. These may include databases going down, AWS having issues in one of the regions, or denial-of-service (DoS) attacks. Basically, it’s about just one part of your infrastructure collapsing or having issues.
Audits
One of the rules to be compliant with good oversight of your data is having a log that shows access to things. DRaaS solutions store traffic and backing data, and they can also keep your logs and other information on what’s happening in your system, which can help you with audits.
The question is, which options suit your business? Let’s see what’s on offer.
How to choose a DRaaS solution
Whatever solution you’re looking for, it’s always a good idea to ask for referrals. Considering big names is great, but it might be more helpful to look for companies similar to yours. This helps find DRaaS that fits in terms of what they specialize in. Some may be big enterprise government contractors; these will know the government-related specifics. Others may be more suitable for startups.
Also, paying attention to the provider’s security standards is essential. Check if they have SOC II certifications, are PCI-compliant, and have other relevant certifications. Another crucial moment is data access and controls. Make sure you only share as much as needed, because otherwise, once your system or its parts get locked, they may be exposed to more sides than the attackers alone.
Finally, depending on your system’s location, on-prem or cloud, look for a provider with expertise in the relevant area.
Metrics
To choose the DRaaS variant that your business can make the most of, you should first carefully look at the important metrics like the ones below.
Time to recovery. If your system goes down or an issue happens, how long does it take for the system to recover?
Time to alert. How quickly does your system signal there’s any issue? How long from that alert does it take for your system to take action automatically?
Data to lose. How much of the system data your company can afford to lose? Say, a DRaaS solution does a daily backup. This means that if the incident happens one minute before the backup, you’ll lose a day's worth of data. Some companies can afford that, while others count data per millisecond.
So, on the one hand, you have a snapshot of your daily backup. On the other, you can go all the way to continual monitoring: as your system is processing data, it’s streaming it to another system; any time you update code, the system will push it to multiple places with as low latency as possible.
If you have doubts about your company’s ability to adequately and fully access the business context and come up with the requirements, look for a provider who can help you with such an analysis.
Ensure a secure development process with cutting-edge cybersecurity measures.
→ Discover INSART
Business use cases
Another factor affecting your DRaaS solution choice (whether you buy or build) is the size of your FinTech business and its operating area.
- If you’re doing payments, accounting, or any kind of money movements, you’ll need a pretty sophisticated disaster recovery. The reason is that a day's worth of transactions or new customer data lost is not something you can easily restore — and that loss can be the end of your business.
- If you’re a lending startup, with applications coming in, you’ll need to store those applications somewhere. When losing a day’s worth of originations, you can probably get that data and re-originate it. That might actually eat less of your resources than getting a real-time DRaaS engineered for your business.
Level of engineering
Any team should think of resiliency and rigor in their system from the beginning. It’s much easier to build the architecture from the start for situations like losing data or the system going down. Changing the architecture to accommodate disaster recovery later, when your business has grown and is forging partnerships, can cost more, whether you use vendor’s help or build a DRaaS solution of your own.
Key elements your DRaaS solution must have
Again, this is very individual: much depends on what kind of business you’re leading. But let’s make up an example to outline the kind of tasks you might be facing. For instance, a credit card business. It has several components.
- Originations (accessing applicants, processing applications) can be done with a daily data snapshot and making sure that applications are stored in multiple locations so that when one goes down, applications are safe in another.
- Code: Having it checked into places like GitHub will keep it secure. Also, if you have Infrastructure-as-a-Code, you can spin up infrastructure as needed.
- Transactions: It’s not just about data: In case of an emergency, you’ll need transaction processing and the ability to get live in different regions as quickly as possible. It can be split between different locations so that certain parts of the infrastructure affected can be restored. The same goes for the CPU doing the processing: you‘d want it stored in two different places and the ability to roll over to the second location if you get a bunch of errors in the first location.
Models and options: DRaaS architecture for 2024
I’d sort the architecture for DRaaS into two main buckets.
- The first is the one where you have fine-grained control and want to understand your architecture, system, and business cases. Also, it’s the one where you want to walk through every possible mistake and see what kind of backup is suitable for each. This will be a custom solution for the most part.
If you fall into this category, you’d probably want one of the two options. One is a self-service DRaaS model, where you take charge of all disaster recovery planning, testing, and management. Another is an assisted DRaaS model, where you can relay just part of the responsibility for creating your DRaaS system to the vendor or engineer using their expertise. If you want a kind of a template you can use for engineering every stage on your own, you might want to opt for self-service DRaaS. This model is the cheapest, but it also requires a skilled team. - The second is typically a plug-and-play managed DRaaS solution. If you haven’t been thinking about resiliency in the past or simply don’t have the resources to engineer your own DRaaS solution, this one might be the best option to consider.
The importance and nuances of testing a DRaaS solution
Whether you build on your own or buy an off-the-shelf solution, testing the solution is vital to ensure it won’t fail you at a critical moment.
- If it’s a third-party service, roll over, see what happens if you try to give your access to the data from the recovery side, or try to do a processing runup. That will help you measure the time to recover and fix the problems (and they always appear when it’s the first time you go through testing.)
- Sometimes, it’s just problems with the domain name server where you can be pointing at the old system even if you have new systems running. But if all the domains are pointing at the old system, it’s not going to work, and the customers aren’t going to have the new one.
- At times, you’d have the same issues in between services, where your infrastructure is hung up in one place. You may have environment variables, secrets, and passwords in one place from systems that may differ across regions. So, if you have two databases, they may have different secure tokens to log in. So, the part of rolling over is to really test, see where the gaps are, and work through iterations. You’ll need to make a list of the things that were hard to do and didn’t work out as expected and fix them with the vendor or your team.
Overall, the platinum standard would be testing in production and moving real customer traffic. That’s a big jump to do and definitely risky to do the first time. So, before attempting that, make sure to do the following:
- It’s great to set up a control group where you can have a certain percentage of traffic going to the other location.
- If you’re doing this for the first time, it’s worth running your staging, or QA environment, or Dev environment — a low, non-production environment to check for any issues. It’s not as good as a prod, but that’s a great place to start.
Wrapping it up on DRaaS
Implementing a robust DRaaS solution is not just a precautionary measure; it's a strategic imperative. By carefully assessing your business needs, selecting the right provider, and rigorously testing your solution, you can fortify your fintech against disruptions.
If you have any questions left on DRaaS and need engineering expertise, let us know by dropping your question here.
.jpg)