From September 13th to September 16th 2016 our operations team received alerts that our managed DNS servers came under a distributed denial of service attack or DDoS for short. These types of attacks are nothing new to us here at Thexyz and in 2015 we made quite a few improvements to protect us from DDoS which you can read about here. We also recently assisted law enforcement after an unsuccessful extortion attempt from the Armada Collective.
Thexyz will not and does not respond to extortion attempts. We have dealt with DDoS attacks in the past, and just last year made immense improvements to our infrastructure to combat DDoS attacks.
What is a DDoS attack?
A Distributed Denial of Service attack is where a criminal uses a large number of computers to send requests to particular website or IP address. If they can send enough requests, it will use up all the target’s resources and the site will appear to be offline.
Even if a DDoS attack is successful in knocking a website offline, it does not lead to data being compromised or lost. It is more like being stuck in a really bad traffic jam and unable to reach your destination until the traffic clears up.
Why would someone attack Thexyz?
Since November of 2015, there have been extortion demands sent to various email providers, including Fastmail, Runbox, Hushmail, Zoho and ProtonMail. By doing so they are counting on the network not being able to cope with the attack and hoping that they will just pay the ransom to prevent. In one attack, ProtonMail did pay after several days of disruptions, although this did not end the attack.
What is Thexyz doing to stop it?
We have been working with our data center and upstream providers to ensure we have strong mitigation for various DDoS scenarios so we are ready to adapt. We have also notified our local police department in Toronto who have been working with international law enforcement agencies that worked on previous attacks.
Our System Operations Team started receiving alerts for our Managed DNS service which saw about 20-25k QPS on each node i.e 5X more than our normal traffic. Support staff received multiple reports from people who could not access our website or webmail site. We immediately started mitigation via Neustar and tried to bring down the QPS count by evaluating tcpdump in order to identify any unusual pattern. We also moved all our traffic via all the 16 IPs and put them under mitigation with only Port 53 as allowed as it is used for UDP/TCP. This brought the QPS count under control and the alerts started to clear up.
Connectivity issues were sporadic yet repeatable. One of the major issues that we faced with this attack was the attack vector kept changing and making the necessary changes with the mitigation filter template took time as we needed to improvise every filter in real time. Since the QPS count didn’t show drastic improvement even after Neustar dropping 960K PPS and 350 Mbps traffic, we decided to cancel the mitigation and spread the load across all our datacenters and deployed legacy mitigation at each one. This plan worked initially but had its own pitfalls – We had to quickly move back to Neustar as another attack would have put this temporary setup under jeopardy. We had our internal team review the problem and it was decided that increasing our nodes at colo will help to load balance traffic against all available nodes (old & new) after mitigation from Neustar.
What improvements have we made?
We are always working to improve your experience. Everyone here at Thexyz is committed to winning your business every day
and we do our best to maintain industry leading uptime and reliability. Following this DDoS attack, we are making the following improvements.
- Terminate traffic with multiple GRE tunnels, instead of just one. If this can be done, all DNS traffic need not be pointed on our DNS nodes in one DC and can be spread out to multiple locations.
- Network stack optimizations on DNS servers, to accept more packets.
- Cross check current DNS server and verify if any optimizations can be done to increase the DNS throughput.
We set out to build a highly resilient Anycast Managed DNS service backing on mitigation services provided by Neustar and this attack was the first one which caused intermittent outages in the course of last 1 year. While we do have some lessons learned and some improvements to make, we continue to be confident that this is the right strategy for us.