Server downtime & Action Plan
UPDATED POST & VIDEO HERE
This post is in result of the two outages we’ve had in the last 31 days.
2019-04-07 09:00:47 Connection Timeout 1 hrs, 6 mins 2019-05-05 18:54:20 Connection Timeout 7 hrs, 45 mins
The first issue was due to a network configuration which was assumed good to be pushed live, The second, Longer one was due to our geolocation compared to our user load – A script that failed to execute ran the server up to 100% and crashed some services at about 4AM Local time, We had no fallbacks and this was clearly a mistake, Below I’ve outlined the steps we’re taking to prevent this in future and keep our customers updated.
We’ve submitted an order for a bigger and better main server – This will allow us to allocate as much resources as possible to Postable, We’re moving all our sister sites and tools to a seperate server in order to eliminate any bottlenecks or potential issues.
We’ve only deployed uptime robot to monitor our website and send push alerts when it notices an issue – This isn’t good enough, We’re currently looking at some paid options but will settle on something with calling or webhook integration so we can ensure someone is alerted if there is any issue for more than 20 minutes.
Fallbacks and software solutions
We’re looking into fragmenting our service and using a cluster of servers with failover protection, This would take some time to implement but we’re very aware this needs to be looked at.
We’ve also set up a new software monitoring panel to help us find irregularities and problems – we’ll be combining this some network monitoring tools and a new firewall.
What to expect
This will be a big change – Once our servers are provisioned we will be doing a database migration – This should take anywhere from 30 minutes to 3 hours depending on network load, The current time we’re looking to do this is 2 PM on the 9th of May – Brisbane, Australia time.
This will give us an opportunity to migrate our servers, set everything up and monitor – We will be upgrading our Video Encoding during this time so if you’ve had video issues this will be fixed.
Delays – Bugs and slowed down support time
We can work our best to make this Bug free and pain free – but they will happen and our live chat will be delayed, If you have any immediate problems related to billing please reach out we will do our best to assist but if you’re having issues with bot speeds, we will be aware and working on it.