We have a live public dashboards showing uptime for Plan✕

| Guaranteed uptime | The service is currently hosted on AWS and therefore benefits from their underlying availability (99.9% uptime) and resilience.

In the event that the PlanX service itself should go down our team will be alerted automatically and seek to restore the service as quickly as possible.

No refunds are currently agreed in the event of service downtime as part of the SLA. We strongly encourage shorter contract periods and encourage customers not to renew if they are dissatisfied with the service availability. | | --- | --- | | Approach to resilience | The weakest link in Plan✕ is where data is being pulled in from a customer or third party host (Ordnance Survey and Digital Land). These are separated, so Plan✕ is designed to continue to function without that third party data. Previous enquiries will remain stable. | | Outage reporting | Customers will be notified of any planned outages by email in advance, and such outages will be timed to minimise disruption. In the event of any unplanned outage, the Customer will be informed as quickly as possible by email. | | Minimising processor / memory storage utilisation | Plan✕ is deployed on cloud-based infrastructure (AWS). We use caching to reduce the possibility of potential usage spikes and we monitor containers to ensure that they are restarted on error or if there are memory leaks etc. | | Response time for transactions | Network traffic response times will be monitored, and reports can be shared with Customer on request. | | Scaling | We have carefully chosen to work with technologies that can handle thousands of requests per second on extremely modest hardware.

We enable dynamic scaling / elastic load balancing to ensure that our infrastructure is responsive to load and always optimal in terms of hardware specification and the number of instances that deliver the service.

We use Cloudflare for load monitoring. | | Load testing | We plan on simulating load tests using external services before any significant infrastructure changes or service expansion events, including public launches for new local authorities. | | Monitoring load | We will also be carefully monitoring metrics such as active users and response times to ensure that they do not negatively impact one another. | | Disaster recovery | In the event of a main system failure our steps would be:

  1. Identify point of failure.
  2. Agree response strategy to get service back online.
  3. Set a status page and phase banner notification (if appropriate) onto the service to keep users up to date.
  4. Test and redeploy.

In all but the most exceptional circumstances Recovery Time Objective (RTO) would be within 24 hours.

We have tried to mitigate the effects of a disaster by deploying our services in separate containers on cloud infrastructure. Complete data backups are stored several times per day (RPO). (See Backups) | | Fault reporting & response | In many cases, our product team will be automatically notified of any issues (see monitoring).

When an issue is reported to us, we will immediately decide:

  1. If it can be fixed / resolved immediately, in which case we will do so.

  2. If it is sufficiently serious that users or admins need to be notified. Also to consider any knock on effects and any who we are obliged to inform.

  3. If no, its priority level. In the case of high priority issues we will keep in contact with the reporter until the issue is fixed.

If an admin is not satisfied that the team is responding in a reasonable way, they should escalate it to the CEO. |