Following the recent trend of “10 Things You Need To…” articles I decided to write my own list of things that you need to do. Nah. I have just built the list based on my experience and mistakes that I do not want to repeat. This is my checklist for future production deployments.
Considering that each point of the list might be a good topic for a separate article I described each of them really briefly. For some of them I give an example of software that I recommend (there may be better alternatives).
- Staging environment
Set up a staging environment which should look the same as production. Automate your deployment and upgrade process. Consider the usage of packages (sbt-native-packager), configuration management tools (chef, ansible) and containers (docker).
- Continuous Integration
Make your CI (jenkins, bamboo) able to build, test and deploy newest version of the application to the staging environment.
- Collect and display metrics
Collect usage of CPU, memory, disk, network from all hosts (collectd). Collect number of requests per seconds and response times of services (metrics). Store data in a central point (graphite). Display metrics as graphs (grafana).
- Collect and manage logs
Collect, store and analyze logs. It will help you in finding and resolving problems faster. (splunk, logstash)
- Configure alerts
Configure monitoring system to send an email to support in case of a system outage (grafana-alerts).
- Real-life simulations
Prepare testing scenarios which simulate real traffic (gatling). Run tests all the time (even during upgrades) to make sure that system behaves properly. Monitoring helps a lot during that step.
- Smoke tests
Prepare smoke tests (sanity tests) to verify basic functionality of the system which can be run even on a production environment. It can be used to be notified as soon as possible in case something goes wrong on production.
- Memory leaks
You have an environment, monitoring, and tests. Run them for several hours and check if the memory usage is not increasing during constant load. There are also tools that might help like plumbr.
- Blocking threads
When running tests, check if there are no unexpected blocking of threads. Tools like visualvm or yourkit help a lot.
- Keep alive HTTP connections
When using an HTTP client, make sure it is using a pool of keep-alive connections.
- Confidential data in logs
Ensure there are not passwords and other confidential data in the logs.
- Graceful shutdown
On shutdown application should stop accepting new requests and finish the already processing ones.
If you app is doing some asynchronous work, it might get into a state when is overloaded and not able to handle more tasks. Think about a backpressure mechanism to avoid that.
- Input validation
Ensure proper validation and filtering of incoming data. Prevent SQL injection.
- Audit logs
Log every user action as audit logs.
- Database indexes
Make sure that all required indexes are set. The application probably should not do full table scan queries at all. Disable full table scan queries in database and run tests to verify that.
- Memory settings
Adjust memory settings of services according to host capacity.
- OS hardening
Adjust operating system settings to increase the performance. My two most common tweaks are to increase the maximum number of open file descriptors (ulimit -n) and allow reusing sockets (by enabling tcp_tw_reuse).
- Log rotation
Avoid full disk by enabling log rotation (logback). Backup archived logs if necessary.
- Startup script
Prepare start/stop/restart scripts for the application (upstart). It might be useful if the start script waits and checks if the application did actually start properly.
- Backups, updates, and reverts
Schedule backups, test update process and prepare disaster recovery plan.
- Tests on different browsers, smartphones, and tablets
For that use a smart tool like browserstack.
- Input field validation
Test if input fields have proper validation attached.
- HttpOnly cookie
- CSRF protection
Use CSRF tokens to protect users from executing unwanted actions (csrf).
Reduce the size of js, html and css files and improve the performance using minifiers.
- Configure access to hosts
Make sure only authorized people can access hosts.
- Expose only required ports
Make only ports for public endpoints open. Bind other services to localhost.
- HTTPs for public endpoints
Use HTTPs and signed SSL certificate on public endpoints.
- Brute force and DDoS protection
Think about brute force and DDoS protection before it is too late.
- Strong hashing algorithm for passwords
Make sure you are not using a weak hashing algorithm for passwords.
- Prepare/update documentation
Remember to finish the documentation before final deployment. For APIs use tools like swagger.
This list for sure is not exhaustive. For each type of application, there are different things that can be easily forgotten. But I think that nowadays most of these things are common for JVM/REST application and can be used for a quick basic check.