Thursday, January 16, 2014

Tuning tips

Over the years, improving existing system is one of my biggest passions, not only because I’m mainly working with infrastructure and operations. To understand the system complicity and improve it, it’s as challenging as black magic. I’d like to share some of my experiences in this area.

Step 1 – Overview. Before start using Google and search “tuning tcp/ip on tomcat” or what ever you need to improve. Create a holistic view, either top down or bottom up. From user to the database with all components are used in the workflow. You might find later something least expected is causing the latency. For example, a busy web cluster is doing million DNS lookups to same backend server because not using DNS cache. But delay is not caused by DNS lookups, it’s caused by the firewall between the layers randomly drop UDP packages.

Step 2 – Measure. Tuning a complex system can drain enormous resource and you might feel hopelessness or cluelessness from time to time. Key to a success tuning is measuring. Not only from end to end, also between the services and applications. If you cannot measure it, you cannot improve it, more important you won’t know how much you’ve improved. Review the overview and measure the interesting paths. More data you collect, easier to find the bottleneck.

Step 3 – Bias Free. When the performance becomes an issue, people tend to blame unknowns to protect themselves. Also we love to attack the symptom rather than look couple steps further for the root cause. It might be perfectly reasonable action, but real gain is usually by resolving the root cause. In reality, we have short-term mitigation and long-term resolution. Bias will blind your judgment and instinct. In many practice cases, the problem symptom is only reflection of the real problem.

Last but not least, you need a great toolbox to complete your mission. Depending on situation, you might need different tools. Here are some of my favorites. logstash – a great central logging system
  • tcpdump, dsniff, wireshark – packet sniffing
  • graphite – graph tool, perfect for time-series data
  • ab, siege – http load test tools
  • new relic, tracelytics – full stack performance insight tools
  • sysstat, vmstat, iostat, htop – OS level monitoring tools
  • strace, vagrind, systemtap – Deep OS level troubleshooting tools
  • iperf – TCP/UPD bandwidth measurement tool
  • charles – client side web debugging proxy


No comments: