Performance Optimization Theory You Should Know as a Programmer
I have worked for many years and found that many people have only a very shallow understanding of performance optimization because they usually only focus on business.
Don’t use Google first, think about what performance optimization theories are there?
Performance: using limited resources to get work done in a limited amount of time. The most important measurement factor is time, so many measurement indicators can use time as the horizontal axis.
A slow-loading website is penalized by the search ranking algorithm, causing the website to rank lower. Therefore, the speed of loading is a very intuitive judgment factor for whether performance optimization is reasonable, but performance indicators not only include the speed of a single request but also include more factors.
1. Metrics
1.1 Throughput and Responsiveness
Distributed high-concurrency applications cannot use a single request as a judgment basis, it is often a statistical result. The most commonly used metrics are throughput and responsiveness, both of which are very important concepts when considering performance.
As we often mentioned in our usual development, QPS represents the number of queries per second, TPS represents the number of transactions per second, HPS represents the number of HTTP requests per second, etc. These are commonly used quantitative indicators related to throughput.
When optimizing performance, we need to figure out whether the goal of optimization is throughput or response speed. Sometimes, although the response speed is relatively slow, the overall throughput is very high, such as batch operations of some databases, merging of some buffers, etc. Although the delay of information has increased, if our goal is throughput, then this can be regarded as a relatively large performance improvement.
In general:
- The response speed is optimized for serial execution, and the problem is solved by optimizing the execution steps
- Throughput is an optimization of parallel execution, which is achieved by rational utilization of computing resources
Our usual optimization mainly focuses on the response speed, because once the response speed increases, the overall throughput will naturally increase accordingly.
But for high-concurrency Internet applications, both response speed and throughput are required. These applications are advertised as high-throughput and high-concurrency scenarios. Users have poor tolerance for system delays. We need to use limited hardware resources to find a balance.
1.2 Average Response Time
Our most commonly used metric, Average Response Time (AVG), reflects the average processing power of a service interface. Its essence is to add up all the request time and divide it by the number of requests. For the simplest example:
there are 100 requests, of which 10 are 1ms, 20 are 5ms, and 70 are 10ms, the average time is:
(10*1 + 20*5 + 70*10)/100 = 8.1 msThe average response time is flat unless the service is experiencing serious problems for some time. Because the request volume of high-concurrency applications is very large, the impact of long-tail requests will be quickly averaged, resulting in slower requests for many users, but this cannot be reflected in the average time consumption indicator.
1.3 Percentile
To solve this problem, another commonly used indicator is the percentile. We delineate a time range, add the time spent for each request to a list, and then sort these times in ascending order. In this way, we take out the time-consuming of a specific percentile, and this number is the TP value. It can be seen that the TP value (Top Percentile) is similar to the median, average, etc., and is a term in statistics.
What it means is that more than N% of requests are returned in X time. For example, TP90 = 50ms, which means that requests over 90th are returned within 50ms.
This indicator is also very important, it can reflect the overall response of the application interface. For example, if a long-term GC occurs in a certain period, the indicators over a certain period will experience severe jitter, but some low percentile values will rarely change.
We generally divide it into TP50, TP90, TP95, TP99, TP99.9, and other segments. The higher the requirement for the high percentile value, the higher the stability requirement for the system response capability.
In these high-stability systems, the goal is to kill long-tail requests that seriously affect the system. For the collection of this part of the interface performance data, we will use a more detailed logging method instead of just relying on indicators. For example, we output the input parameters and execution steps of an interface that takes more than 1s to the log system in detail.
1.4 Concurrency
Concurrency refers to the number of requests that the system can process at the same time. This indicator reflects the load capacity of the system.
In high-concurrency applications, high throughput is not enough, it must also be able to serve multiple users at the same time. When the concurrency is high, it will lead to a serious problem of shared resource contention. We need to reduce resource conflicts and the behavior of occupying resources for a long time.
1.5 Second Open Rate
In the era of mobile Internet, especially for pages in apps, opening in seconds is an excellent user experience. If the page can be loaded in less than 1 second, the user can have a smooth experience and no more anxiety.
Generally speaking, different page opening standards can be set according to the business situation. For example, the second opening rate is the proportion of data less than 1 second.
1.6 Correctness
Say something interesting. We have a technical team. When testing, we found that the interface response was very smooth. After increasing the number of concurrency to 20, the application interface response was still very fast.
However, when the application was launched, a major accident occurred because the interface returned unusable data.
The cause of the problem is also better positioned, that is, the fuse is used in the project. During the stress test, the interface directly exceeded the service capability and a fuse was triggered, but the stress test did not judge the correctness of the interface response, resulting in a very low-level error.
So don’t forget the key element of correctness when doing performance evaluations.
2. Theoretical methods
There are many theoretical methods for performance optimization, such as barrel theory, basic testing, Amdahl’s law, etc. Below we briefly explain the three most commonly used theories.
2.1 Cask Theory
To hold the most water in a barrel, each plank needs to be the same length and not broken. The bucket cannot hold the most water if a single board does not meet the conditions.
How much water can hold depends on the shortest board, not the longest.
The barrel effect is also very suitable for explaining system performance. The components that make up the system vary in speed. The overall performance of the system depends on the slowest component in the system.
For example, the most serious problem in database applications that restricts performance is the I/O problem of disk placement. That is to say, the hard disk is the short board in this scenario, and our first task is to make up for this short board.
2.2 Benchmarks
Benchmark is not a simple performance test, it is used to test the best performance of a program.
2.3 Warm Up
Application interfaces often have short timeouts right after they are started. Before testing, we need to warm up the application to remove the effects of factors such as the JIT compiler. In Java, there is a component, the JMH, that eliminates these differences.
3. Notes
3.1 By numbers, not guesswork
Some students have a good feeling about programming and can list the bottleneck points of the system by guessing. This situation exists, but it is very undesirable. Complex systems often have multiple influencing factors. We should put performance analysis in the first place and performance optimization in the second place. Intuition is only our assistant, but not a tool for concluding.
3.2 Don’t prematurely optimize and over-optimize
Although performance optimization has so many benefits, it does not mean that we have to do everything to the extreme, and performance optimization also has a limit. It is more difficult for a program to run correctly than for a program to run faster.
The originator of computer science “Donald Knuth” once said: “Premature optimization is the root of all evil”, and this is the truth.
Generally, the performance-optimized code is too obscure to read due to the pursuit of execution speed, and there will be many concessions in structure. Premature optimization will introduce this difficult-to-maintain feature into your project prematurely, and when the code is refactored, it will take more effort to solve it.
The correct approach is that project development and performance optimization should be carried out as two independent steps, and performance optimization should be carried out until the architecture and functions of the entire project have generally entered a stable state.
3.3 Maintain Good Coding Habits
We mentioned above, don’t prematurely optimize and over-optimize, but it doesn’t mean that everyone doesn’t consider these issues when coding.
In the process of pursuing high-performance and high-quality coding, some good habits will accumulate and form excellent training and quality on the road of life, which is of great benefit to us.
Finally
Thanks for reading. I am looking forward to your following and reading more high-quality articles.
