Optimize performance through static code analysis

Optimize performance through static code analysis

Static code analyzer is a tool that parses program code and parses it according to predefined rules. The most popular tools of this type for Java include checkstyle, PMD and findbugs. These solutions are usually associated with finding minor errors, such as an unused variable, or detecting deviations from the naming standards for classes, methods, or variables. At our company, we have found that static code analysis is also excellent for detecting many common performance problems such as excessive database queries, suboptimal code, or even memory leaks.

Static code analysis is part of the “continuous integration” process-it runs at night on all repositories where code changes have occurred. Thanks to this, we are able to detect many performance problems already at the stage of creating code-before the application goes to testing.

Our code analyzer ruleset is based on Sonar technology (www.sonarsource.com) and is constantly being developed. Below I have described some interesting rules that may be useful to you.

Frequent database queries

One of the typical performance problems of many applications is frequent database references. With today’s infrastructure capabilities, performing a simple query on a database is quite a fast operation and can take about 1 ms. However, problems begin when we perform large numbers of such queries-for example, in a loop-and their number depends on the size of the input data.

Let’s take as an example such a fragment of code:

This is completely valid code that will pass all possible functional tests. It may even work for some time after deployment. There may also come a day when the system user loads a file containing 100 ‘000 payment transactions, which will result in 100’ 000 queries to the database, which will result in a much longer processing time, perhaps even the withdrawal of transactions after the timeout.

To prevent this type of problem, we have implemented an analyzer rule that detects database operations performed in loops as potentially dangerous pieces of code. In some cases, it turns out that you can replace the entire loop with a single ” update …”, updating the same records selected by the “WHERE” clause. You can also use a batching mechanism to reduce the number of queries.

Detect frequent database queries

Resource leak

By resource, we mean primarily connections to a database or a message queue server. A connection leak is nothing more than a missing code that closes the connection, or more generally, frees up resources. This is a very dangerous error, at the same time quite difficult to detect with typical tests. Open connections can over time take up all available virtual machine memory, or cause the connection pool to be exhausted – in both cases leading to system instability.

Sample code with missing trniter call.close() in block finally

Detecting leaks through static code analysis is not a trivial task. In more complex cases-when connections are passed between methods, or placed in a separate data structure (such as a map or list) – it may not be possible to find an error. It should be remembered that static analysis is “weaker” than actual code execution-first of all, it does not know the context: the call stack and the state of variables. However, in practice, complex cases are a margin.

Using precompiled regular expressions

Programmers often forget that regular expressions are actually mini programs compiled by the JVM as the application runs. Of course, the compilation of expressions has its cost, which can be neglected in individual cases – after all, there are no I/O operations, only CPU time – but keep in mind the effect of scale – a given piece of code can be repeatedly executed, for example, in a loop, or on many parallel threads.

Fortunately, both detection and repair of such code fragments is very easy – example below:

Wrong:

Okay.:

Detecting suboptimal use of regular expressions

Implementation

If you are interested in the details of the solution, please refer to the Sonar documentation (www.sonarsource.com). implementing rules is relatively easy and involves extending the BaseTreeVisitor class – an abstract class from the org package.sonar.plugins.Java.API.tree which implements syntax tree traversal according to the tree visitor pattern.

Below is an example of implementing one of the code analyzer rules:

Implementation of the AvoidMethodsInLoopCheck rule

Summary

Static code analysis is not able to detect all performance problems. It will not replace real performance tests – running on large data sets and near-production environments-but it can provide a valuable complementary solution that allows you to “weed out” a lot of problems at a very early stage.

Another improvement we are working on is code analysis “live” – already in the development environment (IDE), while writing code. In this way, the environment itself will draw the programmer’s attention to a performance problem that can be fixed before the changes are sent to the repository.

Author: Robert Popławski (IT Architect Specialist)

Go to our cases Get a free quote