What can the repository tell you about the work of your team?
We live in a time when the slogan “big data” is no longer alien to almost anyone. Every step we take, every look we take, is analyzed by countless IT systems. Statistics from our browsers allow you to create personalized ads. So why don’t we use the data that our own repositories provide to improve the work of our teams?
Have you ever wondered who is the leader of your project, or what contribution to the development of the code have individual developers? Or have you ever wondered at what time to schedule meetings so that it fits most of the team? In this article, I will try to prove that these and many other questions can be answered by analyzing the history of the repository you are working on.
In this article, we will focus on the most common version control tool, which is git. If you don’t know what Git is, the easiest way is to ask one of your developers, or your Google uncle:). Like any version control system, git allows you to view the history of repository changes. This history contains not only information about changes, but also about who made these changes, the exact date of their inclusion in the repository and many other valuable information. Most utilities for hosting Git repositories (GitLab, GitHub, Bitbucket) provide basic functions for analyzing the history of the repository.
The basic and easiest to understand functionality is the so-called” Contributors graph”, it allows you to find the most active members of the team working on a particular repository. This graph also allows you to see how the number of commits (or lines added or removed from the repository) has spread over time since the project was created. Often this information allows you to identify problems that developers are facing. In poorly organized teams, you can easily notice an increase in the number of commits just before or at the time of the release of a new version of the software (or new functionality). This situation also often happens in open-source projects on which developers work after hours and it is usually difficult for them to finish all the functionality before the planned release. This behavior will be manifested by the local maximum visible on the counter-distribution graph, around the release date of the new version of the software. What to do with such data I leave you to consider, maybe it is worth discussing this topic with the team? I know from experience that usually programmers are aware of the problem but are afraid to admit it.
Punch card Graph
A more interesting and potentially more informative chart is the punch card graph. Unfortunately, it is only available on GitHub, but you can easily use one of the available tools to generate the graph. Perforated card (I wonder where this name came from;)? ) indicates the days and times when changes to the repository are most frequently made. For most projects that are being worked on by developers hired locally by the company on a contract of employment, this graph should be very predictable. The largest circles should appear on working days from 9 to 15-16. However, it turns out that in many cases this is not the case. Programmers aren’t machines. Among the most common aberrations, we can note the reduced activity of the team between 12 and 13 (the inviolable lunch hour in each company 🙂 ) and breaks in activity for so-called scurm meetings (of course, only in teams working in scrum). On this chart, you can also easily see when you should not disturb programmers who are absorbed in the creative process, and when it is worth making an appointment that will allow you to relax for a while. What if the programmers work remotely? Then punch card allows you to control their work. It is easy to see whether the hired team is actually working as stated in the contract.
Code frequency graph
Every experienced programmer tries to stick to the basic principle: the code that touches should leave cleaner than it was before applying their changes. This process is called code refactoring. If the team neglects its code, it may turn out that adding new functionality or changing the old one may cost exponentially more time in the future than if the code was clear and transparent. This situation is characterised by a significant, disproportionate rate of code line increment. As the amount of code increases, so does the complexity, and thus the time to make functional changes. At this point, the “code frequency graph” comes into action, which easily and affordably presents the ratio of the number of lines added to the repository to the lines removed from it. Having such information team-leader (scrum-Master, Product-Owner, etc.) can easily order the freezing of new functionality in order to improve the quality of the code that the team is working on.
I hope that after reading this article, at least a handful of you will sit down in front of the computer will go to the site of your repository and analyze the za – 3 footbridge ”graphs”. This information is just the tip of the iceberg of information that can be drawn from repository history. I hope that you will consider these examples as an introduction to developing knowledge about your repository and take a closer look at what the commits of each employee say about your work:)
Krzysztof Studniarek, senior IT analyst at mBank, 20 February 2017