Version control

During your career as a researcher, you will write code and create documents over time, go back and edit them, reuse parts of it, share your code with other people or collaborate with others to make tools and documents.

Have you ever lost files that weren’t saved? Or have you gone to a conference or interview and met someone interested in your work and realized you don’t have the files on your laptop? On a gloomy day, have you changed some part of your code when suddenly everything broke and you wished you could just go back to the previous working version, but alas there is no backup and you have tens of folders with misleading names?

Or are you familiar with the scenario, in which you are working with a group, writing a function and then notice another person simultaneously making changes to the same file and you don’t know how to merge the changes? Or someone makes changes to your working version and now when you run it, everything crashes? Have you experienced these or a million other situations when you felt frustrated and stressed and spent hours trying to fix things and wished there was a time machine to go back in time? The time machine has already been invented, and it’s called version control.

What is version control

Version control software keeps track of every modification to the code in a special kind of database. If a mistake is made, developers can turn back the clock and compare earlier versions of the code to help fix the mistake while minimizing disruption to all team members.

Advantages:

  • You can save all your code, data, and documents on the cloud, as you develop a project.

  • You can manage the versions throughout time and see which changes were made at which time, and by whom.

  • You can find other projects, import their scripts and modify them to reuse them for your purpose.

  • You can share your code online: it’s good for science and it’s good for your resume.

  • If you are a PhD student, you can start saving your files early on, and by the time you finish, you will have all your analyses documented and easily accessible, which will help a lot when you’re writing your thesis.

There are many version control software such as git, subversion, mercurial and many others. git is by far the most popular one.

So what is git? git is a open source tool, which features functionalities to make repositories, download them, get and push updates. It can allow for teams to work on the same project, manage conflicts, monitor changes and track issues.

Version control platforms

The most widely used version control platforms supporting git are GitHub and Bitbucket.

  • Repositories on Bitbucket are by default private and only viewable by you and your team.

  • Repositories on GitHub are by default public (everyone can see them), and to make them private you need to pay.

For a more comprehensive comparison of the two platforms see this comparison by UpGuard. When choosing a platform you must consider the limitations of each tool, and if you are employed in research, most likely, you will have to use the platform preferred by your research institute or company. Note that Bitbucket has a limitation on the number of teams one can make for free, and after some point you will need to pay.

Another platform for git is Gitlab.