This intro is a compilation of various tutorials available on the web:
- https://www.miximum.fr/blog/enfin-comprendre-git/
- https://www.git-tower.com/learn/git/ebook/en/command-line/
- https://linogaliana.gitlab.io/collaboratif/git.html
- https://www.book.utilitr.org/git.html
- https://thinkr.fr/travailler-avec-git-via-rstudio-et-versionner-son-code/
- https://www.bioinformatics.babraham.ac.uk/training/RStudio_GitHub/Initial_setup.html
The aim is to teach you the basic concepts and commands so that you can work independently in :
- your use of git via rstudio and your gitlab repository
- researching git features
- and above all solving your git problems (because yes, you will!)
1 Why should you use Git?
1.1 Without git
- Management of chaotic files between
- different versions of a file (V1,V2,final, final_ok ….)
- different system backups (PC, backup disks, …)
- different users …
- no trace of why changes were made
- impossible to return to an earlier version
This comic strip should bring back memories for everyone:
1.2 With git
Preservation and archiving of your project
Clear history of changes
Efficient collaborative working:
- Work in parallel and easily merge files
- Keep track of who did what
2 Git can be seen as a time machine
Git lets you write the history of your project, alone or with others, via using snapshot which can be seen as pictures of the folder and files contained in it that you wish to track. This folder is the repository. Each time you want to freeze the state of the repository, to take a snapshot, you do what’s called a commit.
Each commit records a certain amount of information:
- the modifications made
- who made the modifications
- when the modifications was made
- why the modifications were made, description of the modifications (via a commit message).
Over the course of commits, you’ll build up a history that can be consulted. The main history, which contains the “clean” version of your repository, is located on the master branch.
You can also create parallel branches from a commit. For example, you could create an additional branch to do something “just to see” and abandon your idea, or keep your modifications and merge them with the master branch via a merge. But either way, you’ll have kept track of them.
In this course, we won’t be using any additional branches, but you should know that they do exist, and that they make this tool, git, so powerful and indispensable for collaborative projects.
2.1 Collaborate or archive your code via a remote repository (e.g. gitlab/github)
Git lets you make a backup of your versioned project. On a remote server, elsewhere, this is called the remote. Your remote can be on Github (the most famous) or on a self-hosted Gitlab (as here at the ENS).
To retrieve a project from a remote, the first time, you clone it; as the name suggests, you clone the project, making a copy of it that you retrieve locally, on your machine. When you make commits to your local project, you can send them to the remote by making a push. Other people connected to the remote will perform a pull to retrieve your commit.
In this way, the local version (on your computer) and the remote version (on the remote) of your project are always synchronized.
2.2 How do you write your story?
The three most common manipulations are shown in the diagram below:
pull: I retrieve the latest version of the files from the remote repository commit: validate my changes with a message explaining them push: transmit validated changes to the remote repository
To be more precise, there is an additional step to be taken before validating your modifications (i.e. making a commit): indexing your modifications. In fact, git allows you to manage modifications in subtle ways and not take into account all the modifications in your workspace (working directory). Only indexed modifications, those you have added to your staging area via the stage command, will be saved in your commit.
To summarize :
1 - First you make changes to your files, but these changes will not be saved in the repository.
2 - Use the stage command to select the modifications you’re going to include in the next commit and place them in the staging area.
3 - Then use the commit command to save the selected changes in the staging area.
These steps can be carried out via the command line, but there are also graphical tools to do this, or most editors (IDEs) such as Rstudio or Visual Code have plug-ins to make life easier.
2.3 Summary of key commands :
- clone: retrieve the repository from the remote for the first time
- stage: save changes that will be added to the next commit.
- commit: a frozen moment in the life of your project
- push: send new commits to the remote.
- pull: retrieve the new commit locally from the remote.
- checkout: jump back in time to a commit.
You can get a more global view of your environment with this diagram:
Now that we’ve covered the basics, it’s time to give it a try!
3 It’s time to give it a try: Initialize your Git project on Gitlab and use Rstudio to manage it locally
3.1 Linking Gitlab and your machine using Rsudio
3.1.1 1. CCreate an account on [ENS’s Gitlab].(https://gitbio.ens-lyon.fr/): https://gitbio.ens-lyon.fr/
If you don’t have an account yet:
- Go to the site and try to connect via SSO Ens de Lyon, this will redirect you to the CAS in order to connect with your ENS identifiers.
- You will then be blocked, which is normal. Carine will receive an account request and will be able to validate it.
- Send an e-mail to Carine (carine.rey@ens-lyon.fr) specifying your group name.
3.1.2 Create a new repository on the ENS’s Gitlab
Click on the UE group (Menu -> groups -> your groups) (https://gitbio.ens-lyon.fr/ue/ue-ngs/students_2023)
Then create a project by clicking on (Create new project)
- Select Create from blank project (Create blank project)
- Give your project a name containing your group name and your name (Ideally, your project name should be in lower case, without periods, spaces or underscores, and should not begin with a number, e.g. scRNAseq_arabido_carine).
- Leave selected** Initialize repository with a README.
- Click on Create project
3.1.3 Creating your ssh key pair to link Rstudio and Gitlab
We need to enable Rstudio to connect to gitlab, so we’re going to use an ssh key pair, which is more secure than a login/password.
If you already have an ssh key pair, it’s still a good idea to make a new pair of files specifically for connecting to gitlab.
We’re going to use Rstudio to generate this pair of keys (which are in fact simply 2 files, one called “private key” and the second “public key”). The private key can be symbolized by a padlock key and the public key by the lock of this padlock.
We’re going to create these two files directly via Rstudio on your virtual machine.
In Rstudio:
1- click on “Tools > Global Options…> Git/SVN”.
2- click on “Create ssh Key …”.
3- a window will open, enter a passphrase (=a password to secure the use of your private key) and validate.
4- Then click on “Close”.
5 - Access the public key (=content of the id_ed25519.pub file) by clicking on “View public key”.
6- Copy your public key
7- In your Gitlab profile, top left, click on your avatar then on Preferences then ssh keys (in the panel on the left). You should arrive on this page: https://gitbio.ens-lyon.fr/-/profile/keys
8- Add your new public key
You could also create these files on the command line via the terminal. You can find help in the gitlab documentation, or on the Internet if you need to do it again (e.g. https://happygitwithr.com/ssh-keys.html#create-an-ssh-key-pair).
To check that everything is OK, you can type in the terminal (via Rstudio) :
ssh -T git@gitbio.ens-lyon.fr
- Answer “yes” to the question
- Enter your passphrase (it won’t be displayed, that’s normal.)
The answer should be :
Welcome to GitLab, @votre_login!
3.1.4 Configuring git in Rstudio
Finally, you will need to declare your identity: in the RStudio terminal (not the R console), type in your name so that each of your commits is linked to you:
git config --global user.name "your_pseudo"
git config --global user.email "your_mail@mail.com"
3.1.5 Clone your empty repository and create an R project in Rstudio
To associate this Git repository with an R project via RStudio, you need to make a clone:
- On Gitlab: click on Code and copy the URL (SSH protocol)
- In RStudio now click on: File > New Project… > Version Control > Git,
- enter the URL/SSH address of the repository you’ve copied, the name of the R project (ideally the same as Git)
- enter the folder in which to place it (~/mydatalocal),
- click on Create Project and finally enter your passphrase.
In this newly created RStudio project, you’ll see the git tab in the top right-hand corner.
3.2 Using git commands in Rstudio
RStudio’s Git panel shows you the status of your project in real time: the status of the various files and folders is displayed:
- A new file will be associated with an orange icon containing a ?
- This new file will be associated with a green icon containing an A once you’ve checked it (in the ‘staged’ column).
- A modified file will be associated with a blue icon containing an M
- A deleted file will be associated with a red icon containing a D
3.3 Configuring files to be synchronized or not using a .gitignore file
You don’t need to synchronize all the files in your project. Only those you check will be associated with commits. It is therefore possible to explicitly ask Git not to monitor a particular file: this is the role of the .gitignore file at the root of your project. This is a text file that accepts regular expressions and allows you to define rules that correspond to several :
By default, when creating the Rstudio project, a .gitignore file is added containing the following lines:
.Rproj.user
.Rhistory
.RData
.Ruserdata
This means that the Rstudio project configuration files are not tracked.
For TP and in general, we don’t want to track changes to raw data or results.
- Add the following lines to the .gitignore file:
data/
results/
*.Rproj
Index changes by clicking in the staged column the box opposite the .gitignore file.
Then commit with an explicit message.
Then push the modifications to synchronize your local modifications with the remote.
View changes on gitlab
3.4 Organizing your working directory
It’s a good idea to put all your project-related files in the same folder:
- raw data
- scripts
- results
- project documentation,
- …
To help you find your way around and avoid mixing up files or accidentally deleting them, we recommend that you separate the different types of data into sub-folders.
For example, your working directory might look like this:
project_name/
├── README.md # overview of the project
├── data/ # data files used in the project
├── results/ # results of the analysis (data, tables, figures)
├── src/ # contains all code in the project
│ └── ...
└── doc/ # documentation for your project
└── ...
In addition, for ease of use and reproducibility, you need to add a file, often called README.md, to the root of your folder, which will contain all the information you need to get started with the project.
This is also the file that will be visible on your project’s home page on Gitlab. This way, when someone wants to (re)work on the project, they can open the file, and they’ll know where to go to see and understand what’s been done. This person could be a collaborator, your manager or simply yourself 6 months later.
3.4.1 The README.md
In concrete terms, the README.md file is a text file written in markdown (hence the .md extension). Markdown is a language that allows you to encode the formatting of plain text simply and easily.
For example, a # means that the following sentence is a title, ## , a subtitle, ###, a sub-subtitle. You can browse the various tags here: https://www.markdownguide.org/basic-syntax/
This makes it possible to write text without wasting time on formatting, keeping the file “light” and, above all, readable for everyone. On your project’s Gitlab page, you’ll find your formatted README.md file.
Don’t forget to add to your README.md as you go along, so you don’t forget anything. It can also be represented as your laboratory notebook or your laboratory report.
At the end of the course, the quality of your README.md will be particularly important in the evaluation.
3.4.2 Creating your project architecture
- Create your README.md
- Start completing it
- Index, commit, push…
- Create data, results and scripts folders
- Index, commit, push …
- Update .gitignore file
- Index, commit, pusher …