Git

A brief introduction to Git

Version control

Why version control?

  • Tracking But it worked yesterday… 😭
  • Storing versions Could you recreate this plot for me?
  • Backup I replaced the wrong my_file_2_final_final.py 😱
  • Exchange My colleague sent me a script via mail! Or Mattermost…?
  • Collaboration review_comments.pdf review_feedback-2.docx

Version control systems

  • Manage a sequence of snapshots (e.g. of folders)
  • There are many version control systems out there (e.g. Subversion, CVS, Mercurial, Git)
  • We will focus on Git which is the de facto standard
  • We will learn how to use Git on the command line1

Expectation management

  • You won’t master Git in an afternoon; it becomes intuitive only through regular use
  • Git may feel unfamiliar at first, but it’s a tool that rewards practice
  • Excellent documentation and many tutorials are available when you need them

Git is a skill not a button you press

The basic workflow

Creating a repository

git-init — Create an empty Git repository or reinitialize an existing one

  • Initializes1 an empty repository in a directory
  • Creates a hidden .git folder for housekeeping data
mkdir my_repo
cd my_repo
git init .

The staging area

git-add — Add file contents to the index

  • Adds a file to the index by creating an object in the .git folder
  • The file is “staged” for the next commit
date > date.txt  # Create date.txt with current date
git add date.txt

Commit

git-commit — Record changes to the repository

  • Creates a new commit object, which contains the current content of the index
  • In addition to the content, a log message and author information are stored
git commit -m "Initial commit"

gitGraph
   commit id: "df21a: Initial commit" tag: "HEAD"

Commits

  • Commits are linked to form a sequence of snapshots
  • Each commit is a direct child of the current HEAD1
date > date.txt  # Update content of date.txt
git add date.txt
git commit -m "Update date"

gitGraph
   commit id: "df21a: Initial commit"
   commit id: "315f2: Update date" tag: "HEAD"

Day-to-day life

  • Git has many features that are extensively documented

  • You will only need a handful of commands in your daily work:

    git status  # show the working tree status
    git diff my_file  # show what has changed
    git add my_file  # add file contents to the index
    git commit  # record changes to the repository
    git log  # show the commit history

Configuration

git-config — Get and set repository or global options

  • You can configure certain (global) settings for your local Git client

  • For example, you can set the username and mail attached to each commit

    git config --global user.name "Your Name"
    git config --global user.email "youremail@yourdomain.com"
  • The concept of authorship is widely used on platforms, such as GitHub or GitLab

Hands-on session

  1. Set the user name and email address in your local Git client
  2. Create a directory and initialize a Git repository
  3. Create a file and commit it to the repo
  4. Change the file, inspect the differences, and commit the changes

On Levante: module load git

Branches

git-branch — List, create, or delete branches

  • Branches are names that point to a certain commit
  • They are not unique, and they can change over time
  • Encapsulate the changes required for a feature/bugfix
  • Allow incremental development without impacting other branches

Switching branches

Create a new branch develop to work on a feature

git branch develop

Switch to the new branch (changing the HEAD)

git switch develop

gitGraph
   commit
   commit
   branch develop
   checkout develop
   commit
   commit tag: "HEAD"

Show the differences to the main branch

git diff main

Merge two branches

git-merge — Join two or more development histories together

  • Include changes from another branch into the current one
  • Usually creates a “merge commit” with two parent commits
git switch main
git merge develop

gitGraph
   commit
   commit
   branch develop
   checkout develop
   commit
   commit
   checkout main
   merge develop tag: "HEAD"

Hands-on session

  1. Create a branch
  2. Commit something to the branch
  3. Merge the branch into main
  4. Check the log of the main branch
  5. Delete the branch (git branch --help)

Conflicts

Collaboration can lead to disagreements1

gitGraph
   commit
   commit
   branch alice
   checkout alice
   commit
   checkout main
   branch bob
   checkout bob
   commit

Alice fixes an obvious error in file.txt

-This course is lame
+This course is nice!

Bob is doing the same

-This course is lame
+This course is awesome!

Conflicts

This creates a conflict when merging both branches

gitGraph
   commit
   commit
   branch alice
   checkout alice
   commit
   checkout main
   branch bob
   checkout main
   merge alice
   checkout bob
   commit
   checkout main
   merge bob

Auto-merging file.txt
CONFLICT (content): Merge conflict in test.txt
Recorded preimage for 'file.txt'
Automatic merge failed; fix conflicts and then commit the result.

Solving conflicts

Solving conflicts requires your decision

file.txt
<<<<<<< HEAD
This course is nice!
=======
This course is awesome!
>>>>>>> bob

Solving conflicts

Solving conflicts requires your decision

file.txt
This course is awesome!

After resolving the conflict1, you have to commit your changes

git add file.txt
git commit

Hands-on session

  1. Create a file.txt in two different branches,
    each with different content1
  2. Merge both branches into main (CONFLICT)
  3. Resolve the conflict and commit your changes

Best practices

Atomic commits

Commits should only deal with one task — one logical unit

  • Ensure regular commits to track progress effectively
  • Commit changes independently for clarity and easier management
  • Commit self-consistent (i.e. working) states

Atomic commits

  • A single logical unit is not the same as a file

  • You can use Git to interactively add parts of a file

    git add -p hello.txt
    diff --git a/hello.txt b/hello.txt
    index a042389..cd08755 100644
    --- a/hello.txt
    +++ b/hello.txt
    @@ -1 +1 @@
    -hello world!
    +Hello world!
    (1/1) Stage this hunk [y,n,q,a,d,e,p,?]?

Commit messages

Write meaningful commit messages

  • Use the imperative mood in the subject line (what is done)
  • Limit the subject line to 50 characters
  • Use the body to elaborate why changes have been performed

Commit messages

ice: Fix freeing uninitialized pointers

Automatically cleaned up pointers need to be initialized before exiting
their scope.  In this case, they need to be initialized to NULL before
any return statement.

Fixes: 90f821d72e11 ("ice: avoid unnecessary devm_ usage")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Example from kernel.org

Large (binary) files

You should not commit large (binary) files

  • Git stores every version of a file in the repository

  • Binary files are hard to meaningfully compare (diff)

  • Use a .gitignore file to exclude file patterns:

    .gitignore
    *.nc
    plots/*

Decentralization

Decentralization

  • Git is a decentralized version control system
  • Each repository contains the full project history
  • Technically there is no single point of truth1

GitHub and GitLab

  • There are many services to host Git repositories
  • They offer plenty of additional functionality
    (merge/pull requests, code review, automated testing, …)
  • GitHub offers the largest user base, while GitLab can be self-hosted

This lecture is hosted on the DKRZ GitLab

Tracking a remote repository

Add a remote repository1 to a local repository

git remote add origin <PATH_TO_REPO>

Clone an existing remote repository

git clone <PATH_TO_REPO>

Syncing with a remote repository

  • Push your local references to the remote repo

    git push origin <YOUR_BRANCH>
  • Pull remote changes to your local repo

    git pull origin <YOUR_BRANCH>

Syncing with a remote repository

  • Push your local references to the remote repo

    git push -u origin <YOUR_BRANCH>
  • Pull remote changes to your local repo

    git pull

Hands-on session

  1. Generate a new SSH Key pair (ssh-keygen) for use with GitLab
  2. Add the public SSH key to your DKRZ GitLab account
  3. Configure your SSH client to use the new private key
  4. Open your personal Git repo1 in your web broswer
  5. Follow the instructions to “Push an existing Git repository”

Configure your local SSH client

  • Configure your local SSH client to always use your private key for GitLab
  • The key will be added to your SSH agent automatically when needed
~/.ssh/config
Host gitlab.dkrz.de
    Hostname gitlab.dkrz.de
    User git
    IdentityFile <path_to_your_private_key>
    AddKeysToAgent yes

Merge/Pull requests

  • GitLab and GitHub provide an interface to discuss changes before a merge
  • One can requests reviews for specific people to get feedback
  • We will use merge requests to collect and review the exercises!

Merge request on DKRZ GitLab

Take-home messages

  • Git has a learning curve — the more you use it, the more natural it becomes
  • It is the de facto standard for version control across science and industry
  • Use it for personal version control — and get collaboration features “for free”

Shotgun buffet

I am just gonna throw a bunch of stuff at you. Take what you might find interesting.
Scott Chacon

Rebase vs merge

git-rebase — Reapply commits on top of another base tip

Instead of merging branches, one can also rebase

%%{init: {'gitGraph': {'showCommitLabel': false}} }%%
gitGraph
   commit id: "92664df"
   commit id: "39afe64"
   branch develop
   checkout develop
   commit id: "5aeaccd"
   commit id: "6c53a8a"
   checkout main
   commit id: "b125e2f"

%%{init: {'gitGraph': {'showCommitLabel': false}} }%%
gitGraph
   commit id: "92664df"
   commit id: "39afe64"
   commit id: "b125e2f"
   branch develop
   checkout develop
   commit id: "d925af7"
   commit id: "dbdb8e4"

Rebasing retains a linear history by changing the commit history (!)

Forks

  • A fork is a copy of a repository on server side

  • Used to work on public repositories without granting ownership

  • Standard names for locally defined remotes:

    origin    https://github.com/lkluft/numpy (fetch)
    origin    https://github.com/lkluft/numpy (push)
    upstream  https://github.com/numpy/numpy (fetch)
    upstream  https://github.com/numpy/numpy (push)

Git Submodules

  • Let you embed a Git repo as a subdirectory of another
  • Keep your commits separate
  • Avoid them unless you have a specific, strong reason
  • Heavily used in ICON development

Cloning a repo and it’s submoduels

git clone --recursive <PATH_TO_REPO>

Update submodules after changing/updating a branch

git submodule update

Further reading