Welcome

Welcome to Git Gud, an introductory book and reference to using the Git-SCM tool and the GitHub platform. In this book, you will learn how to utilise the Git Command Line Interface (CLI) to manage the development, distribution, tracking and versioning of software as well as utilise the online platform GitHub to store this information remotely, run Continuous Integration (CI) jobs, contribute to Closed Source and Open Source Software (OSS), integrate your changes into large codebases and publish your own software.

How to use this book

Using the book is pretty self-explanatory. The content is split up into chapters which cover a particular topic which can be further broken down into sections. You navigate through the book mostly chronologically using the arrow buttons on either side of the page (can't miss them). You can also look through the chapters and sections to find particular topics or use the search bar, which can be activated by pressing S. Each chapter has a challenges section. These contain various tasks to complete related to the content of each chapter.

Contributing

You can contribute to the book by accessing its GitHub repository (GitHub log in the top right-hand corner of any page). Follow the contributing guidelines on the repository for more details.


Current Release

TL;DR

The section of the book is designed to be a quick reference to some of the most common commands or actions for Git and GitHub.

Installation

Download

Setup

# Set the name that is identifiable for credit when reviewing version history
git config --global user.name "github-username"

# Set an Email address that will be associated with each history marker
git config --global user.email "github-email"

# Set automatic command line coloring for Git for easy reviewing
git config --global color.ui auto

Initialisation

# Initialise a directory as a Git repo
git init

# Clone a repo from the URL
git clone <url>

Staging and Commits

# Show which files are staged
git status

# Stage file or pattern
git add <file-or-pattern>

# Remove all changes
git reset

# Show diff of changed files
git diff

# Show diff of staged files
git diff --staged

# Commit staged changes w/ message
git commit -m "Your commit message"

Branches

# List branches (* indicates current branch)
git branch

# Checkout to branch
git checkout <branch>

# Create a new branch
git branch <branch>

# Create new branch and checkout to it
git checkout -b <branch>

# Merge incoming branch into current branch
git merge <incoming-branch>

Inspecting and Compare

# Show commits from the current branch's history
git log

# Show commits on branch-A not on branch-B
git log <branch-B>..<branch-A>

# Show commits that modified a given file, even across renames
git log --follow <file>

# Show diff of what is in branch-A that is not on branch-B
git diff <branchB>..<branchA>

# Display any Git object in human-readable format
git diff <SHA>

Remotes and Sharing

# Add Git URL as remote
git remote add <remote> <url>

# Retrieve changes from remote without integrating changes
git fetch <remote>

# Fetch and merge changes from a remote branch into your local branch
git pull <remote>

# Transmit changes from local branch into a remote branch
git push <remote> <branch>

# Set current branch to track branch at remote
git push -u <remote> <branch>

# Merge a remote branch into local branch
git merge <remote-name>/<branch>

Rebasing

# Apply commits from the current branch on top of the new-base-branch.
git rebase <new-base-branch>

# Interactively apply commits from the current branch on top of the new-base-branch.
git rebase -i <new-base-branch>

# Clear all staged changes and rewrite working tree from the specified commit-hash
git reset --hard <commit-hash>

Stashes

# Save modified and staged changes in stash
git stash

# Show all stashes in stack-order
git stash list

# Pop stash from top of stash-stack
git stash pop

# Drop stash from top of stash-stack
git stash drop

About

What is Version Control?

Version Control allows for changes within a repository to be tracked. This allows you to retain a historical ledger of your source code. This allows you to easily move between different points in your repository's history. It also allows you to develop features on separate branches so the changes do not affect the currently working codebase.

What is Git?

Git is a Source Control Management tool (SCM). It keeps a history of multiple files and directories in a bundle called a repository. Git tracks changes using save points called commits. Commits use .diff files to track the difference in files between commits. Repositories can have multiple branches allowing many different developers to create new changes and fixes to a codebase that are separate from each other. You can also switch between branches to work on many different changes at once. These branches can then later be merged back together into a main branch, integrating the various changes.

Common Terms in Git

  • Repository (or repo) - A project, workspace or folder containing your codebase.
  • Staging - State the files that are to be added, those that have changed or those that have been deleted from the repository for a commit.
  • Commit - A saved snapshot of the codebase that has an associated hash.
  • Branch - A separate history chain that can later be merged into other branches.
  • Clone - A machine local copy of a repository, usually obtained from a remote repository hosting service (GitHub, GitLab).
  • HEAD - The top (most recent) commit of a branch or repository.
  • Checkout - The means to switch to a branch (HEAD) or commit.
  • Pull/Push - Sync the local repository with the remote repository by pushing up your changes or pulling in the remote ones.
  • Fetch - Pull metadata about remote changes without integrating remote changes.
  • Merge - Combine the history of another branch into the current branch.
  • Stash - Save changes in a temporary save commit as a Work-In-Progress (WIP).
  • Tags - A named commit in a repository's history.
  • Rebase - A technique for reapplying commits on top of a base branch HEAD.
  • Diff - The difference between a file, folder, commit or branch across commits and branches.
  • Remote - A copy of a repository that lives off your machine.
  • Pull Request - A request to merge a branch's changes into your branch (usually a feature branch into main).
  • Fork - A fork is a clone of a repository that shares its history and acts similar to a branch but is logically separated from the original with the forkee as the owner of the fork.
  • Upstream - The source location of a repository.
  • OSS - Acronym for Open Source Software.

Git Workflow

The basic workflow for getting started is as follows:

  1. Initialise a repository [ie. create the repository]
  2. Add/write file contents
  3. Stage changes
  4. Commit changes
  5. Create branches for different features, usually branched from the main branch (or equivalent)
  6. Add/make changes
  7. Stage new changes
  8. Commit changes
  9. Repeat 6-8 until the feature is done
  10. Create a 'Pull Request'. This is a request for the owner of the main branch to merge your branch's changes into main ie. pull them into main.
  11. Code owner merges feature branch into main
  12. Deploy changes
  13. Repeat 5-12 for projects lifetime.

Installing Git

Git is really simple to install, instructions and binaries can be obtained from its official download page. Select your target platform (Windows, macOS, Linux) to see the various options for install Git.

Note: On Windows, you will have to got through an install Wizard. Customize the installation to whatever you see fit but you must tick the option for adding Git to your system %PATH% or 'PATH'.

For the purposes of the this book, we will assume you are using a Bash-based terminal. This is just the regular shell on Linux and macOS. You can use the Git Bash Shell that installs with Git on Windows.

Common Git Commands

Git is predominately used through its command line interface (CLI). This means Git operates using a variety of different commands (and even sub-commands, command options for commands). Some of the most commonly used commands and their purpose:

CommandDescription
initInitialise a repository
cloneClone repository from remote host at URL
checkoutCheckout to branch or commit
branchCreate branch
addAdd files to commit stage
commitCommit staged changes
mergeMerge changes from another branch into current branch
pushPush changes to remote repository
pullPull changes from remote repository
fetchPull changes from remote repository without integrating changes into local repository
rebaseReapply commits on top of new branch HEAD
statusList the currently modified files and status of remotes
tagCreate or list a tag at the current HEAD

There are many more commands within Git's CLI. You can view them and their functionality using Git's Manpage. To access the Manpage run the following commands.

man git

Creating a Repository

To create a repository you first want to create a new directory. Then you can initialise this directory as the root of your new repo.

# Make a new directory/folder to be the root of your new repo
mkdir project

# Navigate into the root directory
cd project

# Initialise your new repo
git init

Changes, Staging and Commits

Whenever you make changes to the contents of a repository, these changes will be compared as a diff between the working tree and the HEAD of the repository's history. To view which files have changed you can run the git status command. Once you are happy with your changes, you can git add <file> them to the file index for staging. You can specify certain files you want to keep or use . to stage all new changes.

Finally you can then commit your changes. Commits are saved snapshots of a repo's state. When you create a commit, a hash value is generated for it, this is used to jump back to the commit using switch or checkout. Commits also have a short message associated with it to describe the changes made in the commit.

# Stage all changes in repo
git add .

# Commit changes
git commit -m "Commit message"

Branches, Stashes and Tags

Git has various ways of storing and marking the different states and histories of a repo. The most common is a branch, which serves as a logically separate history for a repository. This allows various changes from multiple people to not conflict with each other. Stashes are temporary stores of changes that can be pushed to or popped from, kind of like a stack of changes. Tags are named markers for commits in a repo's history.

Branches

To create a new branch, you simply use the git branch with the name of the branch as the argument. This will create a new branch HEAD at the commit you are currently at. You can also pass a commit hash as a second argument to create the branch at that associated commit. Renaming a branch can be achieved by using the -m or -M (forced) flags and passing the old name (optional) and new name as arguments respectively. You can also delete branches using the -d flag. To switch to a branch we use the checkout command, supplying the branch name as an argument.

# Create a new branch
git branch new-branch

# Switch to the new branch
git checkout new-branch

# Rename the current branch
git branch -M new-branch-2

# Switch back to the main branch
git checkout main

# Delete the previously created branch
git branch -d new-branch-2

Note: You can create a new branch and switch to it using the -b flag with the checkout command.

git checkout -b new-branch

Stashes

Stashes store WIP changes that can be reapplied on to the currently checked out commit. Stashes are labelled by a number value starting at 0, indicating the most recent stash. You can pop or drop (delete) a stash from a particular index by providing it as an argument, with the default being the most recent stash. You can also list a repo's stashes using the list sub-command.

# Create a stash
git stash

# Pop the most recent stash
git stash pop

# Drop (delete) the most recent stash
git stash drop

# Stash with a message
git stash -m "Stash 1"
# Some changes
git stash -m "Stash 2"
# Some changes
git stash -m "Stash 3"

# List stashes
git stash list
stash@{0}: On main: Stash 3
stash@{1}: On main: Stash 2
stash@{2}: On main: Stash 1
stash@{3}: WIP on main: cea4a92 current HEAD commit message

git stash pop 1

git stash list
stash@{0}: On main: Stash 3
stash@{1}: On main: Stash 1
stash@{2}: WIP on main: cea4a92 current HEAD commit message

git stash drop 1

git stash list
stash@{0}: On main: Stash 3
stash@{1}: WIP on main: cea4a92 current HEAD commit message

Tags

Tags allow us to name a particular commit such that it can be referenced and changed to it. Tagging is mostly used to mark certain versions of a codebase. Tags can be created using the tag command and providing a label for the tag. Like most of Git, the -d flag can be used to delete a tag as well.

# Create tag
git tag v1.0.0

# List tags
git tag
v1.0.0

# Delete tag
git tag -d v1.0.0

Merging

Merging is the process of combining two (or more) histories together. The commits of the branch being merged into another are 'replayed' onto the 'base' branch synchronizing the histories and finally combining the result into a single commit. Merging two branches requires the branches have a common ancestor commit that can be used as the base of the combined histories. To merge branches we use the merge command, supplying the name of the branch we want to merge, which will be merged into the the current branch.

# Merge branch 'feature' into your current branch
git merge feature

Dealing with Merge Conflicts

Conflicts occur when Git cannot automatically merge two branches together. When this occurs, you will have to manually intervene and decide which changes you want to keep. When you have resolved the conflicts you then stage and commit them in a new commit as you would any other set of changes.

Conflict Makers

Git uses conflict markers in your source to indicate the difference between your local changes and the changes you are integrating in. These markers are:

  • <<<<<<< - Indicates the start of your local difference.
  • ======= - The separator (ie. start) between the local and external difference.
  • >>>>>>> - Indicates the end of the external difference.

Rebasing

Rebasing is a powerful feature of Git that allows you to rewrite the history of a branch. When you branch, the commit and branch from which you branched from becomes the base of your branch's history. However, if changes are made to the base branch then your branch and the base branch are said to have divergent histories. If you want to integrate the changes from the base into your branch you have two options:

  1. Merge the changes into your branch
  2. Rebase your branch

Merging is the simplest and most obvious solution. It involves simply pulling the changes from the base branch back into your branch as a new merge commit, tying the histories together. This is often what you will want to do but it can clutter the history of your branch if the base branch is very active as you will often have to merge the base back into your branch.

Rebasing, as the name suggests allows you to rip up the base of your branch and root it to the HEAD of the same branch or even another branch. This allows you to take the changes on these branches and apply them into the history of your branch.

git rebase <new-base-branch>

Interactive Rebasing

Git also offers interactive rebasing which allows you to control exactly how the history is rewritten for the branch. This feature can be enhanced by IDEs (Integrated Developer Environments), allowing you to effectively rewrite your progress as you go.

git rebase -i <new-base-branch>

When not to rebase

Rebasing is powerful but can make it difficult to track when upstream/divergent changes are integrated into a branch. In general it is best to not rebase when the branch is public (ie. rebase main onto a feature for... whatever reason) especially if the branch has an upstream remote as this will create divergent histories between your local rebased branch and the remote which can be extremely difficult to fix and track.

Configuring Git

The properties and behaviour of Git can be customized using special files within your repo. These can control which files Git tracks as well as give certain properties to certain files within a Git repo.

.gitignore

The .gitignore file is used to specify files or file patterns you wish for Git to ignore. File patterns are specified using Unix-based globbing (ie. the use of wildcard patterns).

.gitattributes

The .gitattributes file is used to specify the attributes of files or file patterns. One particular useful case of the .gitattributes file is to ensure the correct End-of-Line (EOL) characters are used in certain files as the default choice between Unix-like systems and Windows is different which can lead to some unique bugs and inconsistencies when collaborating with people using different systems. It also allows you to control exactly which files these attributes apply to ensuring only the right files are affected.

PatternExamplesExplanation
**/logslogs/debug.log logs/monday/foo.bar build/logs/debug.logYou can prepend a pattern with a double asterisk to match directories anywhere in the repository.
**/logs/debug.loglogs/debug.log build/logs/debug.log but not logs/build/debug.logYou can also use a double asterisk to match files based on their name and the name of their parent directory.
*.logdebug.log foo.log .log logs/debug.logAn asterisk is a wildcard that matches zero or more characters.
*.log !important.logdebug.log trace.log but not important.log logs/important.logPrepending an exclamation mark to a pattern negates it. If a file matches a pattern, but also matches a negating pattern defined later in the file, it will not be ignored.
*.log !important/*.log trace.*debug.log important/trace.log but not important/debug.logPatterns defined after a negating pattern will re-ignore any previously negated files.
/debug.logdebug.log but not logs/debug.logPrepending a slash matches files only in the repository root.
debug.logdebug.log logs/debug.logBy default, patterns match files in any directory
debug?.logdebug0.log debugg.log but not debug10.logA question mark matches exactly one character.
debug[0-9].logdebug0.log debug1.log but not debug10.logSquare brackets can also be used to match a single character from a specified range.
debug[01].logdebug0.log debug1.log but not debug2.log debug01.logSquare brackets match a single character form the specified set.
debug[!01].logdebug2.log but not debug0.log debug1.log debug01.logAn exclamation mark can be used to match any character except one from the specified set.
debug[a-z].logdebuga.log debugb.log but not debug1.logRanges can be numeric or alphabetic.
logslogs logs/debug.log logs/latest/foo.bar build/logs build/logs/debug.logIf you don't append a slash, the pattern will match both files and the contents of directories with that name. In the example matches on the left, both directories and files named logs are ignored
logs/logs/debug.log logs/latest/foo.bar build/logs/foo.bar build/logs/latest/debug.logAppending a slash indicates the pattern is a directory. The entire contents of any directory in the repository matching that name – including all of its files and subdirectories – will be ignored
logs/ !logs/important.loglogs/debug.log logs/important.logWait a minute! Shouldn't logs/important.log be negated in the example on the left Nope! Due to a performance-related quirk in Git, you can not negate a file that is ignored due to a pattern matching a directory
logs/**/debug.loglogs/debug.log logs/monday/debug.log logs/monday/pm/debug.logA double asterisk matches zero or more directories.
logs/*day/debug.loglogs/monday/debug.log logs/tuesday/debug.log but not logs/latest/debug.logWildcards can be used in directory names as well.
logs/debug.loglogs/debug.log but not debug.log build/logs/debug.logPatterns specifying a file in a particular directory are relative to the repository root. (You can prepend a slash if you like, but it doesn't do anything special.)

Note:

  • These explanations assume your .gitignore file is in the top level directory of your repository, as is the convention. If your repository has multiple .gitignore files, simply mentally replace "repository root" with "directory containing the .gitignore file" (and consider unifying them, for the sanity of your team).
  • Additionally, lines starting # are treated as comments and a \ character can be prepended to a character that usually has a special meaning to escape it (ie. match the literal character [ (using \[ in the .gitignore) that is in a file name instead of start a match group).

Example .gitattributes

# Use the 'line feed' EOL character for all files
*       eol=lf

# Use the 'carriage return & line feed' EOL character for text files
*.txt   eol=crlf

GitHub

GitHub is a remote Git service. This allows you to store Git repositories online so that individuals and teams can access and work on Git repositories and projects remotely. It offers many features on top of basic version control such as branch, issue and feature tracking, releases, CI/CD pipelines, project management and more. It's predominately used through its website which offers control of these features through a simple GUI. Throughout your time at Monash DeepNeuron, university and probably for the rest of your career (if in a software-based role), you will use services like GitHub to help manage the development of projects.

Your first task is to sign up for a GitHub account, if you haven't already. I would highly recommend using a personal email address (not a university one) as you will most likely want access to your account after university.

GitHub - Join

It is also a good idea to install the GitHub mobile app. This allows you to track and manage projects and reply to messages and Issues from your phone.

Setup

Once you have a GitHub account setup, it is a good idea to link it with your local Git configuration for your system. Open a new shell and run the following commands, filling in your details.

# Set the name that is identifiable for credit when reviewing version history
git config --global user.name "github-username"

# Set an Email address that will be associated with each history marker
git config --global user.email "github-email"

# Set automatic command line coloring for Git for easy reviewing
git config --global color.ui auto

GitHub Personal Access Tokens

To access private repos from GitHub you will need to either set up SSH (below) or generate a personal access token to use as a password for HTTP login. Follow this link and click the 'Generate new token' (select the 'classic' token option). Tick every box and name the token something like 'Full Access Token'. This will act as a universal password for accessing your GitHub recourses. Make sure to also set it to have no expiration date so it never becomes invalid. Once you generate the token make sure to copy and store it in a secure location as you will never be able to after this.

Note: It should be noted that this is not a best practice as losing this token or leaking it will completely expose your account. This is a method for convenience not safety. You should read the options and tick only what's necessary for a token to control in future.

SSH + Git

Typically you can just use the HTTP protocol to clone or upload repos, however, this is not as secure as using SSH and cloning private repos requires a Personal Access Token. Using SSH is generally far more secure and far more convenient. To do this, we must first install OpenSSH, the process of which can be different for every platform.

Install OpenSSH

Windows

Open PowerShell as Administrator and run the following commands.

# Check if OpenSSH Client and Server packages are already installed
Get-WindowsCapability -Online | Where-Object Name -like 'OpenSSH*'

# Install the OpenSSH Client
Add-WindowsCapability -Online -Name OpenSSH.Client~~~~0.0.1.0

# Install the OpenSSH Server
Add-WindowsCapability -Online -Name OpenSSH.Server~~~~0.0.1.0

Linux (Ubuntu)

sudo apt install openssh-client openssh-server

macOS

# Install OpenSSH with Homebrew
brew install openssh

SSH Keygen and Setup w/ GitHub

Next we generate a key using the ssh-keygen. Running the command below will begin the process for generating an SSH key. It will then prompt you to enter a file location. Just press enter to use the default location. It will then ask you to enter a passphrase. This is optional, but recommended.

ssh-keygen -t ed25519 -C "your_email@example.com"

Note: It is best to you the ed255519 algorithm however, on legacy systems this may not be available so the RSA-4096 algorithm should be used instead.

ssh-keygen -t rsa -b 4096 -C "your_email@example.com"

Once you have generated your key, you need to start an SSH agent and add the key.

# Start new SSH agent
eval "$(ssh-agent -s)"

# Point agent to key location key
ssh-add ~/.ssh/id_ed25519

You will then need to copy the public key to your clipboard. You can print the key using the cat command.

cat ~/.ssh/id_ed25519.pub

Then, go to your GitHub account, go to settings, and click on the SSH and GPG keys tab (or click this link). Click on "New SSH key", and paste the key into the box. Give it a name, and click "Add SSH key". You should now be able to clone repos using SSH. To do this, go to the repo you want to clone, but instead of copying the HTTP link, copy the SSH link, and then its regular Git cloning.

Creating and Cloning a Remote Repository

Creating and cloning a repository on GitHub is super simple. Simply go to github.com and click the green 'New' button on the left panel. This will instruct you on how to create a repository. You will have to give your repo a name, a description (optional) and attach a software license and README.md. Additionally, you can have it generate a .gitignore for the programming language that the source code of the repo will be written in to prevent commonly ignored files (e.g. executables and binaries) from being committed. This will then generate a very boilerplate repo which you can then clone using the clone command via SSH or HTTP.

# Clone with SSH
git clone git@github.com:<username>/<remote-repo-name>.git

# Clone with HTTP
git clone https://github.com/<username>/<remote-repo-name>.git

Uploading a Repository

Alternatively you can create a repository locally and upload it to GitHub at a later date.

Upload Empty Repository

If you want to upload a new empty repo you can do so by first creating the empty repo on GitHub followed by these commands to upload it to the new empty remote, replacing the content between the angle brackets (<>).

mkdir my-awesome-project
cd my-awesome-project
echo "# test" >> README.md
git init
git add README.md
git commit -m "first commit"
git branch -M main

# SSH
git remote add origin git@github.com:<username>/<remote-repo-name>.git
# HTTP
git remote add origin https://github.com/<username>/<remote-repo-name>.git

git push -u origin main

Upload Existing Repository

You can also upload an existing and well established project to GitHub by simply adding the new GitHub repo as a new remote location.

cd my-awesome-project

# SSH
git remote add origin git@github.com:<username>/<remote-repo-name>.git
# HTTP
git remote add origin https://github.com/<username>/<remote-repo-name>.git

git branch -M main
git push -u origin main

Note: A repository can have many remotes. The term origin isn't a special command or 'keyword' in Git but rather just the conventional name for the default remote host location for repository. You can replace origin with whatever name you like.

Push, Pull, Fetch

When interacting with remote repos, there are three key operations you will perform frequently. These are pushing your local changes to a remote repo, pulling and integrating new changes from a remote into your local repo and fetching the metadata of available changes.

Push

Pushing is the process of uploading our local changes to a remote location. You can also use the -u flag to set up a new remote reference if the local branch does not exist on the remote yet.

git push <remote-name>
git push -u <remote-name> <new-upstream-branch-name>

Pull

Pulling is the process of integrating remote changes into the current branch. Under the hood, git pull calls git fetch, forwarding all command line arguments from git pull to git fetch. If the current local branch is simply behind the remote's history the local branch will be fast-forwarded by default. If the branch's histories diverge then either git rebase or git merge will be run depending on the configuration of Git. This allows you to remain up to date with the remote as you develop locally if multiple other people are working on the same branch.

git pull <remote-name>

Fetch

Fetching is a very powerful tool. It allows us to pull the updated refs from a remote location without integrating the changes into your local repo meaning that we can view what has changed before our local history is fast-forwarded or merged with the corresponding remote branch.

git fetch <remote-name>

Forks

Forks are a combination of branches and clones of a repo. They allow you to copy a remote repo to another (usually remote) location that is independent of the 'upstream' but still retains a connection to its upstream source. Forks can exist for many reasons. A common one is you wish to extend the functionality of a codebase for your own or your organisation's/company's uses. You may also want to make and test an improvement or extension of the codebase and request the changes be integrated into the upstream source for other users to benefit from the change.

To fork a repository you can simply clone a repository and rename the remote name from origin to upstream and set the repo to track a different remote location as origin. This allows you to sync with the upstream remote but collaborate at origin. Alternatively, you can use GitHub and select the fork button on a repo's '<> Code' page and have GitHub set up the remote fork for you which can then allow you to simply clone the fork to your local machine to start developing on the fork.

When forking a repository you have to be wary of the software license the original source operates on as this restricts what you are able to do with the forked software and how you are allowed to publish and distribute it.

Template Repositories

When creating repositories, you have the option of creating the repository as a template repository. This can be useful for creating a skeleton for a project that you wish to distribute but be disconnected from the upstream source. This option is a checkbox located above the text field where you enter the repos name. You can then either clone the template to update the template itself or copy the template to use it as the starting point for your new project.

Collaborating

While GitHub can simply be used as a remote host for repos, allowing you to push and pull to and from a central location, the platform has many more features for allowing teams of people to collaborate on many different kinds of codebases and coordinate development with people from across the world.

GitHub flavoured Markdown

A large proportion of text written on GitHub is written in Markdown (like this book's source). Markdown is a markup language which can be used to describe the format and structure of written text, similar to HTML or LaTeX. There are many variations of Markdown which add various functionality to the language. GitHub uses its own version which extends the capabilities expressed in the CommonMark (standard) Markdown specification.

GitHub's extensions to Markdown include some common extensions found across many Markdown parsers including support for Strikethrough text, tables and task list items (checklist). These allow you to express more complex structures with Markdown, however, are not that interesting and specific to GitHub. You can see the syntax for most of Markdown in the second and third links above including these common extensions.

More interesting are the collaboration-based extensions to Markdown that are specific to GitHub. Within any Markdown text on GitHub, you are able to link and refer to Issues, discussions, Pull Requests (PR), commits, people and even organisation teams from any repo/org. This is done by using the characters # and @ as prefixes.

Prefix - #

You may notice that Issues, PRs and discussions have an associated #-number. This is used to reference the item within other items of the same or other repos. The # prefix can be used to create a link to the item in the Markdown of the current item. The Markdown editors on GitHub will even render a scrollable UI component you can use to search by name for the item you want to link when you start typing #. You can even reference items from different repos (as long as they are public) by prefixing the # with the name of the owner (GitHub username) and the repos name eg. MonashDeepNeuron\HPC-Training#1. You can also create a pretty link if you put the link somewhere in a list item.

Prefix - @

Like many social media platforms, the @ prefix is used to mention users and teams with a user's handle being their username. Like #, typing @ will bring up a list of users and teams you can mention, auto-completing the text if you select an option manually. Teams from organisations are prefixed with the orgs name (like cross repo item links) followed by the team name. For example, the Monash DeepNeuron GitHub organisation has a GitHub team called 'HPC Training' which can be mentioned like so @MonashDeepNeuron/hpc-training. Mentioning a GitHub team notifies the whole team. You can also use the @ prefix to link specific commits, however, no matter which repo it is from, you must prefix the whole thing with the repo owner and name eg. MonashDeepNeuron/HPC-Training@767c7f0.

Note: Team names can have spaces, but in links the spaces are replaced with dashes/hyphens (-).

Collaboration Guidelines and Documents

Within a repo, there are often many common documents that are used to describe the collaboration process and guidelines for an OSS. These documents are used to outline the behaviour expected by contributors, how to make contributions, code styling guidelines etc.. Here are some common documents you may find in a repo and the overarching purpose they serve to contributors and users.

Note:

  • Any kind of file can be used to store this as long as the text is accessible by users and contributors, so often plain text files (*.txt) or Markdown files (*.md) are used. Markdown is often favoured as it has better facilities to describe the structure of the document and can then be rendered on GitHub or externally.
  • It is a convention that this contributor documentation file be capitialised, however, it is not required.
  • None of these documents are strictly necessary for a repo to function, however, they can make it easier for people to start contributing to one.
  • README - This is the front page of your codebase. Often containing a summary of the project, a 'quickstart' tutorial and may some examples (all optional).
  • CONTRIBUTING - Describes the steps potential contributors should take to make contributions to the project.
  • CODE_OF_CONDUCT - The expected behaviour of contributors when interacting with each other and users.
  • GUIDELINES - Similar and like a combination of the CODE_OF_CONDUCT and the CONTRIBUTING documents.
  • LICENSE - Details the license the code operate under
  • INSTALL - Instructions on how to install, setup and use a project, either for development or end-user usage.
  • CITATION - If your work is academic in nature you can use this file to hold the appropriate citation so users of the project or people who took inspiration from the project can properly cite your work (often uses the *.cff extension).
  • ACKNOWLEDGMENTS / AUTHORS - A file listing the contributors, authors, co-authors, owners and/or co-owners of a repository. Can also be used to pay tribute to people who've helped with the project, particularly if the contributions were indirect or undocumented (advice etc.).
  • CHANGELOG - Used to describe the changes between one version to another. Many of these files could be littered throughout your project if it has many independent parts or has had a long lifetime.

Issues

Issues are a fundamental tool for users and collaborators to express faults in a codebase and track the progression to fixing those faults. Issues are like tickets on many general purpose project management tools. A new Issue can be raised by anyone on GitHub (provided the repo is public) to indicate a problem with the current codebase. This can be security bugs, incorrect documentation and source code alignment, unexpected behaviour or even new features users want to request from the project. Issues on GitHub have many additional features for contributors to be able to easily link development with Issue tracking. Additional features and controls for Issues can be found on the UI panel component to the right (except the discussion thread).

Discussion Thread

Following the original creation of an Issue and its description, collaborators can reply to the Issue using the discussion thread. This is a history of all actions and comments made on an Issue and can be used to focus discussions about the Issue at the Issue's location. You can also add reactions to an Issue or comment.

Closing an Issue

In the dialog box that is used to craft a comment, there is an option to close the Issue which marks the Issue as resolved or alternatively, you can close an Issue as "Won't Fix" meaning the Issue will not be resolved or considered. You can also reopen the Issue if it persists or becomes an Issue again.

Todos

Using a Markdown checklist in the description of an Issue will add a todo meter and item count to the top of the Issue's page as well as to the Issues row on the 'Issues' tab. List can be treated like a regular todo/checklist list for anything that needs to be done to complete the Issue where you can tick/untick items as you go and it will be reflected in the UI.

Assignees

Assignees allow you to add a person you want to work on a particular Issue. You can assign individual users or teams within an organisation.

Labels

Labels are descriptor tags that can make it easier to categorise Issues, allowing us to find or group common Issues together.

Projects

This section allows you to add your Issue to a project board and have it tracked by GitHub automatically allowing for better integration between a repo's Issues and the general management of the team or organisation.

Milestones

You can make an Issue a part of a milestone's progress.

Development

This section allows you to easily create and track branches linked to the progression of the Issue.

Notifications

Change your notification settings for the Issue

Participants

View who is/has been involved in an Issue.

Lock conversations

Lock the conversation from further modifications (usually for archival purposes)

Pin Issues

Pin the Issue on the repos main 'Issues' tab putting it at the forefront of all collaborators' view.

Convert to Discussion

Convert the Issue, its discussion thread and tags to a dedicated discussion.

Deleting an Issue

Delete the Issue from existence.

Pull Requests

Pull Requests (PRs) are requests from collaborators to pull in their new changes from a feature branch into one of the main development, testing or deployment branches. This allows the maintainers of a project to more carefully view the changes made and consider how they want the changes to be integrated. Maintainers and reviewers can also request the contributor to make changes to their contribution to make it fit better with the existing codebase. PRs are essential to quality assurance (QA) and ensuring that breaking changes are not introduced to a codebase unless explicitly allowed. A PR is essentially just a call to git merge with extra steps and oversight of the merge although a squash merge or rebase can be chosen instead. PRs can also be created as drafts, allowing you to continue to commit changes to the incoming branch without the risk of maintainers or other contributors completing the PR before it is ready.

Note: A squash merge will combine the history of all commits on the incoming branch into a single commit that is then added to the base branch.

Discussion Thread

Just like Issues, PRs also have a discussion thread that allows conversations about the PR to be in the same place as the PR.

Closing a PR

Again, like Issues, you can close a PR meaning it cannot be merged without reopening the PR.

Reviewers

This section allows for you to assign people you wish to review the changes. Many PRs are required to have at least one approved review for a PR to go through. Reviews look at the diff of the changes and can add comments as specific to a line or to an entire file or for the whole PR. Reviews can request that changes be made, blocking the PR until it has been resolved.

Assignees

Assignees allow you to add a person you want to work on merging a particular PR.

Labels

Labels are descriptor tags that can make it easier to categorise PRs which can make it easier to find or group common PRs together.

Projects

This section allows you to add your PR to a project board and have it tracked by GitHub automatically allowing better integration between a repo's PRs and the general management of the team or organisation.

Milestones

You can make a PR a part of a milestone's progress.

Development

This section shows the Issues that are linked to the PR. Issues can be linked by creating the branch from the Issue or by using 'Closing Keywords' in the description of the PR.

Notifications

Change your notification settings for the PR.

Participants

View who is/has been involved in a PR.

Lock conversations

Lock the conversation from further modifications (usually for archival purposes).

Milestones

Milestones are progression trackers that can be used to, well, mark important Milestones for a project and link this progress to the resolution of Issues or merging of PRs. Milestones also have an end date associated with them to help keep you on track. To create a Milestone, go to either the 'Issues' or 'Pull requests' tab of the repo and to the top right of the list views you will see a button called 'Milestones'. From this page, you can create a Milestone. Issues and PRs are added to the Milestone from the page of the particular time you want to add. Milestones are good for tracking large goals that span many Issues and PRs. You must go back to the Milestones page and close it manually to meet the Milestones deadline.

Tags and Releases

We saw Git chapter tags and how they can be used to create named commit points, often for marking versions of the codebase. GitHub takes this to the next step by allowing you to use tags to create GitHub releases where you can store your packaged versions of the codebase and share what has changed with your users. It can even generate crude changelogs for you based on you commit message (a good reason to create good commit messages).

Discussions

Discussions are places where you can ask questions, share ideas or have general conversations that aren't necessarily Issues. This can be a great place to discuss ideas or get further information on how to use a project or even collaborate on it. It is also beneficial to the maintainers as it can help surface things missing in the project's official documentation.

Discussions are very much like the Discussion Threads from Issues and PRs. You can comment, react with emojis and even upvote posts and comments. Top-level comments are also individual items which can be replied to directly making it easier to focus on sub-topics generated by comments. On Q&A discussions you can also mark particular comments as the answer like a mini StackOverflow.

Discussions can also be converted into an Issue making it easier to transfer a conversation that was once a Discussion into an Issue if the discussion resulted in an Issue being found or surfaced a desired feature from the community.

Licensing and OSS

When developing any software, the code and resulting binaries that you distribute will have to have some software license governing how the software can be used and redistributed. This is especially important for OSS as your source code, not just built binaries, are available for all to see and use. Being able to control how your source code is used, and being able to limit the amount of liability you hold as the creator from other peoples'/companies'/organisations' use of your software can become very important.

There are many existing OSS licenses, each with their own benefits, downfalls and use cases. Each allows for the software licensed under them to be used for different things and can also control how derivative work can be distributed and used. Here are a few of the most common OSS licenses with links to copies of them.

  • AGPL-3.0 - GNU Affero General Public License, a free copyleft license similar to GPL-3.0 but has an additional term to allow users who interact with the licensed software over a network to receive the source for that program.
  • Apache-2.0 - This copyright license allows anyone modify the original source as long as the source retains its original Apache-2.0 license and the modifications are listed in the distribution of the modified source. Authors of Apache-2.0 licensed source cannot be held liable under this license.
  • BSD-3-Clause - Simple copyright license that allows anyone to do whatever they want with BSD 3-Clause license software as long as the software retains its original BSD 3-Clause license. It also states that contributors of the original source cannot be used to endorse derivative works without explicit prior permission.
  • BSL-1.0 - Simple copyright license that allows anyone to do whatever they want with BSL-1.0 license software as long as the software retains it original BSL-1.0 license unless distributed as a compiled binary.
  • GPL-2.0 - GNU General Public License Version 2, a free copyleft license.
  • GPL-3.0 - GNU General Public License Version 3, a free copyleft license similar to GPL-2.0 but with stricter copyleft requirements.
  • LGPL-3.0 - GNU Lesser General Public License Version 3, a free copyleft license similar to GPL-3.0 but LGPL-3.0 software is able to be used and modified with non-GPL-3.0 license source as long as th originally LGPL-3.0 licensed source retains its original license and the LGPL-3.0 source can be replaced with other sources with no effect on the end users usage of the compiled program.
  • MIT - Simple copyright license allow anyone to do whatever they want with MIT license software as long as the software retains its original MIT license. Authors of MIT-licensed source cannot be held liable under this license.
  • MPL-2.0 - Mozilla Public License Version 2.0, is a middle-ground license that aims to balance the benefits of permissive licenses like MIT and copyleft licenses like GPL-3.0.
  • Unlicense - A license that releases the software into the public domain.

Workflow

In this page, we are going to explore a potential workflow you can use with Git and GitHub when collaborating with multiple people. No, it doesn't have a name, I'm not a marketing guy and it doesn't need one. The focus of this workflow is to allow as-easy-as-possible integration of many different features without breaking the entire system and its deployment by funneling changes down from feature (or patch) branches into staging branches and finally into the main deployment branch.

Main Deployment Branch

In essence, you have a single branch that is used as the deployment branch. You can of course call this whatever you want but we'll assume this is the main branch. From the main you make deployments and use tags to mark releases.

Development and Staging Branches

You also have a mirroring branch that acts as the source of both new feature branches and the staging area to ensure integrating multiple features does not break each other's new functionality or existing functionality. I typically call this dev but again, you are free to call it whatever you want. You can also have multiple staging levels if you want more rigorous filtering down of features. The dev branch is to be kept up to date with main. This should be easy to maintain if you only pull dev changes into main and don't update main directly but you can pull main back into dev if need be.

Creating Feature Branches

From the HEAD of dev new branches are created for feature development. This will ensure that if a separate but new feature is created and pulled into dev your new feature can benefit or not have issues later on merging the two histories together. If need be features can be created from any point in the dev history if you do not want to include branch-based staged changes. When a feature is based on an Issue use the GitHub 'Create branch from Issue'. Ensure to change the base branch to the dev branch. You can name these branches whatever you want but the GitHub generated one is good enough.

Merging Features to Development or Staging Branches

When a feature is ready to be integrated, create a PR from the feature branch into the dev branch (or appropriately levelled staging branch if applicable) and request reviews from the relevant people/maintainers. If you are not a maintainer then the reviews/maintainers will do the rest and should keep you updated on the PR's progress and may request clarification or revisions to make the changes fit into the existing/upcoming codebase. In the PR's description it is good to include a brief description/list of the changes you have made as well as link and related Issues, PRs, external resources or anything else relevant to the PR however, consult the CONTRIBUTING, CODE_OF_CONDUCT and GUIDELINES files for the repo to ensure you adhere to the specific repos contributing policies as every repo can be different, even if managed by the same people, company and/or organisation.

Merging Development or Staging Branches into the Main Deployment Branch

If you are a maintainer of a repo you may have to set up a PR for merging upcoming changes into the main branch. If you are maintaining a bigger project then coordinating with other maintainers and contributors about which changes you want to integrate into each release can make it easier to create the PRs and stage as many new changes in dev that is feasible so you can thoroughly test how the changes play together. You can make a single release for each PR into main or you can do multiple PRs if you have a more structured release schedule. Ensure you also have this PR reviewed by others as it can help catch problems before they hit your main branch. You can also branch the HEAD of main and merge a test PR into this new branch to do test deployments before pulling dev into main.

Notes

This is not a very thorough workflow but is aimed to be as simple as possible while still being effective for teams to collaborate together. This workflow does not govern coding practices, commit conventions nor stop bad code from entering a codebase but can help set up a skeleton for collaboration between multiple people. Maintainers and collaborators should still make use of the Issues, Discussions, Projects etc. to help coordinate and track problems and changes for a repo.

GitHub Pages

🚧 Under Construction 🚧

⚠️ Coming soon! 🏗️

Acknowledgements

This book is part of Monash DeepNeuron's collection of technical information and internal training resources. It is built and maintained internally by members of Monash DeepNeuron.

Authors

Contributors