WHAT? WHAT IS THIS?
Woah, woah, calm down.
Welcome to “Building git: a git tutorial”! This is a series of files and hopefully videos that you can use to get introduced to git. Since git is a topic that is strange to some, and challenging to others, I think this set of articles or videos will help out in bringing them down to Earth.
The whole approach to this tutorial, guide, book, videoblog, or whatever it becomes, is meant to be “graspable”. If you spot errors, or think of improvements, check the “Contributing” section.
How was this built?
This is actually just a huge tree written down in Gingko, and then exported and post processed by a small library I built on my own (gingko-to-markdown). After that, there’s a bot I payed for to commit it to my repository daily.
[Table of contents for level 1 only]
You and me have probably seen several git tutorials on the internet. And you may agree with me that they jump too quickly into the advanced stuff, and that look too complicated on their own.
If they don’t, it’s very likely that they don’t present the whole set of features that git has. And this sucks, because git is a wonderful tool, a swiss-army knife of development, and just using it to open your beer is cool, but not cool enough.
This tutorial pretends to be an easy-to-follow, progressive, light-hearted approach to git that is comprehensible to every level of developers. It’ll start with the basic concepts and later on jump to the real-deal of git usage.
About the author
Since we’re on the subject of what I’d do, let me introduce myself.
My name is Juan Diego Raimondi, called “JD” by everyone, and I work for a company called MakingSense as a solutions architect. I help support clients and internal or external projects, doing a variety of things to make sure we create things that make sense.
In case you wondered, yes, the company name itself is a bad pun but we’re that bold and we can afford to do it.
MakingSense has both products of their own and works with clients in different industries.
I’m usually doing this stuff on my spare time, but if you do like them and want more of my time into it, contact us and tell us “I want JD to do more open source stuff!”.
Now, if you want to contact me, personally, you can do so at Twitter.
Enough about me. Let’s talk about you.
What YOU can expect from this tutorial
- YOU can expect a simple to follow tutorial starting from very basic concepts
- YOU can expect a guide that’s extensive, but that you can check in pieces and can use as reference
- YOU can expect a light hearted approach to it all and to have a few laughs
Wow, it’s all you, you, you. Stop it already.
We’re going to start with the basic concepts of VCSs and progressively working our ways towards the absolute best approach that we can, ending up with what git actually is.
Note that while the tutorial is named “building git”, we’re not actually going to be building it, at least not literally. (It feels SO dumb to use this title while I’m drafting this and already making excuses.) We’re going to figuratively build our way towards what a VCS should do and once it makes sense, git is going to make that dream a reality.
Finally, you won’t become an expert, and things may not be exactly as accurate. This is on purpose (yeah, right). The point of the tutorial is to bring git’s complexities down to Earth for you to grasp them.
What you should NOT expect from this tutorial
This is not a list of best practices on git. This is going to simplify its use cases, and even touch on the advanced features, but this is not a comprehensive approach into how to use git for teams or even solo. Different things work for different people and different approaches have different pro-s and con-s. While I’d love to get into them, because I’m as opinionated as someone can be, this is not the place to do it.
This is not a description of git’s internal mechanisms. git is freaking complicated on the inside. Performance, conflict resolution strategies, cryptography, dark magic… everything here makes for the amazingly complex product that git is. We’re going to approach it from a user’s point of view, not from a git-internals developer’s point of view.
This is not a guide to using a particular git GUI. IMO, they assume too much, and they widely differ. This guide is going to base itself on the console usage of git. Once you understand what the console does, all GUIs should be very simple to understand.
I’ll assume that you have some knowledge about computers and some knowledge about code. You don’t need to be really experienced in any programming language, but if seeing code makes your skin crawl, you might not be in the right place.
I’ll assume that you have git installed and it is already in the PATH for your system, which means that you can start up a console, invoke “git” and it will call the git program. If you don’t have it or it doesn’t work that way, check the installing git tutorial.
I’ll be using bash files as code examples, because it’s simple to illustrate the point of the changes with them, and because they can be executed to perform some of the tasks that we want to use too. If you don’t get them that’s fine, there will be some explanation of what they do so you can follow along. Windows users, I’m sorry, but batch files are just awful and Powershell is great but I don’t like the syntax. (Plus, you need like a bazillion things before you can run it, and then policies and… no thanks, this is a tutorial on another thing.)
(Derived from the last point) I’m going to be using the bash shell. Again, there should be nothing crazy here, expect from moving around in repositories and what the prompt tells us. This should be pretty simple to follow and you can expect the same from other OSs, with minor differences that you can adjust changing your configuration.
Ok, let’s start.
[Table of contents from level 2]
What we want from a VCS
We first have to lay down what we expect from a VCS. In case you didn’t know about this, a VCS is a Version Control System. It allows you to maintain different versions of your code, and people use it for progressively tracking changes to what they do, be it code or not. Also, if you didn’t know about it, I’m not angry, just disappointing son.
VCS are a primordial tool in software development because we mess up. A lot. In order to make sure that the messes are small and controlled, we need to control as well the changes that we’re introducing to a system.
What we want is:
- To be able to make one change after the other
- To be able to go back and forth in between these changes
- To be able to come up with memorable names for each of these points in time
One change at a time
Imagine that you’re making several changes to your document, your code, your sweater knitting, or whatever it is you’re doing. It is always simpler and less error prone to make your changes one at a time, evaluate how everything is going and continue with the next change.
[image of circles, one after the other, connected with lines, line at the last circle with an arrow that says “you are here”]
In each of these checkpoints and evaluations, if you happen to find that something’s wrong, you don’t need to backtrace everything that you’ve done so far to find out where you messed up, but rather just from the last checkpoint. This means less wasted hours, less headaches, and a more predictable time usage.
Making one change at a time is pretty much what any organized person would do, and making these checkpoints come up naturally when you realize that whatever you’ve done so far can be considered one “change” in on itself.
Now, if there was a way to formalize these checkpoints… maybe committing to what you have done so far…
Going back and forth
Oops! You messed up. And you’ve been messing up for a while, since you did your checkpoints. You did our evaluations, but still something was wrong from the very start. I could have gotten away with it if it wasn’t for you meddling kids and your dog…
[image of circles, one after the other, connected with lines, some of those at the left are filled up in green, some others only have the green border. The last filled circle has an arrow to it saying “last good checkpoint”]
Now, not everything is lost. Our best approach at the moment is to go back as many checkpoints until we are in a way that was still correct. This will allow us to restart from that point without throwing away what has been done since then.
What we need to do now it so check-out how the situation was at that point in time, and go from there.
[same as last image of circles, but with a new set of changes branching from the last good known change, with more filling circles]
- “Hey Mike, let’s go back to the good checkpoint.”
- “You mean the last one?”
- “No, the one we found out was good… until we messed up”
- “Yesterday or the day before?”
- “Well… no.. ff..ARRRGHH” explodes
Naming things is useful. It allows us to refer without confusion to what we mean… unless you mess up with names too, and if that’s the case, I really suggest that you check with someone before having a baby.
[image of circles, the last known good change now has a sticker onto it that says “payment code done”, and the last one says “last change”]
You’re getting ahead of yourself
Have you guessed it already? Are you thinking in branches, commits and tags? Yes. Then stop. You’re getting ahead of yourself.
There are a few things we need to introduce first before we get there.
As you may imagine, most of our work will happen locally. This is, in our own computer. Later on we’ll learn how to collaborate with others, but, hey, baby steps.
Well, what are we going to be tracking changes to? It’s very likely that you’re working with files. So let’s track changes in those.
As obvious as this sounds, this is actually a design limitation that git has, and purposefully made. There are other changes that you can be making that are not files. Maybe you’re changing the configuration of your system. Maybe you’re tracking physical objects (why are you here, then?). #needs-more-examples
As such, our VCS (git) is going to operate on files. But operating on all files on the system would be inefficient. Your OS changes a lot of files every time that you do nothing and every time that you do something, so we clearly need to create a better scope for what git will track.
Let’s decide that git will track changes made to a particular directory and directories inside it. As such, we need to call git and indicate that this is what we want to use as our repository. We’ll call repository the set of files and folders that we’re going to track with this VCS, and we’ll call initializing to this process of starting out a new repository from a common folder.
Initializing a repository
If we jump into any directory, we can ask git to initialize it like this:
> git init Initialized empty Git repository in /home/alpha/building-git/.git/
As you see, git will tell us that it has initialized that repository, and that the folder is now considered a git repository. As well, you can see that it created a
.git directory in it. This is what it will use to store it’s inner workings, tracking devices and everything else to take care of this folder for you. This means that it will not mess up your directory existing files at all.
Similarly, if you want to specify a particular folder to be initialized, you can specify it:
> git init my-repo Initialized empty Git repository in /home/alpha/buiding-git/my-repo/.git/
Congratulations! You already have created your first repository and are ready to start tracking changes in it.
Staging Area / Index
Hold on. Isn’t it a little bit tyrannical if git were to track everything under that directory? I mean, we could have sensitive files, and I’m not talking files with very breakable feelings, but files with information that you don’t want to be stored forever in a repository.
For example, you may be developing an invoicing system and you need API codes to interact with a tax calculator. These API codes are private, and you may working with other people and sharing the repository with them. In cases like this, you may not want to share those particular files.
So, what we need is a way to tell git what we want to include there, instead of everything.
If you’ve worked into software development or even as a sysadmin, you may have heard of “staging” servers or environments. This is where you get “prepared” to test something out, before you actually commit to making it production. Git has the very same concept, and it’s called the staging area. Because of it’s technical nature, you may sometimes hear it being referred to as the index, but it’s more likely that you’ll see staging area.
What we want to do is to add the stuff that we want to track to our staging area, and if it all looks correct, we can add it to the repository so it is tracked.
How do we add files? Well…
> git add file1.txt > git add file2.txt > git status On branch master Initial commit Changes to be committed: (use "git rm --cached <file>..." to unstage) new file: file1.txt new file: file2.txt Untracked files: (use "git add <file>..." to include in what will be committed) passwords.txt
WOAH WOAH, what was that?
Well, we already introduced the concept of files being tracked, files not being tracked, and a staging area. It’s getting a little bit complex, so we don’t want to make this all work from memory, right?
So, we have a way to check which is our current status, which is appropriately called
status. Invoking git status will show a few things:
[image of a typical status result with pointers and numbers that refer to the enumeration below] #needs-contents
- Where are you located in the continuous set of changes that you’re making. (If it has a name, then what name will it have. More on this when we talk about branches and tags.)
- What contents does the staging area have.
- What files are not currently being tracked.
This makes it pretty easy to check if you accidentally added a file that you shouldn’t have added, or if you repent from all your sins and you want to take a file out of it.
This is the chance for you to check that the staging area is correct.
Not to blow your mind right now (just joking, I totally want to blow your mind right now), but you can actually have different parts of the same file in and out of the staging area. But this is for later, when we talk about interactive-adding and chunk adding.
At this point you have added and removed files from your staging area, and you’re ready to commit to those changes. Each commit will represent a snapshot of your repository, and if you lay down the snapshots side by side (which git totally does), you can also think of commits as what changes were performed since the last commit.
In order to commit the files that you have added to the staging area, you just:
By default, git will open a text editor to ask for a message to be displayed.
[image of a text editor prompting for a commit message] #needs-contents
You’ll also be able to review several of the details for the commit, and abort the operation if you want. Enter your commit message, save the file for the editor that git opened, and when you close it, git will do it’s magic.
[image of what git displays after creating a commit] #needs-contents
The rest of the development process for working on a feature is a repetition of the last steps.
- Make changes
- Reach a checkpoint, verify
- Add to staging area
- Rinse and repeat
Quick adding: adding all
You’ll find quite often that you have a pretty good knowledge of what your changes are, so you don’t want to add files to the staging area to review them. You just know that you want to add all pending changes to a commit.
Git has you covered.
git commit -a
With this option, you can bypass adding all files to the staging area and have git do that for you.
Commits are pretty much the central part of git. The bread and butter of this toast that is this VCS. Learn to love them. Put some thought into how you divide your work and how you signal what has been done in them. There will be a lot of pleasure from making your work easy for yourself, or a lot of pain and suffering for making your work difficult.
Let’s say that your environment has files that should not be committed to the repository. Files that should be worked on by every developer, because, for example, they contain local configurations, or usernames and passwords, or anything that you don’t want shared in your project history. This is specially useful if your project is open source, because while you want to share your code, you don’t want to share your credentials and accounts. (You’re not so open, are you now?)
While the usual way of dealing with this is using environment configuration and having your servers do the work, that configuration itself could be anything, from registry entries, to a database, to local files.
In the case of local files to the project — you guessed it — git will tell you that they are untracked. And if you commit all pending changes, those will go as well.
Finally, you may have a project build, or compilation, or transpiled files that you don’t want to go along with the source code. And this is perfectly fine and understandable. No, really, you don’t need to explain it. I get you.
Git allows you to ignore files that comply with an expression. Matching files one by one would be really tedious, but having expression allows for wildcards and the such. It is really not that difficult.
In order to do this, just create or edit the
.gitignore file in the root directory of the repository. Actually, you can have several of these files, each in different directories, and they’ll all be summed up to be applied to the current directory.
To keep it simple, let’s go with just one:
thumbs.db # 1 /config/passwords.txt # 2 *.tmp # 3 /dist/ # 4 !/dist/builder.js # 5
What this previous example is saying is:
- Ignore the
thumbs.dbfile, wherever it is (because we did not specify directories.)
- Ignore the file
passwords.txtthat is present in the
configdirectory. Any other
passwords.txtfile will not be ignored.
- Ignore any file with the extension
- Ignore the full contents of the directory
- Make an exception with the file
distdirectory, so do NOT ignore that one.
If you happen to know what a glob is, you should be on your way. If not, Wikipedia has a nice quick refresher).
After crafting your file, just save changes into it, and add the file to the index. Yes, the file is a file inside the repository, so it needs to be committed too. While this sounds strange, it is actually great: it means that different points in time can ignore a different set of files. You could create a base configuration, commit it, and then ignore it. This means that whoever makes changes into it, cannot commit changes to it accidentally. (They’d have to un-ignore it first.)
Ahh, I see you there, already committing like a crazy fella. Change after change, making your way to completion.
But life is not linear, is it? Sometimes, your work can deviate in different paths that you want to try before really knowing which of the paths you took is final. Maybe you need a fancy name to refer to the work that you’re doing. Or maybe someone else needs to work on the same code base, but on a different set of changes than you.
What you want is a level of isolation, and a pointer, some kind of marker that can be referred by name.
This last point is important: if you some some detail into the commits, you’d see that they can be identified by their hash. But remembering hashes is a pain, specially when they have thirty thousand and twelve digits. (Well, maybe not that much.)
Enter the tag. You can imagine branches as that: just fancy names or markers that you can give to commits.
Creating tags is as easy as any other command we have seen so far:
git tag v0.1
That’s it! Now you will have a marker that indicates that this commit, with the changes in it included are what you call “v0.1”.
They don’t need to be versions at all. They can be releases, they can be features being completed, they can be an important point in your project development life, or they can be the maiden name of your mother.
If you wish to see the list of tags that are present in your repository, you just need to write:
As you know by know, a tag is pretty much tag: a particular note, an indicator, something on that particular commit. For some operations, you will need something more reliable, something that actually has special information about the state of the repository at that point.
Come into scene: annotated tags.
For example, GitHub uses annotated tags to differentiate releases in a particular repository.
Annotated tags have everything that a regular tag has, plus:
- checksum of the contents
- tag author name, email
- tag date
- signature verifiable with GPG
So you can see why for particular purposes you’d prefer an annotated tag.
Creating one is as simple as it gets:
git tag -a v0.1 -m "First test version"
Checking out different points in time
“Ok, so that’s cool”, you say. “I have a bunch of commits, which represent a bunch of changes in time. I also have tags which gives them fancy names. But this is not very useful if I can’t move around.”
And you’re right, my friend. Moving around is one of the easiest things that you can do in git. And since you want to “check out” how a particular snapshot (commit) looked like, that’s all that it takes to go to it.
> git checkout v0.1 Note: checking out 'v0.1'. (a bunch of text on what you can do) HEAD is now at 091b0ed... Added files 1 and 2
What you’re seeing here is pretty much what you would expect. Git tells you “I’m moving here” and then it tells you “I’m here”. It tells you a portion of the commit hash and it shows you the first line of the commit message.
It tells you that
HEAD (the pointer to where you are) is at a particular number. What is that number?
Well, that number is the hash of the commit. The tag is the commit, because, if you remember, tags were nothing more than a fancy name to one of these snapshots.
You don’t need to checkout a particular tag. Anything that points to a snapshot will work.
Using the hash value of the commit:
> git checkout 091b0ede152bf754f28cf34d17f32990c13e7b4d HEAD is now at 091b0ed... Added files 1 and 2
Using just a portion of it:
> git checkout 091b0ed HEAD is now at 091b0ed... Added files 1 and 2 > git checkout 091b0ede152bf HEAD is now at 091b0ed... Added files 1 and 2 > git checkout v0.1 HEAD is now at 091b0ed... Added files 1 and 2
The text that you saw before will make more sense when we talk about branches, HEAD and the detached HEAD mode.
Now that you’re comfortable tagging commits with names, you’d probably seen a problem: you don’t have a way to refer to an in-progress set of changes. For example, you may tag a particular commit as “payment-code-finished” or “v0.1”. But how do you indicate that the set of commits that you’re working on is “new-version” or “cleaning-up-code” or “ongoing-development”?
This is what branches are for. Branches are just like tags in the sense that they are references to commits, but branches can move around.
This is how it works: where you are right now is called HEAD. HEAD is pretty much like the HEAD of a printer: it’s the point of reference of the “now” the “where we are” and the “where we can jam up the paper”.
When you move around, HEAD is there. So, when you make commits, HEAD now points to the new commit.
Tags point to a commit. And branches point to a commit… or you can make a branch point to HEAD until you move otherwise. Let me show you.
But first, let’s create a branch:
$ git branch develop $ git status On branch develop ... $ echo "new line" >> file2.txt $ git add file2.txt $ git commit -m "Added new line" [develop 5d0a91a] Added new line $ git status On branch develop ... `
So, as you can see, as long as we’re “standing” on a branch, making commits or updates to it will update the branch as well, which means that the branch is tracking our “HEAD” position.
Let’s walk step by step into what happened after executing each command:
$ git branch develop
[image, branch named develop pointing to a specific commit, a signaler indicating that our HEAD is there too]
$ echo "new line" >> file2.txt
[same image as before, but now indicating that there is something into a “working area”, mark it in a different color to indicate that is it not a commit]
$ git add file2.txt
[same image as before, but now the working area is colored in a different way and it is labeled staging area]
$ git commit -m "Added new line"
[same image as before, but without working area or staging area, now there is a new commit and it is labeled with the message. The branch is now pointing to it and the HEAD is now pointing to it.]
I’ve skipped over the
git status commands because they don’t perform any change on the structure of the tree.
Detached head mode
Now, you’ve seen how the branch that we’re in will follow around our HEAD pointer. However, it is possible to override this behavior if we were to not use a branch. For example, let’s continue from the very same commit that we just did, and we’ll make another commit where the
develop branch won’t be affected.
$ git checkout 5d0a91a You are in 'detached HEAD' state. ... $ echo "new line 2" >> file2.txt $ git commit -a -m "Added another new line" [detached HEAD ed29c4b] Added another new line $ git log --oneline ed29c4b Added another new line 5d0a91a Added new line c354288 Change 1 091b0ed Added files 1 and 2 $ git checkout develop Warning: you are leaving 1 commit behind, not connected to any of your branches: ed29c4b Added another new line If you want to keep them by creating a new branch, this may be a good time to do so with: git branch new_branch_name ed29c4b $ git log --oneline 5d0a91a Added new line c354288 Change 1 091b0ed Added files 1 and 2
As you can see, we went “back” to the
develop branch, and that meant leaving our commit abandoned. Since nothing is pointing to this commit, eventually git’s internal garbage collector will pick it up and remove it altogether. Git warns us about this and even gives us some instructions into how to create a branch that points to it.
At some point you may want to suspend your work and go back to the latest commit. Or switch to another branch. Whatever the reason, you want to leave your current changes behind, but you don’t want to lose them, and you don’t want to commit them either. Maybe you were just experimenting. Maybe it’s a work in progress. The world may never know.
Whatever the reason, git has you covered.
The stash is yet another storage method that git contains, this one being purely local (so you won’t be able to push it to a remote repository).
All you need to do to save your work to the stash is
And when you’re ready to come back to it:
git stash pop
A fun trick is that you can even pop the stash when you’re not in the same commit where you saved it (pushed it). This means you can move your current work around without having to commit it.
git stash git checkout another-branch git pop
You know what conflicts are, right? No? I’ll bash your head and we’ll see.
Whenever you’ve switched to another branch, or applied changes from stash (or from other places, as we’ll see), you’ll may get into conflicts. But what are conflicts in the first place?
Conflicts is git trying to apply two different changes to the same place in the same file at the same time.
Follow up this set of commands as an exercise.
# our base file to work with git checkout -b baseBranch echo "Line 1" >> myFile.txt echo "Line 2" >> myFile.txt git add myFile.txt git commit -m "Base commit" # first set of changes from base git checkout -b branch1 sed -i '/Line 2/Line 2 Modified 1/' myFile.txt git add myFile.txt git commit -m "Commit 1" # second set of changes from base git checkout baseBranch git checkout -b branch2 sed -i '/Line 2/Line 2 Modified 2/' myFile.txt git add myFile.txt git commit -m "Commit 2" # combine those changes -- uh oh git merge branch1
Pushing, fetching and pulling
- force pushing
- fetching strategies
Global, local and effective configuration.
Getting gitty with it
Chunk adding / interactive add
- Bare repositories
- Reference log
References and links
I don’t intend this tutorial to be very seriously taken, but I do intend it to be accurate. If you do find out inconsistencies, errors, or you want to suggest improvements, please do so. This full tutorial, its texts and videos are going to be available as a GitHub repository.
If you do want to contribute, there are several ways in which you can reach be (Twitter, Email, Facebook, LinkedIn, …), but I suggest that the best approach would be to actually visit the repository and open a ticket or a pull request.
This work is shared under the CC BY-NC-SA 4.0 license, which basically means you can do whatever you want with it as long as:
- You don’t commercialize it
- You don’t remove the attribution to the creators
Note that these are still possibilities, but the difference is that for it to be legal, you need to request permission from me. No biggie, I’m a reasonable guy, y’know? I’ll probably say yes unless you’re printing copies for the mafia (and you’re not giving me a good share).