Git vs TFS - Source Control07/02/2013
[This is a somewhat nerdy and abbreviated overview of today's Dirigo Tech topic. We use TFS, Git and Google Code. We’re constantly looking for new and better ways to develop. The discussion was lively.]
Lets take a look at the differences between a distributed version control system and a centralized versioning system. I want to make sure that we all understand the differences, as the two systems operate quite differently, and while it may seem that they are similar at first glance, their differences are actually quite substantial.
Here’s the basic gist that I think we can all agree upon: Microsoft's Team Foundation Server (TFS) highly discourages branching, while git/dvcs highly encourages it.
Here are three underlying reasons why, which I hope will shed some light on why TFS discourages branching, but why branching in general can be a huge asset in other systems.
- TFS is file-system based and not changeset-based. When you create a branch in TFS you are creating a copy of the entire directory. Creating branches are very expensive on multiple resources. Creating 10 branches from production is essentially copying that folder 10 times.
- Since the system is file-system based and not changeset-based, diffs are made using file-system properties. This is why you have to “check out” a file before it can be committed. If you overwrite that file without checking it out, TFS will not be able to detect a change.
- TFS does not support 3-way merges, but instead relies on the concept of “baseless” merges. TFS cannot show you a common ancestor between each branch where you are trying to merge to, which leaves you potentially solving a lot of conflicts when merging. The general idea is to store file systems and prevent users from working on the same file – hence check in check out and the read only attributes. Most of this is carried over from the early Microsoft versioning system called “Visual Source Safe” – TFS’s predecessor.\
Git / DVCS:
- Git is a Distributed Version Control System (DVCS). Changes are distributed between the users. Creating a branch is extremely quick and cheap, with very little overhead. Branching and merging is considered a daily workflow, and in Git merge conflicts can be resolved easily due to it’s concept of base merging (I’ll explain this in a bit). Binary data (blobs) can also be compared so that data is never duplicated across branches. If you create 10 branches from trunk, you’re just referencing a snapshot, so the branches take up bytes of data instead of duplicating the data over and over.
- Since Git is changeset-based, diffs and file comparisons are initially based on SHA1 hashes. So if you make a file system change, git will detect the hash difference and mark the file for check in. When checked in, only the content that changed actually goes up. Think of it like a snapshot of the data and then the differences on top. This is an important, fundamental concept to understand.
- Git supports the concept of “rebasing” or pulling in changes from trunk or even multiple sources. TFS does not rebase since it simply stores its changes as duplicate files on a server – so you will not rebase in TFS, that was a concept I had incorrect in our meeting this morning.
In Git if you have a base (typically trunk) Git will then automatically understand a common ancestor (the base) and merge changes only. This can be extremely powerful. For example, you can merge multiple branches or feature sets into or out of other branches. You can also cherry pick changesets out of branches, into branches, etc. It’s allows for a lot of flexibility that I don’t believe TFS affords. If you want to pull a change out you would have to completely revert to that check in (file/folder copy essentially), and from there I’m not sure where you would go. I guess you would just have to fix the problem and try to merge the future-commits in – it gets weird.
How changes are stored in Git / TFS
Let’s take a simple example of how TFS and Git would handle a simple text edit change. Let’s say I need to add a ‘.’ to the footer of my website. When I check those changes in, two variations can happen:
- In TFS, the entire file is checked in and can be diffed against prior releases. This is a simple file-based diff, line for line, against prior copies on the tfs server.
- In Git, you are only committing the additional ‘.’ – not the entire file. Under the hood the commit is saying something like, at line 219 column 24 there was an added period after the last word of the paragraph (I’m simplifying – it’s much smarter than that in practice!). If someone wants to revert this particular change, they are actually reverting the addition ‘.’, not a file comparison, and since the trunk can be compared against (common ancestor), branching, rebasing, cherry-picking, etc. is straightforward for the system to handle automatically for you.
So Dirigo is not going to switch wholesale to Git or SVN because of legacy systems, and since TFS has branching limitations we will ultimately end of up using TFS with a mostly-branchless paradigm. Git is known to be friendlier with branching. So again we shouldn’t try to force a Git model on a system that isn’t going to support it. Well thought out discussion here: http://stackoverflow.com/questions/4415127/git-vs-team-foundation-server
I want to be sure that everyone understands that branching and merging is often a good thing and not a scary complex beast, at least not inherently. And while the TFS branchless model may be more simple initially, it’s at the cost of flexible branching and merging which I personally believe should be a fundamental concept of any versioning system. Yes, we can still branch in TFS, but if we understand what’s happening under the hood, we can understand why it may be better not to.
TFS has served Dirigo well without any issues for more than three years. We manage around 50 million lines of code in TFS. It doesn’t get much simpler than a single release branch and merge back into the trunk. Assuming the trunk is development you have really easy merges, we remain agile and able to quickly service clients while steaming forward with development.