Medium 9780596520120

Version Control with Git

Views: 664
Ratings: (0)

Version Control with Git takes you step-by-step through ways to track, merge, and manage software projects, using this highly flexible, open source version control system.

Git permits virtually an infinite variety of methods for development and collaboration. Created by Linus Torvalds to manage development of the Linux kernel, it's become the principal tool for distributed version control. But Git's flexibility also means that some users don't understand how to use it to their best advantage. Version Control with Git offers tutorials on the most effective ways to use it, as well as friendly yet rigorous advice to help you navigate Git's many functions.

With this book, you will:

  • Learn how to use Git in several real-world development environments
  • Gain insight into Git's common-use cases, initial tasks, and basic functions
  • Understand how to use Git for both centralized and distributed version control
  • Use Git to manage patches, diffs, merges, and conflicts
  • Acquire advanced techniques such as rebasing, hooks, and ways to handle submodules (subprojects)
  • Learn how to use Git with Subversion

Git has earned the respect of developers around the world. Find out how you can benefit from this amazing tool with Version Control with Git.

List price: $27.99

Your Price: $22.39

You Save: 20%


16 Slices

Format Buy Remix

1. Introduction


No cautious, creative person starts a project nowadays without a back-up strategy. Because data is ephemeral and can be lost easilythrough an errant code change or a catastrophic disk crash, sayit is wise to maintain a living archive of all work.

For text and code projects, the back-up strategy typically includes version control, or tracking and managing revisions. Each developer can make several revisions per day, and the ever-increasing corpus serves simultaneously as repository, project narrative, communication medium, and team and product management tool. Given its pivotal role, version control is most effective when tailored to the working habits and goals of the project team.

A tool that manages and tracks different versions of software or other content is referred to generically as a version control system (VCS), a source code manager (SCM), a revision control system (RCS), and with several other permutations of the words revision, version, code, content, control, management, and system. Although the authors and users of each tool might debate esoterics, each system addresses the same issues: develop and maintain a repository of content, provide access to historical editions of each datum, and record all changes in a log. In this book, the term version control system (VCS) is used to refer generically to any form of revision control system.


2. Installing Git


At the time of this writing, Git is (seemingly) not installed by default on any GNU/Linux distribution or any other operating system. So, before you can use Git, you must install it. The steps to install Git depend greatly on the vendor and version of your operating system. This chapter describes how to install Git on Linux and Microsoft Windows and within Cygwin.

Many Linux vendors provide pre-compiled, binary packages to make installation of new applications, tools, and utilities easy. Each package specifies its dependencies, and the distributions package manager typically installs the prerequisites and the desired package in one (well-orchestrated and automated) fell swoop.

On most Debian and Ubuntu systems, Git is offered as a collection of packages, where each package can be installed independently depending on your needs. The primary Git package is called git-core, documentation is available in git-doc, and there are other packages to consider, too:


3. Getting Started


Git manages change. Given that intent, Git shares much with other version control systems. Many tenetsthe notion of a commit, the change log, the repositoryare the same, and workflow is conceptually similar among the corpus of tools. However, Git offers many novelties, too. The notions and practices of other version control systems may work differently in Git or may not apply at all. Yet, no matter what your experience, this book explains how Git works and teaches mastery.

Lets get started.

Git is simple to use. Just type git. Without any arguments, Git lists its options and the most common subcommands:

For a complete (and somewhat daunting) list of git subcommands, type git help --all.

As you can see from the usage hint, a small handful of options apply to git. Most options, shown as [ARGS] in the hint, apply to specific subcommands.

For example, the option --version affects the git command and produces a version number:


4. Basic Git Concepts


The previous chapter presented a typical application of Gitand probably sparked a good number of questions. Does Git store the entire file at every commit? Whats the purpose of the .git directory? Why does a commit ID resemble gibberish? Should I take note of it?

If youve used another version control system (VCS), such as Subversion or CVS, the commands in the last chapter likely seemed familiar. Indeed, Git serves the same function and provides all the operations you expect from a modern VCS. However, Git differs in some fundamental and surprising ways.

In this chapter, we explore why and where Git differs by examining the key components of its architecture and some important concepts. Here we focus on the basics and demonstrate how to interact with one repository. Chapter11 explains how to work with many interconnected repositories. Keeping track of multiple repositories may seem like a daunting prospect, but the fundamentals you learn in this chapter apply just the same.


5. File Management and the Index


When your project is under the care of a version control system, you edit in your working directory and commit your changes to your repository for safekeeping. Git works similarly but inserts another layer, the index, between the working directory and the repository to stage, or collect, alterations. When you manage your code with Git, you edit in your working directory, accumulate changes in your index, and commit whatever has amassed in the index as a single changeset.

You can think of Gits index as a set of intended or prospective modifications. You add, remove, move, or repeatedly edit files right up to the culminating commit, which actualizes the accumulated changes in the repository. Most of the critical work actually precedes the commit step.

Remember, a commit is a two-step process: stage your changes and commit the changes. An alteration found in the working directory but not in the index isnt staged and thus cant be committed.


6. Commits


In Git, a commit is used to record changes to a repository.

At face value, a Git commit seems no different from a commit or check-in found in other version control systems. However, under the hood, a Git commit operates in a unique way.

When a commit occurs, Git records a snapshot of the index and places that snapshot in the object store. (Preparing the index for a commit is covered in Chapter5.) This snapshot does not contain a copy of every file and directory in the index, because such a strategy would require enormous and prohibitive amounts of storage. Instead, Git compares the current state of the index to the previous snapshot and so derives a list of affected files and directories. Git creates new blobs for any file that has changed and new trees for any directory that has changed, and it reuses any blob or tree object that has not changed.

Commit snapshots are chained together, with each new snapshot pointing to its predecessor. Over time, a sequences of changes is represented as a series of commits.


7. Branches


A branch is the fundamental means of launching a separate line of development within a software project. A branch is a split from a kind of unified, primal state, allowing development to continue in multiple directions simultaneously and, potentially, to produce different versions of the project. Often, a branch is reconciled and merged with other branches to reunite disparate efforts.

Git allows many branches and thus many different lines of development within a repository. Gits branching system is lightweight and simple. Moreover, Git has first-rate support for merges. As a result, most Git users make routine use of branches.

This chapter shows you how to select, create, view, and remove branches. It also provides some best practices, so your branches dont twist into something akin to a manzanita.[17]

A branch can be created for a countless number of technical, philosophical, managerial, and even social reasons. Here is just a smattering of common rationales:


8. Diffs


A diff is a compact summary of the differences (hence the name diff) between two items. For example, given two files, the Unix and Linux diff command compares the files line by line and summarizes the deviations in a diff, as shown in the following code. In the example, initial is one version of some prose and rewrite is a subsequent revision. The -u option produces a unified diff, a standardized format used widely to share modifications.

Lets look at the diff in detail. In the header, the original file is connoted by --- and the new file by +++. The @@ line provides line number context for both file versions. A line prefixed with a minus sign (-) must be removed from the original file to produce the new file. Conversely, a line with a leading plus sign (+) must be added to the original file to produce the new file. A line that begins with a space is the same in both files, and is provided by the -u option as context.

By itself, a diff offers no reason or rationale for a change, nor does it justify the initial or final state. However, a diff offers more than just a digest of how files differ. It provides a formal description of how to transform one file to the other. (Youll find such instructions useful when applying or reverting changes.) In addition, a diff can be extended to show differences among multiple files and entire directory hierarchies.


9. Merges


Git is a distributed version control system (DVCS). It allows a developer in Japan, say, and another in New Jersey to make and record changes independently, and it permits the two developers to combine their changes at any timeall without a central repository. In this chapter, well learn how to combine two or more different lines of development.

A merge unifies two or more commit history branches. Most often, a merge unites just two branches, although Git supports a merge of three, four, or many branches at the same time.

In Git, a merge must occur within a single repositorythat is, all the branches to be merged must be present in the same repository. How the branches come to be in the repository is not important. (As you will see in Chapter11, Git provides mechanisms for referring to other repositories and for bringing remote branches into your current working repository.)

When modifications in one branch do not conflict with modifications found in another branch, Git computes a merge result and creates a new commit that represents the new, unified state. But when branches conflict, which occurs whenever changes compete to alter the same line of the same file, Git does not resolve the dispute. Instead, Git marks such contentious changes as unmerged in the index and leaves reconciliation to you, the developer. When Git cannot merge automatically, its also up to you to make the final commit once all conflicts are resolved.


10. Altering Commits


A commit records the history of your work and keeps your changes sacrosanct, but the commit itself isnt cast in stone. Git provides several tools and commands specifically designed to help you modify and improve the commit history cataloged within your repository.

There are many valid reasons why you might modify or rework a commit or your overall commit sequence:

You can fix a problem before it becomes a legacy.

You can decompose a large, sweeping change into a number of small, thematic commits. Conversely, you can combine individual changes into a larger commit.

You can incorporate review feedback and suggestions.

You can reorder commits into a sequence that doesnt break a build requirement.

You can order commits into a more logical sequence.

You can remove debug code committed accidentally.

As youll see in Chapter11, which explains how to share a repository, there are many more reasons to change commits prior to publishing your repository.


11. Remote Repositories


So far, youve worked almost entirely within one, local repository. Now its time to explore the much-lauded distributed features of Git and learn how to collaborate with other developers via shared repositories.

Working with multiple and remote repositories adds a few new terms to the Git vernacular.

A clone is a copy of a repository. A clone contains all the objects from the original; as a result, each clone is an independent and autonomous repository and a true, symmetric peer of the original. A clone allows each developer to work locally and independently without centralization, polls, or locks. Ultimately, its cloning that allows Git to scale to projects that are large and dispersed.

Essentially, separate repositories are useful:

Whenever a developer works autonomously.

Whenever developers are separated by a wide area network. A cluster of developers in the same location may share a local repository to amass localized changes.

Whenever a project is expected to diverge significantly along separate development paths. Although the regular branching and merging mechanism demonstrated in previous chapters can handle any amount of separate development, the resulting complexity may become more trouble than its worth. Instead, separate development paths can use separate repositories, to be merged again whenever appropriate.


12. Repository Management


This chapter presents two approaches to managing and publishing repositories for cooperative development. One approach centralizes the repository; the other distributes the repository. Each solution has its place, and which is right for you and your project depends on your requirements and philosophy.

However, no matter which approach you adopt, Git implements a distributed development model. For example, even if your team centralizes the repository, each developer has a complete, private copy of the repository and can work independently. The work is distributed, albeit coordinated through a central, shared repository. The repository model and the development model are orthogonal characteristics.

Some version control systems use a centralized server to maintain a repository. In this model, every developer is a client of the server, which maintains the authoritative version of the repository. Given the servers jurisdiction, almost every versioning operation must contact the server to obtain or update repository information. Thus, for two developers to share data, all information must pass through the centralized server; no direct sharing of data between developers is possible.


13. Patches


Designed as a peer-to-peer version control system, Git allows development work to be transferred directly and immediately from one repository to another using both a push and a pull model.

Git implements its own transfer protocol to exchange data between repositories. For efficiency (to save time and space), Gits transfer protocol performs a small handshake, determines what commits in the source repository are missing from the target, and finally transfers a binary, compressed form of the commits. The receiving repository incorporates the new commits into its local history, augments its commit graph, and updates its branches and tags as needed.

Chapter11 mentioned that HTTP can also be used to exchange development between repositories. HTTP is not nearly as efficient as Gits native protocol, but it is just as capable of moving commits to and fro. Both protocols ensure that a transferred commit remains identical in both source and destination repositories.


14. Hooks


You can use a Git hook to run one or more arbitrary scripts whenever a particular event, such as a commit or a patch, occurs in your repository. Typically, an event is broken into several prescribed steps, and you can tie a custom script to each step. When the Git event occurs, the appropriate script is called at the outset of each step.

Hooks belong to and affect a specific repository and are not copied during a git clone. In other words, hooks you set up in your private repository are not propagated to and do not alter the behavior of the new clone. If for some reason your development process mandates hooks in each coders personal development repository, arrange to copy the directory .git/hooks through some other (nonclone) method.

A hook runs either in the context of your current, local repository or in the context of the remote repository. For example, fetching data into your repository from a remote repository and making a local commit can cause local hooks to run; pushing changes to a remote repository may cause hooks in the remote repository to run.


15. Combining Projects


There are many reasons to combine outside projects with your own. A submodule is simply a project that forms a part of your own Git repository but also exists independently in its own source control repository. This chapter discusses why developers create submodules and how Git attempts to deal with them.

Earlier in this book, we worked with a repository named public_html that we imagine contains your web site. If your web site relies on an AJAX library such as Prototype or jQuery, youll need to have a copy of that library somewhere inside public_html. Not only that, youd like to be able to update that library automatically, see what has changed when you do, and maybe even contribute changes back to the authors. Or maybe, as Git allows and encourages, you want to make changes and not contribute them back but still be able to update your repository to their latest version.

Git does make all these things possible.

But heres the bad news. Gits initial support for submodules was unapologetically awfulfor the simple reason that none of the Git developers had a need for them. At the time this book is being written, the situation has only recently started to improve.


16. Using Git with Subversion Repositories


As you become more and more comfortable with Git, youll likely find it harder and harder to work without such a capable tool. But sometimes youll have to do without Gitsay, if you work with a team whose source code is managed by some other version control system. (Subversion, for example, is popular among open source projects.) Fortunately, the Git developers have created numerous plug-ins to import and synchronize source code revisions with other systems.

This chapter demonstrates how to use Git when the rest of your team employs Subversion. This chapter also provides guidance if more of your teammates want to make the switch to Git, and it explains what to do if your team wants to drop Subversion entirely.

To begin, lets make a shallow clone of a single Subversion branch. Specifically, lets work with the source code of Subversion itself (which is guaranteed to be managed with Subversion for as long as this book is in print) and a particular set of revisions, 33005 through 33142, from the 1.5.x branch of Subversion.



Print Book

Format name
File size
2.33 MB
Read aloud
Format name
Read aloud
In metadata
In metadata
File size
In metadata