Skip to content
This repository has been archived by the owner on Jun 11, 2021. It is now read-only.

01.01 About Git

stuindhamma edited this page Jul 30, 2014 · 3 revisions

In software development, Git is a distributed revision control(1) and SCM (source code management)(2) system with an emphasis on speed.

Every Git working directory is a full-fledged repository(3) with complete history and full version tracking capabilities, not dependent on network access or a central server.

Git has two data structures: a mutable index (also called stage or cache) that caches information about the working directory and the next revision to be committed; and an immutable, append-only object database.

The object database contains four types of objects:

  • A blob (binary large object) is the content of a file. Blobs have no file name, time stamps, or other metadata.

  • A tree object is the equivalent of a directory. It contains a list of file names, each with some type bits and the name of a blob or tree object that is that file, symbolic link, or directory’s contents. This object describes a snapshot of the source tree.

  • A commit object links tree objects together into a history. It contains the name of a tree object (of the top-level source directory), a time stamp, a log message, and the names of zero or more parent commit objects.

  • A tag object is a container that contains reference to another object and can hold additional meta-data related to another object. Most commonly, it is used to store a digital signature of a commit object corresponding to a particular release of the data being tracked by Git.

The index serves as connection point between the object database and the working tree.

Each object is identified by a SHA-1(4) hash(5) of its contents. Git computes the hash, and uses this value for the object’s name. The object is put into a directory matching the first two characters of its hash. The rest of the hash is used as the file name for that object.

Git stores each revision of a file as a unique blob. The relationships between the blobs can be found through examining the tree and commit objects. Newly added objects are stored in their entirety using zlib(6) compression. This can consume a large amount of disk space quickly, so objects can be combined into packs, which use delta compression(7) to save space, storing blobs as their changes relative to other blobs.

Footnotes:

(1) DRCS (distributed revision control) takes a peer-to-peer approach to version control, as opposed to the client-server approach of centralized systems. Rather than a single, central repository on which clients synchronize, each peer’s working copy of the codebase is a complete repository. Distributed revision control synchronizes repositories by exchanging patches (sets of changes) from peer to peer.

(2) Version control management of information that is documented and regularly revised, edited and updated.

(3) A hierarchical file system that contains, files and directories, records of changes in the repository, a set of commit objects and a set of references to commit objects, called heads.

(4) Secure Hashing Algorithm - 1; a type of cryptographic hashing algorithm used to protect sensitive information.

(5) An irreversible fixed length value created by the hashing algorithm.

(6) A software library used for data compression.

(7) A way of storing or transmitting data in the form of differences between sequential data rather than complete files.

References:

Clone this wiki locally