|  | 
|  | 1 | +gitdatamodel(7) | 
|  | 2 | +=============== | 
|  | 3 | + | 
|  | 4 | +NAME | 
|  | 5 | +---- | 
|  | 6 | +gitdatamodel - Git's core data model | 
|  | 7 | + | 
|  | 8 | +SYNOPSIS | 
|  | 9 | +-------- | 
|  | 10 | +gitdatamodel | 
|  | 11 | + | 
|  | 12 | +DESCRIPTION | 
|  | 13 | +----------- | 
|  | 14 | + | 
|  | 15 | +It's not necessary to understand Git's data model to use Git, but it's | 
|  | 16 | +very helpful when reading Git's documentation so that you know what it | 
|  | 17 | +means when the documentation says "object", "reference" or "index". | 
|  | 18 | + | 
|  | 19 | +Git's core operations use 4 kinds of data: | 
|  | 20 | + | 
|  | 21 | +1. <<objects,Objects>>: commits, trees, blobs, and tag objects | 
|  | 22 | +2. <<references,References>>: branches, tags, | 
|  | 23 | +   remote-tracking branches, etc | 
|  | 24 | +3. <<index,The index>>, also known as the staging area | 
|  | 25 | +4. <<reflogs,Reflogs>>: logs of changes to references ("ref log") | 
|  | 26 | +
 | 
|  | 27 | +[[objects]] | 
|  | 28 | +OBJECTS | 
|  | 29 | +------- | 
|  | 30 | + | 
|  | 31 | +All of the commits and files in a Git repository are stored as "Git objects". | 
|  | 32 | +Git objects never change after they're created, and every object has an ID, | 
|  | 33 | +like `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`. | 
|  | 34 | + | 
|  | 35 | +This means that if you have an object's ID, you can always recover its | 
|  | 36 | +exact contents as long as the object hasn't been deleted. | 
|  | 37 | + | 
|  | 38 | +Every object has: | 
|  | 39 | + | 
|  | 40 | +[[object-id]] | 
|  | 41 | +1. an *ID* (aka "object name"), which is a cryptographic hash of its | 
|  | 42 | +  type and contents. | 
|  | 43 | +  It's fast to look up a Git object using its ID. | 
|  | 44 | +  This is usually represented in hexadecimal, like | 
|  | 45 | +  `1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a`. | 
|  | 46 | +2. a *type*. There are 4 types of objects: | 
|  | 47 | +   <<commit,commits>>, <<tree,trees>>, <<blob,blobs>>, | 
|  | 48 | +   and <<tag-object,tag objects>>. | 
|  | 49 | +3. *contents*. The structure of the contents depends on the type. | 
|  | 50 | +
 | 
|  | 51 | +Here's how each type of object is structured: | 
|  | 52 | + | 
|  | 53 | +[[commit]] | 
|  | 54 | +commit:: | 
|  | 55 | +    A commit contains the full directory structure of every file | 
|  | 56 | +    in that version of the repository and each file's contents. | 
|  | 57 | +    It has these these required fields | 
|  | 58 | +    (though there are other optional fields): | 
|  | 59 | ++ | 
|  | 60 | +1. The *files* in the commit, stored as the *<<tree,tree>>* ID | 
|  | 61 | +   of the commit's base directory. | 
|  | 62 | +2. Its *parent commit ID(s)*. The first commit in a repository has 0 parents, | 
|  | 63 | +  regular commits have 1 parent, merge commits have 2 or more parents | 
|  | 64 | +3. An *author* and the time the commit was authored | 
|  | 65 | +4. A *committer* and the time the commit was committed. | 
|  | 66 | +5. A *commit message* | 
|  | 67 | ++ | 
|  | 68 | +Here's how an example commit is stored: | 
|  | 69 | ++ | 
|  | 70 | +---- | 
|  | 71 | +tree 1b61de420a21a2f1aaef93e38ecd0e45e8bc9f0a | 
|  | 72 | +parent 4ccb6d7b8869a86aae2e84c56523f8705b50c647 | 
|  | 73 | +author Maya <maya@example.com> 1759173425 -0400 | 
|  | 74 | +committer Maya <maya@example.com> 1759173425 -0400 | 
|  | 75 | + | 
|  | 76 | +Add README | 
|  | 77 | +---- | 
|  | 78 | ++ | 
|  | 79 | +Like all other objects, commits can never be changed after they're created. | 
|  | 80 | +For example, "amending" a commit with `git commit --amend` creates a new | 
|  | 81 | +commit with the same parent. | 
|  | 82 | ++ | 
|  | 83 | +Git does not store the diff for a commit: when you ask Git to show | 
|  | 84 | +the commit with linkgit:git-show[1], it calculates the diff from its | 
|  | 85 | +parent on the fly. | 
|  | 86 | + | 
|  | 87 | +[[tree]] | 
|  | 88 | +tree:: | 
|  | 89 | +    A tree is how Git represents a directory. | 
|  | 90 | +    It can contain files or other trees (which are subdirectories). | 
|  | 91 | +    It lists, for each item in the tree: | 
|  | 92 | ++ | 
|  | 93 | +1. The *filename*, for example `hello.py` | 
|  | 94 | +2. The *type*: either <<blob,`blob`>> (a file), `tree` (a directory), | 
|  | 95 | +  or <<commit,`commit`>> (a Git submodule, which is a | 
|  | 96 | +  commit from a different Git repository) | 
|  | 97 | +3. The *file mode*. Git has these file modes. which are only | 
|  | 98 | +   spiritually related to Unix permissions: | 
|  | 99 | ++ | 
|  | 100 | +  - `100644`: regular file (with type `blob`) | 
|  | 101 | +  - `100755`: executable file (with type `blob`) | 
|  | 102 | +  - `120000`: symbolic link (with type `blob`) | 
|  | 103 | +  - `040000`: directory (with type `tree`) | 
|  | 104 | +  - `160000`: gitlink, for use with submodules (with type `commit`) | 
|  | 105 | + | 
|  | 106 | +4. The <<object-id,*object ID*>> with the contents of the file or directory | 
|  | 107 | ++ | 
|  | 108 | +For example, this is how a tree containing one directory (`src`) and one file | 
|  | 109 | +(`README.md`) is stored: | 
|  | 110 | ++ | 
|  | 111 | +---- | 
|  | 112 | +100644 blob 8728a858d9d21a8c78488c8b4e70e531b659141f README.md | 
|  | 113 | +040000 tree 89b1d2e0495f66d6929f4ff76ff1bb07fc41947d src | 
|  | 114 | +---- | 
|  | 115 | +
 | 
|  | 116 | +[[blob]] | 
|  | 117 | +blob:: | 
|  | 118 | +    A blob object contains a file's contents. | 
|  | 119 | ++ | 
|  | 120 | +When you make a commit, Git stores the full contents of each file that | 
|  | 121 | +you changed as a blob. | 
|  | 122 | +For example, if you have a commit that changes 2 files in a repository | 
|  | 123 | +with 1000 files, that commit will create 2 new blobs, and use the | 
|  | 124 | +previous blob ID for the other 998 files. | 
|  | 125 | +This means that commits can use relatively little disk space even in a | 
|  | 126 | +very large repository. | 
|  | 127 | + | 
|  | 128 | +[[tag-object]] | 
|  | 129 | +tag object:: | 
|  | 130 | +    Tag objects contain these required fields | 
|  | 131 | +    (though there are other optional fields): | 
|  | 132 | ++ | 
|  | 133 | +1. The *ID* and *type* of the object (often a commit) that they reference | 
|  | 134 | +2. The *tagger* and tag date | 
|  | 135 | +3. A *tag message*, similar to a commit message | 
|  | 136 | + | 
|  | 137 | +Here's how an example tag object is stored: | 
|  | 138 | + | 
|  | 139 | +---- | 
|  | 140 | +object 750b4ead9c87ceb3ddb7a390e6c7074521797fb3 | 
|  | 141 | +type commit | 
|  | 142 | +tag v1.0.0 | 
|  | 143 | +tagger Maya <maya@example.com> 1759927359 -0400 | 
|  | 144 | +
 | 
|  | 145 | +Release version 1.0.0 | 
|  | 146 | +---- | 
|  | 147 | + | 
|  | 148 | +NOTE: All of the examples in this section were generated with | 
|  | 149 | +`git cat-file -p <object-id>`. | 
|  | 150 | + | 
|  | 151 | +[[references]] | 
|  | 152 | +REFERENCES | 
|  | 153 | +---------- | 
|  | 154 | + | 
|  | 155 | +References are a way to give a name to a commit. | 
|  | 156 | +It's easier to remember "the changes I'm working on are on the `turtle` | 
|  | 157 | +branch" than "the changes are in commit bb69721404348e". | 
|  | 158 | +Git often uses "ref" as shorthand for "reference". | 
|  | 159 | + | 
|  | 160 | +References can either refer to: | 
|  | 161 | + | 
|  | 162 | +1. An object ID, usually a <<commit,commit>> ID | 
|  | 163 | +2. Another reference. This is called a "symbolic reference". | 
|  | 164 | +
 | 
|  | 165 | +References are stored in a hierarchy, and Git handles references | 
|  | 166 | +differently based on where they are in the hierarchy. | 
|  | 167 | +Most references are under `refs/`. Here are the main types: | 
|  | 168 | + | 
|  | 169 | +[[branch]] | 
|  | 170 | +branches: `refs/heads/<name>`:: | 
|  | 171 | +    A branch refers to a commit ID. | 
|  | 172 | +    That commit is the latest commit on the branch. | 
|  | 173 | ++ | 
|  | 174 | +To get the history of commits on a branch, Git will start at the commit | 
|  | 175 | +ID the branch references, and then look at the commit's parent(s), | 
|  | 176 | +the parent's parent, etc. | 
|  | 177 | + | 
|  | 178 | +[[tag]] | 
|  | 179 | +tags: `refs/tags/<name>`:: | 
|  | 180 | +    A tag refers to a commit ID, tag object ID, or other object ID. | 
|  | 181 | +    There are two types of tags: | 
|  | 182 | +    1. "Annotated tags", which reference a <<tag-object,tag object>> ID | 
|  | 183 | +       which contains a tag message | 
|  | 184 | +    2. "Lightweight tags", which reference a commit, blob, or tree ID | 
|  | 185 | +       directly | 
|  | 186 | ++ | 
|  | 187 | +Even though branches and tags both refer to a commit ID, Git | 
|  | 188 | +treats them very differently. | 
|  | 189 | +Branches are expected to change over time: when you make a commit, Git | 
|  | 190 | +will update your <<HEAD,current branch>> to point to the new commit. | 
|  | 191 | +Tags are usually not changed after they're created. | 
|  | 192 | + | 
|  | 193 | +[[HEAD]] | 
|  | 194 | +HEAD: `HEAD`:: | 
|  | 195 | +    `HEAD` is where Git stores your current <<branch,branch>>, | 
|  | 196 | +    if there is a current branch. `HEAD` can either be: | 
|  | 197 | ++ | 
|  | 198 | +1. A symbolic reference to your current branch, for example `ref: | 
|  | 199 | +   refs/heads/main` if your current branch is `main`. | 
|  | 200 | +2. A direct reference to a commit ID. In this case there is no current branch. | 
|  | 201 | +   This is called "detached HEAD state", see the DETACHED HEAD section | 
|  | 202 | +   of linkgit:git-checkout[1] for more. | 
|  | 203 | + | 
|  | 204 | +[[remote-tracking-branch]] | 
|  | 205 | +remote-tracking branches: `refs/remotes/<remote>/<branch>`:: | 
|  | 206 | +    A remote-tracking branch refers to a commit ID. | 
|  | 207 | +    It's how Git stores the last-known state of a branch in a remote | 
|  | 208 | +    repository. `git fetch` updates remote-tracking branches. When | 
|  | 209 | +    `git status` says "you're up to date with origin/main", it's looking at | 
|  | 210 | +    this. | 
|  | 211 | ++ | 
|  | 212 | +`refs/remotes/<remote>/HEAD` is a symbolic reference to the remote's | 
|  | 213 | +default branch. This is the branch that `git clone` checks out by default. | 
|  | 214 | + | 
|  | 215 | +[[other-refs]] | 
|  | 216 | +Other references:: | 
|  | 217 | +    Git tools may create references anywhere under `refs/`. | 
|  | 218 | +    For example, linkgit:git-stash[1], linkgit:git-bisect[1], | 
|  | 219 | +    and linkgit:git-notes[1] all create their own references | 
|  | 220 | +    in `refs/stash`, `refs/bisect`, etc. | 
|  | 221 | +    Third-party Git tools may also create their own references. | 
|  | 222 | ++ | 
|  | 223 | +Git may also create references other than `HEAD` at the base of the | 
|  | 224 | +hierarchy, like `ORIG_HEAD`. | 
|  | 225 | + | 
|  | 226 | +[[index]] | 
|  | 227 | +THE INDEX | 
|  | 228 | +--------- | 
|  | 229 | +The index, also known as the "staging area", is a list of files and | 
|  | 230 | +the contents of each file, stored as a <<blob,blob>>. | 
|  | 231 | +You can add files to the index or update the contents of a file in the | 
|  | 232 | +index with linkgit:git-add[1]. This is called "staging" the file for commit. | 
|  | 233 | + | 
|  | 234 | +Unlike a <<tree,tree>>, the index is a flat list of files. | 
|  | 235 | +When you commit, Git converts the list of files in the index to a | 
|  | 236 | +directory <<tree,tree>> and uses that tree in the new <<commit,commit>>. | 
|  | 237 | + | 
|  | 238 | +Each index entry has 4 fields: | 
|  | 239 | + | 
|  | 240 | +1. The *<<tree,file mode>>* | 
|  | 241 | +2. The *<<blob,blob>> ID* of the file | 
|  | 242 | +3. The *file path*, for example `src/hello.py` | 
|  | 243 | +4. The *stage number*, either 0, 1, 2, or 3. This is normally 0, but if | 
|  | 244 | +   there's a merge conflict there can be multiple versions of the same | 
|  | 245 | +   filename in the index. | 
|  | 246 | +
 | 
|  | 247 | +It's extremely uncommon to look at the index directly: normally you'd | 
|  | 248 | +run `git status` to see a list of changes between the index and <<HEAD,HEAD>>. | 
|  | 249 | +But you can use `git ls-files --stage` to see the index. | 
|  | 250 | +Here's the output of `git ls-files --stage` in a repository with 2 files: | 
|  | 251 | + | 
|  | 252 | +---- | 
|  | 253 | +100644 8728a858d9d21a8c78488c8b4e70e531b659141f 0 README.md | 
|  | 254 | +100644 665c637a360874ce43bf74018768a96d2d4d219a 0 src/hello.py | 
|  | 255 | +---- | 
|  | 256 | + | 
|  | 257 | +[[reflogs]] | 
|  | 258 | +REFLOGS | 
|  | 259 | +------- | 
|  | 260 | + | 
|  | 261 | +Every time a branch, remote-tracking branch, or HEAD is updated, Git | 
|  | 262 | +updates a log called a "reflog" for that <<references,reference>>. | 
|  | 263 | +This means that if you make a mistake and "lose" a commit, you can | 
|  | 264 | +generally recover the commit ID by running `git reflog <reference>`. | 
|  | 265 | + | 
|  | 266 | +A reflog is a list of log entries. Each entry has: | 
|  | 267 | + | 
|  | 268 | +1. The *commit ID* | 
|  | 269 | +2. *Timestamp* when the change was made | 
|  | 270 | +3. *Log message*, for example `pull: Fast-forward` | 
|  | 271 | +
 | 
|  | 272 | +Reflogs only log changes made in your local repository. | 
|  | 273 | +They are not shared with remotes. | 
|  | 274 | + | 
|  | 275 | +You can view a reflog with `git reflog <reference>`. | 
|  | 276 | +For example, here's the reflog for a `main` branch which has changed twice: | 
|  | 277 | + | 
|  | 278 | +---- | 
|  | 279 | +$ git reflog main --date=iso --no-decorate | 
|  | 280 | +750b4ea main@{2025-09-29 15:17:05 -0400}: commit: Add README | 
|  | 281 | +4ccb6d7 main@{2025-09-29 15:16:48 -0400}: commit (initial): Initial commit | 
|  | 282 | +---- | 
|  | 283 | + | 
|  | 284 | +GIT | 
|  | 285 | +--- | 
|  | 286 | +Part of the linkgit:git[1] suite | 
0 commit comments