def make_object_header(object_type: str, object_contents: bytes) -> bytes:
return f"{object_type} {len(object_contents)}".encode("ascii") + b"\0"Git 2 exercise - Appendix
Git objects have to be formatted in a special way such that they are considered valid objects by git. We’ve talked about the individual objects This section briefly describes how they are formatted and provides example code to generate them using python.
Object header
Every git object is prefixed with a particular header, before being stored in the git object store. The header consists of object type, object size (in bytes) and a 0-byte as delimiter to the actual contents:
<object type> <object size>\0
The header can be generated using this function:
blob object
A “blob” object contains of any sequence of bytes and is used to represent file contents in git.
<header><contents>
def make_blob_object(contents: bytes) -> bytes:
return make_object_header("blob", contents) + contentstree object
A tree object contains a sequence of tree entries (either blobs or other trees). Each tree entry consists of a mode, a name and a hash.
- The mode is
- “100644” for a normal file
- “100755” for an executable file
- “40000” for a directory
- there exist other modes, but not every combination of numbers is valid
- The name represents the file or directory name
- The hash is the hash of the referenced item (blob or tree). The hash is stored in binary representation.
<header><mode1> <name1>\0<hash1><mode2> <name2>\0<hash2>...
def make_tree_object(items) -> bytes:
"""
items: list of (mode, name, hash) tuples
hash: a hashlib.sha1 object
"""
tree_object = b""
for mode, name, hash in sorted(items, key=lambda item: item[1]):
tree_object += (
f"{mode} {name}".encode("utf-8") + b"\0" + hash.digest()
)
return make_object_header("tree", tree_object) + tree_objectcommit object
The commit object contains references to a tree and zero or more references to parent commits. Hashes are stored in hexadecimal form. The format is as follows:
<header>tree <tree_hash>
parent <parent_hash>
...
author Author Name <author.name@email.example> <timestamp> <timezone>
committer Committer Name <committer.name@email.example> <timestamp> <timezone>
<commit message>
def make_commit_object(
tree,
author,
message,
parents=None,
committer=None,
author_time=None,
committer_time=None,
) -> bytes:
"""
all hashes must be hashlib.sha1 objects
"""
parents = parents or []
committer = committer or author
author_time = author_time or datetime.datetime.now().astimezone()
committer_time = committer_time or author_time
commit_object = f"tree {tree.hexdigest()}\n"
for parent in parents:
commit_object += f"parent {parent.hexdigest()}\n"
commit_object += f"author {author} {author_time.timestamp():.0f} {author_time:%z}\n"
commit_object += (
f"committer {committer} {committer_time.timestamp():.0f} {committer_time:%z}\n"
)
commit_object += "\n" + message
commit_object = commit_object.encode("utf-8")
return make_object_header("commit", commit_object) + commit_object