self-hosting git; or, how git servers actually work, and how to keep yours secure

march 7, 2021

notices and foreword

Disclaimer: I’m no sysadmin extraordinaire, please please please correct me if there’s something blatantly wrong here:

alex@nytpu.com

Also, I have to thank a whole bunch of resources that helped me put this together:

june’s post on their cgit setup

the git book - chapter 4 and 10 for this in particular, but the whole thing is great

zx2c4 for writing cgit, but also for writing actually good documentation

Finally, to see my setup in action before we get started:

https://git.nytpu.com/

https://golang.nytpu.com/

why?

Well, I’m sure most people here on Gemini recognize the value of having control over your own software and data, but for git specifically there’s enough possible objections to where I feel I should address them.

The first major objection I’ve heard (and had myself in the distant past) is “well how do people contribute to your code?” The answer is actually how git was originally intended to be used prior to github’s idea of locking people in and forcing them to make an account. Just use email! On your READMEs just make a note of what email you want people to send patches to, customarily a mailing list, but a private email works just fine.

git-send-email.io is a great tutorial!

Submitting an email is really scary, I know I was really worried I messed something up the first time I sent a patch. I actually didn’t that time, but in later times I did. But, maintainers are always nice and don’t mind if you mess something up, and it’s trivial to resend a fixed patch again, no more stressful than editing a pull request.

just how do git protocols work?

Before we can get to setting stuff up, we need to go over some prerequisite information about how cloning/pulling and pushing works in git.

“local protocol”

The local protocol is what is used when you do something like `git clone ../repo`, all it does is /essentially/ copy the repo from one place to another. There’s a bit of other stuff that goes on, but in principle it really is just copying the folder. The interesting thing is that you can actually push and pull from that the same you could as any other protocol. Not /particularly/ useful, but you could have a full “stable” repo instead of just a branch if you wanted to, or you could have the “remote” on a network disk.

ssh protocol

Now, the ssh protocol works exactly the same as the local protocol, just over an ssh connection. The same way you can `rsync local1 local2` or `rsync local1 user@remote:remote1`, or `borg create localrepo::name ~` and `borg create user@remote:remoterepo::name ~`, you can do with git. Just use a ssh name and path instead of just a path, and you’ve got it working.

For something like a code forge, they use a special shell that comes with git called the `git-shell`, and all it does is allow you to do is `git receive-pack`, `git upload-pack`, and `git upload-archive` (these are what is used behind the scenes for pushing and pulling), and it doesn’t allow interactive use which is why you can’t `ssh git@git.sr.ht` and start issuing commands. However, in principle it’s the exact same. And, on your self-hosted git, you actually can just use your regular ssh user the same way. For instance, my remotes are alex@nytpu.com:pub/.git, instead of a special user like `git@nytpu.com`.

Edit (2021-03-08, suggested by Drew DeVault): If you want to make it read-only, you have to give ssh access to the machine still, but since it respects unix file permissions you can simply make the repos read-only and that'll stop pushing.

git protocol

The git protocol works the same way as the ssh protocol, but doesn’t require authentication or encryption, and is restricted to only what actions a git-shell can do. And yes, it supports all the actions, you can technically make a push-able git: repo, but you just never would because it’s publicly modifiable. It’s generally just used as an alternative to the http protocols when cloning.

“dumb” http

The “dumb” http protocol literally just expects a bare git repo to be served as raw files by a web server. The client just reads the files in the bare repo and constructs diffs, figures out what content to pull, etc. This is the only method cgit supports on it’s own without anything else. It’s also /really/ slow, because the client has to go and parse and create the logs, graph, history, the whole thing every time so it can figure out what to get.

“smart” http

“Smart” http is what you see used on something like github for pulling and pushing, but even if it’s a read-only http link you usually still use the smart http because it’s so much faster than the “dumb” is. It’s essentially like the ssh protocol, but instead of shoving {recieve,upload}-pack and upload-archive into an ssh connection, they’re instead shoved into an http connection. Otherwise it’s the same principle, authenticate in a protocol-specific way then shove the data from these git commands into the stream. The main difference is that smart http lets you have a mix of authenticated and unauthenticated users, unlike the completely unauthenticated git and dumb http protocols. This mix of authenticated and unauthenticated is usually used to make it read-only for everyone, and read-write for authorized users, but you could technically make it work like the git protocol and let everyone write, or make it like dumb http and make it completely read-only.

summary

So that’s all the background, basically all you need to know is that there’s four major protocols for accessing remote git repositories:

ssh — authenticated only, read-write/read-only, as fast as just rsyncing files. Encrypted.
git — unauthenticated only, read-write or read-only, as fast as “smart” http. Unencrypted.
“dumb” http — unauthenticated only, always read-only, very slow due to all the background client stuff that happens every connection. Encrypted or unencrypted depending on if HTTPS or HTTP is used.
“smart” http — authenticated and unauthenticated, read-write/read-only, fast. Encrypted or unencrypted depending on if HTTPS or HTTP is used.

We’re going to get set up with smart http (but still read-only), and ssh for our read-write access. We can also optionally make them read-only accessible over the git protocol.

setting up cgit

cgit itself

Just install it. I’m going to be using arch package names for this whole thing, but it should be trivial to find what you need for your distro. `sudo pacman -S cgit` and you’re ready. We need to configure it, but there’s no default or example configuration file! Luckily, I’m providing you all with my cgitrc, and if you need to make any changes the cgitrc(5) man page is very good:

https://git.zx2c4.com/cgit/tree/cgitrc.5.txt

my annotated cgitrc

Also, the cgitrc references a custom.css, given here:

files/cgit/custom.css

Finally, cgit has these thing’s called “filters,” that are used as an html preprocessor before displaying the file. cgit comes with some pretty good ones, and I use its provided one for the source-filter, but I wrote my own about-filter to add support for scdoc.

files/cgit/cgit-about-filter

That’s about it for cgit itself, the rest is for nginx.

nginx

I assume you have a working nginx installation, and all we need to do is add a new subdomain `git.example.com`. If you don’t already have it set up, then there’s numerous tutorials online, go get it working and come back.

Something that’s good for nginx security but bad for trying to run stuff is that nginx has no built-in cgi support, and it only supports fastcgi, while cgit is a regular cgi script. Luckily, there’s a simple little thing called fcgiwrap to the rescue! Just install fcgiwrap (`sudo pacman -S fcgiwrap`), and start up its service (if you’re unfortunate enough to be using systemd on your server, there’s a `fcgiwrap.socket`). That’s it, no more configuration required for fcgiwrap!

Before we even start getting nginx setup, we’re going to address a very common issue with cgit: cloning through http is /really fucking slow/. This is because cgit only supports the “dumb” http protocol (see above)! Luckily, git comes with a script called `git-http-backend` that supports smart http. In the nginx config we’ll just match paths that are queried for by a smart git request and pass those to git-http-backend. If we didn’t do this, those requests would 404 and the client would fall back to cgit’s dumb http.

Now, this is where people can get worried about security. Well, as long as you haven’t given out your ssh key, then your ssh is safe. cgit’s dumb http is safe, and when we set up the git protocol daemon later on, it disables pushing unless you explicitly enable it on a per-repo basis (which you shouldn’t, that’s public!). The smart http is all we have to worry about, but it’s fine just like this. It won’t enable pushing unless you set up authentication in your nginx config, so unless you just authenticate people willy-nilly it’s all read-only!

Now all you just need to make a new config in sites-available, for instance, mine’s git.nytpu.com.conf, and then enable it once you get it all pasted in.

finally, here’s my annotated nginx config. it’s stripped down a bit to just the relevant parts, make sure to top it off with your own specific configuration

A final note, is that if you mess around with css, or even just change repo descriptions, you’re going to need to delete the cache files in order for your changes to show up immediately. For instance, I run `sudo rm /var/cache/cgit/*` (which is pretty dangerous, what if my history got corrupted and put a space after “cgit/”?).

There! Look you lucky duck, you have your cgit all set up once you restart nginx! Before you start doing anything though, let’s get a regular git protocol server set up and show you how to actually set up repos.

git protocol

Just for fun, let’s set up a git protocol server. However, my setup is literally word-for-word from the git book, all I changed is the path it’s serving from. Just follow this link:

git book: 4.5 Git on the Server - Git Daemon

A few notes: make sure to punch open port 9418 outgoing on your server. Also, if you’re a clever boi and already have repos set up, it won’t work quite yet, I’ll go over it in the next section.

actually using the darn thing

Now we gotta work on getting repos set up on your server. First off is creating them, even if you have a local repo you want to push up, you need to create an empty repo on the server to push to.

i just threw together a script for creating a new repo since it’s such a common thing to do. make sure to change the first cd to your public directory.

It initializes a bare repo (i.e. no working copies of files), and cds into it. Bare repos traditionally have “.git” on the end to indicate it’s a bare repo. Don’t worry, our cgit configuration set it up so the .git is stripped off when creating the repo name. It then touches “git-daemon-export-ok” which is what tells the git daemon we set up last section that it’s okay to make this repo public. If you didn’t touch this, then the git daemon would be smart enough to think that maybe you accidentally put it in the public folder and won’t serve it. Finally, it opens up the “description” file in the standard editor so you can set it. Now all you do is `git remote add origin user@host:pub/newrepo.git` and it’s all set up!

Edit (2021-03-08): I forgot to say this when I first wrote this but you can actually have “private” and “public” repos. For instance, I serve my repos from ~/pub/, but I also have a ~/priv/ that I can still push/pull to over ssh. When I’m ready to make a repo public, I just move the repo from priv/ to pub/ and change my remote url’s path. As simple as that!

bonus: fixing go’s mess for them: setting up a shim for go’s meta tags

I host a few go repos, none of them are libraries but I am partway through writing an actual library that people won’t^W^W^W/will/ want to use. The thing is, google has the /outright hubris/ to expect a language-agnostic, no, file agnostic (git works for anything, not just source code) version control system to include programming-language–specific meta tags! At least they didn’t continue going for hardcoding specific code forges. I guess since google is the centralization monopoly of the world they probably don’t quite understand that the entire point of git is that it’s decentralized, but I mean, come on, you can’t be so sheltered as to not know you can self host git. They didn’t even include sourcehut, they basically said “if you’re not on github or gitlab, you’re SOL.”

drew devault says a thing about how shitty pkg.go.dev is.

But that’s neither here nor there, they discarded the latter proposal, but you still need the go-specific tags. Luckily, the author of cgit also wrote a little script to automatically serve these for us on a new subdomain (mine’s golang.nytpu.com). Unluckily, the dependencies are really egregious unless you already have them installed.

golang-package-server: this very dumb script will serve up the proper go-import and godoc meta tags [and redirect to the source code when necessary].

He gives /no/ documentation, and I’ve never once in my life used uwsgi before this, so I’ll share what I got working for your benefit.

First, clone golang-package server wherever you want (I’m using “/usr/share/webapps/” for the rest of this). Edit repos.txt and load up the name of your modules and a link to the repo in cgit.

my repos.txt. note that it must be tab separated.

Now we have to install uwsgi and the uwsgi python plugin. We’re going to be using uwsgi’s “emperor” mode because it’s the easiest. Start out by checking that /etc/uwsgi/emperor.ini exists, and if not create it and use mine:

files/cgit/golang-package-server/emperor.ini

Then, create /etc/uwsgi/vassals/ and edit /etc/uwsgi/vassals/golang-package-server.ini with this:

my golang-package-server.ini. make sure to change `chdir` to the directory to *above* where golang-package-server is, not in the golang-package-server directory itself. also make sure to change the socket path if necessary.

Now you can start up the uwsgi emperor daemon using your preferred metod.

Finally, we need to set up our new nginx config:

all you should need to change is the socket path, if necessary.

And that’s it, you can go to golang.nytpu.com and get a list of hosted repos, import golang.nytpu.com/LIBRARY as normal, and if you go to golang.nytpu.com/LIBRARY you’ll be redirected to git.nytpu.com/LIBRARY!

appendix

You are in a gemlog post. There is an appendix with strange gothic lettering here.

> examine appendix

The engravings translate to "This space intentionally left blank."