Rewriting History - Updating All Occurrences of a Name/Email in Git

Rewriting History - Updating All Occurrences of a Name/Email in Git

It turns out Git really likes to keep things around. That's kinda it's whole job, and it's damn good at it. Unfortunately, sometimes we really don't want that and need Git to forget about something. Maybe you left a company or school and you want to fix old email references, or maybe you need to remove some string from a repo for security purposes.

Whatever the reason, sometimes you just need to really dig into it and scrub your git history squeaky clean. This guide will teach you the tools at your disposal to make these dreams a reality, whatever that may mean to you! It will have a strong focus on names and emails, since it's mostly a document of everything that I would have liked to know when I first needed to worry about this.

I am not at all responsible for anything that breaks in the process of doing this yourself. Many of these tools rewrite git history, so you can very very easily fuck up a repo with this so please please be sure to do a backup (see below).

What This Guide Is Not About

  • Removing files - some of these will do that, but this is mostly about names and emails, in files, commit messages, and commit metadata. Removing files is a whole different world that many people smarter than me have already figured out, just go look that up.
  • Removing things from the last commit - you want git rm. Google it and see how it works! It's pretty cool and newish.

TLDR

If you can't change git history, go and rename your GitHub account and add a .mailmap file to the repo that makes the change you want.

If you can re-write history:

  1. Make a backup BEFORE you do anything else. Copy the whole repo and put it in a tarball then chuck it somewhere
  2. The single best tool for this is git-filter-repo. It's modern, fast, can do pretty much anything, and is better than all the other tools that aim to do this. It can be used to apply a mailmap permanently in git history AND even change all occurrences of a string to another one in place in history without leaving a trace afterwards. It's the perfect tool, see below for usage.

Without further adieu, let's get started with out first option:

Don't Change History At All

Often times, this is the best you can do on a public project if you can't expect everyone else to make major changes to their workflow . It's more limited but it will help to change what most people see most of the time in a lot of cases.

Rename Your GitHub Account

GitHub is pretty good about displaying your name and username as it currently is on the platform anywhere it can, so this is really the easiest step to cover a lot of cases. This will change what is shown for you pretty much anywhere there is a UI, but won't affect the repo itself and won't affect mentions of your username in issues and the like.

It won't affect what people see in the Git CLI or non-GitHub Git GUIs (and who the heck uses the GitHub GUI app? It's absolute shit, go use SourceTree instead). It also won't work on patch URLs (like this https://github.com/Cobular/attila/commit/ba7f69b2838434aa8b6992b4c1eaa80bb0839249.patch) but I think most people forget those even exist anyway.

If you want your changes to be respected by other tools:

.mailmap File

Git has a feature for this already! This doesn't change history at all but does automatically change nearly all Git UIs (inc. the CLI, GitHub, etc) show for committer and author info. It may not be supported by all tools that directly read from the .git folder, but it seems very well supported by most things. Still, if you need to entirely scrub an old name or email, this won't be enough.

The syntax is pretty weird but well documented. Put this in a .mailmap file at the root of the project. See the small example below but really just look at the docs (linked below the example)

  • New Name <email to match>
  • New Name <new email> <email to match>
  • New Name <new email> Old Name <email to match>

line 1 - replaces all names associated with a given email, keeping the email the same ‌‌
line 2 - replaces all names and emails on commits matching the given email ‌‌
line 3 - replaces all names and emails on commits matching the given name-email combo

If you would like to apply this syntax permanently without leaving a trace, look at the git-filter-repo example below!

Git - gitmailmap Documentation

Documentation for Git Mailmap Files. Nice opengraph image u got there git!

Re-write History

All the following tools will allow you to re-write git history. This has a lot of consequences, including changing the hashes that git uses to identify literally everything. This will break things. It is usually not a problem if it's just you working on it, but it may (probably will) cause issues if you're trying to work with others.

Before You Start

Backups

When you're breaking stuff like this, backups are important. Especially when they involve force pushing your changes over the server's copy. Please do a backup. I'm begging you. Just do it. Thanks.

There's a lot of ways to back this stuff up. You need more than just a copy on GitHub as your backup plan, since that's what you're inevitably trying to change permanently. It should be perfectly safe to make a copy of your repo, .git and current working directory and all, so you should probably do that!

Also, hold onto it for a bit locally after you push! You might not notice some major issues for a while, so I recommend making a tarball, archiving it somewhere, and putting in a reminder to come back and delete it in a few months or something if nothing comes up.

Git GC

After changing any of this stuff, you might have a lot of orphan commits sitting around outside the branches with all the old information. These will be very hard to access now, but if you want to remove them still, use something like $ git reflog expire --expire=now --all && git gc --prune=now --aggressive to totally clean these out of your local copy.

This won't remove them from GitHub however. They are a little bit iffy on if they ever GC things (I have a commit that's been out of history for over a year and still up), but they do have a procedure on getting things cleaned up which you can follow right here.

git clone --mirror

If you're messing with history, you should always use this instead of a normal clone. Basically, this does a super full clone of every ref and also sets it up so that everything will be overridden on the remote when pushed. This can be dangerous since it means you can totally break everything if you mess up, but it's also the best way to get everything properly changed the easiest (and you already made a backup, right? No? Go do it now. I'll be here when you get back). ‌‌For reference: https://git-scm.com/docs/git-clone#Documentation/git-clone.txt---mirror

Spot Check Things

Before you do anything, it'd be a good idea to go looking through history and grab a random selection of things you're going to change so you can follow up later and make sure it worked. Make sure you don't copy the hashes, since you'll probably be changing these, consider getting things at a ref like a branch head tag, or relative to one using that relative commit syntax thing with the colons and tildes. These should stay reliable despite the underlying commits changing.

Tools

Git-filter-repo

The ultimate multitool for this kind of work, it'll be able to do everything you want and more. It's not as well known as it's frankly less useful cousins, so if you get anything out of this guide, I hope it'll be this!

It will leave some breadcrumbs (in the form of files mapping old commits to new ones and the like), please see this for how to remove them.

Pros

  • Very very versatile, superseeds everything filter-branch and the BFG can do
  • Faster than filter-branch
  • Handles more edge cases than either other tool, like updating hashes in commit messages
  • Solid docs full of specific examples

Cons

  • Not as many ready-to-run examples on stackoverflow as the other tools, but it's so much easier to use it's not really an issue.

Repo: https://github.com/newren/git-filter-repo#why-filter-repo-instead-of-other-alternatives

Docs: https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html

Usage examples:

Make Mailmap Changes Permanent

  1. Create the mailmap file as described above
  2. git filter-repo --mailmap my-mailmap

Ref

Change text content to more than just ***REMOVED***

  1. Create an expression file:
    p455w0rd
    foo==>bar
    glob:*666*==>
    regex:\\bdriver\\b==>pilot
    literal:MM/DD/YYYY==>YYYY-MM-D
    regex:([0-9]{2})/([0-9]{2})/([0-9]{4})==>\\3-\\1-\\2
  2. git filter-repo --replace-text expressions.txt

Ref (see for an explanation of the expression file format)

Change commit message text

  1. Create an expression file like above
  2. git filter-repo --replace-message expressions.txt

Ref

Just Nuke the History

This is the simplest option (and really the only other option than filter-repo worth considering) but it also leaves you with no commit history, just one commit with the current status of everything. It will clean up all the history and often is all you need for a personal project, but make extra sure this is what you want to do!!!

Pros

  • Very easy!
  • Very fast!

Cons

  • All the commit history is gone
  • You'll literally only have one commit after, be sure this is what you want!
#!/bin/bash

# From:
# http://stackoverflow.com/questions/13716658/how-to-delete-all-commit-history-in-github

git checkout --orphan latest_branch
git add -A
git commit -a
git branch -D main
git branch -m main
git push -f origin main

nuke_history.sh

Filter-branch

Pretty simple, will edit committer and author fields filtering by email or name and change them to a new thing similar to filter-repo. It's literally just worse though, however it's included here since it's really old and will be the first thing you probably find.

Pros

  • Changes commiter and author name without changing date
  • Kinda easy to do some simple things

Cons

  • Slow
  • Hard to do anything you can't find a stackoverflow post with an example
  • Straight up superseded by filter-repo
  • Will change hashes, won't update references in text like filter-repo will
  • Harder to use than apply mailmap from filter-repo
#!/bin/sh

git filter-branch --env-filter '
OLD_EMAIL="your_old_email@example.com"
NEW_NAME="Your New Name"
NEW_EMAIL="your_new_email@example.com"

if [ "$GIT_COMMITTER_EMAIL" = "$OLD_EMAIL" ]
then
    export GIT_COMMITTER_NAME="$NEW_NAME"
    export GIT_COMMITTER_EMAIL="$NEW_EMAIL"
fi
if [ "$GIT_AUTHOR_EMAIL" = "$OLD_EMAIL" ]
then
    export GIT_AUTHOR_NAME="$NEW_NAME"
    export GIT_AUTHOR_EMAIL="$NEW_EMAIL"
fi
' --tag-name-filter cat -- --branches --tags

edit_by_email.sh

BFG Repo Cleaner

A tool (https://rtyley.github.io/bfg-repo-cleaner/) specifically for removing things from the contents of files and not changing names. Originally designed to pull plaintext secrets, it's also great for changing references to your name in like copyrights and things. Fast, but still recommend filter-repo over this since it's just straight up better, yet again.

Pros

  • Changes commiter and author name without changing date
  • Kinda easy to do some simple things

Cons

  • Slow
  • Hard to do anything you can't find a stackoverflow post with an example
  • Straight up superseded by filter-repo
  • Will change hashes, won't update references in text like filter-repo will
  • Harder to use than apply mailmap from filter-repo

Usage

  1. Make a badwords file in the same format as the one from filter-repo
  2. Run the command: $ bfg -replace-text passwords.txt my-repo.git

More examples: https://rtyley.github.io/bfg-repo-cleaner/#examples

Automating

If you're anything like me, you probably have tens to hundreds of repos from a bunch of random projects, all of which have no forks and no other contributors (and if we're being honest probably were never finished 😔). Still, you might want to clean those up! You're in luck, there are some tools for this, but keep in mind automating this puts you wayyy off the deep end into the danger zone since you might not be reviewing the changes before you push them, and this hasn't really been tested by me.

All-repos

This lets you run things on a bunch of repos all at once. Unfortunately, the tooling isn't really there to do this kind of thing on it, however I'll look at getting it working with filter-branch one day. If you end up doing it, find me and let me know and I'll add it!!!

Repo: https://github.com/asottile/all-repos‌‌
Possible Starting Example: https://github.com/jugmac00/til/blob/master/zope/how-to-update-all-zopefoundation-repositories-at-once.md

Conclusion

There are a lot of weird different ways to change your name, ranging from different display-level options all the way to fully re-writing history, so pick the one that works best for your needs. You got this!!

If you see an error or just have feedback, feel free to reach out: blog at cobular dot com. Thanks!!

Show Comments