We have a very nice series of informal computing centric talks/seminars at the University Observatory Munich, labelled “Code Coffee”. I enjoy these a lot, although I prefer tea. Even for topics on which I already know a lot, I usually learn something new and interesting. The most recent installment was an introduction to git
. Although by no means an expert, I was probably one of the more advanced users in the room. One question, whether and how I use git cherry-pick
led to a — I believe — somewhat convoluted answer and got me thinking that this should probably be the “comeback” blog post after a longer absence. Basic familiarity with git add
, git commit
, git merge
, and how to identify a commit by its hash is assumed throughout.
To see how I use git cherry-pick
to quickly backport important changes like critical bug fixed from development branches, we will create a new, initially buggy, project. As a toy project, let’s create a Python module, which computes the $n$-th Fibonacci number. We will start by creating a new directory for our project and initializing git.
%mkdir fibonacci
%cd fibonacci
!git init
We are now in the new project directory and can start adding code.
%%writefile -a fibonacci.py
import sys
def fibonacci(n):
"""Computes the n-th Fibonacci by recursion"""
if n in [0, 1]:
return 1
else:
return fibonacci(n - 1) + fibonacci(n - 2)
if __name__ == "__main__":
print(fibonacci(int(sys.argv[1])))
from fibonacci import fibonacci
[fibonacci(n) for n in range(3, 15)]
This seems to work. Each number in this list ist the sum of its two predecessors. So let’s commit our first version.
%%bash
git add fibonacci.py
git commit -m "First version of recursive Fibonacci sequence" fibonacci.py
A little further testing, however, shows that this becomes terribly slow for only slightly smaller number than we tested above.
for n in range(31, 36):
print(f'Execution time fibonacci({n})', end=': ')
%time fibonacci(n)
And what’s worse: The execution time of fibonacci(n)
is the sum of the execution times of fibonacci(n-1)
and fibonacci(n-2)
. At this rate, fibonacci(100)
will finish in roughly 4 million years! Clearly, something must be done about this. So, we create a new development branch to deal with this problem.
%%bash
git checkout -b make_faster
First, our developer, apparently unaware of the goodnes that is jupyter notebooks and the ipython %time and %timeit magics, creates their own time measurement, to see how they are doing.
%%writefile fibonacci.py
import sys
import time
def fibonacci(n):
"""Computes the n-th Fibonacci by recursion"""
if n in [0, 1]:
return 1
else:
return fibonacci(n - 1) + fibonacci(n - 2)
if __name__ == "__main__":
t0 = time.time()
print(fibonacci(int(sys.argv[1])))
duration = time.time() - t0
print(f'Executed in {duration:.2} seconds')
!python ./fibonacci.py 35
That’s apparently working. So lets commit this.
%%bash
git commit -m 'add debugging timer' fibonacci.py
Meanwhile somebody notices that nothing in fibonacci
ensures that n
actually is an integer and makes a quick commit directly to master (not usually encouraged!) to rectify this.
!git checkout master
%%writefile fibonacci.py
import sys
def fibonacci(n):
"""Computes the n-th Fibonacci by recursion"""
n = int(n)
if n in [0, 1]:
return 1
else:
return fibonacci(n - 1) + fibonacci(n - 2)
if __name__ == "__main__":
print(fibonacci(int(sys.argv[1])))
%%bash
git commit -m "ensure that n is an integer or cast to int" fibonacci.py
git log --all --graph
Okay, we have two branches (master
and make_faster
), which both have commits not present in the other one.
Time to get back to development and trying to make our program faster where our developer notices a serious problem:
%%bash
git checkout make_faster
for i in `seq 0 5`; do echo -n "$i "; python ./fibonacci.py $i; done
Our Fibonacci series is off by one! fibonacci(0)
should be 0, and fibonacci(4)
should be 3, while fibonacci(5)
should 5. Let’s fix this first.
%%writefile fibonacci.py
import sys
import time
def fibonacci(n):
"""Computes the n-th Fibonacci by recursion"""
if n in [0, 1]:
return n # Return n, not 1!
else:
return fibonacci(n - 1) + fibonacci(n - 2)
if __name__ == "__main__":
t0 = time.time()
print(fibonacci(int(sys.argv[1])))
duration = time.time() - t0
print(f'Executed in {duration:.2} seconds')
# Let's make some simple test. These should really be automated unit tests …
assert fibonacci(5) == 5
assert fibonacci(1) == 1
assert fibonacci(10) == 55
!git commit -m "fix obiwan bug in fibonacci" fibonacci.py
Phew, this fix is important enough that it should go directly into our production code in master. This time, however, we’ll follow the good (best?!) practice of not committing to master directly but create a bug fix branch. This could then be reviewed before being merged into master. We don’t want to merge the make_faster
branch into master
, because our developer is experimenting and has littered it with debugging code, and god know’s what he’s been trying there.
We make a note of the bug fix (short) commit hash (33e0161
) now, but could of course retrieve this later from git log
as well. This is the one we will cherry-pick into our bug fix branch.
%%bash
# We want to fix master, so we'll branch from there
git checkout master
git checkout -b fix_obiwan
git cherry-pick 33e0161
And then, after code review, etc. this bug fix branch is merged into master.
%%bash
git checkout master
git merge fix_obiwan
Meanwhile our developer has an epiphany. Every time the recursion calls fibonacci(n - 2)
it computes the same values that the previous call to fibonacci(n - 1)
already computed. If only there was a way to cache the results of previous function calls … Well, this being Python, of course there is.
!git checkout make_faster
%%writefile fibonacci.py
from functools import lru_cache
import sys
import time
@lru_cache()
def fibonacci(n):
"""Computes the n-th Fibonacci by recursion"""
if n in [0, 1]:
return n # Return n, not 1!
else:
return fibonacci(n - 1) + fibonacci(n - 2)
if __name__ == "__main__":
t0 = time.time()
print(fibonacci(int(sys.argv[1])))
duration = time.time() - t0
print(f'Executed in {duration:.2} seconds')
!python ./fibonacci.py 100
Whoah, that’s a lot faster than 4 Myr! Let’s commit this and then clean up.
!git commit -m "speed up by using least-recently used cache decorator" fibonacci.py
%%writefile fibonacci.py
from functools import lru_cache
import sys
@lru_cache()
def fibonacci(n):
"""Computes the n-th Fibonacci by recursion"""
if n in [0, 1]:
return n # Return n, not 1!
else:
return fibonacci(n - 1) + fibonacci(n - 2)
if __name__ == "__main__":
print(fibonacci(int(sys.argv[1])))
%%bash
git commit -m "remove debugging statements" fibonacci.py
git checkout master
git merge make_faster
!git log --all --graph
Great, we merged everything back into one branch! There cherry-picked bug fix, which was modification in both branches did not cause a merge conflict.
In summary, git cherry-pick
is a convenient way to quickly backport important bug fixes from development branches to more slowly moving branches, such as stable releases.
And finally, there’s of course no need to compute the Fibonacci sequence with recursion. One could as well do this iteratively.
This blog post is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This post was written as a Jupyter Notebook in Python. You can download the original notebook.