Random Thoughts in a Random Universe

How I use git cherry-pick

We have a very nice series of informal computing centric talks/seminars at the University Observatory Munich, labelled “Code Coffee”. I enjoy these a lot, although I prefer tea. Even for topics on which I already know a lot, I usually learn something new and interesting. The most recent installment was an introduction to git. Although by no means an expert, I was probably one of the more advanced users in the room. One question, whether and how I use git cherry-pick led to a — I believe — somewhat convoluted answer and got me thinking that this should probably be the “comeback” blog post after a longer absence. Basic familiarity with git add, git commit, git merge, and how to identify a commit by its hash is assumed throughout.

To see how I use git cherry-pick to quickly backport important changes like critical bug fixed from development branches, we will create a new, initially buggy, project. As a toy project, let’s create a Python module, which computes the $n$-th Fibonacci number. We will start by creating a new directory for our project and initializing git.

In [1]:
%mkdir fibonacci
%cd fibonacci
!git init
/home/joerg/IPyNB/Blog/fibonacci
Initialized empty Git repository in /home/joerg/IPyNB/Blog/fibonacci/.git/

We are now in the new project directory and can start adding code.

In [2]:
%%writefile -a fibonacci.py

import sys
def fibonacci(n):
    """Computes the n-th Fibonacci by recursion"""
    if n in [0, 1]:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)
    
if __name__ == "__main__":
    print(fibonacci(int(sys.argv[1])))
Writing fibonacci.py
In [3]:
from fibonacci import fibonacci

[fibonacci(n) for n in range(3, 15)]
Out[3]:
[3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610]

This seems to work. Each number in this list ist the sum of its two predecessors. So let’s commit our first version.

In [4]:
%%bash
git add fibonacci.py
git commit -m "First version of recursive Fibonacci sequence" fibonacci.py
[master (root-commit) b90d77e] First version of recursive Fibonacci sequence
 1 file changed, 11 insertions(+)
 create mode 100644 fibonacci.py

A little further testing, however, shows that this becomes terribly slow for only slightly smaller number than we tested above.

In [5]:
for n in range(31, 36):
    print(f'Execution time fibonacci({n})', end=': ')
    %time fibonacci(n)
Execution time fibonacci(31): CPU times: user 512 ms, sys: 3.79 ms, total: 516 ms
Wall time: 515 ms
Execution time fibonacci(32): CPU times: user 831 ms, sys: 158 µs, total: 831 ms
Wall time: 831 ms
Execution time fibonacci(33): CPU times: user 1.63 s, sys: 4.11 ms, total: 1.64 s
Wall time: 1.64 s
Execution time fibonacci(34): CPU times: user 2.17 s, sys: 160 µs, total: 2.17 s
Wall time: 2.17 s
Execution time fibonacci(35): CPU times: user 3.48 s, sys: 20 µs, total: 3.48 s
Wall time: 3.48 s

And what’s worse: The execution time of fibonacci(n) is the sum of the execution times of fibonacci(n-1) and fibonacci(n-2). At this rate, fibonacci(100) will finish in roughly 4 million years! Clearly, something must be done about this. So, we create a new development branch to deal with this problem.

In [6]:
%%bash
git checkout -b make_faster
Switched to a new branch 'make_faster'

First, our developer, apparently unaware of the goodnes that is jupyter notebooks and the ipython %time and %timeit magics, creates their own time measurement, to see how they are doing.

In [7]:
%%writefile fibonacci.py

import sys
import time

def fibonacci(n):
    """Computes the n-th Fibonacci by recursion"""
    if n in [0, 1]:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)
    
if __name__ == "__main__":
    t0 = time.time()
    print(fibonacci(int(sys.argv[1])))
    duration = time.time() - t0
    print(f'Executed in {duration:.2} seconds')
Overwriting fibonacci.py
In [8]:
!python ./fibonacci.py 35
14930352
Executed in 3.7 seconds

That’s apparently working. So lets commit this.

In [9]:
%%bash
git commit -m 'add debugging timer' fibonacci.py
[make_faster 7a418cd] add debugging timer
 1 file changed, 6 insertions(+), 1 deletion(-)

Meanwhile somebody notices that nothing in fibonacci ensures that n actually is an integer and makes a quick commit directly to master (not usually encouraged!) to rectify this.

In [10]:
!git checkout master
Switched to branch 'master'
In [11]:
%%writefile fibonacci.py

import sys
def fibonacci(n):
    """Computes the n-th Fibonacci by recursion"""
    n = int(n)
    if n in [0, 1]:
        return 1
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)
    
if __name__ == "__main__":
    print(fibonacci(int(sys.argv[1])))
Overwriting fibonacci.py
In [12]:
%%bash
git commit -m "ensure that n is an integer or cast to int" fibonacci.py
git log --all --graph
[master cfd121c] ensure that n is an integer or cast to int
 1 file changed, 1 insertion(+)
* commit cfd121c94a0d96a73aad4fae30b646fb176e7398
| Author: Jörg Dietrich <astro@joergdietrich.com>
| Date:   Sat Mar 17 00:05:34 2018 +0100
| 
|     ensure that n is an integer or cast to int
|    
| * commit 7a418cdc3dc2effa01829ab3c45dab3f0777a35b
|/  Author: Jörg Dietrich <astro@joergdietrich.com>
|   Date:   Sat Mar 17 00:05:19 2018 +0100
|   
|       add debugging timer
|  
* commit b90d77e2d9fd1a0e5e1f8383751ca6a208264e6c
  Author: Jörg Dietrich <astro@joergdietrich.com>
  Date:   Sat Mar 17 00:04:49 2018 +0100
  
      First version of recursive Fibonacci sequence

Okay, we have two branches (master and make_faster), which both have commits not present in the other one.

Time to get back to development and trying to make our program faster where our developer notices a serious problem:

In [13]:
%%bash
git checkout make_faster
for i in `seq 0 5`; do echo -n "$i "; python ./fibonacci.py $i; done
0 1
Executed in 4.4e-05 seconds
1 1
Executed in 2.3e-05 seconds
2 2
Executed in 2e-05 seconds
3 3
Executed in 1.7e-05 seconds
4 5
Executed in 1.9e-05 seconds
5 8
Executed in 2.3e-05 seconds
Switched to branch 'make_faster'

Our Fibonacci series is off by one! fibonacci(0) should be 0, and fibonacci(4) should be 3, while fibonacci(5) should 5. Let’s fix this first.

In [14]:
%%writefile fibonacci.py

import sys
import time

def fibonacci(n):
    """Computes the n-th Fibonacci by recursion"""
    if n in [0, 1]:
        return n  # Return n, not 1!
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)
    
if __name__ == "__main__":
    t0 = time.time()
    print(fibonacci(int(sys.argv[1])))
    duration = time.time() - t0
    print(f'Executed in {duration:.2} seconds')
Overwriting fibonacci.py
In [15]:
# Let's make some simple test. These should really be automated unit tests …
assert fibonacci(5) == 5
assert fibonacci(1) == 1
assert fibonacci(10) == 55
In [16]:
!git commit -m "fix obiwan bug in fibonacci" fibonacci.py
[make_faster 33e0161] fix obiwan bug in fibonacci
 1 file changed, 1 insertion(+), 1 deletion(-)

Phew, this fix is important enough that it should go directly into our production code in master. This time, however, we’ll follow the good (best?!) practice of not committing to master directly but create a bug fix branch. This could then be reviewed before being merged into master. We don’t want to merge the make_faster branch into master, because our developer is experimenting and has littered it with debugging code, and god know’s what he’s been trying there.

We make a note of the bug fix (short) commit hash (33e0161) now, but could of course retrieve this later from git log as well. This is the one we will cherry-pick into our bug fix branch.

In [17]:
%%bash
# We want to fix master, so we'll branch from there
git checkout master
git checkout -b fix_obiwan
git cherry-pick 33e0161
[fix_obiwan dbff8eb] fix obiwan bug in fibonacci
 Date: Sat Mar 17 00:06:02 2018 +0100
 1 file changed, 1 insertion(+), 1 deletion(-)
Switched to branch 'master'
Switched to a new branch 'fix_obiwan'

And then, after code review, etc. this bug fix branch is merged into master.

In [18]:
%%bash
git checkout master
git merge fix_obiwan
Updating cfd121c..dbff8eb
Fast-forward
 fibonacci.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Switched to branch 'master'

Meanwhile our developer has an epiphany. Every time the recursion calls fibonacci(n - 2) it computes the same values that the previous call to fibonacci(n - 1) already computed. If only there was a way to cache the results of previous function calls … Well, this being Python, of course there is.

In [19]:
!git checkout make_faster
Switched to branch 'make_faster'
In [20]:
%%writefile fibonacci.py

from functools import lru_cache
import sys
import time


@lru_cache()
def fibonacci(n):
    """Computes the n-th Fibonacci by recursion"""
    if n in [0, 1]:
        return n  # Return n, not 1!
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)
    
if __name__ == "__main__":
    t0 = time.time()
    print(fibonacci(int(sys.argv[1])))
    duration = time.time() - t0
    print(f'Executed in {duration:.2} seconds')
Overwriting fibonacci.py
In [21]:
!python ./fibonacci.py 100
354224848179261915075
Executed in 0.00012 seconds

Whoah, that’s a lot faster than 4 Myr! Let’s commit this and then clean up.

In [22]:
!git commit -m "speed up by using least-recently used cache decorator" fibonacci.py
[make_faster abf1332] speed up by using least-recently used cache decorator
 1 file changed, 3 insertions(+)
In [23]:
%%writefile fibonacci.py

from functools import lru_cache
import sys

@lru_cache()
def fibonacci(n):
    """Computes the n-th Fibonacci by recursion"""
    if n in [0, 1]:
        return n  # Return n, not 1!
    else:
        return fibonacci(n - 1) + fibonacci(n - 2)
    
if __name__ == "__main__":
    print(fibonacci(int(sys.argv[1])))
Overwriting fibonacci.py
In [24]:
%%bash
git commit -m "remove debugging statements" fibonacci.py
git checkout master
git merge make_faster
[make_faster 6d8d63f] remove debugging statements
 1 file changed, 1 insertion(+), 6 deletions(-)
Auto-merging fibonacci.py
Merge made by the 'recursive' strategy.
 fibonacci.py | 3 +++
 1 file changed, 3 insertions(+)
Switched to branch 'master'
In [25]:
!git log --all --graph
*   commit 06f69272a3289715bae44ba8bc00701dc13c84f6
|\  Merge: dbff8eb 6d8d63f
| | Author: Jörg Dietrich <astro@joergdietrich.com>
| | Date:   Sat Mar 17 00:06:43 2018 +0100
| | 
| |     Merge branch 'make_faster'
| |   
| * commit 6d8d63f612ae1b087f1449b24d7b50de3665b280
| | Author: Jörg Dietrich <astro@joergdietrich.com>
| | Date:   Sat Mar 17 00:06:43 2018 +0100
| | 
| |     remove debugging statements
| |   
| * commit abf13323a213d94dd4fe0c095e9d4a50335dd253
| | Author: Jörg Dietrich <astro@joergdietrich.com>
| | Date:   Sat Mar 17 00:06:37 2018 +0100
| | 
| |     speed up by using least-recently used cache decorator
| |   
| * commit 33e016177dd7c3805767decd4f6d45b58bded942
| | Author: Jörg Dietrich <astro@joergdietrich.com>
| | Date:   Sat Mar 17 00:06:02 2018 +0100
| | 
| |     fix obiwan bug in fibonacci
| |   
| * commit 7a418cdc3dc2effa01829ab3c45dab3f0777a35b
| | Author: Jörg Dietrich <astro@joergdietrich.com>
| | Date:   Sat Mar 17 00:05:19 2018 +0100
| | 
| |     add debugging timer
| |   
* | commit dbff8eba5b33067c6a9dbe468c46f71b742fce6a
| | Author: Jörg Dietrich <astro@joergdietrich.com>
| | Date:   Sat Mar 17 00:06:02 2018 +0100
| | 
| |     fix obiwan bug in fibonacci
| |   
* | commit cfd121c94a0d96a73aad4fae30b646fb176e7398
|/  Author: Jörg Dietrich <astro@joergdietrich.com>
|   Date:   Sat Mar 17 00:05:34 2018 +0100
|   
|       ensure that n is an integer or cast to int
|  
* commit b90d77e2d9fd1a0e5e1f8383751ca6a208264e6c
  Author: Jörg Dietrich <astro@joergdietrich.com>
  Date:   Sat Mar 17 00:04:49 2018 +0100
  
      First version of recursive Fibonacci sequence

Great, we merged everything back into one branch! There cherry-picked bug fix, which was modification in both branches did not cause a merge conflict.

In summary, git cherry-pick is a convenient way to quickly backport important bug fixes from development branches to more slowly moving branches, such as stable releases.

And finally, there’s of course no need to compute the Fibonacci sequence with recursion. One could as well do this iteratively.

Comments