While working through the close to 250 applications for PhD positions at the International Max Planck Research School (IMPRS) for Astrophysics1 for the current round of applications, I noticed an odd pattern: Reference letter writers who saw a candidates strengths more in theory than in experiment seemed to rank these students higher compared to those whose strengths are more in experiment.
Each applicant is asked to have three letters of reference submitted. In addition to the free form letter that these references can submit, letter writers are also asked to fill out a form that asks (among other questions) “Are the applicant’s strengths more in theory or in experiment” and “Where does the applicant rank with respect to his/her fellow students”. The answer to the first question is a free form reply. For the second question, “Upper 10%” and “Upper 30%” can be selected (or left empty). Now the of “experiment” is a little unusual for astronomers. We have instrumentation groups at LMU and MPE but no laboratory astrophysics anymore. This is mostly and correctly interpreted as “observational” rather than experimental.
Was I imagining this pattern or were theory inclined candidates indeed consistently ranked higher? I decided to check. Here are the data compiled from a large subset of applications (anonymized and in random order to prevent alphabetic matching):
%matplotlib inline
import numpy as np
import pandas as pd
import seaborn as sns
sns.set_palette("Set2", 8)
sns.set_style("white")
sns.set_style("ticks")
df = pd.read_csv("https://raw.githubusercontent.com/joergdietrich/joergdietrich.github.io/master/data/TheoristObserver.csv")
df.tail()
StrengthN
and PercentileN
are the student’s strength and their ranking according to reviewer N
.
Keep in mind that these are students half-way through a Master program or sometimes even Bachelor students. For them it is often difficult to find three reviewers who know them well, especially outside course settings. Many of these reviewers felt unable to judge whether a person’s strengths were more in theory or experiment. I only took those replies where the reviewers were confident enough to make a judgment call, either in the form or in their letter. Descriptions of BSc of MSc projects as either being more theoretical or more observational were discarded for this purpose as well as statements such as “Person X is more interested in theory/experiment” since a student’s interests do not necessarily match their abilities. Possible values for an applicant’s strengths are Theory
, Experiment
, or Both
.
Wherever possible, percentiles were taken directly from the form. If the form was left empty I tried to gather this information from the letter and rounded up to the “upper 10%” or “upper 30%” category. Where this field is missing data (indicated as NaN
) the reviewers either did not specify the percentile or placed the applicant below the upper 30%. I ignored the latter information for this analysis.
After all this talk, let’s look at the data:
df_allref = pd.DataFrame(columns=("Strength", "Percentile"))
for i in range(1, 4):
sk = "Strength%d" % i
pk = "Percentile%d" % i
df_tmp = pd.DataFrame(data=[df[sk], df[pk]]).T
df_tmp.columns = ['Strength', 'Percentile']
df_allref = df_allref.append(df_tmp, ignore_index=True)
df_allref = df_allref.dropna()
df_allref.Percentile = df_allref.Percentile.astype(int)
strength_counts = df_allref.Strength.value_counts()
g = sns.factorplot("Strength", hue="Percentile", data=df_allref,
x_order=["Experiment", "Theory", "Both"])
pd.DataFrame(strength_counts, columns=["Count"])
There is clearly a huge difference between the experimentalists, who are almost equally in the 10 and 30 percentile, and the theorists and universalists, who clearly show up more often in the 10 than in the 30 percentile. Another way to look at this is the ratio of upper 10% over upper 30% classifications. Here is the plot assuming Poisson errors on the counts:
The picture is very clear: There is a huge and highly significant gap in how often observers are put into the upper 10 percent of their peers and how often this happens for theoretically inclined students. The small but not significant increase from theoreticians to universalists is not too surprising. Most people are strong in one area and will do worse in the other discipline. Somebody who is equally good in both naturally compares favorably.
The gap between observers and theorists is surprising (at least to me) and deserves further checking. It is clear from the reference letters that many more referees are able to place students on the percent scale than to say with confidence whether somebody’s strengths are more in theory than in experiment. Maybe the individual strength ratings are too noisy? What happens if we only look at candidates whose letter writers have a majority opinion on whether this person is better in theory than in experiment and whether this person is in the upper 10% or upper 30%?
The picture does not change if we consider only those student for whom the letter writers agree on their strengths.
Possible Explanations¶
I could now answer the provocative title question with a resounding yes and be done. The real questions is, however, what is going on here? Are the smartest physics students not interested in experiment or observation? If so, why not? There are obviously (to me anyway) very interesting and intellectually rewarding problems to work on in these fields. Or is this difference due to some underlying bias in evaluating students?
- It may simply be that the way university physics courses are structured drives away some very good students from experimental physics. Although I am primarily an observational astrophysicist (I have some theory/numerical papers as well), I remember with horror the lab courses I had to take. Formulaic experimental setups without any chance at developing experimental skills were the norm in my course of studies. While I could play with the equations of my homework, actually trying something in the lab was neither expected nor indeed wanted.2 I know many very good students who were driven into theory by these labs because they favored those who worked following a strict script without requiring a deep understanding of the material.
- Except for a few lab courses, students are mostly evaluated on pen and paper work, even in experimental physics courses. Moreover, the labs are often taught by (graduate) students. As a result, the ones writing recommendation letters, usually have never seen the applicant in a lab setting. The obvious exception here is the direct supervisor of a MSc student who is doing experimental/observational work. However, that still leaves two other letter writers. If somebody is a top 10% experimentalist/universalist, how will the letter writers know this?
- It may be that we value talent in theory higher than in experiment. How often have we heard “You’re a physicist/astronomer. You must be so smart!”. The New York Magazine deconstructed this myth in a recent article, which I highly recommend. Maybe we are (possibly unconsciously) subject to the same bias when evaluating students who excel at maths and theoretical physics and judge them as the better students than those with more talent or interest in experimental/observational work.
I don’t know what the correct explanation for the observed ranking preference of theory minded individuals is. Maybe it is a combination of more than one factor. However, ignoring the provocative blog post title, this may point to a real problem in our community. There are many problems and research topics in observational astrophysics and astronomy that require the best and the brightest minds. If we do not value the talents of students in these fields as highly as those in theory or are simply not aware of them, we may drive them out of the field. Or maybe we fail to make the observational side of astrophysics as attractive as the theory for the best students. Then we also risk an imbalance between theory and observations in our community.
So to finally answer the question, are theorists smarter than observers? Possibly, or there is a hidden bias in the way we educate or evaluate students.
This blog post is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. This post was written as a Jupyter Notebook in Python. You can download the original notebook.
Footnotes¶
[1] The IMPRS on Astrophysics at the Ludwig-Maximilians-Universität (LMU) Munich is the joint PhD program of the LMU, the Max Planck Institutes for Astrophysics and extraterretial Research, and the European Southern Observatory (ESO). It is “at the LMU” because that is the degree granting institutions. This is a large graduate program of four prestigious institutions and as such receives a very large number of applications for almost all fields of astronomy and astrophysics, observational, theoretical, and instrument building.
[2]In fact, I clearly remember an episode from a beginners’ lab course when we had some time left after finishing the required measurements. We started playing around with experiment and trying different setups until the head of lab program entered the room and scolded us for doing things we were not supposed to do (We did not put any material or persons in danger). It took three more years until I could take ownership of a lab experiment as much as I could for more complex theory homework.