Assignment 3

Exercise 1.

A data file, >~bromley/courses/ap7730/data/pop.dat, has three columns, t (time in days), N (population), and ε_N (error/uncertainty in the population that arises from the difficulty of an accurate, instantaneous census in a rapidly growing population--assume these are the 1-sigma values for normally distributed uncertainties). The starting time is t=0 days, and data points run up to t=10 days. The N values are inferred indirectly and so have some uncertaintly associated with them.

NOTE to instructor: use 5th order next time!

Try three different models for fitting this data in an attempt to predict the value of N at t=14 days. The models are:

linear: N = a₀ + a₁t
quadratic: N = a₀ + a₁t + a₁t²
6th-order: N = a₀ + a₁t + ... + a₆t⁶

For this exercise write a code census.py that fits the data for each model and print (at a minimum) form:

Model A. p-value: #####;  predicted N at 14 days: ####
Model B. p-value: #####;  predicted N at 14 days: ####
Model C. p-value: #####;  predicted N at 14 days: ####

Also have your code print out a narrative about what you would pick as the best estimate of the population on day 14, based on the data running from t=0 to day 10. Feel free to comment on the reasonableness and risks (R&R?) of this endeavor....

Note: To print out a paragraph of text in a python script, try

  print(
  '''
  blah blah
  blah
  ''')

Exercise 2.

This problem is inspired by -- and hugely over-simplifies -- data from the cosmic ray experiments have been taking place here in Utah, led by faculty in our department.

Here we consider 3000 simulated ultra-high energy "events", each one corresponding to a cosmic ray that is detected after having careened through the Earth's atmosphere. The text file in ~bromley/courses/ap7730/data/crevents.txt contains three columns: an event identification number, the energy of the cosmic ray (eV), and a number, 1 or 2, that distinguishes two groups within this collection of events. The group number indicates some attribute such as arrival direction. For definiteness, let's say group 1 was observed in the northern part of sky (as seen from the Utah desert) and group 2 from the southern part.

Write a python code crspec.py to do the following, with appropriately descriptive output:

i. Use the full data set (groups 1+2) to test the whether the data are consistent with a power-law distribution of energy E, such that p(E)=E**(-a) where a = 2.75.

ii. These simulated ultra-high energy events correspond to energies that are all over 10^19 eV. An important issue is whether there is a cut-off in the energy spectrum above ~5x10^19 eV, related to how cosmic rays from extraalactic sources interact with the cosmic microwave background -- the "GZK cut-off". Estimate the likelihood that if p(E)=E**(-2.75), that we expect to detect zero events with energy higher than observed among the 3000 events. In other words, is there evidence in this data set for a GZK cuto-off in this data set (using this power law model)?

iii. Find a best-fit power-law index, a_best_fit, such that p(E)=E**(-a_bestfit) best represents the data in some quantifiable sense.

iv. Whether the flux of cosmic rays varies across the sky is key to understanding their origin. Assess a likelihood that the two groups 1 and 2, representing "northern" and "southern" sources, were drawn from the same distriution.

Exercise 3.

DIffusive processes affect many things in nature, including (famously) pollen grains in a liquid, charge carriers in a semiconductor, and molecular motors traversing the microtubules in cells. In this exercise, we consider the last of these examples.

A molecular motor, such as those studied in Prof. Michael Vershinin's Lab transports cellular cargo along microtubules, "girders" that span a cell's cytoplasm. The motion of a motor is partly diffusive and partly linear (a constant "drift" speed). In this exercise you are to derive a statistic that can be used to detect whether there is linear motion in an otherwise diffusive trajectory of a molecular motor in 1-D.

To mimic the diffusive process, assume that a molecular motor (treated as a point) takes a sequence of random, normally distributed steps along a microtubule such that the standard deviation of the steps is length delx=0.5 microns. Let the motor take such steps at a rate of one per delt=0.1 microsecond. This corresponds to a diffusion coefficient of D = 0.5*delx**2/delt. The distance that a motor travels under diffusion in some time T is itself normally distributed, with standard deviation L=sqrt(2*D*T). As an example, consider one realization of a motor with no drift -- pure diffusion -- along 100 steps (T=10 microseconds):

Note that average speed (L/T) is comparable to that of a drift speed v=1 micron/microsecond, which (barring any misunderstanding on my part) is typical of observed molecular motors.

Write a python code molmot.py that simulates 100 steps (T=10 microseconds) of a molecular motor that moves by pure diffusion and by diffusion+drift. Come up with a statistical measure that can test the hypothesis that a trajectory of 100 steps is purely diffusive. Have your code demonstrate the efficacy of your test. What is the slowest drift speed that you can reliably distinguish from a "no-drift" scenario?

Submit your answers.

submit p7730 a03 .....