# Software I : Anaconda, AstroPy, and libraries

In this school, we are making extensive use of Python and various associated libraries and so the first thing we need to ensure is that we all have a common setup and are using the same software. The Python distribution that we have decided to use is <i>Anaconda</i> which can be downloaded from <a href="http://continuum.io/downloads">here</a> (although we hope that you have already done this prior to the school). Make sure that you installed the Python 2.7 version for your operating system (there is nothing wrong with Python 3.x but it is slightly different syntactically and 2.7 is the currently approved version for LSST code development).

## Installing packages

One of the advantages of the <i>Anaconda</i> distribution is that it comes with many of the most commonly-used Python packages, such as <a href="http://www.numpy.org">numpy</a>, <a href="http://www.scipy.org">scipy</a>, and <a href="http://scikit-learn.org">scikit-learn</a>, preinstalled. However, if you do need to install a new package then it is very straightforward: you can either use the Anaconda installation tool <i>conda</i> or the generic Python tool <i>pip</i> (both use a different registry of available packages and sometimes a particular package will not available via one tool but will be via the other).

For example, <a href="https://github.com/bwlewis/irlbpy">irlbpy</a> is a superfast algorithm for finding the largest eigenvalues (and corresponding eigenvectors) of very large matrices. We can try to install it first with <i>conda</i>:

<code>conda install irlbpy</code>

but this will not find it:

<code>Fetching package metadata: ....
Error: No packages found in current osx-64 channels matching: irlbpy

You can search for this package on Binstar with <br/>
    binstar search -t conda irlbpy
</code>

so instead we try with <i>pip</i>:

<code>pip install irlbpy</code>

In the event that both fail, you always just download the package source code and then install it manually with:

<code>python install setup.py</code>

in the appropriate source directory.

We'll now take a brief look at a few of the main Python packages. 

## NumPy

<a href="http://www.numpy.org">NumPy</a> is the main Python package for working with N-dimensional arrays. Any list of numbers can be recast as a NumPy array:

In [2]:
import numpy as np
x = np.array([2,5,3,9,7])
x

array([2, 5, 3, 9, 7])

Arrays have a number of useful methods associated with them:

In [12]:
print x.min(), x.max(), x.sum(), x.argmin(), x.argmax() 

1 5 15 0 1


and NumPy functions can act on arrays in an elementwise fashion: 

In [9]:
np.sin(x * np.pi / 180.)

array([ 0.01745241,  0.0348995 ,  0.05233596,  0.06975647,  0.08715574])

Ranges of values are easily produced:

In [3]:
np.arange(1, 10, 0.25)

array([ 1.  ,  1.25,  1.5 ,  1.75,  2.  ,  2.25,  2.5 ,  2.75,  3.  ,
        3.25,  3.5 ,  3.75,  4.  ,  4.25,  4.5 ,  4.75,  5.  ,  5.25,
        5.5 ,  5.75,  6.  ,  6.25,  6.5 ,  6.75,  7.  ,  7.25,  7.5 ,
        7.75,  8.  ,  8.25,  8.5 ,  8.75,  9.  ,  9.25,  9.5 ,  9.75])

In [4]:
np.linspace(1, 20, 5)

array([  1.  ,   5.75,  10.5 ,  15.25,  20.  ])

In [16]:
np.logspace(1, 3, 5)

array([   10.        ,    31.6227766 ,   100.        ,   316.22776602,
        1000.        ])

Random numbers are also easily generated in the half-open interval [0, 1):

In [17]:
np.random.random(10)

array([ 0.32236496,  0.21506812,  0.43010248,  0.00518381,  0.76868494,
        0.40007316,  0.54393627,  0.47369813,  0.84379927,  0.64993354])

or from one of the large number of statistical distributions provided:

In [5]:
np.random.normal(loc = 2.5, scale = 5, size = 20)

array([ -3.73267855,   1.37170819,   8.16304747,  -0.27896174,
        10.39564368,   0.45185063,   7.04073321,  -1.39126892,
         1.3358371 ,   3.90961175,   2.30816605,  19.87333166,
         5.41203659,   5.13839341,   5.06171327,   4.38265858,
         3.78950268,   7.06296169,   4.80470557,  10.20694391])

Another useful method is the <i>where</i> function for identifying elements that satisfy a particular condition: 

In [10]:
x = np.random.normal(size = 100)
print x
np.where(x > 2.)

[  1.87105877e-01  -2.39603961e+00   1.44687770e+00  -2.16482071e-01
   5.75713272e-01  -4.05465421e-01   3.24493064e-01   1.94923740e+00
   1.82203983e-01   9.14587481e-01  -2.71385704e-02  -3.40197555e-01
  -1.23485896e+00  -8.59566120e-04   1.58623798e+00  -1.45167513e-02
  -1.04614749e+00   7.98648076e-01   2.14170370e+00   3.06397570e-01
   7.26984985e-01   3.30964686e-01  -8.96114730e-01   2.16686828e+00
   2.96145626e-01  -8.16399365e-01  -9.49566662e-01   4.22530260e-01
  -1.61935998e+00  -4.40557627e-01  -3.17059317e-01   1.19800654e+00
  -9.78872255e-02  -1.55824225e+00  -3.78734151e-01   1.94701640e-02
  -2.35352615e-01  -1.66271785e+00  -6.62442251e-01   8.82626351e-01
   1.09057493e+00  -3.89036409e-01   8.15959802e-01  -6.51061933e-01
   7.40576461e-01   7.21376501e-02  -2.10867165e+00   1.62439110e-01
  -3.96635524e-01  -6.51321684e-01   1.04665708e+00   3.81217863e-01
   2.10488771e-01  -1.28420630e+00   1.64306857e+00   2.26955679e-01
  -5.90036730e-01   6.67998555e-01

(array([18, 23]),)

Of course, all of these work equally well with multidimensional arrays.

In [25]:
x = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
np.sin(x)

array([[ 0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849, -0.54402111]])

Data can also be automatically loaded from a file into a Numpy array via the <i>loadtxt</i> or <i>genfromtxt</i> methods:

In [None]:
data = np.loadtxt("somefile.csv", delimiter = ",", skiprows = 3)

## SciPy

<a href="http://www.scipy.org">SciPy</a> provides a number of subpackages that deal with common operations in scientific computing, such as numerical integration, optimization, interpolation, Fourier transforms and linear algebra.

In [33]:
f = lambda x: np.cos(-x ** 2 / 9.)

x = np.linspace(0, 10, 11)
y = f(x)

from scipy.interpolate import interp1d
f1 = interp1d(x, y)
f2 = interp1d(x, y, kind = 'cubic')

from scipy.integrate import quad
print quad(f1, 0, 10)
print quad(f2, 0, 10)
print quad(f, 0, 10)

(1.6346035509274763, 1.1580617576001373e-08)
(1.2743983238992254, 1.1301087878068563e-08)
(1.4332524555959525, 5.534144099065977e-11)


## scikit-learn

<a href="http://scikit-learn.org">scikit-learn</a> provides algorithms for machine learning tasks, such as classification, regression, and clustering, as well as associated operations, such as cross-validation and feature normalization. These topics will be covered in greater depth in Guillermo Cabrera's talks <a href="">here</a>. A related module is <a href="http://www.astroml.org">astroML</a> which is a wrapper around a lot of the scikit-learn routines but also offers some additional functionality and faster/alternate implementations of some methods.

## pandas

<a href="http://pandas.pydata.org/index.html">pandas</a> offers data structures, particularly data frames, and operations for manipulating numerical tables and time series, such as fancy indexing, reshaping and pivoting, and merging, as well as a number of analysis tools. Although similar functionality already exists in numpy, pandas is highly optimized for performance and large data sets.  Some of these topics will be covered in greater depth in Mauricio San Martin's talk <a href="placeholder">here</a>.

## AstroPy

<a href="http://www.astropy.org">AstroPy</a> aims to provide a core set of subpackages to specifically support astronomy. These include methods to work with image and table data formats, e.g., FITS, VOTable, etc., along with astronomical coordinate and unit systems, and cosmological calculations.

In [38]:
from astropy import units as u
from astropy.coordinates import SkyCoord

c = SkyCoord(ra = 10.625 * u.degree, dec = 41.2 * u.degree, frame = 'icrs')
print c.to_string('hmsdms')
print c.galactic

00h42m30s +41d12m00s
<SkyCoord (Galactic): (l, b) in deg
    (121.12334339, -21.6403587)>


In [39]:
from astropy.cosmology import WMAP9 as cosmo
print cosmo.comoving_distance(1.25), cosmo.luminosity_distance(1.25)

3944.5841858 Mpc 8875.31441806 Mpc


In [None]:
from astropy.io import fits
hdulist = fits.open('someimage.fits')
hdulist.info()

In [None]:
from astropy.io.votable import parse
votable = parse('sometable.xml')
table = votable.get_first_table()
data = table.array

A useful affiliated package is <a href="https://astroquery.readthedocs.org">Astroquery</a> which provides tools for querying astronomical web forms and databases. This is not part of the regular AstroPy distribution and needs to be installed separately. Whereas many data archives have standardized VO interfaces to support data access (see Amelia Bayo's talk <a href="">here</a>), Astroquery mimics a web browser and provides access via an archive's form interface. This can be useful as not all provided information is necesarily available via the VO.  

For example, the <a href="http://ned.ipac.caltech.edu">NASA Extragalactic Database</a> is a very useful human-curated resource for extragalactic objects. However, a lot of the information that is available via the web pages is not available through an easy programmatic API. Let's say that we want to get the list of object types associated with a particulae source:

In [13]:
from astroquery.ned import Ned
from astropy.coordinates import SkyCoord
from astropy import units as u

co = SkyCoord(ra = 56.38, dec = 38.43, unit = (u.deg, u.deg))
result = Ned.query_region(co, radius = 0.07 * u.deg)
print result
set(result.columns['Type'])

No.         Object Name          ... Diameter Points Associations
                                 ...                             
--- ---------------------------- ... --------------- ------------
  1 GALEXASC J034513.29+382620.9 ...               0            0
  2      2MASX J03451354+3825588 ...               2            0
  3          6C B034157.1+381808 ...               0            1
  4                    4C +38.10 ...               0            1
  5 GALEXASC J034517.39+382239.7 ...               0            0
  6 GALEXASC J034517.60+382255.0 ...               0            0
  7 GALEXASC J034518.64+382750.8 ...               0            0
  8 GALEXASC J034522.97+382939.1 ...               0            0
  9 GALEXASC J034526.22+382730.1 ...               0            0
 10 GALEXASC J034529.64+382240.9 ...               0            0
 11 GALEXASC J034530.57+382408.8 ...               0            0
 12 GALEXASC J034531.71+382446.4 ...               0            0
 13 GALEXA

{'G', 'RadioS', 'UvS'}

## Other libraries

For some of the other lectures or projects this week, you might also need to install the following Python packages:

<ul>
<li> photutils
<li> glue
</ul>

## Unrelated software use survey

<a href="https://goo.gl/W0jDMJ">https://goo.gl/W0jDMJ</a>