Using Nested-Pandas with Astronomical Spectra

Using Nested-Pandas with Astronomical Spectra#

In Astronomy, a spectrum is a measurement (or combination of measurements) of an object that shows the intensity of light emitted over a range of energies. In this tutorial, we’ll walk through a simple example of working with spectra from the Sloan Digital Sky Survey (SDSS), in particular showing how it can be represented as a NestedFrame.

First, we’ll use astroquery and astropy to download a handful of spectra from SDSS:

[3]:
from astroquery.sdss import SDSS
from astropy import coordinates as coords
import astropy.units as u
import nested_pandas as npd

# Query SDSS for a set of objects with spectra
pos = coords.SkyCoord("0h8m10.63s +14d50m23.3s", frame="icrs")
xid = SDSS.query_region(pos, radius=3 * u.arcmin, spectro=True)
xid_ndf = npd.NestedFrame(xid.to_pandas())
xid_ndf
[3]:
ra dec objid run rerun camcol field z plate mjd fiberID specobjid run2d
0 2.076662 14.843455 1237652943176204410 1739 301 3 316 -0.000464 6112 56191 560 6881654266119084032 v5_13_2
1 2.023446 14.839824 1237652943176138868 1739 301 3 315 0.045591 751 52251 160 845594848269461504 26
2 2.016424 14.810989 1237652943176139005 1739 301 3 315 0.113535 752 52251 302 846759780839090176 26

3 rows × 13 columns

This initial query returns a set of objects with spectra (as specified by the spectro=True flag). To actually retrieve the spectra, we can do the following:

[4]:
# Query SDSS for the corresponding spectra
SDSS.clear_cache()
sp = SDSS.get_spectra(matches=xid)
sp
[4]:
[[<astropy.io.fits.hdu.image.PrimaryHDU object at 0x119f595e0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bc36600>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bc74aa0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bc77d40>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bc36e40>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bc17320>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bc92e70>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bc938c0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bcbe1b0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bcd5d60>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bce98e0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd04650>],
 [<astropy.io.fits.hdu.image.PrimaryHDU object at 0x11bbbe090>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd06390>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd15b20>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd28ce0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd2b320>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd424e0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd51700>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd688f0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd6bad0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd7ecc0>],
 [<astropy.io.fits.hdu.image.PrimaryHDU object at 0x11bc34560>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bd907a0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bda8080>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bdab230>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bdb9850>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bdccaa0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bdcfcb0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bde6e40>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11bdf9fd0>, <astropy.io.fits.hdu.table.BinTableHDU object at 0x11be0d220>]]

The result is a list of FITS formatted data. From this point there are a few ways that we could move towards a nested-pandas representation. The most straightforward is to build a “flat” spectra table from all the objects, where we gather the information from each spectrum into a single combined table.

[5]:
import numpy as np

# Build a flat spectrum dataframe

# Initialize some empty arrays to hold the flat data
wave = np.array([])
flux = np.array([])
err = np.array([])
index = np.array([])
# Loop over each spectrum, adding its data to the arrays
for i, hdu in enumerate(sp):
    wave = np.append(wave, 10 ** hdu["COADD"].data.loglam)  # * u.angstrom
    flux = np.append(flux, hdu["COADD"].data.flux * 1e-17)  # * u.erg/u.second/u.centimeter**2/u.angstrom
    err = np.append(err, 1 / hdu["COADD"].data.ivar * 1e-17)  # * flux.unit

    # We'll need to set an index to keep track of which rows correspond
    # to which object
    index = np.append(index, i * np.ones(len(hdu["COADD"].data.loglam)))

# Build a NestedFrame from the arrays
flat_spec = npd.NestedFrame(dict(wave=wave, flux=flux, err=err), index=index.astype(np.int8))
flat_spec
/var/folders/lc/dws63_cs5gz5mf8s869hjpx40000gn/T/ipykernel_21256/2955152026.py:14: RuntimeWarning: divide by zero encountered in divide
  err = np.append(err, 1 / hdu["COADD"].data.ivar * 1e-17)  # * flux.unit
[5]:
wave flux err
0 3606.617188 1.699629e-16 9.050196e-17
0 3607.447021 -4.092098e-17 7.294453e-17
... ... ... ...
2 9185.439453 1.398144e-16 1.269789e-17
2 9187.557617 1.487952e-16 1.259023e-17

12245 rows × 3 columns

From here, we can simply nest our flat table within our original query result:

[ ]:
spec_ndf = xid_ndf.join_nested(flat_spec, "coadd_spectrum").set_index("objid")
spec_ndf
  ra dec run rerun camcol field z plate mjd fiberID specobjid run2d coadd_spectrum
1237652943176204410 2.076662 14.843455 1739 301 3 316 -0.000464 6112 56191 560 6881654266119084032 v5_13_2
wave flux err
3606.617188 0.0 0.0
+4594 rows ... ...
1237652943176138868 2.023446 14.839824 1739 301 3 315 0.045591 751 52251 160 845594848269461504 26
wave flux err
3799.268555 0.0 0.0
+3840 rows ... ...
1237652943176139005 2.016424 14.810989 1739 301 3 315 0.113535 752 52251 302 846759780839090176 26
wave flux err
3822.962891 0.0 0.0
+3808 rows ... ...
3 rows x 13 columns

And we can see that each object now has the coadd_spectrum nested column with the full spectrum available.

[9]:
# Look at one of the spectra
spec_ndf.iloc[1].coadd_spectrum
[9]:
wave flux err
0 3799.268555 3.059662e-16 1.552601e-16
1 3800.142578 3.324573e-16 inf
... ... ... ...
3839 9196.020508 5.023617e-16 3.563417e-17
3840 9198.140625 5.170272e-16 5.481493e-17

3841 rows × 3 columns

We now have our spectra nested, and can proceed to do any filtering and analysis as normal within nested-pandas.

[10]:
import matplotlib.pyplot as plt

# Plot a spectrum
spec = spec_ndf.iloc[1].coadd_spectrum

plt.plot(spec["wave"], spec["flux"])
plt.xlabel("Wavelength (Å)")
plt.ylabel(r"Flux ($ergs/s/cm^2/Å$)")
[10]:
Text(0, 0.5, 'Flux ($ergs/s/cm^2/Å$)')
../_images/pre_executed_nested_spectra_13_1.png