Data Preprocessing#

The TimeSeriesData class provides a standard container for photometric time series with built-in preprocessing, PSD, and ACF computation.

Sections

  1. Creating a TimeSeriesData from simulated data

  2. Loading Kepler data via lightkurve

  3. Normalization, sigma clipping, PSD, and ACF

import sys
sys.path.append("../..")

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams.update({                                                                                  
    "font.size":        16,   # base font size                                                         
    "axes.titlesize":   20,   # axes title                                                             
    "axes.labelsize":   18,   # x/y axis labels                                                        
    "xtick.labelsize":  14,   # x tick labels                                                          
    "ytick.labelsize":  14,   # y tick labels                                                          
    "legend.fontsize":  14,                                                                            
    "figure.titlesize": 22,   # suptitle
    "axes.formatter.useoffset": False,  # disable scientific notation offset                                                   
})  

from spotgp import TimeSeriesData

1. Simulated data#

Create a synthetic lightcurve with a periodic signal, noise, and a few outliers.

np.random.seed(42)

N = 500
t = np.sort(np.random.uniform(0, 100, N))  # irregular sampling
period = 8.0
flux = 1.0 + 0.005 * np.sin(2 * np.pi * t / period) + 0.001 * np.random.randn(N)
flux_err = np.full(N, 0.001)

# Inject some outliers and NaNs
flux[10] = 1.05
flux[200] = 0.95
flux[300] = np.nan

ts = TimeSeriesData(t, flux, flux_err)
print(ts)
print(f"Median flux: {np.median(ts.y):.6f}")
TimeSeriesData(N=499, baseline=98.79, median_dt=0.1299)
Median flux: 1.000000

The constructor automatically:

  • Removed the NaN entry (N went from 500 to 499)

  • Normalized the flux so the median is 1.0

Plot the raw time series#

ts.plot()
plt.title("Simulated lightcurve (with outliers)")
plt.show()
../_images/fc05ef412ec9becc6f6eb3de948d3ae45baef8e535411209bda500f554cf1ab3.png

Sigma clipping#

Remove the outliers with sigma_clip():

print(f"Before clipping: N = {ts.N}")
ts.sigma_clip(lower=3, upper=3)
print(f"After clipping:  N = {ts.N}")

ts.plot()
plt.title("After sigma clipping")
plt.show()
Before clipping: N = 499
After clipping:  N = 497
../_images/3f77e0d861bcd742273561abf39aee182f10214e18b897316d26046aa4f3e44a.png

Downsampling#

Bin the time series into uniform time bins using inverse-variance weighted averaging. This reduces the number of points while preserving the signal and correctly propagating uncertainties.

print(f"Before downsampling: N = {ts.N}, median dt = {ts.median_dt:.4f}")
ts.downsample(dt=1.0)
print(f"After downsampling:  N = {ts.N}, median dt = {ts.median_dt:.4f}")

ts.plot()
plt.plot(ts.x, ts.y)
plt.title("After downsampling (dt = 1.0 day)")
plt.show()
Before downsampling: N = 497, median dt = 0.1311
After downsampling:  N = 97, median dt = 0.9924
../_images/69290aba33b69fcb833de36b7610d7cf3503ad2ce2620f5e0aa12b72d50755f6.png

PSD and ACF#

Compute and plot the power spectral density and autocorrelation function:

fig, axes = plt.subplots(1, 2, figsize=(14, 4))

ts.plot_psd(ax=axes[0])
axes[0].axvline(1.0 / period, color="r", ls="--", label=f"Injected: $f = 1/{period:.0f}$ d$^{{-1}}$")
axes[0].legend()
axes[0].set_title("Lomb-Scargle PSD")

ts.plot_acf(ax=axes[1], n_bins=100, max_lag=40)
for i in range(1, 5):
    axes[1].axvline(i * period, color="r", ls="--", alpha=0.4)
axes[1].set_title("Autocorrelation function")

plt.tight_layout()
plt.show()
../_images/5a9aa4c531b9010ae4b4039f8c5496011b1c166eb3cbf40a5fc53094d9b5ff01.png

The PSD peaks at the injected frequency (\(1/8\) d\(^{-1}\)) and the ACF shows periodicity at multiples of 8 days.


2. Kepler data with lightkurve#

Load a Kepler quarter for a known active star and preprocess it with TimeSeriesData.

Note

This section requires lightkurve. Install it with pip install lightkurve.

import lightkurve as lk

# Download a single Kepler quarter for a spotted star
search = lk.search_lightcurve("KIC 7985370", mission="Kepler", cadence="long", quarter=5)
lc = search.download()
lc.head()
KeplerLightCurve length=5 LABEL="KIC 7985370" QUARTER=5 AUTHOR=Kepler FLUX_ORIGIN=pdcsap_flux
timefluxflux_errqualitytimecorrcentroid_colcentroid_rowcadencenosap_fluxsap_flux_errsap_bkgsap_bkg_errpdcsap_fluxpdcsap_flux_errsap_qualitypsf_centr1psf_centr1_errpsf_centr2psf_centr2_errmom_centr1mom_centr1_errmom_centr2mom_centr2_errpos_corr1pos_corr2
electron / selectron / sdpixpixelectron / selectron / selectron / selectron / selectron / selectron / spixpixpixpixpixpixpixpixpixpix
Timefloat32float32int32float32float64float64int32float32float32float32float32float32float32int32float64float32float64float32float64float32float64float32float32float32
443.48970645623194——————10000-1.448464e-03319.40248398.02002163731.5593305e+063.1996595e+018.0061802e+031.6346836e+00——————10000————————————319.402482.0221185e-05398.020022.9403081e-05-4.5754738e-024.8700325e-02
443.510140695121661.5641606e+063.2047684e+0110000000010000-1.447925e-03319.40256398.02001163741.5606115e+063.2009605e+018.0055918e+031.6347697e+001.5641606e+063.2047684e+0110000000010000————————————319.402562.0208812e-05398.020012.9394563e-05-4.5619853e-024.9024601e-02
443.53057473435911.5655112e+063.2063877e+0110000000010000-1.447386e-03319.40289398.01947163751.5619460e+063.2023129e+018.0033838e+031.6351467e+001.5655112e+063.2063877e+0110000000010000————————————319.402892.0198473e-05398.019472.9383289e-05-4.5340929e-024.8469041e-02
443.551008873706451.5667932e+063.2078529e+0110000-1.446846e-03319.40306398.01928163761.5632136e+063.2036301e+018.0128496e+031.6329130e+001.5667932e+063.2078529e+0110000————————————319.403062.0186624e-05398.019282.9375227e-05-4.5147281e-024.8476472e-02
443.57144311341111.5680732e+063.2089455e+0110010000-1.446307e-03319.40302398.01933163771.5645386e+063.2048767e+018.0063745e+031.6347470e+001.5680732e+063.2089455e+0110010000————————————319.403022.0174755e-05398.019332.9367231e-05-4.5176014e-024.8612297e-02
ts_kepler = TimeSeriesData.from_lightkurve(lc, normalize=True)
print(ts_kepler)
TimeSeriesData(N=4486, baseline=94.65, median_dt=0.0204)

Plot the raw lightcurve#

ts_kepler.plot(xlabel="Time [BKJD]", ylabel="Normalized flux")
plt.title("KIC 7985370 — Kepler Q5")
plt.show()
../_images/1797bf5485d44a0cbecdb867ac78d3261cb87f6d82f5651484223868977dc153.png

Downsample#

Kepler long-cadence data has ~30 min sampling. Downsample to 1-day bins to speed up GP fitting:

print(f"Before downsampling: N = {ts_kepler.N}, median dt = {ts_kepler.median_dt:.4f}")
ts_kepler.downsample(dt=0.5)
print(f"After downsampling:  N = {ts_kepler.N}, median dt = {ts_kepler.median_dt:.4f}")

ts_kepler.plot(xlabel="Time [BKJD]", ylabel="Normalized flux")
plt.plot(ts_kepler.x, ts_kepler.y, alpha=0.4)
plt.title("After downsampling (dt = 0.5 day)")
plt.show()
Before downsampling: N = 187, median dt = 0.5006
After downsampling:  N = 169, median dt = 0.5006
../_images/441638bedec7c001233fb1224eb3025fac9cf515501b7419d96eadb635f1c536.png

PSD and ACF#

fig, axes = plt.subplots(1, 2, figsize=(14, 4))

ts_kepler.plot_psd(ax=axes[0])
axes[0].set_title("Lomb-Scargle PSD")

ts_kepler.plot_acf(ax=axes[1], n_bins=100)
axes[1].set_title("Autocorrelation function")

plt.tight_layout()
plt.show()
../_images/3fcff591c61b8396f0aa16bfe76da888d2e4b7eff992e4ffb7e1006bcfd86637.png

Summary panel#

A combined three-panel view of the data:

fig, axes = plt.subplots(3, 1, figsize=(12, 10))

ts_kepler.plot(ax=axes[0], xlabel="Time [BKJD]", ylabel="Normalized flux")
axes[0].set_title("KIC 7985370 — Kepler Q5")

ts_kepler.plot_psd(ax=axes[1])
axes[1].set_title("Power spectral density")

ts_kepler.plot_acf(ax=axes[2], n_bins=150)
axes[2].set_title("Autocorrelation function")

plt.tight_layout()
plt.show()
../_images/2e4203623a532ea45584b84d68cd6c615b21e3d621dfd2a4e8470cca375c00b6.png

API summary#

Method

Description

TimeSeriesData(x, y, yerr)

Create from arrays (auto-removes NaN, normalizes by default)

TimeSeriesData.from_lightkurve(lc)

Create from a lightkurve.LightCurve

.normalize()

Divide flux by median

.sigma_clip(lower, upper)

Remove outliers beyond N-sigma

.downsample(dt)

Bin into non-overlapping intervals with inverse-variance weighted mean

.compute_psd()

Lomb-Scargle PSD

.compute_acf()

Binned empirical ACF

.plot()

Plot time series with error bars

.plot_psd()

Plot PSD (log-log by default)

.plot_acf()

Plot ACF