[統計與數據科學研究所]一種無模型的變量篩選方法，用於高維生存數據的最優治療方案

[Institute of Statistics and Data Science]A model-free variable screening method for optimal treatment regimes with high-dimensional survival data

A model-free variable screening method for optimal treatment regimes with high-dimensional survival data

Sustainable Development Goals

Abstract/Objectives

We study the recent outburst of the black hole candidate EXO 1846-031, which went into an outburst in 2019 after almost 34 yr in quiescence. We use archival data from the Swift/XRT, MAXI/GSC, NICER/XTI, and NuSTAR/FPM satellites/instruments to study the evolution of the spectral and temporal properties of the source during the outburst. The low-energy (2-10 keV) X-ray flux of the outburst shows multiple peaks, making it a multipeak outburst. Evolving type-C quasi-periodic oscillations are observed in the NICER data in the hard, hard-intermediate, and soft-intermediate states. We use the physical two-component advective flow (TCAF) model to analyze the combined spectra of multiple satellite instruments. According to the TCAF model, the accreting matter is divided into Keplerian and sub-Keplerian parts, and the variation in the observed spectra in different spectral states arises out of the variable contributions of these two types of accreting matter in the total accretion rate. Studying the evolution of the accretion rates and other properties of the accretion flow obtained from the spectral analysis, we show that the multiple peaks in the outburst flux arise out of the variable supply of accreting matter from the pile-up radius. We determine the probable mass of the black hole to be 10.4 − 0.2 + 0.1 M ⊙ from the spectral analysis with the TCAF model. We also estimate the viscous timescale of the source in this outburst to be ∼8 days from the peak difference of the Keplerian and sub-Keplerian mass-accretion rates. © 2023. The Author(s). Published by the American Astronomical Society.

Results/Contributions

In this paper, our contribution lies in proposing a model-free variable screening method for selecting optimal treatment regimes in high-dimensional survival data. This screening method provides a unified framework to filter important variables within a predefined target population, with the treatment group included as a special case. Under this framework, the optimal treatment regime can be viewed as the best classifier that minimizes the weighted misclassification rate, where the weights are related to the survival outcome variable, the limiting distribution, and the predefined target population. Our main contribution is in reconstructing this weighted classification problem as a classification problem, allowing the observed data to be seen as coming from a result-related sampling, where the selection probability is inversely proportional to the weights. Consequently, we introduce a weighted Kolmogorov–Smirnov method to filter important variables in the optimal treatment regime and extend the application of the traditional Kolmogorov–Smirnov method in binary classification.

Furthermore, the proposed method demonstrates robustness on two levels. First, it does not require any model assumptions regarding the relationship between the survival outcome variable and the treatment and covariates; second, the form of the treatment regime can be unspecified, applicable even without assuming convex surrogate losses (such as logit loss or hinge loss). Therefore, this screening method exhibits robustness to model misspecifications and allows for the application of non-parametric learning methods such as random forests and boosting on the selected variables for further analysis. We establish the theoretical properties of the method and validate its performance through simulation studies, culminating in an empirical application on lung cancer data.

This paper has been published in the Biometrika journal (2024), titled "A model-free variable screening method for optimal treatment regimes with high-dimensional survival data," in collaboration with my former master's student Yang Cheng-Han.

Keywords

variable screening methodhigh-dimensional survival datarandom forestsboosting

References

1. https://academic.oup.com/biomet/article-abstract/111/4/1369/7658388

Contact Information

鄭又仁老師

ycheng@stat.nthu.edu.tw