TG2 Publications

State of the art in selection of variables and functional forms in multivariable analysis—outstanding issues

Willi Sauerbrei, Aris Perperoglou, Matthias Schmid et al for TG2 of the STRATOS initiative



How to select variables and identify functional forms for continuous variables is a key concern when creating a multivariable model. Ad hoc ‘traditional’ approaches to variable selection have been in use for at least 50 years. Similarly, methods for determining functional forms for continuous variables were first suggested many years ago. More recently, many alternative approaches to address these two challenges have been proposed, but knowledge of their properties and meaningful comparisons between them are scarce. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, many outstanding issues in multivariable modelling remain. Our main aims are to identify and illustrate such gaps in the literature and present them at a moderate technical level to the wide community of practitioners, researchers and students of statistics.


We briefly discuss general issues in building descriptive regression models, strategies for variable selection, different ways of choosing functional forms for continuous variables and methods for combining the selection of variables and functions. We discuss two examples, taken from the medical literature, to illustrate problems in the practice of modelling.


Our overview revealed that there is not yet enough evidence on which to base recommendations for the selection of variables and functional forms in multivariable analysis. Such evidence may come from comparisons between alternative methods. In particular, we highlight seven important topics that require further investigation and make suggestions for the direction of further research.


Selection of variables and of functional forms are important topics in multivariable analysis. To define a state of the art and to provide evidence-supported guidance to researchers who have only a basic level of statistical knowledge, further comparative research is required.

A review of spline function procedures in R

Aris Perperoglou, Willi Sauerbrei, Michal Abrahamowicz, Matthias Schmid on behalf of TG2 of the STRATOS initiative



With progress on both the theoretical and the computational fronts the use of spline modelling has become an established tool in statistical regression analysis. An important issue in spline modelling is the availability of user friendly, well documented software packages. Following the idea of the STRengthening Analytical Thinking for Observational Studies initiative to provide users with guidance documents on the application of statistical methods in observational research, the aim of this article is to provide an overview of the most widely used spline-based techniques and their implementation in R.


In this work, we focus on the R Language for Statistical Computing which has become a hugely popular statistics software. We identified a set of packages that include functions for spline modelling within a regression framework. Using simulated and real data we provide an introduction to spline modelling and an overview of the most popular spline functions.


We present a series of simple scenarios of univariate data, where different basis functions are used to identify the correct functional form of an independent variable. Even in simple data, using routines from different packages would lead to different results.


This work illustrate challenges that an analyst faces when working with data. Most differences can be attributed to the choice of hyper-parameters rather than the basis used. In fact an experienced user will know how to obtain a reasonable outcome, regardless of the type of spline used. However, many analysts do not have sufficient knowledge to use these powerful tools adequately and will need more guidance.

Introducing the Topic Group on Selection of Variables and Functional Forms in Multivariable Analysis (TG2)

Aris Perperoglou , Georg Heinze , Willi Sauerbrei on behalf of STRATOS TG2


The Biometric Bulletin has recently introduced its readership to the STRATOS initiative and described the activities of the Topic Groups on Missing Data (TG1), Measurement Error (TG4) and on Initial Data Analysis (TG3). This series now continues with an introduction to TG2, dealing with selection of variables and functional forms in multivariable analysis.

Systematic review of education and practical guidance on regression modeling for medical researchers who lack a strong statistical background: Study protocol

Paul Bach, Christine Wallisch, Nadja Klein, Lorena Hafermann, Willi Sauerbrei, Ewout W. Steyerberg, Georg Heinze, Geraldine Rauch , for topic group 2 of the STRATOS initiative


In the last decades, statistical methodology has developed rapidly, in particular in the field of regression modeling. Multivariable regression models are applied in almost all medical research projects. Therefore, the potential impact of statistical misconceptions within this field can be enormous Indeed, the current theoretical statistical knowledge is not always adequately transferred to the current practice in medical statistics. Some medical journals have identified this problem and published isolated statistical articles and even whole series thereof. In this systematic review, we aim to assess the current level of education on regression modeling that is provided to medical researchers via series of statistical articles published in medical journals. The present manuscript is a protocol for a systematic review that aims to assess which aspects of regression modeling are covered by statistical series published in medical journals that intend to train and guide applied medical researchers with limited statistical knowledge. Statistical paper series cannot easily be summarized and identified by common keywords in an electronic search engine like Scopus. We therefore identified series by a systematic request to statistical experts who are part or related to the STRATOS Initiative (STRengthening Analytical Thinking for Observational Studies). Within each identified article, two raters will independently check the content of the articles with respect to a predefined list of key aspects related to regression modeling. The content analysis of the topic-relevant articles will be performed using a predefined report form to assess the content as objectively as possible. Any disputes will be resolved by a third reviewer. Summary analyses will identify potential methodological gaps and misconceptions that may have an important impact on the quality of analyses in medical research. This review will thus provide a basis for future guidance papers and tutorials in the field of regression modeling which will enable medical researchers 1) to interpret publications in a correct way, 2) to perform basic statistical analyses in a correct way and 3) to identify situations when the help of a statistical expert is required.

Suggested Publications


  • Harrell FE. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Springer: New York, 2015.

  • Royston P, Sauerbrei W. Multivariable model-building: a pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables. John Wiley & Sons; 2008.

  • Wood S. Generalized Additive Models. Chapman & Hall/CRC: New York, 2006.

  • Miller A. Subset Selection in Regression. Taylor & Francis: Boca Raton, Florida, 2002.

  • Boer C de. A Practical Guide to Splines revised edn. Springer: New York, 2001.

  • Hastie T, Tibshirani R.. Generalized Additive Models. Chapman & Hall/CRC: New York, 1990.


  • Lu Z, Lou W. Bayesian approaches to variable selection: a comparative study from practical perspectives. Int J Biostat 2021;

  • Heinze G, Wallisch C, Dunkler D. Variable selection – a review and recommendations for the practicing statistician. Biometrical J. 2018; 60:431–49.

  • Binder H, Sauerbrei W, Royston P. Comparison between splines and fractional polynomials for multivariable model building with continuous covariates: a simulation study with continuous response. Statistics in Medicine 2013; 32(13): 2262– 2277.

  • Royston P, Sauerbrei W. Multivariable Modelbuilding. A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Continuous Variables. Wiley: Chichester, 2008.

  • Sauerbrei W, Royston P, Binder H. Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in Medicine 2007; 26: 5512– 5528.

  • Abrahamowicz M, MacKenzie TA. Joint estimation of timedependent and nonlinear effects of continuous covariates on survival. Statistics in Medicine 2007; 26(2): 392– 408.

  • Abrahamowicz M, Du Berger R, Grover SA. Flexible modeling of the effects of serum cholesterol on coronary heart disease mortality. American Journal of Epidemiology 1997; 145(8): 714– 729.

  • Greenland S. Avoiding power loss associated with categorization and ordinal scores in doseresponse and trend analysis. Epidemiology (Cambridge, Mass.) 1995; 6(4): 450– 454.

  • Royston P, Altman DG. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Applied Statistic 1994; 43(3): 429– 467.