mrtool.core package

data

data module for mrtool package.

class MRData(obs=<factory>, obs_se=<factory>, covs=<factory>, study_id=<factory>, data_id=<factory>)[source]

Bases: object

Data for simple linear mixed effects model.

get_covs(covs)[source]

Get covariate matrix.

Parameters:covs (Union[List[str], str]) – List of covariate names or one covariate name.
Returns:Covariates matrix, in the column fashion.
Return type:np.ndarray
get_study_data(studies)[source]

Get study specific data.

Parameters:studies (Union[List[Any], Any]) – List of studies or one study.
Returns
MRData: Data object contains the study specific data.
Return type:MRData
has_covs(covs)[source]

If the data has the provided covariates.

Parameters:covs (Union[List[str], str]) – List of covariate names or one covariate name.
Returns:If has covariates return True.
Return type:bool
has_studies(studies)[source]

If the data has provided study_id

Parameters:Union[List[Any], Any] (studies) – List of studies or one study.
Returns:If has studies return True.
Return type:bool
is_cov_normalized(covs=None)[source]

Return true when covariates are normalized.

Return type:bool
is_empty()[source]

Return true when object contain data.

Return type:bool
load_df(data, col_obs=None, col_obs_se=None, col_covs=None, col_study_id=None, col_data_id=None)[source]

Load data from data frame.

load_xr(data, var_obs=None, var_obs_se=None, var_covs=None, coord_study_id=None)[source]

Load data from xarray.

normalize_covs(covs=None)[source]

Normalize covariates by the largest absolute value for each covariate.

num_covs

Number of covariates.

num_obs

Number of observations.

num_points

Number of data points.

num_studies

Number of studies.

reset()[source]

Reset all the attributes to default values.

to_df()[source]

Convert data object to data frame.

Return type:DataFrame

cov_model

Covariates model for mrtool.

class CovModel(alt_cov, name=None, ref_cov=None, use_re=False, use_re_mid_point=False, use_spline=False, use_spline_intercept=False, spline_knots_type='frequency', spline_knots=array([0., 0.33333333, 0.66666667, 1. ]), spline_degree=3, spline_l_linear=False, spline_r_linear=False, prior_spline_derval_gaussian=None, prior_spline_derval_gaussian_domain=(0.0, 1.0), prior_spline_derval_uniform=None, prior_spline_derval_uniform_domain=(0.0, 1.0), prior_spline_der2val_gaussian=None, prior_spline_der2val_gaussian_domain=(0.0, 1.0), prior_spline_der2val_uniform=None, prior_spline_der2val_uniform_domain=(0.0, 1.0), prior_spline_funval_gaussian=None, prior_spline_funval_gaussian_domain=(0.0, 1.0), prior_spline_funval_uniform=None, prior_spline_funval_uniform_domain=(0.0, 1.0), prior_spline_monotonicity=None, prior_spline_monotonicity_domain=(0.0, 1.0), prior_spline_convexity=None, prior_spline_convexity_domain=(0.0, 1.0), prior_spline_num_constraint_points=20, prior_spline_maxder_gaussian=None, prior_spline_maxder_uniform=None, prior_spline_normalization=None, prior_beta_gaussian=None, prior_beta_uniform=None, prior_beta_laplace=None, prior_gamma_gaussian=None, prior_gamma_uniform=None, prior_gamma_laplace=None)[source]

Bases: object

Covariates model.

attach_data(data)[source]

Attach data.

create_constraint_mat()[source]

Create constraint matrix. :returns: Return linear constraints matrix and its uniform prior. :rtype: tuple{numpy.ndarray, numpy.ndarray}

create_design_mat(data)[source]

Create design matrix. :param data: The data frame used for storing the data :type data: mrtool.MRData

Returns:Return the design matrix for linear cov or spline.
Return type:tuple{numpy.ndarray, numpy.ndarray}
create_regularization_mat()[source]

Create constraint matrix. :returns: Return linear regularization matrix and its Gaussian prior. :rtype: tuple{numpy.ndarray, numpy.ndarray}

create_spline(data, spline_knots=None)[source]

Create spline given current spline parameters. :type data: MRData :param data: The data frame used for storing the data :type data: mrtool.MRData :type spline_knots: Optional[ndarray] :param spline_knots: Spline knots, if None determined by frequency or domain. :type spline_knots: np.ndarray, optional

Returns:The spline object.
Return type:xspline.XSpline
create_x_fun(data)[source]
create_z_mat(data)[source]
has_data()[source]

Return True if there is one data object attached.

num_constraints
num_regularizations
num_x_vars
num_z_vars
class LinearCovModel(*args, **kwargs)[source]

Bases: mrtool.core.cov_model.CovModel

Linear Covariates Model.

create_x_fun(data)[source]

Create design function for the fixed effects.

create_z_mat(data)[source]

Create design matrix for the random effects.

Parameters:data (mrtool.MRData) – The data frame used for storing the data
Returns:Design matrix for random effects.
Return type:numpy.ndarray
class LogCovModel(*args, **kwargs)[source]

Bases: mrtool.core.cov_model.CovModel

Log Covariates Model.

create_constraint_mat(threshold=1e-06)[source]

Create constraint matrix. Overwrite the super class, adding non-negative constraints.

create_x_fun(data)[source]

Create design functions for the fixed effects.

Parameters:data (mrtool.MRData) – The data frame used for storing the data
Returns:Design functions for fixed effects.
Return type:tuple{function, function}
create_z_mat(data)[source]

Create design matrix for the random effects.

Parameters:data (mrtool.MRData) – The data frame used for storing the data
Returns:Design matrix for random effects.
Return type:numpy.ndarray
num_constraints
num_z_vars

model

Model module for mrtool package.

class LimeTr[source]

Bases: object

class MRBRT(data, cov_models, inlier_pct=1.0)[source]

Bases: object

MR-BRT Object

attach_data(data=None)[source]

Attach data to cov_model.

check_input()[source]

Check the input type of the attributes.

create_c_mat()[source]

Create the constraints matrices.

create_draws(data, beta_samples, gamma_samples, random_study=True, sort_by_data_id=False)[source]

Create draws for the given data set.

Parameters:
  • data (MRData) – MRData object contains predict data.
  • beta_samples (np.ndarray) – Samples of beta.
  • gamma_samples (np.ndarray) – Samples of gamma.
  • random_study (bool, optional) – If True the draws will include uncertainty from study heterogeneity.
  • sort_by_data_id (bool, optional) – If True, will sort the final prediction as the order of the original data frame that used to create the data. Default to False.
Returns:

Returns outcome sample matrix.

Return type:

np.ndarray

create_gprior()[source]

Create direct gaussian prior.

create_h_mat()[source]

Create the regularizer matrices.

create_lprior()[source]

Create direct laplace prior.

create_uprior()[source]

Create direct uniform prior.

create_x_fun(data=None)[source]

Create the fixed effects function, link with limetr.

create_z_mat(data=None)[source]

Create the random effects matrix, link with limetr.

extract_re(study_id)[source]

Extract the random effect for a given dataset.

Return type:ndarray
fit_model(**fit_options)[source]

Fitting the model through limetr.

Parameters:
  • x0 (np.ndarray) – Initial guess for the optimization problem.
  • inner_print_level (int) – If non-zero printing iteration information of the inner problem.
  • inner_max_iter (int) – Maximum inner number of iterations.
  • inner_tol (float) – Tolerance of the inner problem.
  • outer_verbose (bool) – If True print out iteration information.
  • outer_max_iter (int) – Maximum outer number of iterations.
  • outer_step_size (float) – Step size of the outer problem.
  • outer_tol (float) – Tolerance of the outer problem.
  • normalize_trimming_grad (bool) – If True, normalize the gradient of the outer trimming problem.
get_cov_model(name)[source]

Choose covariate model with name.

Return type:CovModel
get_cov_model_index(name)[source]

From cov_model name get the index.

Return type:int
predict(data, predict_for_study=False, sort_by_data_id=False)[source]

Create new prediction with existing solution.

Parameters:
  • data (MRData) – MRData object contains the predict data.
  • predict_for_study (bool, optional) – If True, use the random effects information to prediction for specific study. If the study_id in data do not contain in the fitting data, it will assume the corresponding random effects equal to 0.
  • sort_by_data_id (bool, optional) – If True, will sort the final prediction as the order of the original data frame that used to create the data. Default to False.
Returns:

Predicted outcome array.

Return type:

np.ndarray

sample_soln(sample_size=1, sim_prior=True, sim_re=True, max_iter=100, print_level=0)[source]

Sample solutions.

Parameters:
  • sample_size (int, optional) – Number of samples.
  • sim_prior (bool, optional) – If True, simulate priors.
  • sim_re (bool, optional) – If True, simulate random effects.
  • max_iter (int, optional) – Maximum number of iterations. Default to 100.
  • print_level (int, optional) – Level detailed of optimization information printed out during sampling process. If 0, no information will be printed out.
Returns:

Return beta samples and gamma samples.

Return type:

Tuple[np.ndarray, np.ndarray]

summary()[source]

Return the summary data frame.

Return type:Tuple[DataFrame, DataFrame]
class MRBeRT(data, ensemble_cov_model, ensemble_knots, cov_models=None, inlier_pct=1.0)[source]

Bases: object

Ensemble model of MRBRT.

create_draws(data, beta_samples, gamma_samples, random_study=True, sort_by_data_id=False)[source]

Create draws. For function description please check create_draws for MRBRT.

Return type:ndarray
fit_model(x0=None, inner_print_level=0, inner_max_iter=20, inner_tol=1e-08, outer_verbose=False, outer_max_iter=100, outer_step_size=1.0, outer_tol=1e-06, normalize_trimming_grad=False, scores_weights=array([1., 1.]), slopes=array([ 2., 10.]), quantiles=array([0.4, 0.4]))[source]

Fitting the model through limetr.

predict(data, predict_for_study=False, sort_by_data_id=False, return_avg=True)[source]

Create new prediction with existing solution.

Parameters:return_avg (bool) – When it is True, the function will return an average prediction based on the score, and when it is False the function will return a list of predictions from all groups.
Return type:ndarray
sample_soln(sample_size=1, sim_prior=True, sim_re=True, max_iter=100, print_level=0)[source]

Sample solution.

Return type:Tuple[List[ndarray], List[ndarray]]
score_model(scores_weights=array([1., 1.]), slopes=array([ 2., 10.]), quantiles=array([0.4, 0.4]))[source]

Score the model by there fitting and variation.

summary()[source]

Create summary data frame.

Return type:Tuple[DataFrame, DataFrame]
create_knots_samples(data, alt_cov_names=None, ref_cov_names=None, l_zero=True, num_splines=50, num_knots=5, width_pct=0.2, return_settings=False)[source]

Create knot samples for relative risk application.

Parameters:
  • data (MRData) – Data object.
  • alt_cov_names (List[str], optional) – Name of the alternative exposures, if None use [‘b_0’, ‘b_1’]. Default to None.
  • ref_cov_names (List[str], optional) – Name of the reference exposures, if None use [‘a_0’, ‘a_1’]. Default to None.
  • l_zero (bool, optional) – If True, assume the exposure min is 0. Default to True.
  • num_splines (int, optional) – Number of splines. Default to 50.
  • num_knots (int, optional) – Number of the spline knots. Default to 5.
  • width_pct (float, optional) – Minimum percentage distance between knots. Default to 0.2.
  • return_settings (bool, optional) – Returns the knots setting if True. Default to False.
Returns:

Knots samples.

Return type:

np.ndarray

score_sub_models_datafit(mr)[source]

score the result of mrbert

score_sub_models_variation(mr, ensemble_cov_model_name, n=1)[source]

score the result of mrbert

Return type:float

utils

utils module of the mrtool package.

class Matrix[source]

Bases: object

class Polyhedron[source]

Bases: object

get_generators()[source]
class RepType[source]

Bases: object

INEQUALITY = None
avg_integral(mat, spline=None, use_spline_intercept=False)[source]

Compute average integral.

Parameters:
  • mat (numpy.ndarray) – Matrix that contains the starting and ending points of the integral or a single column represents the mid-points.
  • spline (xspline.XSpline | None, optional) – Spline integrate over with, when None treat the function as linear.
  • use_spline_intercept (bool, optional) – If True use all bases from spline, otherwise remove the first bases.
Returns:

Design matrix when spline is not None, otherwise the mid-points.

Return type:

numpy.ndarray

col_diff_mat(n)[source]

column difference matrix

combine_cols(cols)[source]

Combine column names into one list of names.

Parameters:cols (list{str | list{str}}) – A list of names of columns or list of column names.
Returns:Combined names of columns.
Return type:list{str}
empty_array()[source]
expand_array(array, shape, value, name)[source]

Expand array when it is empty.

Parameters:
  • array (np.ndarray) – Target array. If array is empty, fill in the value. And When it is not empty assert the shape agrees and return the original array.
  • shape (Tuple[int]) – The expected shape of the array.
  • value (Any) – The expected value in final array.
  • name (str) – Variable name of the array (for error message).
Returns:

Expanded array.

Return type:

np.ndarray

get_cols(df, cols)[source]

Return the columns of the given data frame. :param df: Given data frame. :type df: pandas.DataFrame :param cols: Given column name(s), if is None, will return a empty data frame. :type cols: str | list{str} | None

Returns:The data frame contains the columns.
Return type:pandas.DataFrame | pandas.Series
input_cols(cols, append_to=None, default=None)[source]

Process the input column name. :param cols: The input column name(s). :type cols: str | list{str} | None :param append_to: A list keep track of all the column names. :type append_to: list{str} | None, optional :param default: Default value when cols is None. :type default: str | list{str} | None, optional

Returns:The name of the column(s)
Return type:str | list{str}
input_gaussian_prior(prior, size)[source]

Process the input Gaussian prior

Parameters:
  • prior (numpy.ndarray) – Either one or two dimensional array, with first group refer to mean and second group refer to standard deviation.
  • size (int, optional) – Size the variable, prior related to.
Returns:

Prior after processing, with shape (2, size), with the first row store the mean and second row store the standard deviation.

Return type:

numpy.ndarray

input_laplace_prior(prior, size)

Process the input Gaussian prior

Parameters:
  • prior (numpy.ndarray) – Either one or two dimensional array, with first group refer to mean and second group refer to standard deviation.
  • size (int, optional) – Size the variable, prior related to.
Returns:

Prior after processing, with shape (2, size), with the first row store the mean and second row store the standard deviation.

Return type:

numpy.ndarray

input_uniform_prior(prior, size)[source]

Process the input Gaussian prior

Parameters:
  • prior (numpy.ndarray) – Either one or two dimensional array, with first group refer to mean and second group refer to standard deviation.
  • size (int, optional) – Size the variable, prior related to.
Returns:

Prior after processing, with shape (2, size), with the first row store the mean and second row store the standard deviation.

Return type:

numpy.ndarray

is_cols(cols)[source]

Check variable type fall into the column name category. :param cols: Column names candidate. :type cols: str | list{str} | None

Returns:if col is either str, list{str} or None
Return type:bool
is_gaussian_prior(prior, size=None)[source]

Check if variable satisfy Gaussian prior format

Parameters:prior (numpy.ndarray) – Either one or two dimensional array, with first group refer to mean and second group refer to standard deviation.
Keyword Arguments:
 size (int | None, optional) – Size the variable, prior related to.
Returns:True if satisfy condition.
Return type:bool
is_laplace_prior(prior, size=None)

Check if variable satisfy Gaussian prior format

Parameters:prior (numpy.ndarray) – Either one or two dimensional array, with first group refer to mean and second group refer to standard deviation.
Keyword Arguments:
 size (int | None, optional) – Size the variable, prior related to.
Returns:True if satisfy condition.
Return type:bool
is_numeric_array(array)[source]

Check if an array is numeric.

Parameters:array (np.ndarray) – Array need to be checked.
Returns:True if the array is numeric.
Return type:bool
is_uniform_prior(prior, size=None)[source]

Check if variable satisfy uniform prior format

Parameters:prior (numpy.ndarray) – Either one or two dimensional array, with first group refer to lower bound and second group refer to upper bound.
Keyword Arguments:
 size (int | None, optional) – Size the variable, prior related to.
Returns:True if satisfy condition.
Return type:bool
mat_to_fun(alt_mat, ref_mat=None)[source]
mat_to_log_fun(alt_mat, ref_mat=None, add_one=True)[source]
nonlinear_trans(score, slope=6.0, quantile=0.7)[source]
ravel_dict(x)[source]

Ravel dictionary.

Return type:dict
sample_knots(num_intervals, knot_bounds=None, interval_sizes=None, num_samples=1)[source]

Sample knots given a set of rules.

Parameters:
  • num_intervals (int) – Number of intervals (number of knots minus 1).
  • knot_bounds (Optional[ndarray]) – Bounds for the interior knots. Here we assume the domain span 0 to 1, bound for a knot should be between 0 and 1, e.g. [0.1, 0.2]. knot_bounds should have number of interior knots of rows, and each row is a bound for corresponding knot, e.g. knot_bounds=np.array([[0.0, 0.2], [0.3, 0.4], [0.3, 1.0]]), for when we have three interior knots.
  • interval_sizes (Optional[ndarray]) – Bounds for the distances between knots. For the same reason, we assume elements in interval_sizes to be between 0 and 1. For example, interval_distances=np.array([[0.1, 0.2], [0.1, 0.3], [0.1, 0.5], [0.1, 0.5]]) means that the distance between first (0) and second knot has to be between 0.1 and 0.2, etc. And the number of rows for interval_sizes has to be same with num_intervals.
  • num_samples (int) – Number of knots samples.
Returns:

Return knots sample as array, with num_samples rows and number of knots columns.

Return type:

np.ndarray

sample_simplex(n, N=1)[source]

sample from n dimensional simplex

sizes_to_indices(sizes)[source]

Converting sizes to corresponding indices. :param sizes: An array consist of non-negative number. :type sizes: numpy.dnarray

Returns:List the indices.
Return type:list{range}
to_list(obj)[source]

Convert objective to list of object.

Parameters:obj (Any) – Object need to be convert.
Returns:If obj already is a list object, return obj itself, otherwise wrap obj with a list and return it.
Return type:List[Any]