mlguess package

mlguess package#

Subpackages#

Submodules#

mlguess.classifier_uq module#

mlguess.classifier_uq.brier_multi(targets, probs, num_classes=4, skill_score=True)#

mlguess.classifier_uq.brier_skill_score_multiclass(true_labels, pred_probs, skill_score=True)#

mlguess.classifier_uq.classifier_attribution(true_labels, pred_probs, num_bins=10, legend_cols=['Rain', 'Snow', 'Sleet', 'Frz Rain'], save_location=False, prefix=False)#

mlguess.classifier_uq.classifier_discard_fraction(df, num_bins=10, num_classes=4, uncertainty_cols=['aleatoric', 'epistemic', 'total', 'evidential'], legend_cols=['Rain', 'Snow', 'Sleet', 'Frz Rain'], plt_titles=['Aleatoric', 'Epistemic', 'Total', 'Evidential'], colors=['#f8d605', '#ce4912', '#042c71', 'b', 'g'], save_location=False, prefix=False)#

mlguess.classifier_uq.classifier_skill_scores(true_labels, pred_probs, aleatoric, epistemic, total, evidential, num_bins=10, legend_cols=['Aleatoric', 'Epistemic', 'Total', 'Evidential'], save_location=False, prefix=False)#

mlguess.classifier_uq.plot_uncertainties(df, input_cols, output_cols, num_bins=20, legend_cols=None, x_labels=None, y_labels=None, fontsize=10, save_location=None)#

mlguess.classifier_uq.sort_arr(true_labels, pred_probs, confidences, n_bins=10, data_min=False, data_max=False)#

mlguess.classifier_uq.uq_results(df, save_location=None, prefix=None)#

mlguess.pbs module#

mlguess.pbs.launch_distributed_jobs(config_file, script_path, launch=True)#

Launches a distributed job across multiple nodes using PBS and MPI.

This function generates a PBS script based on the provided configuration file, copies the necessary files, and optionally submits the job to the queue.

Parameters:

config_file (str) – Path to the YAML configuration file containing PBS options.
script_path (str) – Path to the Python script to be executed in the distributed environment.
launch (bool, optional) – If True, submits the job using qsub. If False, only generates the script. Defaults to True.

mlguess.pbs.launch_pbs_jobs(config_file, trainer_path, args='')#

Launches a PBS job using the specified configuration file and trainer script.

This function reads the configuration file to construct a PBS script, writes the script to a file, submits the job using qsub, and then cleans up the script file.

Parameters:

config_file (str) – Path to the YAML configuration file containing PBS options.
trainer_path (str) – Path to the Python training script to be executed.
args (str, optional) – Additional command-line arguments to pass to the training script. Defaults to an empty string.

Raises:

ValueError – If the ‘pbs’ section is not present in the configuration file.

mlguess.pit module#

mlguess.pit.pit_deviation(y_true, y_pred, pred_type='ensemble', bins=10)#: Runs pit_histogram then calculates the pit_deviation. See docstring for pit_histogram.

mlguess.pit.pit_deviation_skill_score(y_true, y_pred, pred_type='ensemble', bins=10)#: Calculate PITD score relative to the worst possible PITD for a given number of bins. Ranges from 0 to 1.

mlguess.pit.pit_deviation_worst(n_bins)#: Calculate the worst possible PITD score based on the number of bins. Assumes all the forecasts end up in one of the outermost bins.

mlguess.pit.pit_gaussian_ensemble(y_true, y_pred_gauss_ens)#

Calculate the probability integral transform quantile for an ensemble of Gaussian parametric models

Parameters:

y_true – true values with shape (n_samples,)
y_pred_gauss_ens – ensemble of gaussian predictions (mean and standard deviation) with shape (n_samples, n_params, n_members)

Returns:

for each sample, the true value’s quantile in the predicted distribution.

Return type:

pit_quantiles

mlguess.pit.pit_histogram(y_true, y_pred, pred_type='ensemble', bins=10)#

Calculate PIT histogram values for different types of ensemble predictions. Predictions can take one of three formats: * ensemble: array of n deterministic ensemble predictions with shape (n_samples, n_members). * gaussian: arary of 1 gaussian distribution prediction (mean and standard deviation) with shape (n_samples, n_params) * gaussian_ensemble: array of multiple gaussian distributions for the same prediction with shape (n_samples, n_params, n_members)

Parameters:

y_true – true values with shape (n_samples,)
y_pred – predictions in format of gaussian, gaussian_ensemble, or ensemble.
pred_type – ‘ensemble’, ‘gaussian’, or ‘gaussian_ensemble’
bins – number of bins or array of specific bin edges. Bins should cover space between 0 and 1.

Returns:

pit_hist, pit_bins

mlguess.pit.pit_histogram_values(pit_quantiles, bins=10)#

mlguess.pit.probability_integral_transform_ensemble(y_true, y_pred_ens)#

Calculate the probability integral transform quantiles for an ensemble of predictions

Parameters:

y_true – true values with shape (n_samples,)
y_pred_ens – predicted ensemble values with shape (n_samples, n_ensemble_members)

Returns:

for each sample, the true value’s quantile in the predicted distribution.

Return type:

pit_quantiles

mlguess.pit.probability_integral_transform_gaussian(y_true, y_pred_gaussian)#

Calculate the probability integral transform quantiles for a single Gaussian distribution.

Parameters:

y_true – true values with shape (n_samples,)
y_pred_gaussian – predicted Gaussian parameters (mean, stand. dev.) with shape (n_samples, n_params)

Returns:

for each sample, the true value’s quantile in the predicted distribution.

Return type:

pit_quantiles

mlguess.plotting module#

mlguess.plotting.compute_cov(df, col='pred_conf', quan='uncertainty', ascending=False)#

mlguess.plotting.conus_plot(df, dataset='mping', column='pred_label', title='Predicted', save_path=False)#

mlguess.plotting.coverage_figures(test_data, output_cols, colors=None, title=None, save_location=None)#

mlguess.plotting.plot_confusion_matrix(y_true, y_pred, classes, model_name, normalize=False, title=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, filename=None)#: Function to plot a confusion matrix.

mlguess.preprocessing module#

mlguess.preprocessing.load_preprocessing(conf, seed=1000)#

mlguess.preprocessing.load_preprocessor(stype, params, seed=1000)#

mlguess.regression_metrics module#

mlguess.regression_metrics.calculate_skill_score(y_true, y_pred, sigma, num_bins=10, log=False, filter_top_percentile=0)#

mlguess.regression_metrics.regression_metrics(y_true, y_pred, total=None, split='val')#

Compute common regression metrics for continuous data.

Parameters: y_true (array-like): True target values. y_pred (array-like): Predicted target values.

Returns: dict: A dictionary containing common regression metrics.

mlguess.regression_metrics.rmse_crps_skill_scores(y, mu, total, filter_top_percentile=0)#

mlguess.regression_uq module#

mlguess.regression_uq.calculate_skill_score(y_true, y_pred, sigma, num_bins=10, log=False, filter_top_percentile=5)#

mlguess.regression_uq.calibration(dataframe, a_cols, e_cols, mae_cols, legend_cols, bins=10, save_location=False)#

mlguess.regression_uq.calibration_curve(df, col='var', quan='error', bins=10)#

mlguess.regression_uq.compute_coverage(df, col='var', quan='error')#

mlguess.regression_uq.compute_results(df, output_cols, mu, aleatoric, epistemic, ensemble_mu=False, ensemble_type=False, legend_cols=['Friction_velocity', 'Sensible_heat', 'Latent_heat'], fn=None)#

mlguess.regression_uq.compute_skill_score(y_true, y_pred, y_std, num_bins=10)#

Computes the skill score with RMSE on the y-axis and binned spread on the x-axis.

Parameters:

y_true (array-like) – A 1D array of true values.
y_pred (array-like) – A 1D array of predicted values.
y_std (array-like) – A 1D array of standard deviations of predicted values.
num_bins (int, optional) – The number of bins to use for binning the spread.
Returns
-------
ss (array-like) – A 2D array of skill scores.
bins (array-like) – A 1D array of bin edges for the spread.

mlguess.regression_uq.discard_fraction(df, output_cols, legend_cols, save_location=False, fontsize=10)#

mlguess.regression_uq.pit_figure_ensemble(df, output_cols, mu, legend_cols=['Friction velocity', 'Sensible heat', 'Latent heat'], title='Ensemble', save_location=None)#

mlguess.regression_uq.pit_figure_gaussian(df, output_cols, mu, aleatoric, epistemic, titles=['Aleatoric', 'Epistemic', 'Total'], legend_cols=['Friction velocity', 'Sensible heat', 'Latent heat'], save_location=None)#

mlguess.regression_uq.plot_skill_score(y_true, y_pred, y_ale, y_epi, output_cols, num_bins=50, legend_cols=None, save_location=False)#

Plots the skill score with RMSE on the y-axis and binned spread on the x-axis.

Parameters:

y_true (array-like) – A 1D array of true values.
y_pred (array-like) – A 1D array of predicted values.
y_std (array-like) – A 1D array of standard deviations of predicted values.
num_bins (int, optional) – The number of bins to use for binning the spread.

mlguess.regression_uq.plot_uncertainties(ale, epi, output_cols, num_bins=20, legend_cols=None, fontsize=10, save_location=None)#

mlguess.regression_uq.regression_attributes(df, output_cols, legend_cols, nbins=11, save_location=False, fontsize=10)#

mlguess.regression_uq.rmse_crps_skill_scores(output_cols, df, mu, aleatoric, epistemic, titles, save_location=None)#

mlguess.splitting module#

mlguess.splitting.load_splitter(splitter, n_splits=1, random_state=1000, train_size=0.9)#

mlguess package

Contents

mlguess package#

Subpackages#

Submodules#

mlguess.classifier_uq module#

mlguess.pbs module#

mlguess.pit module#

mlguess.plotting module#

mlguess.preprocessing module#

mlguess.regression_metrics module#

mlguess.regression_uq module#

mlguess.splitting module#

Module contents#