mlguess package#
Subpackages#
- mlguess.keras package
- mlguess.tests package
- mlguess.torch package
- Submodules
- mlguess.torch.class_losses module
- mlguess.torch.layers module
- mlguess.torch.mc_dropout module
- mlguess.torch.models module
- mlguess.torch.regression_losses module
- Module contents
Submodules#
mlguess.classifier_uq module#
- mlguess.classifier_uq.brier_multi(targets, probs, num_classes=4, skill_score=True)#
- mlguess.classifier_uq.brier_skill_score_multiclass(true_labels, pred_probs, skill_score=True)#
- mlguess.classifier_uq.classifier_attribution(true_labels, pred_probs, num_bins=10, legend_cols=['Rain', 'Snow', 'Sleet', 'Frz Rain'], save_location=False, prefix=False)#
- mlguess.classifier_uq.classifier_discard_fraction(df, num_bins=10, num_classes=4, uncertainty_cols=['aleatoric', 'epistemic', 'total', 'evidential'], legend_cols=['Rain', 'Snow', 'Sleet', 'Frz Rain'], plt_titles=['Aleatoric', 'Epistemic', 'Total', 'Evidential'], colors=['#f8d605', '#ce4912', '#042c71', 'b', 'g'], save_location=False, prefix=False)#
- mlguess.classifier_uq.classifier_skill_scores(true_labels, pred_probs, aleatoric, epistemic, total, evidential, num_bins=10, legend_cols=['Aleatoric', 'Epistemic', 'Total', 'Evidential'], save_location=False, prefix=False)#
- mlguess.classifier_uq.plot_uncertainties(df, input_cols, output_cols, num_bins=20, legend_cols=None, x_labels=None, y_labels=None, fontsize=10, save_location=None)#
- mlguess.classifier_uq.sort_arr(true_labels, pred_probs, confidences, n_bins=10, data_min=False, data_max=False)#
- mlguess.classifier_uq.uq_results(df, save_location=None, prefix=None)#
mlguess.pbs module#
- mlguess.pbs.launch_distributed_jobs(config_file, script_path, launch=True)#
Launches a distributed job across multiple nodes using PBS and MPI.
This function generates a PBS script based on the provided configuration file, copies the necessary files, and optionally submits the job to the queue.
- Parameters:
config_file (str) – Path to the YAML configuration file containing PBS options.
script_path (str) – Path to the Python script to be executed in the distributed environment.
launch (bool, optional) – If True, submits the job using qsub. If False, only generates the script. Defaults to True.
- mlguess.pbs.launch_pbs_jobs(config_file, trainer_path, args='')#
Launches a PBS job using the specified configuration file and trainer script.
This function reads the configuration file to construct a PBS script, writes the script to a file, submits the job using qsub, and then cleans up the script file.
- Parameters:
config_file (str) – Path to the YAML configuration file containing PBS options.
trainer_path (str) – Path to the Python training script to be executed.
args (str, optional) – Additional command-line arguments to pass to the training script. Defaults to an empty string.
- Raises:
ValueError – If the ‘pbs’ section is not present in the configuration file.
mlguess.pit module#
- mlguess.pit.pit_deviation(y_true, y_pred, pred_type='ensemble', bins=10)#
Runs pit_histogram then calculates the pit_deviation. See docstring for pit_histogram.
- mlguess.pit.pit_deviation_skill_score(y_true, y_pred, pred_type='ensemble', bins=10)#
Calculate PITD score relative to the worst possible PITD for a given number of bins. Ranges from 0 to 1.
- mlguess.pit.pit_deviation_worst(n_bins)#
Calculate the worst possible PITD score based on the number of bins. Assumes all the forecasts end up in one of the outermost bins.
- mlguess.pit.pit_gaussian_ensemble(y_true, y_pred_gauss_ens)#
Calculate the probability integral transform quantile for an ensemble of Gaussian parametric models
- Parameters:
y_true – true values with shape (n_samples,)
y_pred_gauss_ens – ensemble of gaussian predictions (mean and standard deviation) with shape (n_samples, n_params, n_members)
- Returns:
for each sample, the true value’s quantile in the predicted distribution.
- Return type:
pit_quantiles
- mlguess.pit.pit_histogram(y_true, y_pred, pred_type='ensemble', bins=10)#
Calculate PIT histogram values for different types of ensemble predictions. Predictions can take one of three formats: * ensemble: array of n deterministic ensemble predictions with shape (n_samples, n_members). * gaussian: arary of 1 gaussian distribution prediction (mean and standard deviation) with shape (n_samples, n_params) * gaussian_ensemble: array of multiple gaussian distributions for the same prediction with shape (n_samples, n_params, n_members)
- Parameters:
y_true – true values with shape (n_samples,)
y_pred – predictions in format of gaussian, gaussian_ensemble, or ensemble.
pred_type – ‘ensemble’, ‘gaussian’, or ‘gaussian_ensemble’
bins – number of bins or array of specific bin edges. Bins should cover space between 0 and 1.
- Returns:
pit_hist, pit_bins
- mlguess.pit.pit_histogram_values(pit_quantiles, bins=10)#
- mlguess.pit.probability_integral_transform_ensemble(y_true, y_pred_ens)#
Calculate the probability integral transform quantiles for an ensemble of predictions
- Parameters:
y_true – true values with shape (n_samples,)
y_pred_ens – predicted ensemble values with shape (n_samples, n_ensemble_members)
- Returns:
for each sample, the true value’s quantile in the predicted distribution.
- Return type:
pit_quantiles
- mlguess.pit.probability_integral_transform_gaussian(y_true, y_pred_gaussian)#
Calculate the probability integral transform quantiles for a single Gaussian distribution.
- Parameters:
y_true – true values with shape (n_samples,)
y_pred_gaussian – predicted Gaussian parameters (mean, stand. dev.) with shape (n_samples, n_params)
- Returns:
for each sample, the true value’s quantile in the predicted distribution.
- Return type:
pit_quantiles
mlguess.plotting module#
- mlguess.plotting.compute_cov(df, col='pred_conf', quan='uncertainty', ascending=False)#
- mlguess.plotting.conus_plot(df, dataset='mping', column='pred_label', title='Predicted', save_path=False)#
- mlguess.plotting.coverage_figures(test_data, output_cols, colors=None, title=None, save_location=None)#
- mlguess.plotting.plot_confusion_matrix(y_true, y_pred, classes, model_name, normalize=False, title=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, filename=None)#
Function to plot a confusion matrix.
mlguess.preprocessing module#
- mlguess.preprocessing.load_preprocessing(conf, seed=1000)#
- mlguess.preprocessing.load_preprocessor(stype, params, seed=1000)#
mlguess.regression_metrics module#
- mlguess.regression_metrics.calculate_skill_score(y_true, y_pred, sigma, num_bins=10, log=False, filter_top_percentile=0)#
- mlguess.regression_metrics.regression_metrics(y_true, y_pred, total=None, split='val')#
Compute common regression metrics for continuous data.
Parameters: y_true (array-like): True target values. y_pred (array-like): Predicted target values.
Returns: dict: A dictionary containing common regression metrics.
- mlguess.regression_metrics.rmse_crps_skill_scores(y, mu, total, filter_top_percentile=0)#
mlguess.regression_uq module#
- mlguess.regression_uq.calculate_skill_score(y_true, y_pred, sigma, num_bins=10, log=False, filter_top_percentile=5)#
- mlguess.regression_uq.calibration(dataframe, a_cols, e_cols, mae_cols, legend_cols, bins=10, save_location=False)#
- mlguess.regression_uq.calibration_curve(df, col='var', quan='error', bins=10)#
- mlguess.regression_uq.compute_coverage(df, col='var', quan='error')#
- mlguess.regression_uq.compute_results(df, output_cols, mu, aleatoric, epistemic, ensemble_mu=False, ensemble_type=False, legend_cols=['Friction_velocity', 'Sensible_heat', 'Latent_heat'], fn=None)#
- mlguess.regression_uq.compute_skill_score(y_true, y_pred, y_std, num_bins=10)#
Computes the skill score with RMSE on the y-axis and binned spread on the x-axis.
- Parameters:
y_true (array-like) – A 1D array of true values.
y_pred (array-like) – A 1D array of predicted values.
y_std (array-like) – A 1D array of standard deviations of predicted values.
num_bins (int, optional) – The number of bins to use for binning the spread.
Returns
-------
ss (array-like) – A 2D array of skill scores.
bins (array-like) – A 1D array of bin edges for the spread.
- mlguess.regression_uq.discard_fraction(df, output_cols, legend_cols, save_location=False, fontsize=10)#
- mlguess.regression_uq.pit_figure_ensemble(df, output_cols, mu, legend_cols=['Friction velocity', 'Sensible heat', 'Latent heat'], title='Ensemble', save_location=None)#
- mlguess.regression_uq.pit_figure_gaussian(df, output_cols, mu, aleatoric, epistemic, titles=['Aleatoric', 'Epistemic', 'Total'], legend_cols=['Friction velocity', 'Sensible heat', 'Latent heat'], save_location=None)#
- mlguess.regression_uq.plot_skill_score(y_true, y_pred, y_ale, y_epi, output_cols, num_bins=50, legend_cols=None, save_location=False)#
Plots the skill score with RMSE on the y-axis and binned spread on the x-axis.
- Parameters:
y_true (array-like) – A 1D array of true values.
y_pred (array-like) – A 1D array of predicted values.
y_std (array-like) – A 1D array of standard deviations of predicted values.
num_bins (int, optional) – The number of bins to use for binning the spread.
- mlguess.regression_uq.plot_uncertainties(ale, epi, output_cols, num_bins=20, legend_cols=None, fontsize=10, save_location=None)#
- mlguess.regression_uq.regression_attributes(df, output_cols, legend_cols, nbins=11, save_location=False, fontsize=10)#
- mlguess.regression_uq.rmse_crps_skill_scores(output_cols, df, mu, aleatoric, epistemic, titles, save_location=None)#
mlguess.splitting module#
- mlguess.splitting.load_splitter(splitter, n_splits=1, random_state=1000, train_size=0.9)#