Quick Start¶
To run an example fit, get the example datasets by cloning this repo:
Then within a python session (or simply do python fit_example.py):
from symbolfit.symbolfit import *
dataset = importlib.import_module('examples.datasets.toy_dataset_1.dataset')
pysr_config = importlib.import_module('examples.pysr_configs.pysr_config_gauss').pysr_config
model = SymbolFit(
x = dataset.x,
y = dataset.y,
y_up = dataset.y_up,
y_down = dataset.y_down,
pysr_config = pysr_config,
max_complexity = 60,
input_rescale = True,
scale_y_by = 'mean',
max_stderr = 20,
fit_y_unc = True,
random_seed = None,
loss_weights = None
)
model.fit()
After the fit, save results to csv files:
and plot results to pdf files:
model.plot_to_pdf(
output_dir = 'output_dir/',
bin_widths_1d = dataset.bin_widths_1d,
plot_logy = False,
plot_logx = False,
sampling_95quantile = False,
#bin_edges_2d = dataset.bin_edges_2d,
#plot_logx0 = False,
#plot_logx1 = False,
#cbar_min = None,
#cbar_max = None,
#cmap = None,
#contour = None,
# ^ additional options for 2D plotting
)
Candidate functions with full substitutions can be printed promptly:
Each fit will produce a batch of candidate functions and will automatically save all results to six output files:
candidates.csv: saves all candidate functions and evaluations in a csv table.candidates_compact.csv: saves a reduced version for essential information without intermediate results.candidates.pdf: plots all candidate functions (1D/2D only for now) with associated uncertainties one by one for fit quality evaluation.candidates_sampling.pdf: plots all candidate functions (1D only for now) with total uncertainty coverage generated by sampling parameters.candidates_gof.pdf: plots the goodness-of-fit scores.candidates_correlation.pdf: plots the correlation matrices for the parameters of the candidate functions.
Note
The function space is usually huge, even when constrained by the pysr
config. This means that if you are not satisfied with the results from a
fit, you can simply rerun it with the exact same config and obtain a
completely different set of candidate functions-the only difference
being the random seed that initiates the seeding functions. Therefore,
you can rerun the fit as many times as you want until you are satisfied
with the results. If you use
model = SymbolFit(..., random_seed = None, ...), nothing needs to be
changed-just rerun the fit. If you set a specific random_seed, change
its value before rerunning. However, if you are still not satisfied with
the results after many trials, it might indicate an issue with the
config. Then you might want to try a different config, tune it, and
start new runs.
Output files from an example fit can be found and downloaded here for illustration.