Quick Start¶

To run an example fit, get the example datasets by cloning this repo:

git clone https://github.com/hftsoi/symbolfit.git
cd symbolfit

Then within a python session (or simply do python fit_example.py):

from symbolfit.symbolfit import *

dataset = importlib.import_module('examples.datasets.toy_dataset_1.dataset')
pysr_config = importlib.import_module('examples.pysr_configs.pysr_config_gauss').pysr_config

model = SymbolFit(
        x = dataset.x,
        y = dataset.y,
        y_up = dataset.y_up,
        y_down = dataset.y_down,
        pysr_config = pysr_config,
        max_complexity = 60,
        input_rescale = True,
        scale_y_by = 'mean',
        max_stderr = 20,
        fit_y_unc = True,
        random_seed = None,
        loss_weights = None
)

model.fit()

After the fit, save results to csv files:

model.save_to_csv(output_dir = 'output_dir/')

and plot results to pdf files:

model.plot_to_pdf(
        output_dir = 'output_dir/',
        bin_widths_1d = dataset.bin_widths_1d,
        plot_logy = False,
        plot_logx = False,
        sampling_95quantile = False,
        #bin_edges_2d = dataset.bin_edges_2d,
        #plot_logx0 = False,
        #plot_logx1 = False,
        #cbar_min = None,
        #cbar_max = None,
        #cmap = None,
        #contour = None,
        # ^ additional options for 2D plotting
)

Candidate functions with full substitutions can be printed promptly:

model.print_candidate(candidate_number = 20)

Each fit will produce a batch of candidate functions and will automatically save all results to six output files:

candidates.csv: saves all candidate functions and evaluations in a csv table.
candidates_compact.csv: saves a reduced version for essential information without intermediate results.
candidates.pdf: plots all candidate functions (1D/2D only for now) with associated uncertainties one by one for fit quality evaluation.
candidates_sampling.pdf: plots all candidate functions (1D only for now) with total uncertainty coverage generated by sampling parameters.
candidates_gof.pdf: plots the goodness-of-fit scores.
candidates_correlation.pdf: plots the correlation matrices for the parameters of the candidate functions.

Note

The function space is usually huge, even when constrained by the pysr config. This means that if you are not satisfied with the results from a fit, you can simply rerun it with the exact same config and obtain a completely different set of candidate functions-the only difference being the random seed that initiates the seeding functions. Therefore, you can rerun the fit as many times as you want until you are satisfied with the results. If you use model = SymbolFit(..., random_seed = None, ...), nothing needs to be changed-just rerun the fit. If you set a specific random_seed, change its value before rerunning. However, if you are still not satisfied with the results after many trials, it might indicate an issue with the config. Then you might want to try a different config, tune it, and start new runs.

Output files from an example fit can be found and downloaded here for illustration.