Skip to content

CMS dijet dataset (template spec)

CMS search for high-mass dijet resonances at sqrt(s) = 13 TeV

Differential dijet spectrum (Figure 5), public data taken from HEPDATA

See >>notebook<< for the complete procedure.

This fit generates 20 candidate functions in total! The output files can be found here (feel free to download them and look at what a typical fit will produce).

Lets look at the output file candidates_compact.csv, which is a csv table storing all candidate functions and their evaluations:

Unnamed: 0 Parameterized equation, unscaled Parameters: (best-fit, +1, -1) Covariance Correlation RMSE R2 NDF Chi2 Chi2/NDF p-value
0 a1a3log(a2x0) {'a1': (5.08e-05, 0, 0), 'a2': (7.58007e-05, 1.43e-06, -1.43e-06), 'a3': (0.000894616, 5.94e-05, -5.94e-05)} {'a2, a3': -8.460737866786669e-11, 'a2, a2': 2.0465412685531722e-12, 'a3, a3': 3.5270273803835507e-09} {'a2, a3': -0.9961} 1.493 0.9977 40 37870 946.8 0
1 a2x0*a1 {'a1': (-7.02, 0, 0), 'a2': (4.26379e+24, 5.08e+22, -5.08e+22)} {} {} 1.497 0.9977 41 37870 923.7 0
2 a3(a2x0 + 1)**a1 {'a1': (-11.0, 0, 0), 'a2': (0.000796547, 4.53e-06, -4.53e-06), 'a3': (1102750.0, 40900.0, -40900.0)} {'a2, a3': 0.1848810500350927, 'a2, a2': 2.0559557509711373e-11, 'a3, a3': 1669091717.0163503} {'a2, a3': 0.9979} 0.2736 0.9999 40 1412 35.29 3.265e-270
3 a3(a2x0 + 1)**a1 {'a1': (-11.0, 0, 0), 'a2': (0.000796547, 4.53e-06, -4.53e-06), 'a3': (1102750.0, 40900.0, -40900.0)} {'a2, a3': 0.1848813041222993, 'a2, a2': 2.0559559759106455e-11, 'a3, a3': 1669096119.40858} {'a2, a3': 0.9979} 0.2736 0.9999 40 1412 35.29 3.265e-270
4 a3(a2x03 + x0)a1 {'a1': (-5.74, 0, 0), 'a2': (2.29242e-08, 7.81e-11, -7.81e-11), 'a3': (4.49778e+20, 6.91e+17, -6.91e+17)} {'a2, a3': 50435336.252600916, 'a2, a2': 6.091893850573889e-21, 'a3, a3': 4.778770078487348e+35} {'a2, a3': 0.9346} 0.02582 1 40 78.09 1.952 0.0002948
5 a5(a4x02(a2 + a3x0) + x0)a1 {'a1': (-5.91, 0, 0), 'a2': (-0.0738, 0, 0), 'a3': (8.17716e-05, 5.18e-06, -5.18e-06), 'a4': (0.000277021, 2.14e-05, -2.14e-05), 'a5': (1.3175e+21, 1.36e+19, -1.36e+19)} {'a3, a4': -1.1072343065298495e-10, 'a3, a5': 70285160218420.27, 'a4, a5': -288507989246391.75, 'a3, a3': 2.6877414399463356e-11, 'a4, a4': 4.5706758516093817e-10, 'a5, a5': 1.8537814861120806e+38} {'a3, a4': -0.9988, 'a3, a5': 0.9977, 'a4, a5': -0.9913} 0.03517 1 39 63.32 1.624 0.008193
6 a6(x0(a2 + a5x0)(a3 + a4x0) + x0)*a1 {'a1': (-5.91978, 0.0124, -0.0124), 'a2': (-0.116, 0, 0), 'a3': (-0.0423, 0, 0), 'a4': (7.69e-05, 0, 0), 'a5': (0.000295233, 4.38e-06, -4.38e-06), 'a6': (1.44531e+21, 1.29e+20, -1.29e+20)} {'a1, a5': 5.3207088865981514e-08, 'a1, a6': -1.604028115464063e+18, 'a5, a6': -550756620569488.44, 'a1, a1': 0.00015457215800504133, 'a5, a5': 1.9208385665966632e-11, 'a6, a6': 1.6648001756780168e+40} {'a1, a5': 0.9797, 'a1, a6': -1, 'a5, a6': -0.9748} 0.03672 1 39 62.99 1.615 0.008821
7 a6(a5x0 + x0(a2 + a4x0)(a3 + a4x0))**a1 {'a1': (-6.0, 0, 0), 'a2': (-0.0756072, 0.00373, -0.00373), 'a3': (-0.0272, 0, 0), 'a4': (7.69e-05, 0, 0), 'a5': (0.260647, 0.00366, -0.00366), 'a6': (7.683e+17, 6.98e+16, -6.98e+16)} {'a2, a5': 1.3164057291764314e-05, 'a2, a6': 252989104907177.62, 'a5, a6': 255456422340728.62, 'a2, a2': 1.3926636968702662e-05, 'a5, a5': 1.3394693429415872e-05, 'a6, a6': 4.875841650232542e+33} {'a2, a5': 0.9643, 'a2, a6': 0.9717, 'a5, a6': 1.0} 0.04453 1 39 61.99 1.589 0.01102
8 a6(a5x0(a3x0 + a4x0(a2x02 + a3x0)) + x0)**a1 {'a1': (-5.7, 0, 0), 'a2': (5.92e-09, 0, 0), 'a3': (7.69e-05, 0, 0), 'a4': (0.000551201, 3.6e-05, -3.6e-05), 'a5': (0.344921, 0.0177, -0.0177), 'a6': (3.83357e+20, 3.36e+18, -3.36e+18)} {'a4, a5': -6.355651112150391e-07, 'a4, a6': -118706308135185.27, 'a5, a6': 5.890069363802254e+16, 'a4, a4': 1.2935165564338607e-09, 'a5, a5': 0.000313223039972398, 'a6, a6': 1.1275164449287815e+37} {'a4, a5': -0.9974, 'a4, a6': -0.9814, 'a5, a6': 0.9904} 0.03545 1 39 52.37 1.343 0.07459
9 a6(a5x0(a3x0 + a4x0(a2x02 + a3x0)) + x0)**a1 {'a1': (-5.7, 0, 0), 'a2': (5.92e-09, 0, 0), 'a3': (7.69e-05, 0, 0), 'a4': (0.000551201, 3.6e-05, -3.6e-05), 'a5': (0.344921, 0.0177, -0.0177), 'a6': (3.83357e+20, 3.36e+18, -3.36e+18)} {'a4, a5': -6.355651112150391e-07, 'a4, a6': -118706308135185.27, 'a5, a6': 5.890069363802254e+16, 'a4, a4': 1.2935165564338607e-09, 'a5, a5': 0.000313223039972398, 'a6, a6': 1.1275164449287815e+37} {'a4, a5': -0.9974, 'a4, a6': -0.9814, 'a5, a6': 0.9904} 0.03545 1 39 52.37 1.343 0.07459
10 a6(x0(a3x0(a2x03 + a4x0)(a4x0 + a5) + a3x0) + x0)*a1 {'a1': (-5.66, 0, 0), 'a2': (4.55e-13, 0, 0), 'a3': (2.90315e-05, 1.56e-06, -1.56e-06), 'a4': (7.69e-05, 0, 0), 'a5': (7.8163, 0.562, -0.562), 'a6': (2.96622e+20, 2.76e+18, -2.76e+18)} {'a3, a5': -8.787776403453511e-07, 'a3, a6': 4284326536206.3115, 'a5, a6': -1.528946951212225e+18, 'a3, a3': 2.448790039843823e-12, 'a5, a5': 0.3160777337787326, 'a6, a6': 7.612952544133508e+36} {'a3, a5': -1, 'a3, a6': 0.9951, 'a5, a6': -0.9857} 0.03575 1 39 45.73 1.173 0.2128
11 a7(a5x0(a3x0(a2x03 + a3x0)(a4x0 + a6) + a3x0) + x0)a1 {'a1': (-5.65, 0, 0), 'a2': (4.55e-13, 0, 0), 'a3': (7.69e-05, 0, 0), 'a4': (8.98e-05, 0, 0), 'a5': (0.40515, 0.0209, -0.0209), 'a6': (7.16935, 0.507, -0.507), 'a7': (2.79392e+20, 2.64e+18, -2.64e+18)} {'a5, a6': -0.01056172798546536, 'a5, a7': 5.472024910590873e+16, 'a6, a7': -1.3184128943651082e+18, 'a5, a5': 0.00043587733738634563, 'a6, a6': 0.2565488436052721, 'a7, a7': 6.973351191632643e+36} {'a5, a6': -0.9967, 'a5, a7': 0.9917, 'a6, a7': -0.985} 0.03552 1 39 45.48 1.166 0.2202
12 a7(x0(a4x0 + a6x0(a3x02(a2x03 + a5x0) + a5x0)) + x0)**a1 {'a1': (-5.64, 0, 0), 'a2': (4.55e-13, 0, 0), 'a3': (5.92e-09, 0, 0), 'a4': (3.08548e-05, 1.33e-06, -1.33e-06), 'a5': (7.69e-05, 0, 0), 'a6': (0.000236736, 3.15e-06, -3.15e-06), 'a7': (2.61039e+20, 2.13e+18, -2.13e+18)} {'a4, a6': -4.0798210147698854e-12, 'a4, a7': 2822695252079.888, 'a6, a7': -6259278883513.492, 'a4, a4': 1.7817207290485798e-12, 'a6, a6': 9.912795450928038e-12, 'a7, a7': 4.556286156023498e+36} {'a4, a6': -0.9738, 'a4, a7': 0.9964, 'a6, a7': -0.9329} 0.03548 1 39 42.64 1.093 0.3175
13 a7(a6x0(a4x0 + a5x0(a3x02(a2x03 + a4x0) + a4x0)) + x0)*a1 {'a1': (-5.64, 0, 0), 'a2': (5.76e-13, 0, 0), 'a3': (5.92e-09, 0, 0), 'a4': (7.69e-05, 0, 0), 'a5': (0.000573849, 3.16e-05, -3.16e-05), 'a6': (0.409393, 0.0172, -0.0172), 'a7': (2.61919e+20, 2.13e+18, -2.13e+18)} {'a5, a6': -5.427769212039176e-07, 'a5, a7': -65885162378837.13, 'a6, a7': 3.62581744488366e+16, 'a5, a5': 9.971040085253033e-10, 'a6, a6': 0.0002964678195715919, 'a7, a7': 4.5187787342402686e+36} {'a5, a6': -0.9986, 'a5, a7': -0.9789, 'a6, a7': 0.9897} 0.03678 1 39 42.48 1.089 0.3236
14 a7(a6x0(a4x0 + a5x0(a3x02(a2x04 + a4x0) + a4x0)) + x0)*a1 {'a1': (-5.64, 0, 0), 'a2': (1.06e-16, 0, 0), 'a3': (5.92e-09, 0, 0), 'a4': (7.69e-05, 0, 0), 'a5': (0.00057743, 3.15e-05, -3.15e-05), 'a6': (0.408045, 0.017, -0.017), 'a7': (2.61876e+20, 2.1e+18, -2.1e+18)} {'a5, a6': -5.362956057832366e-07, 'a5, a7': -65035437031858.36, 'a6, a7': 3.5484478765296036e+16, 'a5, a5': 9.936510784214594e-10, 'a6, a6': 0.0002904277118866256, 'a7, a7': 4.4182484934170157e+36} {'a5, a6': -1, 'a5, a7': -0.9832, 'a6, a7': 0.994} 0.03736 1 39 41.57 1.066 0.3593
15 a7(a6x0(a4x0 + a5x0(a3x02(a2x05 + a4x0) + a4x0)) + x0)*a1 {'a1': (-5.64, 0, 0), 'a2': (2.0000000000000002e-20, 0, 0), 'a3': (5.92e-09, 0, 0), 'a4': (7.69e-05, 0, 0), 'a5': (0.00057826, 3.16e-05, -3.16e-05), 'a6': (0.407912, 0.017, -0.017), 'a7': (2.61928e+20, 2.1e+18, -2.1e+18)} {'a5, a6': -5.370977765256392e-07, 'a5, a7': -65126236281690.78, 'a6, a7': 3.5477735908249676e+16, 'a5, a5': 9.967053856289958e-10, 'a6, a6': 0.0002904026653106192, 'a7, a7': 4.417003112436634e+36} {'a5, a6': -0.9998, 'a5, a7': -0.9814, 'a6, a7': 0.9938} 0.03795 1 39 41.54 1.065 0.3607
16 a8(a7x0(a4x0(a3x02(a2x05 + a4x0) + a5x0 + a6) + a4x0) + x0)*a1 {'a1': (-5.64, 0, 0), 'a2': (2.0000000000000002e-20, 0, 0), 'a3': (4.4e-08, 0, 0), 'a4': (7.64229e-05, 6.89e-06, -6.89e-06), 'a5': (0.000572, 0, 0), 'a6': (0.00915, 0, 0), 'a7': (0.413139, 0.0393, -0.0393), 'a8': (2.62736e+20, 7.33e+17, -7.33e+17)} {'a4, a7': -2.7063911809956403e-07, 'a4, a8': -4209939715836.5503, 'a7, a8': 2.438658828730293e+16, 'a4, a4': 4.748553837687573e-11, 'a7, a7': 0.0015434853116839161, 'a8, a8': 5.378342107509645e+35} {'a4, a7': -0.9995, 'a4, a8': -0.8336, 'a7, a8': 0.8466} 0.0403 1 39 41.67 1.069 0.3552
17 a9(a8x0(a5x0(a4x02(a3x04(a2 + a5x0) + a5x0) + a6x0 + a7) + a5x0) + x0)*a1 {'a1': (-5.64, 0, 0), 'a2': (-0.0199, 0, 0), 'a3': (2.6e-16, 0, 0), 'a4': (4.4e-08, 0, 0), 'a5': (7.6729e-05, 5.47e-06, -5.47e-06), 'a6': (0.000572, 0, 0), 'a7': (0.00927, 0, 0), 'a8': (0.411471, 0.031, -0.031), 'a9': (2.62743e+20, 6.58e+17, -6.58e+17)} {'a5, a8': -1.6947439140739597e-07, 'a5, a9': -2833077941506.8174, 'a8, a9': 1.645739013663623e+16, 'a5, a5': 2.98823363031568e-11, 'a8, a8': 0.0009621492771438784, 'a9, a9': 4.3278896221434915e+35} {'a5, a8': -0.9994, 'a5, a9': -0.7871, 'a8, a9': 0.8068} 0.04041 1 39 41.56 1.066 0.3598
18 a8(x0(a4x0(a3x02(a2x06 + a5x0) + a6x0 + a7) + a4x0) + x0)**a1 {'a1': (-5.64, 0, 0), 'a2': (3.08e-24, 0, 0), 'a3': (4.4e-08, 0, 0), 'a4': (3.17e-05, 0, 0), 'a5': (7.88417e-05, 7.1e-06, -7.1e-06), 'a6': (0.000568227, 4.58e-06, -4.58e-06), 'a7': (0.00889, 0, 0), 'a8': (2.6287e+20, 5.13e+17, -5.13e+17)} {'a5, a6': -2.9340640604404285e-11, 'a5, a8': -2929128573480.344, 'a6, a8': 2267372291131.628, 'a5, a5': 5.044990489393001e-11, 'a6, a6': 2.0960202796305254e-11, 'a8, a8': 2.6302513209257903e+35} {'a5, a6': -0.9023, 'a5, a8': -0.8042, 'a6, a8': 0.965} 0.04068 1 39 42.01 1.077 0.342
19 a10(a9x0(a5x0(a4x02(a3x02(a2x04 + a8) + a5x0) + a6x0 + a7) + a5*x0) + x0)a1 {'a1': (-5.64, 0, 0), 'a2': (5.21e-16, 0, 0), 'a3': (5.92e-09, 0, 0), 'a4': (4.4e-08, 0, 0), 'a5': (7.77126e-05, 6.94e-06, -6.94e-06), 'a6': (0.000572, 0, 0), 'a7': (0.00914, 0, 0), 'a8': (0.0425, 0, 0), 'a9': (0.406031, 0.0383, -0.0383), 'a10': (2.62672e+20, 7.37e+17, -7.37e+17)} {'a5, a9': -2.6566736387648835e-07, 'a5, a10': -4264555557282.979, 'a9, a10': 2.3898145634758404e+16, 'a5, a5': 4.8190228952493635e-11, 'a9, a9': 0.0014655714847557353, 'a10, a10': 5.4292481309155894e+35} {'a5, a9': -0.9995, 'a5, a10': -0.8338, 'a9, a10': 0.8466} 0.04013 1 39 41.96 1.076 0.3437

Recall that we used the pysr.TemplateExpressionSpec method to constrain the structure of final expressions to be of "dijet-like" (see pysr config above): p[1] * f(x/13000) ^ g(log(x/13000)), while requiring both f and g to be polynomials. The searched f and g are shown in the PySR template spec column of the full csv file candidates.csv. The expressions shown in candidates_compact.csv and pdf files are after simplified algebraically.

The goodness-of-fit scores are plotted in candidates_gof.pdf, such as the chi2/ndf:

image

For other goodness-of-fit scores:

Click to expand

image

^ p-value

image

^ Root-mean-square error

image

^ Coefficient of determination R2

Now, lets take a look at one of the candidate functions, say candidate #13. The functional form can be found in the corresponding plots from the PDF files and in the csv table above, which is (after some algebraic simplication of the original template p[1] * f(x/13000) ^ g(log(x/13000))):

a7*(a6*x0*(a4*x0 + a5*x0*(a3*x0**2*(a2*x0**3 + a4*x0) + a4*x0)) + x0)**a1.

To see what the template expressions f and g look like, they can be found in the PySR template spec column in the full candidates.csv file:

f = ((((((#1 * ((((#1 * #1) * (#1 * 1.2657217)) + #1) * #1)) + #1) * #1) * 7.4338403) + #1) * (#1 * 0.4126327)) + #1; g = -5.635317; p = [0.0016633321].

In this case, f is a polynomial of x/13000 of some degrees, and g is a constant function. Therefore, the algorithm finds that such a combination is already good to fit the dijet spectrum without needing a polynomial of log(x/13000) in the exponent g. Other suitable candidate functions may have very different combinations in f and g, as the equation space is still very large.

This candidate function has 7 parameters, originally: a1, a2, a3, a4, a5, a6, a7. However, there are only 3 final varying parameters: a5, a6, a7, as can be seen from the Parameters: (best-fit, +1, -1) column in the csv tables or directly from the pdf files:

{'a1': (-5.64, 0, 0), 'a2': (5.76e-13, 0, 0), 'a3': (5.92e-09, 0, 0), 'a4': (7.69e-05, 0, 0), 'a5': (0.000573849, 3.16e-05, -3.16e-05), 'a6': (0.409393, 0.0172, -0.0172), 'a7': (2.61919e+20, 2.13e+18, -2.13e+18)}

where a1, a2, a3, a4 have zeros at both +1 and -1 unc entries, meaning they were both held fixed during the re-optimization. This is because during the re-optimization loop, the objective function was too complex to minimize, therefore some parameters are held fixed to lower the number of degrees of freedom in order to achieve a better fit. This is common when the functions or the distribution shapes are not very simple.

To see how this candidate function behaves when each of these 3 parameters is varied to its +/-1 sigma value:

Click to expand

image

^ +/-1 sigma variations of parameter a5

image

^ +/-1 sigma variations of parameter a6

image

^ +/-1 sigma variations of parameter a7

image

^ Correlation matrix

As shown in the correlation matrix, these parameters are very anti-correlated in this case, so it will be nice to see the actual uncertainty coverage considering uncertainties from all parameters in a candidate function. These are plotted in candidates_sampling.pdf. Here, what it does is to generate an ensemble of functions for a candidate function by sampling its parameters, where the sampling is done by sampling from a multidimensional normal distribution for the parameters, with the best-fit parameter values being the mean location and the covariance matrix for the parameters being the covarience. In this way, the total uncertainty is obtained by considering uncertainties from all parameters simultaneously. Then the 68% quantile range of this function ensemble as green bands in the plots and compared with the input data.

image

Note the 95% quantile range can also be added by sampling_95quantile = True.