archive-org.com » ORG » A » ARCHDUKE.ORG Total: 328 Choose link from "Titles, links and description words view": Or switch to
"Titles and links view". |

- stuff » English football league predictions

predictions The English football league is a tiered system of leagues each of which operates a double all play all every football season August May Fans punters and bookies all have an interest in predicting how the final table may end up You could possibly add mathematicians to that list since there is some scope to make a statistical model of how teams behave and simulate the rest of the season based on this model This model uses Markov Chain Monte Carlo together with a Poisson model of goal scoring and a prior distribution from the previous season to predict future matches The rest of the season is rolled out many thousands of times and the proportion of times a team gets promoted relegated in the playoffs is displayed in the tables linked to here The software is freely available Prediction tables updated daily Description of method Download software including results gathering program and prediction program Comment RSS Trackback Leave a Reply Click here to cancel reply Name required Mail will not be published required Website Pages About D Wave comment on comparison with classical computers D Wave scaling comparison with classical algorithms Efficient subgraph based sampling of models with

Original URL path: http://www.archduke.org/stuff/english-football-league-predictions/ (2016-04-27)

Open archived version from archive - stuff » Description of predictor

where xi i are distributed as independent normals xi i sim N mu i i i 1 The 2n dimensional density of x 1 ldots x 2n 2 is then proportional to int infty infty d lambda 1 int infty infty d lambda 2 exp left half sum i 1 2n 2 i i x i lambda 1g 1 i lambda 2g 2 i mu i 2 right where vec g 1 1 ldots 1 0 ldots 0 1 1 and vec g 2 0 ldots 0 1 ldots 1 1 1 generate G This only makes sense as a probability distribution when restricted to a 2n dimensional surface our current choice being W vec x vert sum i a i sum i d i 0 It is not necessary to actually do any integration here instead you can use standard properties of Gaussian measures Consider the inner product on V R 2n 2 given by vec x vec y sum i i i x i y i with corresponding norm norm and volume form omega V 2 pi n 1 i 1 ldots i 2n 2 half dx 1 ldots dx 2n 2 The unintegrated prior density as a function of vec xi in V is proportional to exp half norm vec xi vec mu 2 omega V vec xi and the integrated density as a function of vec x in V is proportional to exp half norm pi G bot vec x vec mu 2 omega G bot vec x Here G bot is the subspace orthogonal to vec g 1 vec g 2 with respect to the above inner product and pi G bot V to G bot is the corresponding orthogonal projection As before we restrict to W to get a probability distribution In practice it s convenient to calculate norm pi G bot vec v 2 as norm vec v 2 norm pi G vec v 2 since G is only two dimensional Concretely this amounts to the prior component of the log likelihood being given by begin equation begin split L mathrm pr vec x vec mu vec i half sum i 1 2n 2 i i x i mu i 2 half begin matrix begin pmatrix w 1 w 2 end pmatrix mbox end matrix begin pmatrix A B B C end pmatrix 1 begin pmatrix w 1 w 2 end pmatrix label eq Lpr1 end split end equation where begin equation begin split A left sum i 1 n i i right i 2n 1 i 2n 2 B i 2n 1 i 2n 2 C left sum i n 1 2n i i right i 2n 1 i 2n 2 w 1 left sum i 1 n i i x i mu i right i 2n 1 x 2n 1 mu 2n 1 i 2n 2 x 2n 2 mu 2n 2 w 2 left sum i n 1 2n i i x i mu i right i 2n 1 x

Original URL path: http://www.archduke.org/stuff/english-football-league-predictions/description-of-football-league-predictor/ (2016-04-27)

Open archived version from archive - stuff » Software download and use

the best sources and combines them to form a consensus view results and analysis for past years going back to the 1993 94 season a program to simulate playing out the rest of the season a program to make a simple web page from the results It is currently tested on Ubuntu 12 04 though should probably run on most Unix like systems with a C compiler and Python It can be obtained from github as a zipfile or if you prefer to run git directly then you can use git clone git github com alex1770 footballpredictions git In either case cd into the resulting directory then type make to compile To use in its basic mode use s update 1 which will retrieve the results for the current season for the top five leagues and then simulate them to the end of the season To retrieve results only no processing type for example getresults py 0 This will retrieve all known results for the current season of the Premiership Replacing 0 by 1 2 3 or 4 will give results for the Championship League One League Two and the Conference respectively To retrieve results for a past season type

Original URL path: http://www.archduke.org/stuff/english-football-league-predictions/football-league-prediction-software/ (2016-04-27)

Open archived version from archive - stuff » Harder QUBO instances on a Chimera graph

44 s introduces an element of non planarity so methods that are successful for planar or low genus graphs are of limited benefit here Instead the effect of an external field in the case of C N is to introduce a bias which gives a clue as to which way the spins should point This presumably aids heuristic type searches by decreasing the correlation length After some experimentation it became apparent that another factor making the previous examples easier was the discreteness of the weights in particular restricting to using only pm1 which makes the ground state is highly degenerate That is there are many local minima that happen to be global minima This is cured by choosing J ij from a continuous distribution or in practice a sufficiently finely divided discrete distribution The intra K 44 weights were chosen uniformly from the set 100 99 ldots 99 100 Using more steps than this did not appear to make it much harder No attempt was made to discover if a nonuniform distribution would make it harder still I don t know if D Wave hardware can cope with this kind of precision but this doesn t matter for the purposes of trying to get an upper bound to its capabilities Turning to the last item rebalancing neglecting edge effects there are essentially two types of edge in a Chimera graph the intra K 44 edges and the inter K 44 edges so it s reasonable to suppose that we correspondingly use two different distributions for the weights J rm intra and J rm inter say It s clearly wrong to use either extreme J rm intra gg J rm inter the K 44 s split off into N 2 independent subgraphs or J rm intra ll J rm inter the graph splits into two trivial planes of horizontally connected and vertically connected vertices But it is also presumably wrong to make these distributions the same for then the two semi K 44 s within each K 44 would be more tightly bound than two semi K 44 s are between adjacent K 44 s since there are 16 edges from one semi K 44 to another within a K 44 but only 4 to another semi K 44 from an adjacent K 44 As the weights have mean 0 for maximum glassiness a reasonable hunch is that balance is achieved by making J rm inter about twice as big as J rm intra reasoning from 16 4 1 2 2 Something like this ratio does indeed turn out to generate the hardest examples of the form tried as is seen below So then after various experiments an ansatz of the form J rm intra U 100 99 ldots 99 100 and J rm inter U L L 1 ldots L 1 L U meaning uniform distribution was settled on with the parameter L to be determined by experiment The best search strategy for N 8 was found to be S13 and the best for N 16 was S14 S13 and S14 are variants of the previously described S3 and S4 in which instead of the state being completely randomised every so often only a certain proportion of the nodes are randomised It is interesting that the treewidth 2 strategy S14 is superior to the treewidth 1 strategy S13 for N 16 Previously with N 8 there was no example where using treewidth 2 was an improvement on treewidth 1 but now it appears that was because the problem instances were too easy for the treewidth 2 strategies to overcome their initial overheads The heuristically generated answers here have not all been verified for optimality In both N 8 and N 16 cases the first runs that determine L have not been checked and the final N 16 run at L 220 has not been checked either Or in other words the only answers that have been checked are those from the final run with N 8 at L 200 Doing a proving search at N 16 is beyond the capability of the current exhaustive searcher I hope to upgrade the proving search to cope with these more difficult cases but in the meantime the N 16 results rely for optimality on the assumptions alluded to previously I think there is good reason to believe that the 500 N 16 answers below at L 220 that were searched to 50 independent solutions are in fact optimal A plausibility argument is as follows The heuristic search is generating a stream of local minimum states Every now and again it resets itself so that it does not use any previously discovered information that required non negligible processing Saying that two states are independently generated just means that there was such a reset separating the two finds It is possible they are the same state If you heuristically search until you have generated 50 independent presumed solutions i e 50 states of energy E 0 where E 0 is the minimum energy seen on the whole run then along the way you will find other local minimum states with energies E E 0 For a particular run let n E denote the number of times that a state of energy E independently crops up before a lower energy is discovered There is a technicality explained before whereby the first occurrence of energy E doesn t count Now taking the collection of all n E E E 0 over all the 500 N 16 final runs below we get an empirical process distribution though cut off at 50 n E i Number of occurrences m i 1 309 2 71 3 26 4 21 5 9 6 7 7 3 8 5 9 1 11 1 12 1 13 1 26 1 ge 50 The number of occurrences sums to 456 which is less than 500 because on average there is less than 1 positive n E per run If i doesn t appear in

Original URL path: http://www.archduke.org/stuff/d-wave-comment-on-comparison-with-classical-computers/harder-qubo-instances-on-a-chimera-graph/ (2016-04-27)

Open archived version from archive - stuff » Efficient subgraph-based sampling of models with frustration

of the above pictures is traced in some detail For U 3 in the first picture there will be for each of the two possible values pm1 of S 7 the value Z U 3 given by Z U 3 S 7 sum S 4 S 5 S 6 e beta H U 3 S 4 S 5 S 6 S 7 SC G setminus T where H U 3 contains the contribution to the energy from edges meeting the vertices 4 5 6 of U 3 These edges will join vertices 4 5 6 to each other to vertex 7 and to G setminus T but they can t meet the other vertices of T by construction This is the definition of Z U 3 not the method by which it is calculated A choice of spins C S 7 S 4 S 5 S 6 Moving from the first of the above pictures to the second the vertex 7 is incorporated into U 3 and S 8 is the new boundary vertex It is seen that Delta H H U 3 H U 3 only depends on S 7 and S 8 and spins over G setminus T which we suppress not any other vertices of T So The new value Z U 3 is easily calculated as Z U 3 S 8 sum S 7 pm1 e beta Delta H S 7 S 8 Z U 3 S 7 and the new choice of spins is given by C S 8 begin cases S 7 1 S 4 S 5 S 6 C 1 text with prob Z U 3 S 8 1 Z S 7 1 S 4 S 5 S 6 C 1 text with prob Z U 3 S 8 1 Z end cases where Z pm are the summands in the above expression for Z U 3 S 8 There is a slight extension to the process when the new vertex involves fusing more than one U i in which case the expression for the new Z will involve products of the old Z s In the example above in the transition from the second to third picture Z U 2 is built up of the sum of terms of the form e beta Delta H Z U 2 Z U 3 In practice this algorithm can be easily and compactly implemented with the spin choice being represented by a pointer Some care is needed for a fast implementation since straightforward methods of dealing with the inevitable large numerical range of values involve taking logs exps and random numbers in the inner loop or the loop one level outside it which would be rather slow These can be avoided for example by using look up tables for the exps rescaling the Z values every now and again and storing and judiciously reusing random numbers See Appendix B for details The code used to produce the results below is available from github with notes on how to invoke it in Appendix C Comparison with existing methods 1 In a recent nice paper of Katzgraber Hamze and Andrist the authors study inter alia the critical behaviour of the bimodal J ij pm1 spin glass on the Chimera graph They make a prediction for its universality class and test this by showing that the expectation of the Binder ratio as a function of suitably scaled temperature is independent of the size N of the underlying graph figure 2 upper of Katzgraber For the present purpose this represents an opportunity to compare the subgraph based sampling methods described here with the standard though highly tuned Monte Carlo methods used in the paper There are now two levels of probability space the space of random J ij known as the disorder and for a given choice of J ij the space of spins SC averaging over which can be referred to as a thermal average In Katzgraber they use replica exchange Monte Carlo see e g Hukushima et al which greatly improves convergence in this sort of problem It is a kind of Monte Carlo meta method and can be used on top of another Monte Carlo method Katzgraber uses a standard Monte Carlo sweep as the subroutine For comparison we also use exchange Monte Carlo but instead base it on top of subgraph based sampling The comparison is with p 0 5 N 512 of Katzgraber et al That is the graph is the Chimera graph of order 8 and J ij are IID pm1 For avoidance of doubt I believe there is a slight typo in their paper where the Hamiltonian is given as sum i j 1 N J ij S iS j I m sure that sum i j J ij S iS j was intended otherwise the edge weights would be trimodal 1 0 1 not bimodal as they are meant to be The choice of temperatures used here covers a similar range 0 2 2 to that specified in table 1 of Katzgraber et al 0 212 1 632 The temperature choice here was decided upon by trying to make the exchange probabilities all equal to 0 6 which ended up requiring 25 temperatures for the range 0 2 2 As it turned out these probabilities got a little higher than that for the bottom two temperatures For a given disorder Katzgraber takes two independent random spins SC and SC and defines the spin overlap q 1 N sum iS iS i Then the quantity of interest is its excess kurtosis the Binder ratio g q frac 1 2 left 3 frac q 4 q 2 2 right Here denotes thermal average and disorder average The interest is in the thermal average as it is trivial to sample J ij for the disorder average At very low temperatures assuming the ground state is unique ish you might expect q to take the values close to 1 or 1 according to whether SC happened to hit the same ground state as SC or its negation This would make g q close to 1 At high temperatures S i will be independently pm1 which makes q approx2 1 N B N 1 2 1 approx N 0 1 N and so g q approx0 This is what we see with g q decreasing smoothly at intermediate temperatures The subgraph based method used here was the Gibbs analogue of the method known as strategy 3 here It uses the collapsed Chimera graph of 128 aggregate vertices with 16 spin states each and there is a prescribed set of 32 trees The relevant question in this simulation is how many steps the Monte Carlo takes to equilibrate i e for the associated Markov chain to reach something close to an equilibrium distribution For each disorder a pair of spin states is initialised uniformly at random at each temperature Then R exchange Monte Carlo steps are performed which involves R tree steps for each temperature At that point the states are deemed to be in thermal equilibrium and they are run for R further Monte Carlo steps during which the thermal averages q 2 and q 4 are evaluated This whole process including disorder average was repeated for increasing values of R 250 500 1000 2000 ldots until the final values of g q appeared to stabilise within acceptable error margins up to 0 01 absolute It turned out this required R 1000 i e 1000 tree steps As far as I can gather in terms of elementary operations such a tree step should be approximately comparable to about 100 standard Monte Carlo sweeps when both are optimised making about 100 000 sweep equivalents for equilibration This compares with 2 21 or about 2 000 000 sweeps required in Katzgraber so it appears there might be a significant advantage with this measure However see the discussion below The above graph left shows the results of the method described here applied to one of Katzgraber s example problems N 512 bimodal The right hand graph shows the errors artificially amplified by a factor of 10 since they are too small to be seen clearly at their normal size As can be seen there is good agreement with Katzgraber s results taken from his Fig 2 p 3 In both cases 5964 disorders were used These graphs serve as a check that the method described here is functioning correctly They do not compare performance because that is hidden in the number of equilibration steps required for each disorder Returning to the performance comparison as alluded to above there are some problems with this comparison i we re guessing an equivalence between sweeps and tree step ii we don t know for sure that the accuracy used as a termination criterion is comparable or if Katzgraber s equilibration criterion was more or less severe iii we don t know if Katzgraber s exchange Monte Carlo is tuned in the same way well it won t be exactly but how much does that matter What we d really like to do is construct a comparison where as little as possible is changed between the method based on single site updates and that based on subgraph updates We could then compare the equilibration times required for a given final accuracy and we could be sure about the basis of comparison between the two kinds of times We could also see how both methods scale These considerations inform the next comparison below Comparison with existing methods 2 Due to the possible shortcomings of the above comparison it seems sensible to run a much more careful and controlled comparison of subgraph update based sampling with site update based sampling We choose a particular problem and try to simulate it using versions of these two methods that are as near identical as possible Notation SGU shall mean subgraph update based sampling as described above In the example below the subgraph will be a tree SSU shall mean single site update based sampling In other words the traditional method whereby each spin is updated depending only on its immediate neighbours So to be clear this category includes multispin methods such as described in Isakov et al even though they operate on more than one spin at a time The particular problem set chosen is that of the Chimera graphs of sizes 6 6 8 8 10 10 and 12 12 number of spins 288 512 800 1152 respectively For these graphs the couplings on the edges are chosen uniformly from the 14 possibilities pm1 pm2 ldots pm7 there are no external fields and each spin can be 1 or 1 This mimics the range 7 harder example set from Rønnow et al In fact the actual J ij used were half the values stated above i e chosen from pm frac12 pm frac22 ldots pm frac72 This factor of frac12 means that the energy quantum the smallest possible change in energy due to a change in S i for a given disorder J ij is 1 rather than 2 Of course this scaling factor doesn t fundamentally change anything because you can always rescale beta but stating it allows the reader to interpret the numbers in what follows where we mention specific values of beta and maximum allowable absolute errors in energy To make a good comparison we would ideally compare the best SGU based method against the best SSU based method in so far as that makes sense Of course we don t necessarily know the best methods but we can use an approximation to the best known methods Here we use exchange Monte Carlo in both cases tuned with slightly different parameters The temperature sets used in both cases are fixed Probably this could be improved by using an adaptive method whereby the temperature set evolves according to the needs of the problem instance This would be something to try in a further investigation Before describing the exchange Monte Carlo setup it should be mentioned that if one attempts to do without it and equilibrate by simply doing updates at a fixed low temperature then convergence is extremely slow for both SGU and SSU SSU behaves particularly poorly scaling much worse than SGU and having a higher effective critical temperature below which problems are effectively intractable Of course this isn t a big surprise and it is more interesting to compare better methods such as the exchange Monte Carlo considered here Exchange Monte Carlo setup For each problem size 6 6 8 8 10 10 12 12 we choose 100 random disorders and for each of these we determine the time required for the SGU and SSU based methods to have equilibrated to a reasonable accuracy The principal statistic comparing SGU with SSU is simply the respective totals of these times though other comparisons are considered It may be argued that fixing the required accuracy for each disorder does not match the likely underlying objective which is to get a good expectation over disorders It may be for example that it is more efficient to spend less time trying to equilibrate the most difficult disorders allowing them to be less accurate on the grounds that they will be compensated for by the majority of more accurate disorders I do not believe this kind of optimisation would make a large difference because a by the nature of the exponential decay of the subleading eigenvalue of the Monte Carlo iteration there should be a fairly sharp change from an inaccurate result to an accurate one as the number of equilibration steps crosses the characteristic threshold for that particular disorder That means that you can t afford to use very much less than the proper number of equilibration steps otherwise the results would be very inaccurate and pollute all the others and b scrutinising the actual results though there are certain disorders that are considerably harder than the others these still don t represent an overwhelming proportion of the total equilibration steps expended over all 100 disorders The set of temperatures is determined by fixing an effective absolute zero at beta beta 20 and aiming for a constant transition acceptance rate 0 25 between neighbouring beta s The value beta 20 is sufficiently large for the class of disorders considered here that there is only a negligible chance of a state at this temperature not being in the ground state This article on Exchange Monte Carlo by Katzgraber describes such a constant acceptance rate as in general not too bad but not optimal The temperature set is determined in advance of the actual simulation using a method based on E and C V The acceptance rates during actual simulations match the target value reasonably well almost always lying between 0 2 and 0 3 Having fixed the top beta and having fixed the spacing between the beta s by deciding on the transition acceptance rate the remaining parameter to be decided on is the minimum beta or equivalently the number of temperatures This is determined by trying a range of different possible values on a trial set of disorders and seeing which number of temperatures requires the fewest steps on average to equilibrate It is found that SGU requires slightly fewer temperatures than SSU for a given problem size thus SSU will end up with some bonus information about high temperature states However it is not given credit for this and all that is compared is the time required to estimate the ground state energy and so presumably low temperature states to a given accuracy The justification for this is that it is assumed that the main interest lies in colder temperatures strong coupling high beta since higher temperatures are relatively easy to simulate using any sensible method Full details of the sets of temperatures used are in Appendix A Determination of Equilibration Equilibration is determined using the following method Starting from a set of uniformly random states at each temperature n Monte Carlo steps are performed Each such step consists of doing a single temperature step at each temperature and then attempting to perform exchanges After that n more steps are performed during which any observables can be measured and the energy at beta is averaged to make a sample value E 1 This whole process is repeated 25 times starting each time from random initial states and the average value E E 1 ldots E 25 25 is formed If this is within a chosen threshold taken to be 0 2 here of the smallest energy observed at any point so far E text min then it is deemed that n steps are sufficient for equilibration It is possible to simultaneously test for m being sufficiently large for all m n in not much more time than it takes just to test n itself This procedure relies on a number of assumptions 1 that E text min is the actual ground state energy Empirical evidence strongly suggests that it is but for the purposes of comparison it may not matter too much if it isn t provided that the same value of E text min is used for both SSU and SGU and this can be checked 2 that beta is the hardest value of beta to equilibrate and the states at other temperatures will have equilibrated if the state at beta has If this turns out not to be an accurate assumption then at least SSU and SGU are being compared on an equal basis In any case it is assumed that the lower temperatures are the objects of interest and the higher temperatures are a means to this end 3 That the number of restarts 25 is sufficiently large to get a good estimate of the required number of equilibration steps The number 25 was chosen to limit the overall computer time to a week or so but in fact it is on

Original URL path: http://www.archduke.org/stuff/d-wave-comment-on-comparison-with-classical-computers/efficient-tree-based-sampling-of-models-with-frustration/ (2016-04-27)

Open archived version from archive - stuff » D-Wave: scaling comparison with classical algorithms

actually want optimising Without such a use case in mind it s hard to know whether the uniform couplings are representative but they do at least generate difficult problem instances In the Prog QUBO graphs above for N ge 648 the optimal values ground state energies were not verified using a proving solver It is therefore possible that some of these presumed optimal values are not actually optimal which would change the graphs though for what it s worth I believe that all the presumed optimal values are in fact optimal for reasons given in previous posts Comment RSS Trackback 6 Comments Sol Warda says 23 June 2014 at 15 07 Hello Alex I wonder if you have seen this latest study of benchmarking the DW2 against some optimised software including yours by Dr Itay Hen of USC s ISI at a recent conference on Adiabatic Quantum Computation Thanks http www isi edu sites default files top level events aqc2014 day4 talk4 Hen pdf Alex Selby says 24 June 2014 at 01 20 Yes I saw that and it looks very interesting By the way the author informs me that we should disregard the slides after p 46 as these were not meant to be included in the upload and may possibly be erroneous Joshua Job says 24 June 2014 at 03 58 Hello Alex Do you find strategy 13 scales better than plain strategy 3 Also how are you getting these 99 confidence timings as I don t see any options in the code on github I ve modified my local copy to time stabletreeexhaust directly We have found that using precise timing of the stabletreeexhaust function to estimate the actual 99th percentile of the times to solution doesn t actually match the expectation given from the assumption that the times for a single instance are Poisson distributed It s close but not quite right and varies from instance to instance Alex Selby says 24 June 2014 at 20 27 Hi Joshua Yes strategy 13 appears to scale a bit better than strategy 3 They are similar for N 392 or so but S13 is better for larger N Did you mean to talk about the times to solution TTS being exponentially distributed which would make the number of solutions found in a time interval Poisson You are right to question this and I think the results above should be adjusted slightly as mentioned below though I don t think it will make that much difference I found that these times fitted an exponential distribution well apart from very small problem sizes N 72 where it wasn t such a good fit Not so surprising for larger N the chance of any one attempt at finding the solution is small so a solution is the result of many quasi independent attempts at finding a rare thing On the exponential Poisson assumption the 99th percentile time should be log 100 TTS 4 6 TTS which is what I used for

Original URL path: http://www.archduke.org/stuff/d-wave-comment-on-comparison-with-classical-computers/d-wave-scaling-comparison-with-classical-algorithms/ (2016-04-27)

Open archived version from archive - stuff › Log In

stuff Username Password Remember Me Lost your password Back to stuff

Original URL path: http://www.archduke.org/stuff/wp-login.php (2016-04-27)

Open archived version from archive - stuff

subgraph based sampling of models with frustration D Wave scaling comparison with classical algorithms Not Found Sorry no posts matched your criteria Pages About D Wave comment on comparison with classical computers D Wave scaling comparison with classical algorithms Efficient subgraph based sampling of models with frustration Harder QUBO instances on a Chimera graph English football league predictions Description of predictor Software download and use Categories No categories Archives Blogroll

Original URL path: http://www.archduke.org/stuff/ (2016-04-27)

Open archived version from archive