BABYL OPTIONS: -*- rmail -*- Version: 5 Labels: Note: This is the header of an rmail file. Note: If you are seeing it in rmail, Note: it means the file has no messages in it.  1, forwarded,, Summary-line: 8-Sep to: SNIPES@psfc.mit.edu [319] #Notes on Cross-Validation estimates of uncertainties Mail-from: From hammett@pppl.gov Wed Sep 8 11:44:16 1999 Received: from orion.pppl.gov (IDENT:hammett@orion.pppl.gov [198.35.4.73]) by pppl.gov (8.9.2/8.9.2) with ESMTP id LAA08852; Wed, 8 Sep 1999 11:44:16 -0400 (EDT) Received: (from hammett@localhost) by orion.pppl.gov (8.9.2/8.9.2) id LAA04621; Wed, 8 Sep 1999 11:44:16 -0400 (EDT) Date: Wed, 8 Sep 1999 11:44:16 -0400 (EDT) Message-Id: <199909081544.LAA04621@orion.pppl.gov> From: Greg Hammett To: SNIPES@psfc.mit.edu Subject: Notes on Cross-Validation estimates of uncertainties Reply-to: Hammett@pppl.gov Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Length: 15461 *** EOOH *** Date: Wed, 8 Sep 1999 11:44:16 -0400 (EDT) From: Greg Hammett To: SNIPES@psfc.mit.edu Subject: Notes on Cross-Validation estimates of uncertainties Reply-to: Hammett@pppl.gov Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Hi Joe, I hope the following provides some help. First some references, and then my notes on cross-validation estimates of uncertainties (applied to tau_E). Refs on the standard error estimate for extrapolating linear regressions are given in Chap II.6 (Sec. 2.6.5) of the ITER physics basis document. In particular it says, "As explained in 2.6.24, it is based on error propagation from the centre of gravity of the data to the ITER operating point. A geometrical interpretation yielding a simple summation of projectsion in principal axes was derived in [2.6.2], and a linearized projection formula around a standard operating point in [2.6.26]." 2.6.24: O. Kardaun and A. Kus, Basic Probability and Statistics for Experimental Plasma Physics, IPP 5/68 (1996). 2.6.2 Christiansen, Cordey, Tohmsen, et.al. (incl. Kardaun), "Global Energy Confinement H-mode database for ITER", Nucl. Fus. 32 (1992) 291-338. 2.6.26 O. Kardaun, "Interval Estimation of Global H-mode ITER Energy Confinement in ITER", 1997, to be published (PPCF). **************************************************************** Section on Confidence Interval Estimation for ITER Performance, for the ITER Physics Basis Article *************************************************************** Draft by Greg Hammett, Oct. 27, 1997 Summary ******* By some measures, DB3 is an improvement over DB2 with a somewhat reduced uncertainty in extrapolating to ITER. For example, the distance of extrapolation from the center of gravity of DB3 to ITER is 7.2 standard deviations, while it had been an extrapolation of 12 standard deviations from DB2. However, there are still significant systematic tokamak-to-tokamak variations in the database which are not well understood, and cross-validation tests estimate a 1-sigma uncertainty for ITER's log(tau_E) of +-0.24 to +-0.37, depending on how the results are weighted. These correspond to 95% confidence intervals in ITER's H_H of 0.62-1.62 or 0.48-2.08. It may be possible to reduce these uncertainties in the near future if a better physics-based understanding of these systematic variations can be developed and controlled for. The ignition margin for ITER might also be improved if a better physics-based understanding can be developed of recently discovered methods of reducing turbulence and improving tokamak confinement, to understand how they might scale to a larger device. These are estimates of the statistical uncertainties only, and there are potentially other sources of uncertainty such as hidden variables (like edge conditions, atomic physics, divertor geometry, density and current profiles, sheared flows, Ti/Te, ELM severity, etc.) that might vary systematically in present tokamaks or in the extrapolation from present tokamaks and ITER. There are also uncertainties in the functional form of tau_E (such as nonlinear behavior near various operational limits in density, the H-mode power threshold, beta, beta/rho_*, etc.). The uncertainty in the H-mode power threshold is also not accounted for in the estimates presented here. Introduction ************ The standard result for the uncertainty in the extrapolation of a regression formula, assuming that the function form is correct and that all deviations of the data from the fit are random uncorrelated (independent) errors in the dependent variable, is: sigma_ITER = sigma_fit/sqrt(N_eff)*(1+lambda**2)**0.5 (Eq. 1) where sigma_ITER is the uncertainty in the mean tau_E extrapolated to ITER, sigma_fit is the reduced RMS error of the fit to the present database, lambda is the distance being extrapolated from the center of the present database to ITER (lambda is measured in units of standard deviations of the independent variables in principal component form), and N_eff is the effective number of independent measurements in the database. The problem of estimating the error propagation when the errors in the database are statistically non-ideal (i.e., exhibit systematic correlations of various kinds) can be complicated and various methods to try to deal with specific situations have been developed in general statistics research (for example, see Mosteller and Tukey or Efron and Gong). Some of these methods have been applied specifically to fusion confinement scalings (see Riedel and Kaye, Kardaun, and Hammett et.al.). Note that non-ideal correlated errors can be due to several causes, and not necessarily just systematic measurement errors in tau_E between tokamaks. There can also be physical effects which make the true tau_E vary systematically between tokamaks (such as differences in how effective various divertor configurations or wall conditions are in controlling edge neutral density, systematic variations in density, rotation, heating, and current profiles, Ti/Te, ELM type and severity, etc.). Similarly, non-ideal errors result if the true functional form is not log-linear, and some tokamaks tend to operate in different regions of parameter space where the true tau_E form varies systematically from the approximate fitting function (such as some tokamaks operating closer to various limits in density, beta, rho_star). Cross-Validation Tests ********************** The main idea of cross-validation is to test uncertainty estimates like Eq. 1 by doing a fit to a subset of the existing database, and observing how well it does at predicting the rest of the data that it wasn't fit to. In the following table, we summarize the results of a cross-validation test using the 11 tokamaks in the DB3 ELMy database. One tokamak at a time is dropped from the database, and a log-linear fit (in the standard 8 engineering variables) is done to the remaining 10 tokamaks. The resulting regression formula for tau_E is then used to predict the data from the tokamak which was dropped. For example, from the following table we see that the RMS error of predicting JET data from a fit to the other tokamaks is 24.8% (or more precisely, the RMS error in predicting JET's log(tau_E) is 0.248), and the JET tau_E data is on average 18.2% high compared to these predictions. The RMS error of the fit to the other 10 tokamaks (excluding JET) was 15.7%. The RMS extrapolation to the JET datapoints, from the database excluding JET, is 5.4 standard deviations. The number of observations excluding JET is 1092 (=1398-306). Even reducing this by a factor of 4 and using N_eff=1092/4, we see that Eq. 1 predicts that the uncertainty in predicting the mean of the JET data should be only 5.2%. This is significantly smaller than the actual error of predicting JET, and is evidence that there are non-ideal errors of some sort. Table I. Cross-validation test of predicting each tokamak from a fit to the other 10 tokamaks. Dropped RMS error Avg err RMSE of # of RMS Tokamak on dropped on dropped fit Dropped Extrapolation sigma_drop avg_drop sigma_fit N_drop lambda ASDEX 17.0804 -10.24018 16.5053 431.000 3.93410 D3D 17.2394 -4.71607 15.6707 270.000 3.98883 JET 24.7698 18.1938 15.7154 306.000 5.41192 JFT2M 10.7039 7.43903 16.0517 59.0000 5.50338 PBXM 51.1138 48.5839 14.8946 59.0000 7.60707 PDX 23.9204 14.2842 15.3481 97.0000 4.24153 AUG 15.0425 -6.65805 15.9078 102.0000 2.00001 C_MOD 50.5558 49.5839 15.2740 37.0000 8.95939 JT60U 33.7669 -33.5355 15.6528 9.00000 3.33234 COMPASS 28.8206 -23.1805 15.6700 17.0000 6.84810 TCV 48.9837 -45.7694 15.3400 11.0000 4.22000 Weighting the first column of numbers by the number of data points for that tokamak (the 4th column), we find that the average RMS error of predicting a tokamak (when that tokamak was not included in the fit) is 24%. Weighting the first column of numbers equally (giving equal weight to each tokamak), then the average RMS error of predicting a tokamak is 33%. The second column is the average (i.e., mean) error (instead of the root-mean-squared error) of predicting a tokamak, and it is rather large, suggesting that there are significant systematic tokamak-to-tokamak variations for some reason. To be more quantitative, note that the uncertainty in predicting data from a dropped tokamak, sigma_drop, should be due to the uncertainty in the mean (as given by Eq. 1) plus the expected scatter of data around that mean (which is assumed to be the same as sigma_fit): sigma_drop**2 = sigma_mean**2 + sigma_fit**2 We will use Eq. 1 for sigma_mean, multiplied by a coefficient c to account for the increased uncertainty due to non-ideal errors: sigma_drop**2 = sigma_fit**2 + c*sigma_fit**2/N_fit*(1+lambda**2) [The coefficient c can be thought of as a measure of correlations in the data, so that the effective number of independent (uncorrelated) measurements is modeled as N_eff=N_fit/c. However, the structure of the systematic errors may be complicated and not correspond to a simple model of a single N_eff. So, alternatively, c can be thought of as just a measure of how much larger the actual errors are than Eq. 1 indicates, based on the empirical test of predicting each existing tokamak using a fit to the other existing tokamaks.] Everything in this equation is measured (sigma_drop is from column 1 of the table, sigma_fit is column 3, N_fit=1398-N_drop, where N_drop is from column 4, and lambda is column 5). Averaging the results over each of the dropped tokamaks: = + c* (Eq. 2) This equation can now be solved for c. If each column in Table 1 is weighted by the number of data points contributed by that tokamak, then N_eff=N_tot/c=24. Using this in Eq. 1, together with sigma_fit=15.82 for the fit to the full DB3 database, and lambda=7.23 for the extrapolation to ITER, gives a 1-sigma uncertainty in ITER's mean log(tau_E) of +-0.24. This appears to be an improvement over the previous DB2 database, where a similar analysis gave a 1-sigma uncertainty of +-0.33. However, if each tokamak is weighted equally when calculating the averages in Eq.2 for DB3, then the result is N_eff=10 and the resulting uncertainty is +-0.37. In summary, the 1-sigma uncertainty in ITER's mean log(tau_E) is in the range of +-0.24 to +-0.37, depending on how the cross-validation results are weighted. This corresponds to 2-sigma 95% confidence intervals in H_H of 0.62-1.62 or 0.48-2.08. Sensitivity to Data Selection and Weighting ******************************************* There is some sensitivity in the results to the choice of data selection and weighting, which is illustrated by the following table. Table II. Effect of Data selection and weighting on ITER prediction Data Set Weighting H_H for ITER a_wgt Standard DB3 0.0 1.0 -Ohmic, -ECH 0.0 0.82 -Ohmic, -ECH 0.5 0.74 -Ohmic, -ECH 1.0 0.63 (Other results could go in this table also.) H_H=tau_E/tau_E_ref, where tau_E_ref=6 seconds is the reference case. Each observation is weighted by the factor 1/N_obs_tok**a_wgt, where N_obs_tok is the number of observations from that tokamak. Thus a_wgt=0.0 corresponds to equal weighting for each shot in the database, a_wgt=1.0 corresponds to equal weighting each tokamak in the database, and a_wgt=0.5 is in between. DB2 concentrated on NBI-heated H-modes only, while DB3 has been expanded to allow Ohmic, ICRF, ECH-heated H-mode data. The Ohmic and ECH data come from the 28 shots contributed by the Compass and TCV tokamaks. While some of the characteristics of ohmic and ECH H-modes are like regular NBI-heated H-modes, there may be some differences which suggest caution. For example, Ti/Te < 0.5 in some of the ECH (and ohmic, as I recall?) discharges, while the NBI heated discharges tend to have Ti/Te in the range of 1 to 1.5?? (need to check this number), and ITER will have Ti/Te~1 due to its high density. The database itself suggests that there are some systematic differences between these Ohmic and ECH-heated H-modes and the other H-modes, as indicated by the last two lines of Table I or by the first two lines of Table II, which indicate that that when these 28 Ohmic/ECH shots (2% of the shots in the full DB3 database) are removed from the database, the predicted tau_E for ITER falls by 18%. As the weighting is then varied from equal-shot weighting to equal-tokamak weighting, the confinement time drops significantly further. On the other hand, dropping the C-MOD data in addition to the Compass and TCV data would raise ITER's tau_E back up. Table I indicates that C-Mod also appears to have significant differences from the other tokamaks for some reason. It is an ICRF-heated machine with some differences in its wall and divertor configuration, and with somewhat different ELM characteristics. However, by some measures, C-Mod's H-modes appear very standard, with confinement times about twice L-mode, and C-Mod's L-mode results are quite close to standard L-mode scalings. We do not mean to single out C-mod data here, since Table I indicates that many of the tokamaks seem to have systematic variations from what one would expect based on fits to the other tokamaks. A related issue, which has been noted in previous papers (Riedel et.al., Christiansen et.al., Goldston et.al. ) is that there are statistically significant differences between tokamaks in their scalings of tau_E with some variables (such as density, I_p, ...), making the construction of an overall scaling more uncertain. The examples in Table II and elsewhere show that there is some sensitivity in the ITER prediction to the choice of data selection and weighting. Further work to understand some of the systematic variations in performance observed in different tokamaks or in different operating regimes could help to reduce the uncertainties. Sensitivity to Various Nonlinear Functional Forms ************************************************* A table of results from various nonlinear functional forms would go here. (Kardaun's paper explores several nonlinear function forms for DB2, and I understand the Dorland suggested several possible new ones for DB3 to Kardaun, and Kardaun has been investigating them.) Refs.: Kardaun 97, the draft paper Kardaun has recently circulated. Hammett, Dorland, Kotschenreuther, "Estimating Statistical Uncertainties in Global Energy Confinement Scalings", manuscript in preparation. B. Efron and G. Gong, "A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation", The American Statistician 37, 36 (1983). F. Mosteller and J.W. Tukey, "Data Analysis and Regression", Addison-Wesley Publishing, Reading,Massachusetts, 1977. J.P Christiansen, J.G. Cordey, K. Thomsen, et.al., "Global Energy Confinement H-Mode Database for ITER", Nucl. Fus. 32, 291 (1992). K.S. Riedel, S.M. Kaye, "Uncertainties associated with extrapolating L-mode energy confinement to ITER and CIT", Nucl. Fus. 30, 731 (1990). R.J. Goldston, R.E. Waltz, et.al., "Burning Plasma Experiments Physics Design Description, III. Confinement", in Fusion Technology 21, 1067 (1992).  1, forwarded,, Summary-line: 8-Sep to: SNIPES@psfc.mit.edu [15] #cross-validation Mail-from: From hammett@pppl.gov Wed Sep 8 11:58:57 1999 Received: from orion.pppl.gov (IDENT:hammett@orion.pppl.gov [198.35.4.73]) by pppl.gov (8.9.2/8.9.2) with ESMTP id LAA10099; Wed, 8 Sep 1999 11:58:56 -0400 (EDT) Received: (from hammett@localhost) by orion.pppl.gov (8.9.2/8.9.2) id LAA05869; Wed, 8 Sep 1999 11:58:56 -0400 (EDT) Date: Wed, 8 Sep 1999 11:58:56 -0400 (EDT) Message-Id: <199909081558.LAA05869@orion.pppl.gov> From: Greg Hammett To: SNIPES@psfc.mit.edu Subject: cross-validation Reply-to: Hammett@pppl.gov Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Length: 305 *** EOOH *** Date: Wed, 8 Sep 1999 11:58:56 -0400 (EDT) From: Greg Hammett To: SNIPES@psfc.mit.edu Subject: cross-validation Reply-to: Hammett@pppl.gov Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII As I recall, the description of "Cross-validation" in Otto's recent paper PPCF 41, 429 (1999) is incorrect. Also, he excluded the case of droppping JET his jackknife. In the original DB2 this was a huge effect, although in DB3 (which I used in the notes I just sent you) it is less of an effect. --Greg  1, answered, forwarded,, Summary-line: 8-Sep to: SNIPES@psfc.mit.edu [53] #standard regression uncertainty estimate Mail-from: From hammett@pppl.gov Wed Sep 8 14:58:10 1999 Received: from orion.pppl.gov (IDENT:hammett@orion.pppl.gov [198.35.4.73]) by pppl.gov (8.9.2/8.9.2) with ESMTP id OAA29112; Wed, 8 Sep 1999 14:58:09 -0400 (EDT) Received: (from hammett@localhost) by orion.pppl.gov (8.9.2/8.9.2) id OAA19185; Wed, 8 Sep 1999 14:58:09 -0400 (EDT) Date: Wed, 8 Sep 1999 14:58:09 -0400 (EDT) Message-Id: <199909081858.OAA19185@orion.pppl.gov> From: Greg Hammett To: SNIPES@psfc.mit.edu Subject: standard regression uncertainty estimate Reply-to: Hammett@pppl.gov Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Length: 2284 *** EOOH *** Date: Wed, 8 Sep 1999 14:58:09 -0400 (EDT) From: Greg Hammett To: SNIPES@psfc.mit.edu Subject: standard regression uncertainty estimate Reply-to: Hammett@pppl.gov Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Hi Joe, I wrote up some notes on the standard method for estimating uncertainties in a regression formula, in http://w3.pppl.gov/~hammett/work/1999/stderr.pdf I'm sure there are many standard text books that derive this stuff besides Otto's tutorial. I learned this stuff many years ago in intro and advanced statistics courses I took in college (I also had a part-time job doing some econometrics and business statistics in college), but Otto's tutorial was a convenient recent place I had seen it all written down. Getting these standard uncertainty estimates is a first step to doing cross-validation. By the way, cross-validation is used in AI-circles to test how good a neural net is. Just like in regression with the dangers of "overfitting", it can be misleading to only ask what the errors are in a neural net on the same data it was trained on. It is better to train the neural net on one set of data, and then see how big the errors are in using that neural net to predict a new set of data... Cross-validation is a well-known technique among statistics experts. Simple text books tend to focus on the ideal case where all of the errors are uncorrelated and uniform (independent and identically distributed in their parlance), because then a lot of the math can be done easily and straightforwardly to get the main concepts across. Once one has to face the problem that in reality there are often systematic or correlated errors in the data, so the number of truly independent observations is not just N, things get much more complicated. The way in which systematic errors are correlated becomes highly dependent on the problem at hand (all of the masurements in instrument A might be 10% high relative to the measurements in instrument B, or doctor B might tend to give more optimistic evaluations of his older patients than doctor A does, etc.) So different methods to deal with different types of non-idealized errors have been developed for different applications. One of the best introductory books I found that talks about non-ideal errors and some methods to try to deal with them is F. Mosteller and J.W. Tukey, "Data Analysis and Regression", Addison-Wesley Publishing, Reading, Massachusetts, 1977. Let me know if you have any more questions. Greg