R&D and Productivity in OECD Firms and Industries: A Hierarchical Meta-Regression Analysis

Effects of R&D investment on frim/industry productivity have been investigated widely thanks to pioneering contributions by Zvi Griliches and others in late 1970s and early 1980s. We aim to establish where the balance of the evidence lies and what factors may explain the variation in the research findings. Using 1,258 estimates from 65 primary studies and hierarchical meta-regression models, we report that the average elasticity and rate-of-return estimates are both positive, but smaller than those reported in prior narrative reviews and meta-analysis studies. We discuss the likely sources of upward bias in prior reviews, investigate the sources of heterogeneity in the evidence base, and discuss the implications for future research. Overall, this study contributes to existing knowledge by placing the elasticity and rate-of-return estimates under a critical spot light and providing empirically-verifiable explanations for the variation in the evidence base.


Introduction
The relationship between research and development (R&D) investment and productivity has been a subject of major interest for researchers and policy makers for a long time. The pioneering work is that of Minasian (1969) and Griliches (1973) on R&D and productivity; and Terleckyj (1974) on rates of return to R&D. The empirical work has expanded significantly after Griliches (1979), who has articulated a lasting framework for the range of measurement, modeling and estimation issues encountered in empirical work.
The work tended to follow the so-called primal approach, which consists of a Cobb-Douglas production function augmented with R&D (knowledge) capital in addition to physical capital and labour. A smaller number of studies have adopted a dual approach, which draws on a system of factor demand equations and cost-function representations of technology. We synthesize the evidence from the primal approach only because the dual-approach studies are not only small in number and but also more heterogeneous in their model specifications.
Several narrative reviews of the extant literature exist. Of these, Mairesse and Sassenou (1991) and Mairesse and Mohnen (1994) review the literature on R&D and productivity at the firm and industry levels, respectively. Hall (1996) focuses on rates-of-return estimates, differentiating between private and social returns to R&D. A recent and comprehensive review by Hall et al. (2010) provides an authoritative account of the analytical, measurement and estimation issues that characterise the research field. Finally, two meta-analysis studies by Wieser (2005) and Moen and Thorsen (2013) provide meta-regression evidence on productivity and rates-ofreturn estimates.
We have identified a number of issues that justify a novel attempt at synthesizing the rich evidence base and explaining the sources of heterogeneity therein. First, existing narrative and quantitative reviews tend to present summary measures based on 'representative' or 'preferred' estimates and as such call for more effective use of all available information. More importantly, however, the evidence base may be contaminated with publication selection bias. To the extent that this is the case, summary measures based on 'preferred' or 'representative' estimates are inappropriate for inference.
The second issue relates to sampling bias in existing reviews, which follow a cascading approach that updates the list of studies covered in preceding reviews. This approach may allow for replication and extension as methods of verification, but the absence of explicit criteria for including or excluding primary studies may limit the representativeness of the sample and the generalizability of the offered synthesis.
The third issue relates to how existing reviews quantify the effects of moderating factors on the variation among primary study estimates. The narrative reviews rely on 'vote counting' for deciding whether a moderating factor (e.g., the estimation method, the measure of inputs or output, type of R&D or firms/industries, etc.) is associated with systematically larger or smaller productivity estimates. Of the meta-analysis studies, Wieser (2005) controls for a number moderating factors within a multiple meta-regression framework, but his sample consists of only 52 observations chosen from 17 primary studies.
To address these issues, we utilise 1,258 estimates from 65 studies and follow the best-practice recommendations for meta-analysis . The remainder of this article is organised in three sections. Section 2 provides an overview of the analytical and empirical issues that characterise the research field. In section 3 we present the meta-analysis strategyincluding the study search and selection criteria and the meta-regression methodology. In section 4 we present meta-regression estimates, based on 440 elasticity estimates in the level dimension, 468 elasticity estimates in the temporal dimension; and 350 estimates for rates of return. The sample consists of primary studies using OECD firm or industry data, and published in English between 1980 and July 2013.
We focus on OECD firm/industry studies for three reasons. First, differences in data quality are less likely to be a source of unobserved heterogeneity as the definition and collection of R&D data in OECD countries has been harmonised substantially since 1963. Second, primary studies on EOECD firms/industries account for more than 80% of the existing evidence base. Finally, there are multiple studies per country over time and this provides scope for investigating whether productivity and rates-of-return estimates have varied over time and across OECD countries.
We report that the average elasticity and rate-of-return estimates are both positive, but smaller than those reported in prior narrative reviews and meta-analysis studies. We discuss the likely sources of upward bias in prior reviews and investigate the sources of heterogeneity in the evidence base. Our findings also indicate that there is room for innovation in future research in several areas, including the modelling and estimation of the rates of return on R&D, the measurement of the spillover pool, the relationship between R&D intensity and market power, and the separation of public and private R&D in productivity estimations.

Analytical and empirical dimensions of the research field
Primary studies on R&D and productivity usually draw on a Cobb-Douglas production function, augmented with knowledge (R&D) capital. Assuming perfect competition in factor markets and separability of the conventional inputs (physical capital and labour) from knowledge (R&D) capital, the production function can be stated as: Here, Y is deflated output (sales or gross output or value-added); C is deflated physical capital stock; K is deflated knowledge capital; L is labour (number of employees or hours worked); λ is rate of disembodied technological change; and A is a constant. Taking natural logarithms and using lower-case letters to denote logged values, t to denote time and i to denote firm or industry, the empirical model can be written as: The log of technical progress ( ) yields a firm-or industry-specific effect (ηi) and a time effect (λt). In (2a), returns to scale are assumed as constant. However, this assumption can be relaxed and returns to scale can be tested explicitly by subtracting logged labour from both sides of (2a).
Here, μ = α+β+γ and implies constant returns to scale if μ =1; but variable returns to scale otherwise. The coefficient of interest in both (2a) and (2b) is ( ) -the output elasticity with respect to knowledge capital.
Usually, the R&D capital (K) is constructed with the perpetual inventory method (PIM), assuming a growth rate of 5% for R&D investment prior to initial year and a depreciation rate of 15%. 1 The consensus in the literature is that assumed rates of growth or depreciation do not alter the elasticity estimates (Hall and Mairesse, 1995;Bartelsman et al, 1996;Verspagen, 1995;Harhoff, 1994;Hall and Mairesse, 1995;Bartelsman et al., 1996). 2 Therefore, we do not control either for growth or depreciation rates as potential sources of variation in the evidence base. Nevertheless, PIM's appropriateness for constructing R&D capital has been debated widely (see, for example, Klette, 1991;. Several contributors indicate that development of novel methods in this area constitutes promising avenues for future research (Griliches, 1979;Bitzer and Stephan, 2007;and Hall et al., 2010). Therefore, we investigate whether estimates based on other methods differ systematically from those based on PIM.
The second contentious issue is whether elasticities or rates of return should be equalised between firms/industries. Assuming elasticity equalisation overlooks the possibility that firms may choose different factor shares depending on the competitive equilibria they are faced with. 1 The growth rate for R&D investment can also be calculated from the R&D series over a period of τ years prior to the initial year if the R&D series is sufficiently long. The depreciation rate of 15% is informed by findings in a number of some studies, which range from 10% to 36% (Bosworth, 1978;Klette, 1994;Pakes and Schankerman, 1984;Hall, 2005). 2 Assumed depreciation rate is not relevant when rates of return are estimated because the latter are based on R&D intensity rather than R&D capital.
Hence, a substantial number of contributors assume rates-of-return equalisation, which is more compatible with the assumption of competitive markets.
To obtain rates-of-return estimates, model (2a) is expressed in first-difference, yielding: Note that the firm-or industry-specific fixed effect term ( ) has disappeared and the time effect is now a growth-rate effect relative to initial observation rather than a level effect. Assuming that the depreciation rate ( ) is sufficiently close to zero and recalling that the elasticity of output with respect to R&D capital is given by = ( can be rewritten as (3b) below, where is the gross rate of return on R&D investment and Model (3b) allows for estimating gross rates of return on R&D investment directly, using output growth. Direct rates of return can be estimated also by using total factor productivity (TFP) growth by subtracting conventional inputs (physical capital and labour) from both sides of (3b): Alternatively, R&D rates of return can also be obtained indirectly, using the definition of the R&D elasticity ( ). 4 A small number of studies report indirectly-measured rates-of-return estimates. We included such estimates in the meta-analysis only if they were reported together with their standard errors.
Differences in econometric specification are significant sources of variation in the evidence base. For example, the variables may be expressed in levels or first-difference; the production function may or may not be augmented with a measure of spillovers or with time-industry dummies; and different estimators may be used. 3 By definition, the elasticity of output with respect to R&D capital is = ( is the marginal productivity of R&D capital, (3a) can be re-written as: ∆ = ∆ + ∆ + ∆ + ( ) ⁄ ∆ + ∆ . Then the term for knowledge capital simplifies as follows: depreciation ( ) is close to zero. To highlight the difference between elasticity estimates based on level and first-differenced data, consider the total error in 2a or 2b, which is: Estimating 2a or 2b with level  data is feasible if one assumes  and  are constant across all units and time periods, respectively. Another approach would be to maintain that assumption about fixed-effects ( ) but eliminate the time effects by estimating the model for each period or by averaging over a time period. In all of these cases, the elasticity estimates ( ) are in the level dimension as they reflect cross-sectional variation in the levels of R&D capital and other inputs.
Alternatively, (2a) or (2b) can be estimated by first-differencing or by using a within estimator that utilises deviations from the mean. In both cases, the unit-specific fixed effect ( ) disappears and the elasticity estimates are referred to as elasticities in the temporal dimension.
Elasticity estimates in both dimensions will be consistent and similar if (2a) and (2b) are specified correctly and the variables are free of measurement errors. In practice, however, the model is estimated with different control variables; and measurement errors cannot be ruled out. Therefore, and in accordance with the existing practice, we analyse the elasticity estimates in the level and temporal dimensions separately.
Unlike elasticities, rates of return are difficult to interpret for two reasons. When estimated indirectly, they could be interpreted either as a risk premium or a supra-normal rate of profit on R&D investments (see, Griliches, 1979;Schankerman, 1981;and Hall et al., 2010). However, Griliches (1980a: 389) points out that this interpretation is valid only when the elasticity estimate ( ) used to derive them is in the level (as opposed to temporal) dimension. Secondly, the rates of return reported in primary studies (estimated directly or indirectly) measure private returns only, which may be much smaller than social returns in the presence of externalities (spillovers). In this meta-analysis, we analyse the gross private rates of return only, but we control for whether the latter differ between studies depending on whether spillovers are controlled for as an additional source of productivity.
Model specification and estimation methods are additional sources of heterogeneity in the evidence base. For example, some studies control for spill-over effects ( (Bartelsman et al., 1996;1980b;Griliches and Lichtenberg, 1984;. Secondly, most of the primary studies use a standard Cobb-Douglas production function but a minority utilizes a translog version of the latter (e.g., Cameron et al., 2005;Lehto, 2007;and Smith et al., 2004). Third, some studies control for endogeneity via instrumented variable techniques including the general method of moments (GMM) ( We also control for differences in measurement. One measurement issue is double-counting, which arises when R&D capital expenditures and R&D personnel are counted twice: on their own and as part of the physical capital (C) and labour (L). It is often reported that failure to correct for double counting leads to downward bias in the elasticity estimates (Griliches, 1979Harhoff, 1994;Mairesse and Hall, 1994;Hall et al., 2010); and in rates-of-return estimated indirectly (Schankerman, 1981). Therefore, we control for whether primary studies correct for double counting in both elasticity and rates-of-return estimations. Another measurement issue relates to output. Cunéo and Mairesse (1984) and Mairesse and Hall (1994) report that elasticity estimates based on value-added do not differ from those based on sales without including materials as an additional input. However,  indicate that elasticity estimates based on value added tend to be smaller than those based on sales without materials. Therefore, we control for measurement of output as a potential source of heterogeneity in the evidence base.
The final set of moderating factors we control for include publication type (whether the primary study is a journal article or a working paper against a reference category consisting of all others); countries (France, Germany, UK or US versus other OECD countries); time period (whether the median of the data period is before or after 1980); R&D type (private versus public R&D); R&D intensity; and whether the underlying data is at the firm or industry level. Definitions of and summary statistics for all moderating variables are given in Tables A1 and A2 in the Appendix.

Meta-analysis: procedures and method
We use meta-regression analysis (MRA) to provide verifiable estimates for: (a) the 'average' productivity or rate-of-return estimate after taking account of publication selection bias; (b) the extent of publication selection bias; and (c) the effects of a wide range of moderating factors on the variation in the evidence base. We follow the best-practice recommendations in  and identify the relevant studies by searching in nine (9) databases, using 13 search terms for Title and 20 search terms for Abstract fields (Table P1). 5 We also use the snowballing approach and identify 32 studies through backward citations.
In the first stage, two reviewers read the title and abstract information of 979 total hits to select the relevant studies. In stage two, we made inclusion and exclusion decisions based on full-text information. In both stages, all studies have been coded with de-selection and exclusion criteria specified in Table P2 (see note 5). We have de-selected 343 studies on the basis of relevance criteria and 297 studies as duplicates. In stage two, we excluded 274 studies by invoking at least one of the exclusion criteria. The final sample consists of 65 primary studies that report elasticity and private rates-of-return estimates; and published in English between 1980 and July 2013. The frequencies with which a de-selection criterion is invoked in stage one are given in Table P3; and those for exclusion decisions in stage two are given in Table P4 (see note 5). Data extraction yielded 1,262 estimates; but 4 of these have been excluded from estimations as they are found to have undue influence. 6 Hence, the meta-analysis is based on 1,258 estimates, of which 440 are elasticity estimates in the level dimension, 468 are elasticity estimates in the temporal dimension, and 350 are rate-of-return estimates.
First, we calculate fixed-effect weighted means (FEWMs) per study and for each evidence pool, in accordance with: Here, ei is the elasticity or rate-of-return estimate and 1/SEi 2 is precision-squared, which assigns lower weights to estimates with larger standard errors. FEWMs are more reliable than simple means, but they may conceal a high degree of heterogeneity among the estimates extracted from each study.
To estimate 'genuine effect' beyond publication selection bias, we draw on meta-regression analysis (MRA) models proposed by Stanley (2005Stanley ( , 2008, Stanley (2012, 2013) and Doucouliagos (2012, 2013a). The underpinning theoretical framework is that of Egger et al. (1997), who postulate that researchers with small samples and large standard errors would search intensely across model specifications, econometric techniques and data measures to find sufficiently large (hence statistically significant) effect-size estimates. Hence, denoting effect size (i.e., the elasticity or rates-of-return estimate) with ei and the standard error with SEi: Rejecting the null hypothesis of = 0 indicates the presence of publication selection bias. This is in line with increasing emphasis on the need to control for selection bias in both social sciences and medical research (see, Card and Krueger, 1995;Dickersin and Min, 1993;Ioannidis, 2005; and Simmons et al., 2011). The test is also known as funnel-asymmetry test (FAT), which evaluates the asymmetry of the funnel graphs that chart the effect-size estimates against their precisions. 7 Testing for = 0 is a test for whether genuine effect exists beyond selection bias.
However, estimating (7) poses several issues. First, the model is heteroskedastic because the effect-size estimates have widely different standard errors (hence variances) that violate the assumption of independently and identically distributed (i.i.d.) error term (ui). To address this issue, Stanley (2008) and Stanley and Doucouliagos (2012) propose a weighted least squares (WLS) version, obtained by dividing both sides of (7) with precision (1/SEi), leading to: Here ti is t-values reported in or calculated from primary studies; and the error term vi = ui/SEi. Under the Gauss-Markov theorem, OLS estimation of (8) yields minimum-variance linear unbiased estimates. Testing for = 0 is a test for publication selection bias whereas testing for = 0 is a 'genuine effect' test (or precision-effect test -PET) after controlling for selection bias. The selection bias is considered as substantial if |α| ≥ 1; and as severe if |α| ≥ 2 (Doucouliagos and Stanley, 2009;. The second issue is whether there is random variation between and within studies beyond idiosyncratic errors. We address this issue by estimating a hierarchical version in which primary-study estimates (the lower-level observations) are nested within primary studies (higher-level clusters). Hence: Here, subscripts j and i refer to higher-level clusters and lower-level observations, respectively; and εij is a multivariate-normal error term with mean zero and variance matrix σ 2 R, with R containing the residual-variance parameters. The study-level random effect (vj) is assumed orthogonal to the error term εij. The random effects (vj) are not estimated directly, but their variance (or standard error) is.
Hierarchical models have two advantages over standard linear models. First, they allow for inclusion of random deviations other than those associated with the idiosyncratic error term. Secondly, they allow for modeling the random deviations as both between-and within-study variations (Demidenko, 2004;McCulloch et al., 2008). As such, hierarchical models are particularly relevant for meta-analysis of the research evidence, which reflects a high level of heterogeneity across and within primary studies.
To account for both between-and within-study variation, the random-effect term in (9) can be modelled as follows: Here, 0 captures the between-study variation (so-called random intercepts) and 1 captures within-study variation between slopes.  (9) or (10) rejects the null hypothesis of zero effect. Then, the correct specification is referred to as precision-effect test corrected for standard errors (PEESE) and can be stated as follows:

(Random intercepts and slopes)
We select the appropriate model on the basis of likelihood ratio (LR) tests, with the null hypothesis that the comparison model is nested within the preferred model. A rejection of the null hypothesis indicates that the preferred model is a better fit for the data at hand. The testing procedure can be summarised as follows: (i) estimate model (9) and establish whether the LR test justifies model (9) as opposed to standard OLS (8); (ii) if (9) is preferred, test for random intercepts-only model (9) versus the model with random intercepts and slopes (10); and (iii) if the precision-effect test (PET) in (ii) confirms presence of non-zero effect, conduct LR test to choose between random-intercepts or random-intercepts-and-slopes version of the PEESE model in (11).
We also address the issue of overly influential observations, using the DFBETA routine in Stata. This involves calculating the difference in the regression coefficient when the i th observation is included and excluded. The difference is scaled by the estimated standard error of the coefficient; and observations with |DFBETA| > 1 are excluded from the estimation.
The effect-size estimate (β) obtained in (11) is an 'average' R&D elasticity or rate of return, taking account of publication selection bias and the quadratic relationship between primarystudy estimates and their standard errors. Although this is more reliable than simple or fixedeffect weighted means, its out-of-sample generalizability may be limited due to excessive heterogeneity in the evidence base. To identify the sources of heterogeneity, we estimate a multivariate meta-regression model in which we control for a wide range of moderating factors (i.e., observable sources of heterogeneity) that reflect the dimensions of the research field. This can be stated as follows: (Random intercepts and slopes) All terms and subscripts are as defined above. The kx1 vector of moderating variables (Zk) reflects the dimensions of the research field and constitutes the observable sources of heterogeneity in the evidence base. Because the inclination towards publishing statisticallysignificant findings is pervasive in social sciences, medicine and physical sciences in general (Card and Krueger, 1995;Dickersin and Min, 1993;Ioannidis, 2005; and Simmons et al., 2011), it is important that the Z-variables are interacted with precision.
To minimise the risk of multicollinearity and over-fitting, we estimate (12) through a generalto-specific estimation routine, whereby we omit the most insignificant variables one at a time until all remaining covariates are statistically significant. We present the findings from the specific and general models side by side to: (a) establish the extent of congruence between the significant moderating factors; and (ii) identify the range of moderating variables that do not affect the variation in the evidence base.

Meta-analysis results: R&D effects and sources of heterogeneity
We report three sets of evidence on R&D elasticities and rates of return: (1) fixed-effect weighted means (FEWMs); (2) 'average' effect-size estimates that take account of publication selection bias; and (3) multivariate meta-regression evidence on how moderating variables affect the estimates reported in primary studies. Table 1 presents FEWMs for elasticity estimates in the level and temporal dimensions (1a and 1b) and for rates-of-return estimates (1c). The FEWM is 0.053 for the sample of elasticities in the level dimension; 0.012 for the sample of elasticities in the temporal dimension; and 11.5% for the rates-of-return sample. They indicate that knowledge capital has positive effects on productivity, but the effects are smaller than what is reported in existing reviews. For example Wieser (2005)      Given the extent of heterogeneity, it is difficult to make inferences on the basis of representative estimates chosen by primary-study authors or reviewers. The risk of incorrect inference is higher if the primary-study estimates are contaminated with publication selection bias. Therefore, and as a second step towards correct inference, we estimate the average elasticity and rate-of-return estimates after controlling for publication selection bias.

Elasticities and rates of return beyond selection bias
PET/FAT and PEESE results from bivariate hierarchical meta-regressions are given in Table 2. The models are fitted with random intercepts and random slopes in accordance with LR test results. Standard deviations of the random slopes and the residuals are all significant -indicating that the hierarchical model specification is preferable due to presence of between-and within-study variations that cannot be explained by sampling differences. -1494 -1382 -720 -1494 -1407 -763 Notes: *, **, *** indicate significance at 10%, 5% and 1*, respectively. All models are estimated with random intercepts and random slopes, in accordance with LR tests. Significance of random-effect terms is based on standard errors (not reported here) for the natural logarithms of the standard deviations. Observations with undue influence are excluded, using the DFBETA routine in Stata. Cluster-robust standard errors (in brackets) are clustered within primary studies. Wald Chi-square tests indicate overall significance of the hierarchical models (HM), which are also preferable to OLS given the log-likelihood for these models are smaller in magnitude.
In panel A, the PET/FAT results indicate substantial and positive selection bias as the Constant is larger than 1 and significant across three evidence pools. The presence of positive selection bias can be verified visually by inspecting the funnel graphs in Figure  A1 in the Appendix. As indicated above, publication selection is a prevalent practice in medicine, physical sciences and social sciences. Results in Table 2 indicate that the research on R&D and productivity does not constitute an exception. It is beyond the scope of this study to discuss why publication selection bias exists despite the fact that the work of the leading contributors to this field was in great demand in the 1980s and 1990s. However, it is possible to conjecture that public policy makers are in need of justifying public support for private R&D investment on the basis of evidence demonstrating that the latter has direct or indirect positive effects on productivity of resident firms/industries. This public policy preference may be conducive to selection by researchers, who are interested in research uptake by and impact on the process of public policy-making. This conjecture draws support from evidence in the research field that both leading contributors and major reviews of the literature did tend to highlight representative/preferable estimates that indicate larger 'effects' compared to what the evidence indicates within each study and across studies.
However, the existence of selection bias does not invalidate the 'genuine' effect, which remains significant after controlling for selection bias. Therefore, we report PEESE results in Panel B of Table 2. The average estimate is 0.079 for elasticities in the level dimension; 0.057 for elasticities in the temporal dimension; and 11.7% for rates of return. These are still smaller than simple averages or representative/preferred estimates reported in previous reviews. 8 Two points are worth emphasizing here. First, our findings confirm the consensus view that elasticity estimates in the temporal dimension are smaller and may be inconsistent (Hall et al., 2010;Hall and Mairesse, 1995). These estimates are likely to be influenced by collinearity between capital (both R&D and physical capital) and the time-effect that reflects autonomous technical change; and by measurement errors that are amplified when the data is first-differenced.
A more striking aspect of our findings is that they indicate a gross private rate of return on R&D that is smaller than the typical depreciation rate (usually, 15%) assumed in primary studies. This finding raises doubt as to whether the rate-of-return estimates reported in primary studies do indeed measure what they are supposed to measure.
With the exception of the debate on the difference between private and social returns on R&D, the primary-study authors and reviewers do not question whether the private rateof-return estimates measure what they are supposed to measure. This is despite the fact that Griliches and Mairesse (1991a) drew attention to the limitations of the rate-of-return estimates and characterised them as only "distant reflections" of the true rate-of-return measures for two reasons. First, the estimates are contemporaneous partial effects of the R&D intensity on output or TFP growth. This is a naïve measure because R&D projects take several years to complete and the returns on completed R&D projects may not materialise until a few years after completion. Second, the estimates are obtained from R&D intensity in one period only -in contrast to elasticity estimates based on past R&D capital stock and current R&D flows. Therefore, Griliches and Mairesse (1991a: 338) conjecture that private rates-of-return estimates obtained from microeconometric models tend to be biased downward by an order of 50%.
Taken in conjunction with this warning, our rate-of-return estimate of 11.7% indicates that the current specifications of the primal model may not be adequate for obtaining correct rates-of-return estimates. To obtain such estimates, it is necessary to model the lag structure of the R&D investment explicitly; and to estimate 'long-run' rates of return that take account of the lag structure for the relationship between R&D investment and its effects on output/TFP growth.

Multivariate meta-regression results
In what follows, we investigate how the moderating factors affect the estimates reported in primary studies. We measure the moderating factors with dummy variables that capture a specific feature of the research field vis-a-vis a reference category. 9 We use a hierarchical model specification justified by LR tests and follow a general-to-specific model estimation routine discussed in the methodology. Specific-and general-model estimates are presented side by side with a view to establish whether the findings are stable and congruent across models.
In the upper half of Table 3, we present coefficient estimates for a range of moderating variables that capture six dimensions of the research field: (i) publication type; (ii) measurement of output and inputs; (iii) model specification; (iv) estimation method; (v) country of origin for the data; and (vi) sample-related issues. At the bottom of Table 3, LR tests indicate that the hierarchical models fit the data better than their standard linear counterparts.
Before we discuss the findings for each dimension, we first discuss the range of insignificant moderating variables. In the publication type dimension, neither journal articles nor working papers are associated with systematically different elasticity estimates in the level or temporal dimensions. However, there is evidence that journal articles tend to report smaller rate-of-return estimates. These findings suggest that journal articles does not suffer from the 'winner's curse', which arises when journals with higher levels of perceived quality capitalise on their reputations and publish more selected results (Costa-Font et al., 2013). We also find that the country of origin for the data is usually insignificant, with the exception of US data which we discuss below. Lack of systematic difference between France, Germany, or the UK on the one hand and the rest of OECD as the reference category may be driven by similar levels of R&D intensity or by rate-of-return equalisation or both. With respect to the sampling dimension, we find that the median year in the time dimension of the panel data, the use of firm as opposed to industry data or the restriction of the sample to small firms as opposed to large or mixed-size firms do not explain the variation in elasticity or rate-of-return estimates.
As indicated above, US data is associated with relatively larger elasticity estimates in the level dimension but lower rate-of-return estimates; and small-firm data is associated with relatively larger elasticity estimates in the temporal dimension. Larger elasticity estimates associated with US Data is likely to be driven by higher R&D intensity in the US, which tops the OECD group throughout the time period covered in primary studies. This finding is congruent with country-level evidence reported in Soete and Verspagen (1993) and Coe and Helpman (1995) -and can be explained by higher technological capacity that enables R&D-intensive firms/industries to better exploit the productivity gains from R&D investment. On the other hand, lower rate-of-return estimates associated with US data is likely to be due to diminishing returns on R&D investment in countries where R&D intensity is higher than the reference category. Notes: NA indicates not applicable. ***, ** and * indicate significance at 1%; 5% and 10%, respectively. Significance of the random-effect components are based on the natural log of the standard deviations. Results for elasticity estimates in the level and temporal dimension are based on a model with random intercepts and random slopes; those for rates-of-return estimation are based on a model with random intercepts only. Model choices are based on LR tests. Observations with undue influence are excluded, using the DFBETA routine in Stata. Models are not estimated with cluster-robust standard errors because the number of restrictions in the general model exceeds the number of clusters. However, cluster-robust estimation of the specific model is available on request. The results are the same -with the exception of firm-level data and R&D-intensive covariates, which become insignificant in the cluster-robust estimation of level dimension model. Wald Chi-square tests indicate overall significance of the hierarchical models that, according to the log-likelihood values, are also preferable to their standard linear counterparts.
Now we return to dimensions of the research field where a range of moderating variables tend to have a significant effect on the elasticity and rate-of-return estimates. Four dimensions stand out: measurement of output and inputs; sample-related issues; model specification; and estimation methods.
With respect to input/output measurement, we find that studies that measure output with value added tend to report larger elasticity and rate-of-return estimates. This is an interesting result given the lack of consensus in the literature on whether the use of value added affects reported estimates. Some studies report no difference between elasticity estimates based on value added and those based on sales corrected for intermediate inputs (Cunéo and Mairesse, 1984;Mairesse and Hall, 1994). Some others indicate that elasticity estimates based on value added tend to be smaller than those based on sales not corrected for intermediate inputs . Hall et al. (2010) indicate that the preferred measure is gross output, used in conjunction with intermediate inputs, capital and labour. However, they also cite two reasons as to why value added should be preferred to gross output or sales, particularly when the analysis is at the firm level. First, the ratio of materials to gross output can vary substantially across firms because of different degrees of vertical integration. Secondly, when output is measured by sales or gross output, the demand for intermediate inputs should be modelled explicitly, including the adjustment costs related to stocking of materials. These conditions are usually not satisfied due to data limitations.
We argue that larger elasticity estimates associated with the use of value added is a result to be expected. Note that the elasticity estimate is = ( ) ⁄ , where is the rate of return on R&D capital, and Yit and Kit are sample means of output and R&D capital, respectively. When measured as value added, output is smaller than sales or gross output including cost of materials. Therefore, the value of ( ) ⁄ is relatively larger and hence the elasticity ( ) is also relatively larger for the same rate of return ( ). A similar explanation holds for larger rate-of-return estimates. Recall from equations (3b) and (3c) that the rate-of-return estimate ( ) is the coefficient on R&D intensity ( ) ⁄ , which is larger when output is measured as value added rather than sales or gross output. Schankerman (1981) is the first study that quantifies the downward bias in elasticity estimates when physical capital and labour inputs are not corrected for double counting. The bias is larger the larger are the ratios of R&D capital and R&D personnel to conventional capital and labour, respectively. Our finding confirms that, unless R&D investment is capitalised in national accounts, it is good practice to deduct the R&D capital and R&D labour from physical capital and total employment. 10 This is the case too when rates of return are estimated indirectly, using the elasticity estimate ( ). However, correcting for double counting introduces a downward bias in the rates of return estimated directly. This is to be expected because direct rates-of-return estimates are based on R&D intensity rather than R&D capital as an additional input.
Primary studies that use other methods to construct the R&D capital tend to report smaller elasticity estimates compared to those using the perpetual inventory method (PIM). Although this finding is limited to elasticities in the level dimension, it is worth highlighting here because the appropriate method is a contentious issue in the literature. PIM is compatible with the neo-classical theory of capital, which assumes that firms within a given industry will carry out less R&D investment in the current period if they have relatively higher levels of R&D capital stock in the preceding period. However, this assumption is usually not supported by empirical evidence -which indicates that firms that carry out above-average levels of R&D investment in the preceding period tend to do so in the current period too (Hall et al., 1986;Klette, 1994). Therefore, alternative methods for constructing R&D capital are recommended (Hall and Hayashi 1989;Klette, 1994;Bitzer and Stephan, 2007).
Although primary studies and Hall et al. (2010) discuss the merits and demerits of different methods for constructing the R&D capital, no systematic evaluation has been provided about whether productivity estimates would differ between studies using different methods. We address this question here and report a downward bias in elasticity estimates when primary studies use other methods instead of PIM. This is due to the fact that the majority of the alternative methods are based on R&D capital proxies such as proportion of researchers to total employment or R&D investment per employee. 11 These measures do not correct for the empirical pattern that seems to be in contradiction with the assumption underlying the PIM-namely the positive correlation between R&D capital stocks in the last and current period. Therefore, we suggest that future research should limit innovation to more innovative alternatives as suggested by Klette (1994) rather than simple proxies for R&D capital.
Some studies use weighted variables when they estimate rates of return on R&D. The weight could be the square-root of the R&D intensity or firm size or industry's share in sectoral value added (Bartelsman, 1996;Cameron et al., 2005;Hall, 1993;and Lichtenberg and Siegel, 1991). Bartelsman et al. (1996) report that weighted estimations yield lower elasticity but higher rates-of-return estimates; but others do not provide comparative findings. Our finding corroborates Bartelsman et al. (1996) with respect to larger rate-of-return estimates; but not with respect to smaller elasticity estimates in the level or temporal dimensions. Therefore, we suggest that researchers and research users should compare rate-of-return estimates based on both weighted and un-weighted specifications.
With respect to sampling issues, we find that data on R&D-intensive firms/industries is associated with larger elasticity estimates in both dimensions and with higher rates of return. This is in line with several findings in primary studies (Griliches, 1980b;Cunéo and Mairesse, 1984;Odagiri, 1983;Bartelsman, 1990; and with the conclusions derived in the narrative synthesis by Hall et al. (2010). The standard explanation in the literature is that R&D-intensive firms/industries have better technological capacity to exploit the benefits of the product and process innovations that R&D investment generates.
We also find smaller elasticity (and rate-of-return) estimates when the underlying data is government-funded R&D instead of private-funded or total R&D. Bartelsman (1990), Lichtenberg and Siegel (1991), Mansfield (1980), Terleckyj (1980), and Wolff and Nadiri (1993) all report smaller elasticity and rate-of-return estimates for government-funded R&D. Hall et al. (2010) suggest several reasons for the difference. First, firms may underestimate the risks when they use public funds for R&D purposes. Second, public funds for R&D may be spent in areas such as health and defence, with high levels of externalities. Finally, government funding of R&D may be concentrated in few industries (such as pharmaceuticals and IT) where returns are lower due to high levels of R&D intensity.
We are of the view that these sample-related findings point out two important issues that do not feature sufficiently in the existing literature and its reviews. The first concerns the role of market power in the case of larger productivity effects in R&D-intensive firms/industries. The question is: are larger elasticity and rate-of-return estimates in R&D-intensive firms/industries due to better technological capabilities or higher market power that enables them to extract innovation rents? The relationship between innovation and market power has been discussed extensively in the industrial organisation literature (see, Gilbert, 2006 for a review) but not in the R&D and productivity literature. The evidence from the former indicates that R&D intensity and market power are correlated positively -at least until a threshold of market power is reached (Aghion et al., 2005). Therefore, controlling for market power and/or for interactions between market power and R&D-intensity would constitute useful avenues for future research.
Secondly, and as indicated by Griliches (1979), decomposing the R&D capital into public and private components raises the issue of functional form in the primal model, where all inputs are assumed as complements and as such it is legitimate to include each input separately. However, is complementarity also applicable to the components of the R&D capital itself (e.g., private versus public R&D capital or basic versus applied R&D capital)? Or should these components be considered as substitutes, summed up into a single aggregate measure of R&D capital? These questions cannot be answered without testing for functional form, which is usually not the case in existing studies. Hence, further research is necessary to ascertain whether productivity differentials between public and private R&D are genuine or reflect a model specification bias.
With respect to estimation methods, the most contentious issue is how to address endogeneity and whether addressing endogeneity yields systematically different estimates. Some studies address endogeneity through a semi-reduced form of the production function . Some others use three-stage leastsquares (3SLS) (Verspagen, 1995) or a general method of moments (GMM) estimator (Mairesse and Hall, 1996;Aldieri et al., 2008;Blanchard et al., 2006;and Griffth et al., 2006). Yet, there is no consensus about how the instrumental variable estimation methods would affect the reported estimates (see, Hall et al., 2010). Our findings indicate that studies that implement an instrumental variable estimation method (2SLS, 3SLS, or system/difference GMMM) tend to report smaller elasticity estimates in both level and temporal dimensions. The effect is also negative but insignificant in the rate-of-return estimates. Therefore, the assumption that different estimators produce statistically similar results is at best questionable . However, it must also be noted that results from instrumental variable estimators (including GMM) are highly contingent on specifying the correct instrument mix.
The largest number of moderating factors with significant effects is observed within the model specification dimension. However, the findings here are partial in the sense that they relate to a single evidence base (elasticities in the level or temporal dimensions or rate-of-return estimates only). Following the order of reporting in Table 3, we observe that studies that use Enhanced panel-data models that control for panel co-integration and cross-sectional dependence report larger elasticity estimates in the level dimension. Notable among these studies are Anon and Higon (2007) who utilise an autoregressive distributed lag (ARDL) model; Doraszelski and Jaumandreu (2013) who utilise a controlled Markov process that captures the impact of R&D on the evolution of productivity; and Eberhardt et al (2013) who take account of cross-sectional dependence by using common correlated effects pooled estimator (CCEP) or common correlated effects mean group estimator (CCEMG). These are innovative approaches but the larger estimates they report are significant only in the level dimension.
A large body of empirical work investigates whether R&D spillovers has indirect productivity effects. However, there is no systematic evaluation of whether the direct effects differ between studies that do or do not control for the spillover effects separately. Our finding indicates no systematic difference when studies estimate elasticities, but rateof-return studies that control for spillovers tend to report smaller rates-of-return estimates compared to others that do not.
Unlike existing reviews, however, we argue that this finding should be taken with a pinch of salt because the way in which spillovers are measured and modelled begs important questions -as indicated emphatically in a working paper by Griliches (1991). First, primary studies tend to report contemporaneous spillover effects despite the fact that spillovers are likely to take more time than own R&D to have an effect on productivity. Secondly, it is difficult to measure the pool of external R&D capital with precision either through weighted measures where the weights usually consist of technology proximity matrix or unweighted measures where the spillover effects is assumed to be symmetric across industries. Finally, the primary studies tend to assume that all firms would benefit from the spillover pool equally, whereas heterogeneity is more likely due to cross-firm differences in R&D intensity as a potential determinant of absorption capacity.
The third partial finding indicates that primary studies that include industry dummies in their models tend to report smaller elasticity estimates in the level dimension. This is in line Hall et al. (2010), who report that elasticities in the level dimension tend to be smaller when primary studies include industry/sector dummies in their models. In the temporal dimension, however, we find a positive coefficient. With respect to time dummies, we find that their inclusion in primary-study models is associated with larger elasticity estimates in the temporal dimension and with higher rate-of-return estimates.
Inclusion of industry/time dummies is considered as good practice that takes account of unobservable factors such as quality differences not reflected in prices (Hall et al., 2010). However, there is little or no guidance on correcting for new sources of bias that the inclusion of time/industry dummies may introduce when the latter are correlated with existing covariates. Therefore, we recommend that researchers should complement the use of time/industry dummies with estimations based on industry subsets characterised by technological proximity. Pavitt's (1984) taxonomy of technology classes can be a useful framework for such analysis.
The fourth set of partial findings indicate that primary studies that allow for variable returns to scale tend to report smaller elasticity estimates in the temporal dimension. This is in line with Hall et al. (2010), who report that studies imposing constant returns to scale tend to report larger elasticity estimates in the temporal dimension. It is also in line with a number of primary studies that report a similar pattern, including , Cunéo and Mairesse (1984), and Griliches and Mairesse (1991a). However, when we checked the primary studies allowing for variable returns to scale we find out that the lower elasticity estimates they report are associated with decreasing return to scale. Hence, the lower elasticity estimates in studies allowing for variable returns to scale are likely to be driven by decreasing returns to scale. However, Cunéo and Mairesse (1984: 378) indicate that the decreasing returns to scale they establish are somewhat implausible. Therefore, the reliability of the elasticity estimates in the temporal dimension is questionable not only because of the amplification of the measurement error discussed above, but also because of their susceptibility to whether the data at hand indicates increasing or decreasing returns to scale.

Conclusions
The work on R&D and productivity has made significant contributions to knowledge not only in terms of empirically-rich findings but also with respect to measurement, modeling and estimation issues involved. Our review has enabled us to take stock of the findings from the extant literature and provide verifiable findings on productivity effects of R&D investment, including elasticity estimates in the level and temporal dimensions and ratesof-return estimates. Some of our findings are in line with those reported in a number of narrative reviews and meta-analysis studies. Congruent findings include: (i) the average productivity effect of R&D investments is positive -whether it is measured as output elasticity or rate of return; (ii) controlling for double-counting of the R&D capital and R&D personnel is necessary to avoid a downward bias in the elasticity estimates; (iii) elasticity estimates in R&D-intensive firms/industries are usually larger than other industries, and (iv) elasticity and rate-of-return estimates for government-funded R&D are lower compared to private-funded R&D.
However, even with respect to congruent findings, we provide additional evidence indicating that the findings in existing reviews should be qualified. Specifically: (i) the productivity effect may be positive, but it is smaller than what is reported in existing reviews and the 'average' rate-of-return estimate is smaller than the rate of depreciation usually assumed; (ii) controlling for double counting may be appropriate in the estimation of elasticities, but it is likely to introduce a downward bias in rates-of-return estimates; (iii) the productivity effects of R&D investment may be larger in R&D-intensive firms/industries, but the cause can be either better technological capabilities among R&Dintensive firms/industries as suggested by existing reviews or higher market power as indicated in the industrial organisation literature; and (iv) the productivity effects of public R&D may be smaller than privately-funded R&D, but this results raises the question of functional form and is contingent on whether private and public R&D are complements rather than substitutes.
Beyond these qualifications, we also provide verifiable evidence about the effects of a range of moderating factors with respect to which the existing reviews provide either inconclusive or divergent conclusions. Here, we find that: (i) elasticity and rate-of-return estimates based on value added as the measure of output are larger than those based on gross output or sales; (ii) elasticity estimates that do not take account of endogeneity in the level or temporal dimensions are likely to be biased upward; (iii) elasticity estimates based on other methods of constructing the R&D capital stock are smaller than those based on the perpetual inventory method (PIM) and are likely to suffer from downward bias; (iv) elasticity estimates based on small-firm data are smaller than those for large firms in the temporal dimension, but this difference is likely to reflect a downward bias due to measurement error in small-firm data, which is exacerbated in the temporal dimension; (v) studies that control for spillovers as an additional source of productivity report smaller rates-of-return estimates compared to others that do not, but the existing methods through which spillovers are measured and modeled should be questioned; and (vi) variable returns to scale are associated with smaller elasticity estimates in the temporal dimension, but this association is likely to be due to decreasing returns to scale rather than relaxing the assumption of constant returns to scale.
Our findings indicate that future research could benefit from a number of innovations in modeling and estimating the R&D-productivity relationship. One innovation relates to rate-of-return estimations, where it is necessary to account for the lag structure in the R&D-productivity relationship and for the time-lag involved in completion of R&D projects. Secondly, relatively larger productivity effects associated with R&D-intensive firms poses the question as to whether this association is due to variations in absorption capacities or market power or both. Therefore, we recommend that the primal model be augmented with covariates capturing technology absorption capacity and market power/concentration measures, preferably with interaction terms between the latter and R&D intensity. Third, relatively smaller elasticity and rate-of-return estimates associated with government-funded R&D may be due to concentration of public support in R&Dintensive industries with high levels of externality; but they may also reflect a model specification bias due to untested assumption of complementarity between public and private R&D. Therefore, it is necessary to test for complementarity before reporting separet estimates for the productivity effects of private and public R&D. Finally, we agree with Hall et al. (2010) that there is scope for innovation in the methods used for constructing R&D capital; however the current practice is likely to constitute an additional source of bias as it consists of using simple proxies rather than alternative methods that challenge the neo-classical assumption of within-industry equalisation of R&D capital.

Rate-of-return estimates
Note: the funnel graphs are generated after excluding observations with undue influence (observations with |DFBETA|>1)