This article requires a subscription to view the full text. If you have a subscription you may use the login form below to view the article. Access to this article can also be purchased.

## Abstract

A persistent problem for hedge fund researchers presents itself in the form of inconsistent and diverse style classifications within and across database providers. In this article, single-manager hedge funds from the Hedge Fund Research (HFR) and Hedgefund.Net (HFN) databases are classified on the basis of a common factor extracted using the factor axis methodology. It is assumed that the returns of all sample hedge funds are attributable to a common factor that is shared across hedge funds within one classification, and a specific factor that is unique to a particular hedge fund. In contrast to earlier research and the application of principal component analysis (Fung and Hsieh [1997, 2001]), factor axis seeks to determine how much of the covariance in the dataset is attributable to common factors (commonality). Factor axis largely ignores the diagonal elements of the covariance matrix, and orthogonal factor rotation maximizes the covariance between hedge fund return series. In an iterative approach, common factors are extracted until all return series are described by one common and one specific factor. Prior to factor extraction, the series are tested for autoregressive moving-average processes, and the residuals of such models are used in further analysis to improve upon squared correlations as initial factor estimates. The methodology is applied to the July 1995 to June 2010 timeframe. The results indicate that the number of distinct style classifications is reduced in comparison with the arbitrary self-select classifications of the databases.

The following literature review is divided into two parts: The first part focuses on autocorrelation and autoregressive/moving-average processes in hedge fund returns; the second provides an overview of the current research on the application of economic and statistical factor models to hedge fund return series.

Extensive research has been conducted to identify autocorrelation and to quantify the effects of serial correlation in hedge fund time series. For example, Asness, Krail, and Liew [2001] found significant exposure of hedge fund returns to lagged market betas. They argued that stale or managed prices may prevent hedge funds that are closely correlated with the market from moving in tandem with their benchmark in the same month. Lo [2001] argued that market frictions such as transaction costs, borrowing constraints, and short sales contribute to the possibility of serial correlation that cannot be arbitraged away. Similarly, Kat and Lu [2002] found evidence of both non-normality and autocorrelation of hedge funds reporting to the Trading Advisor Selection System (TASS). Skewness, excess kurtosis, and first-order autocorrelation are found to be significant across all style classifications, while merger arbitrage, distressed securities, convertible arbitrage, and emerging markets exhibit the highest positive autocorrelation coefficients. Amenc et al. [2004] recommended the Herfindahl index or, alternatively, the Ljung-Box statistic (Ljung and Box [1978]) to determine the significance of cumulative autocorrelation coefficients and thus the liquidity risk in hedge funds.

The magnitude of price distortions due to illiquidity of assets, stale prices, and performance smoothing is not identical across different hedge fund classifications. Loudon, Okunev, and White [2006] found significant first and second order autocorrelation coefficients for both high-yield and mortgage-backed securities hedge fund indexes. Diversified hedge funds, on the other hand, show no evidence of serial correlation. The authors recommended eliminating the first two autocorrelations in an adjusted time series. Further evidence of conditional return smoothing can be found in work by Bollen and Pool [2008]: The return smoothing process is conditional on the current performance of the fund (i.e., performance smoothing occurs during periods of large negative returns).

The presence of autocorrelation has some implications for the modelling of hedge fund returns and the use of autoregression. According to Miura, Aoki, and Yokouchi [2009], monthly individual hedge fund returns cannot be treated as independent and identically distributed observations (i.i.d.). Rather, current observations are dependent on lagged return observations plus an error term and are expressed in an autoregressive model of order *p* (*AR*(*p*)). Risk-adjusted returns are higher for some hedge funds series described by an *AR*(= 1) process, for the long–short equity and managed futures classifications in particular.

Bollen and Whaley [2009] confirmed that 30% of single-manager funds and 37.7% of funds of hedge funds (FoHFs) feature significantly positive coefficients in an *AR*(1) process. They identified three sources of high autocorrelation: trading in illiquid assets and lagged response times to system shocks, deliberately inflated Sharpe ratios (performance smoothing), and performance measurement bias at the single-manager level. For commodity trading advisors (CTAs), no such evidence of illiquidity or performance manipulation can be discerned.

Getmansky, Lo, and Makarov [2004] considered a moving average process (*MA*(*q*)) to describe hedge fund returns as a linear combination of white noise processes, where the sum of the *MA* coeffcients is equal to one (i.e., smoothing takes place over only the most recent *q* + 1 observations). They found the *MA*(2) specification to be a reasonable specification for hedge fund returns (i.e., the coefficient estimates are significant for all 908 funds of the TASS subsample, bar one). Autoregressive moving average models (*ARMA*(*p*,*q*)) are a natural extension when the current return observation depends linearly on both previous return observations and a combination of current and previous values of a white noise error term. Implementation of an autoregressive integrated moving average, (*ARIMA*(*p*,1,*q*)), model for the Hedge Fund Research (HFR) index series may be found in work by Lòpez de Prado and Peijan [2004].^{1} In a stepwise procedure, the relevant factors are selected up to an (*ARMA*(3,3)) process. No *ARMA* time-dependence can be identified for equity market neutral funds, market timers, short sellers, and managed futures.

The existing research on factor models and hedge funds can be divided into two categories: first, adaptations of the Sharpe [1992] 12-factor model and its application to mutual funds to determine hedge fund exposure to asset class or asset-based factors; and second, statistical factor models and the extraction of principal components from hedge fund time series as a dimensionality reduction technique to identify unobservable factors that drive hedge fund performance. In particular, principal component analysis (PCA) is applied to hedge fund indexes from different providers to extract the best possible one-dimensional representation of competing indexes (see, e.g., Amenc and Martellini [2001, 2003]). This section presents a brief overview of the research that is relevant in the context of this article.

Schneeweis and Spurgin [1998] used a multifactor framework to estimate the risk exposure of hedge funds and managed futures. The asset class factors include, beside stock and bond proxies, a commodity index and intramonth volatility index to account for hedge funds and managed futures trading long and short positions. In a similar vein, Agarwal and Naik [1999] applied an asset class factor model to hedge fund returns in the spirit of the Sharpe [1992] 12-factor model. They employed a stepwise regression algorithm to limit multicollinearity between the regressors. Similarly, Liang [1999] used stepwise regression to identify factor loadings on equity, fixed-income, commodity, and cash proxies. Edwards and Caglayan [2001] employed a six-factor model including the Fama and French [1992] high-minus-low (HML) and small-minus-big (SMB) portfolios, the Carhart [1997] winners-minus-losers (WML) portfolio, and a yield curve proxy to determine alphas for trend-following hedge funds.

Some additional examples of asset class factor models to estimate hedge fund risk factors include studies by Boyson [2003] on multifactor models using standard asset indexes, HML and SMB portfolios, and a momentum factor; Teo, Koh, and Koh [2003] explaining returns in Asian hedge funds replacing U.S. Equity and Bond proxies with regional indexes; Harri and Brorsen [2004] and Hasanhodzic and Lo [2007] on linear six-factor models based on broad asset indexes; Capocci, Corhay, and Hübner [2005] combining the factors from previous research, including that by Agarwal and Naik [2004]; Ammann and Moerth [2008] working on asset class factor models for FoHFs; Eling [2009] comparing several factor models, including the capital asset pricing model (CAPM) and the Fama–French/momentum extension; and Eling and Faust [2010] constructing asset class factor models for emerging markets hedge funds using various equity and bond proxies.

Using lookback straddles on a number of standard asset indexes, Fung and Hsieh [2002] showed that primitive trend-following strategies (PTFSs) can explain the returns in trend-following hedge funds. The PTFSs subsume the nonlinear relationship between the hedge fund style factors and the markets in which hedge funds trade. In a similar approach to accounting for the nonlinearities in the relationship between hedge funds and risk factors, Agarwal and Naik [2004] extended their original model by incorporating option-based risk factors. Other risk factors include the Fama-French SML and HML factors, the Carhart momentum factor, and a commodity proxy. The *R*
^{2} varies between 44% for the HFR Event Arbitrage Index and 92% for the Equity Non-Hedge Index.^{2}

An extension of asset class factor models includes asset-based style (ABS) factors, as described by Fung and Hsieh [2002]. The four equity ABS factors include the S&P 500 and emerging market index as well as small-cap–large-cap stock and value–growth stock proxies. The proxies for fixed-income hedge funds include various yield curve spreads. The risk factors for hedge funds depend on the prevailing underlying strategy: directional, event-driven, or market neutral/relative value. ABS factors aid investors in identifying (portable) alphas adjusted for systematic style risks (see also Fung and Hsieh [2004]).

Extensive research has been conducted with respect to factor models, considering the option-like payoff of hedge fund investments. Mitchell and Pulvino [2001] found that the return profile of risk arbitrage funds correlates with that of selling uncovered index put options. Kouwenberg [2003] accounted for nonlinearities by considering the exposure of hedge funds to two option strategy portfolios, the first selling one-month put options and the second selling one-month call options on the S&P 500. Jaeger and Wagner [2005] used a similar approach, employing the Chicago Board of Trade’s BXM index, which mimics a covered call-writing strategy using the S&P 500. An updated application of the PTFSs described by Fung and Hsieh [2001] can be found in a study by Kosowski, Naik, and Teo [2005]. The hypothetical hedge funds used by Levchenkov, Coleman, and Li [2009] are option-based market timing dynamic strategies used to compare various approaches to hedge fund return modelling.

Aragon [2007] argued that an ex ante estimation of an appropriate model to describe the systematic risk of hedge funds may be difficult to achieve. Four models are considered for hedge funds of the eVestment Hedge Fund (HFN) database: a lagged market model including contemporaneous as well as lagged terms for the value-weighted market index to account for illiquidity; a broad market model including passive equity, fixed-income, and commodity benchmarks; an option model accounting for the dynamic market risk exposure as presented by Fung and Hsieh [2001] and Agarwal and Naik [2004]; and a Fama–French four-factor model including a momentum factor and a market index. The quality of the regression models is found to be comparable across all four models.

Some researchers have argued that the relationship between hedge fund and asset class returns is inherently nonlinear. Some of the nonlinearities are attributable to hedge fund return distributions that represent significant deviations from a normal distribution: For the HFR database, Agarwal and Naik [2000a] established significant skewness and kurtosis in hedge fund return series. Kat and Miffre [2008] commented on the overstatement of hedge fund alphas and the risks from non-normality of the return distribution. Similarly, Chan et al. [2006] established higher moments and nonlinear risk exposure for classifications of the TASS database.

Seeking to mitigate estimation errors due to non-normality and nonlinearity, researchers—most notably, Fung and Hsieh [1997]—have resorted to statistical factor models to extract unobservable factors based on the variances and covariances of the hedge fund return series. They identified five principal components that account for 43% of the variation in 409 hedge fund return series. Although the initial style factors bear no economic interpretation, regression against asset indexes and PTFSs reveal systematic exposure of the style factors to different asset classes and trading strategies. Fung and Hsieh [2002, 2004] provided extensions of their initial research.

Barès, Gibson, and Gyger [2003] also relied on principal components to determine factors driving hedge fund performance. However, they refrained from adding any economic interpretation to the factors. The estimation of principal components is in line with related research: estimation of eigenvalues and corresponding eigenvectors from the correlation matrix of the standardized hedge fund return series and subsequent calculation of principal components as a function of the original series and the eigenvector matrix. Rather than using five factors as in Fung and Hsieh [1997], Barès, Gibson, and Gyger [2003] compromised on eight factors explaining 60% of the variations in returns.

Christiansen, Madsen, and Christensen [2004] extracted five principal components from the hedge fund return series of the Center for International Securities and Derivatives Market (CISDM) database in a fashion similar to Fung and Hsieh [1997], allowing for a statistical classification of the dominating components in various strategies. They regressed the five components against broad market indexes and passive option strategies (compare to Agarwal and Naik [2000b]). For 185 funds with a continuous track record of 37 months, Christiansen, Madsen, and Christensen [2004] found that the first five components explain more than 60% of the total variance. Although the first component explains a larger proportion of the total variance in comparison to the Fung and Hsieh [1997] research, this is mainly due to a smaller sample of funds.

Amenc and Martellini [2001, 2003] considered principal components in the context of constructing equally weighted portfolios of competing hedge fund indexes. The portfolio weights are chosen so as to capture the largest possible fraction of information contained in the original index series. In a similar approach, Goltz, Martellini, and Vaissié [2007] constructed factor-replicating portfolios from a small number of individual hedge funds. They extracted the first *k* principal components and formed style portfolios from hedge funds that are highly correlated with the *k*th principal component. The resulting continuously rebalanced portfolios can be thought of as investable pure style indexes.

Kugler, Henn-Overbeck, and Zimmermann [2010] tested the consistency of style classifications across database providers. Using PCA, they identified considerable heterogeneity of index returns within the same classification. They analyzed the series of 78 hedge fund indexes pooled from seven different index providers. Indexes for the same classification from different providers show similar loadings for the first five components, indicating homogeneous style characteristics across different providers. However, the cumulative proportion of variance explained by the first five components is below 80%. Additionally, no comments are made with respect to the somewhat arbitrary attribution of single-manager funds to style classifications.^{3}

So far, other statistical factor models, such as principal axis, and their application to alternative investments have received little attention in the existing literature. One possible explanation is that principal axis accounts for commonalities across hedge fund return series but ignores the specific return component that comes about as a result of highly sophisticated trading strategies. However, it will be demonstrated that the proposed method yields unbiased estimators of a common strategic theme that may be used to classify single-manager hedge funds on the basis of their covariances with other hedge funds.

The remainder of this article is structured as follows. The second section discusses the data sources and the sampling process. The third describes the factor model and algorithm used to eliminate first- and second-order autocorrelation. Here, the rationale for choosing principal axis over PCA is given. The fourth section delivers the empirical results and gives brief explanations of the style classifications. The fifth section concludes the article.

**DATA SOURCES AND SAMPLING**

Existing research into statistical factor models for hedge funds is focused primarily on the TASS, HFR, and CISDM databases. In this article, all samples are created from hedge funds reporting to one of two databases: HFR and HFN. The HFR database is complemented by the HFN for two reasons. First, only some hedge funds report to both databases; as a consequence, the sample size is increased for the combined samples of the HFR and HFN databases. Second, the HFN database includes managed futures/CTAs, whereas the HFR does not.^{4} To account for attrition rates and survivorship bias, defunct or derelict funds formerly reporting to HFR are included in the analysis in the form of the HFR graveyard database.^{5}

The period of interest is July 1995 to June 2010. The timeframe is selected so as to include the demise of the long-term capital management fund in 1998, the subsequent period of economic recovery, as well as the subprime lending and banking crisis of 2007. This should allow for testing of the results throughout different economic cycles. First, it is postulated that classification according to common factor loadings yields meaningful strategic clusters of hedge funds. Second, said classification is robust with respect to macroeconomic impact factors (i.e., all hedge funds within a style classification are expected to react in a similar fashion to external shocks).

To test for the persistency of the results, we decided to conduct the analysis for 180 rolling-window estimation periods. The analysis was repeated using different timeframes to determine the common and specific factors of hedge funds. For example, the factor composition of hedge funds in July 2010 was estimated using three different timeframes:

• July 2005 to June 2010 timeframe (

*T*= 60 observations)• January 2003 to June 2010 timeframe (

*T*= 90 observations)• July 2000 to June 2010 timeframe (

*T*= 120 observations)

All findings were replicated for the three rolling-window estimation periods. The results from the estimation periods were used as forecasts for the composition of common factor portfolios.

The classification according to principal factor extraction was expected to be an improvement over both the self-selected classification of hedge fund managers in databases and the results from PCA and classification according to the first *k* principal components. The common factor loadings were expected to be statistically significant: The specific return component, while significant, does not fully explain the variation in the performance of the individual hedge fund.

The minimum number of observations required for hedge funds entering the sample was *T* = 63.^{6} The extracted factors explaining the commonalities across hedge funds varied according to the number of observations *T* included. It is reasonable to assume that longer time series yield classifications that are both robust and persistent. Similarly, larger samples of hedge funds are expected to require an increasing number of extracted factors explaining the covariance between the sample hedge funds. For example, the initial sample for the July 1990 to June 1995 timeframe includes 55 single-manager hedge funds meeting the selection criteria outlined subsequently. For the final July 1995 to June 2010 window, that number increases to 1,500 funds. We acknowledge that the hedge fund industry is growing and that an increasing number of hedge funds warrants additional style classifications as represented by a larger number of extracted common factors.

Factor axis and rotation were assumed to reveal style classifications that prevail over time. Assuming that style classifications are persistent, existing hedge funds can be expected to fall within the same classification as in previous periods.^{7} For the timeframes under observation, some evolutionary development of the prevailing style classifications was expected as the number of hedge funds in the sample increased (i.e., as new funds enter the sample, they may warrant their own classification). The decision to include an additional classification was unbiased, because the selected procedure relies upon parallel analysis to decide on the number of common factors extracted. Principal axis was deemed appropriate if (1) some persistence was observed with respect to hedge funds belonging to one classification or another throughout their reporting history, and (2) the main style classifications prevailed throughout different estimation windows.

The information extracted from the two databases included the monthly return on investment (ROI), main investment strategy, and substrategy. In addition, for the HFR graveyard database, the date that the fund stopped reporting to the database was included. For both databases, ROI was defined as change in net asset value during the month, assuming the reinvestment of any distributions on the fund’s reinvestment date, divided by net asset value at the beginning of the month. In general, returns were reported net of management fees, incentive fees, or other expenditures. Net-of-fee performance was calculated and provided by the fund managers. Reported returns were assumed to be an accurate representation of investors’ realized returns.

The sampling criteria were as follows:

• Single-manager funds only

• Continuous track records of

*T*= 63,*T*= 93 or*T*= 123 return observations; no inconsistencies or performance gaps• USD as returns currency

• At least monthly reporting frequency

• Reporting style: net-of-all-fees

Double-reporting funds within as well as between databases were accounted for. Hedge funds reporting to a database may elect to include time series for onshore and offshore investment vehicles separately. Although the after-tax return may differ with respect to investor residency, the ROI reported to the database is identical for both onshore and offshore funds. Additionally, some managers offer several classes of the same investment strategy that differ with respect to the underlying currency (e.g., USD, EUR, or GBP). To avoid accounting twice for the same investment fund, the analysis was limited to funds reporting in USD.^{8} Lastly, a fund manager may offer several classes of the same basic investment strategy (e.g., market neutral) that differ with respect to hedge overlay and leverage. In addition, similar series of the same funds may be offered as different share classes for regulatory and accounting reasons. Where these funds produced identical time series, one of the two series was eliminated.

In the combined sample, if funds were found to report to both databases, only one of two records was retained. The HFR database was used as the primary database, and the samples were complemented with hedge funds reporting only to the HFN database. In the event that duplicate hedge funds belonged to different strategy classifications in the HFR and HFN databases, the record was removed from the HFN classification, which may lead to fewer funds of that particular strategy entering the joint sample. The combined HFR and HFN database includes 26,300 funds, 18,471 of which are single-manager funds. Between April 1985 and June 2010, after accounting for double reporters, 3,204 funds provided a minimum continuous performance track record of *n* = 63 while fulfilling the sampling criteria previously outlined.

All classifications and substrategies for HFR and HFN are displayed in Exhibits 1 and 2. The exhibits show the *reported* main strategy and substrategy that are expected to deviate from the classification as a result of factor analysis. It is evident that the self-reported classifications are diverse and inconsistent across database providers. For the HFN database, the differentiations between market-neutral and directional hedge funds, as well as the broader strategic themes, are included to facilitate comparison between the databases.

**METHOD**

The factor model for hedge fund *i* at time *t* takes the following form:

Here, ß_{i1}, ß_{i2}, …, ß_{im} represents the fund-specific factor loadings, and *f*
_{1t}, *f*
_{2t}, …, *f _{mt}
* are the specific factor outcomes at time

*t*. The common factor model postulates that the return of a hedge fund at time

*t*is linearly dependent on a number of unobserved random variables

*f*_{t}= (

*f*

_{1t},

*f*

_{2t}, …,

*f*) and additional noise ?

_{mt}_{it}(specific errors), which may be different for every variable (i.e., hedge fund) under observation. Here

**?**

_{i}= ?

_{i1}, ?

_{i2}, …, ?

_{iT}are i.i.d. error terms with zero mean and finite variance that may not be the same for all

*k*variables. In matrix terms, the orthogonal factor model from Equation (1) can be rephrased for all hedge funds in the sample as

where **b** is the *m* × *k* matrix of factor loadings, *T* × *m* dimensional matrix **F** represents the common factors, and **e** is a *T* × *k* dimensional matrix containing the specific factors and contribution to performance. The factor model is an orthogonal factor model if **F** has zero expected mean, the common factors are uncorrelated, and the asset-specific factor **e** is a white noise process that is uncorrelated with the common factors. To formalize the assumptions:

Assuming
is an unbiased estimate of the error covariance matrix and **P** is the correlation matrix, then
:

Squared multiple correlations are used as a lower bound for the commonality estimates in the diagonals. The squared multiple correlations can be calculated from the inverse of the correlation matrix with unities in the diagonal as

9where *r ^{ii}
* is the diagonal element from the inverse of the correlation matrix

**P**

^{-1}. The estimator represents the principal factors from spectral decomposition. The principal factors are rescaled eigenvectors. The eigenvalues are equal to the sum of the squared loadings on the principal factors and are indicative of the covariance accounted for by each factor. Note that where the diagonal elements are less than unity (as with the common factor approach), there will be fewer factors than variables. The QR method is used to solve for eigenvalues and eigenvectors. A Monte Carlo approach is selected to determine the number of nontrivial factors to extract. The specific method is Horn’s parallel analysis (Horn [1965]). The factor rotation criterion is Varimax.

^{9}

All hedge funds in the sample were initially classified according to their maximum factor loading. Principal factor estimation was repeated for every subsample of hedge funds within a classification. In an iterative process, hedge funds within one classification were further categorized by their newly acquired rotated absolute factor loadings. This process was repeated until hedge funds within one category loaded on one factor only (and all factor loadings were positive). The resulting model for hedge fund *i* at time *t* was then of the following form:

Within each classification, hedge funds loaded on one common factor *f*
_{t} only. The unique return contribution ?_{it} can be large and significant. However, it represents a unique factor that has no significant impact on other hedge funds within the same classification. That is, all hedge funds within a classification can now be described by exactly one common and one specific factor.

In a final step, all hedge funds within the same classification were combined into a portfolio (the common factor portfolio [CFP]). Different weightings were applied to the portfolio constituents to maximize the correlation of the portfolio return with factor scores from principal factor analysis. Due to performance differences in monthly returns across the various hedge funds that make up the factor portfolios, the weighting of the individual level gradually shifts away from its optimum. To create time series for a portfolio of hedge funds, the portfolio constituents and the portfolio itself were required to be rebased at (regular) intervals. All portfolios were rebalanced on a 12-month basis. The resulting portfolio return may be viewed as an equally balanced index return series representative of a particular style classification.

Serial correlation is a significant problem when using factor models in hedge fund return series and must be removed from the original series prior to factor estimation. The Ljung-Box statistic (Ljung and Box [1978]) was used to determine the significance of cumulative autocorrelation coefficients and thus the liquidity risk in hedge funds. An autoregressive moving-average *ARMA*[*p*,*q*] model was estimated for all hedge fund series used in further analysis. The appropriate model was selected on the basis of the Schwarz Information Criterion (Schwartz [1978]). Upon observation of the observed autocorrelation and partial autocorrelation coefficients, the highest-order model considered was an *ARMA*[3,3] model. The residuals from ordinary least squares regression were used to estimate the squared multiple correlations in matrix (8).

**EMPIRICAL RESULTS**

Classical linear regression models require the regression error term to be i.i.d. and approximately normal, or:

11Similarly, some estimation techniques require explicit assumptions about the frequency distribution (e.g., maximum likelihood). It is easy to see from Exhibit 3 that these assumptions are unlikely to hold in the context of the distribution of hedge fund returns. Exhibit 3 gives the first four moments of the frequency distribution of returns as well as a parametric and nonparametric test statistic to estimate the deviation from the normal distribution function. The results are for the reported strategic classifications from the HFR and HFN databases. One upshot of using the factor axis methodology is that the unique component of the model subsumes the abnormal return component of hedge funds, and thus non-normality of the original series is less of a concern.

Autocorrelation coefficients and partial autocorrelation coefficients and the significance thereof are provided in Exhibit 4. The Box-Ljung portmanteau test gives the significance at cumulative lags. Exhibit 5 displays the results after correcting for the autoregressive and moving-average processes described in the Method section (i.e., testing the residual series of a sufficiently defined *ARMA*[*p*,*q*] model). It is evident that most of the autocorrelation at the first three lags is removed. In addition, the number of sample hedge fund return series exhibiting cumulative significance of serial correlation at lags 4 through 12 is substantially reduced.

After correcting for autocorrelation, the sample of hedge funds was entered into the factor model.^{10} The factor axes were rotated so that the majority of hedge funds maximize upon one factor loading only. All hedge funds in the sample were classified according to their maximum factor loading. Hedge funds within a classification can be expected to share a higher degree of commonality (i.e., display higher covariance) with hedge funds of the same classification than with hedge funds loading on different factors. However, despite the increased level of commonality, more than one factor may be required to describe the return series of hedge funds within the classifications. Thus, hedge funds were further classified on the basis of newly extracted factors. This process was repeated until hedge funds within a classification loaded on one common factor only (and all factor loadings were positive).

An example is given for the July 1993 to June 1998 estimation period (*T* = 60): The initial 261 funds of the sample are described by four common factors and one factor unique to each hedge fund of the sample (the error term).^{11} Because funds can display positive as well as negative factor loadings (the Varimax criterion tends to drive the factor loadings toward -1, 0, or 1), the maximum number of factor portfolios is eight. From the original sample, 176 hedge funds were found to maximize on the first factor loading. Second-level factor analysis yielded three common factors driving the performance of the 176 funds in the subsample (again, all hedge funds were sorted into subsamples of the following sizes: 100, 44, and 32). Third-level factor rotation for the third group was required to yield the final statistical cluster of hedge funds (19 and 13). All hedge funds were attributed to a group of funds that share one common factor only.

The same principle of iterative factor extraction was applied to all subsamples of the initial factor classification. A total of 10 statistical clusters were required to identify homogenous classifications across the initial sample of 261 hedge funds. Most of these clusters contained very few funds, which were not necessarily representative of a dominant strategic theme. Hence, only the largest of these clusters were retained; the remaining funds from smaller clusters were attributed to the retained clusters.^{12} A statistical cluster was regarded as significant if, for any particular estimation window, it was representative of at least 30 single-manager hedge funds. The statistical clusters are henceforth referred to as CFPs.

After the initial analysis, the estimation window rolls forward one month (from the preceding example, from August 1993 to July 1998). After allowing for hedge funds to exit the sample (e.g., derelict funds or funds that stopped reporting), factor extraction was repeated for the existing CFPs. As before, factor extraction was repeated until all hedge funds within one portfolio loaded on one common factor only. If all hedge funds within a CFP required more than one common factor to explain their commonalities, new CFPs were created for each such factor (as before, the CFPs were retained only if they represented a significant number of hedge funds from the sample). New funds entering the sample were attributed to the existing CFPs on the basis of their correlation with the existing factor scores. New factor portfolios were created if the resulting common factor was orthogonal to the existing factors and the CFPs were representative of a significant number of return series.

Rolling-window estimation periods were used to create unbiased estimators of the CFP constituents. The results from factor extraction using the previous 60, 90, or 120 observations predict the CFP composition one month into the future. As an example, the July 1993 to June 1998 estimation period provides the CFP constituents for July 1998.^{13} For 180 windows, there are up to 180 estimates for the composition of each CFP. The initial estimate for July 1998 yielded three CFPs as strategic representations of the 261 hedge funds in the sample. As time progresses and more hedge funds enter the sample, additional portfolios are required to incorporate the increasing strategic diversity of the funds in the sample. For the last June 2010 estimate, the initial CFPs were split several times to yield the final seven CFPs (for *T* = 60). The number of CFPs varies slightly depending on the estimation window used: eight CFPs for *T* = 90 and six CFPs for *T* = 90. This means that not all nine distinct strategic classifications are represented in the result for each of the three estimation windows (see Exhibit 6).

All returns within a CFP were weighted to maximize correlation with the factor score, and the components were annually rebalanced to create style indexes (i.e., indexes comprising hedge fund return series that load on the same common factor). For 180 estimation periods, the resulting style index return series are comprised of up to 180 observations (July 1995 to June 2010). Because the initial CFPs split into additional CFPs over the course of 180 months, some indexes will have the same initial performance history. We decided to use some of the existing classification terminology to label the style indexes.^{14}
Exhibit 6 provides an overview of the distinct classifications for different estimation windows.

The equally weighted CFP returns were regressed against a number of asset-based factors and trading portfolios representative of particular investment strategies. Lagged factors were considered, and the error estimates were heteroskedasticity autocorrelation consistent (HAC). Due to the large number of factors considered, a stepwise forward regression algorithm was employed. The results for the CFPs as of June 2010 are displayed in Exhibit 7. An extensive list of the asset-based factors and an exhaustive list of the regressors used and their respective data sources is displayed in Exhibit 8. Exhibit 9 gives the regression function and statistical significance of the coefficient estimates.

From in Exhibit 7, it is evident that asset-based factor models explain a significant proportion of the common variation in hedge fund index returns. In large portfolios, the specific return component is expected to be diversified away, leaving the common factor to affect the performance of the style index.

The results from Exhibit 7 may suggest that the performance of some hedge fund portfolios is easily replicated, with some important limitations: First, describes the in-sample fit rather than the tracking ability of the model. Second, the model should be considered as a long-term equilibrium. Phase-locking behavior and style drift are likely to necessitate conditional or regime-switching models to match the performance over shorter intervals. Third, the proposed model is not a passive index or asset class model. Some coefficients are difficult to interpret, in particular where lagged coefficients have different signs from contemporaneous coefficients. In addition, it is assumed that short selling is allowed and that all rollover dates/maturities match the monthly reporting frequency of the style indexes. Despite these simplifying assumptions, the regression results allow for an objective assessment of the prevailing investment philosophy in each classification.

The *equity hedge–fundamental value* portfolio exhibits significant directional exposure to broad equity indexes and the MSCI Emerging Markets Index. The CBOE VIX coefficient (volatility proxy) suggests that these funds have a long bias in U.S. equity.^{15} The Fama-French SMB and WML portfolios proxy for higher moments and nonlinear risk exposure (compare with Chung, Johnson, and Schill [2006]).

Similarly, the *equity hedge–fundamental growth* portfolio correlates to the MSCI Emerging Markets index and Fama-French SMB factor portfolio. Exposure to the WML portfolios suggests that a number of equity hedge growth managers employ a momentum strategy of buying past winners and shorting past losers. The sign of the coefficients of the WML portfolio varies depending on the time lag, allowing for both momentum as well as contrarian strategies.

The *equity hedge–sector* CFP and its factor exposure indicates some evidence of sector-specific classifications that stand apart from broader equity hedge funds (finance, technology, and healthcare).

The performance of the *long–short equity* portfolio correlates strongly to the MSCI ex USA index. The inverse relationship between the USD index and portfolio performance suggests an investment strategy biased toward offshore equity investments.

Lastly, *equity hedge–sector focus* exhibits exposure to the same asset-based factors as the other equity hedge portfolios, although with higher explanatory power in regression. The SMB and WML portfolios improve upon the results compared to the goodness-of-fit from index regression alone.

*Macro system/trend* hedge funds are characterized by trend-following behavior, as evidenced by their linear exposure to the Fung-Hsieh factors. Systems traders employ technical trading rules or rely upon mathematical and algorithmic models to identify investment opportunities in markets exhibiting trending or momentum characteristics. Similar to systematic macro funds, CTAs invest mainly in listed options and futures on commodities or currencies, often using a long-term trend-following strategy (for details on trend followers refer to Collins [2003]). The exposure to fixed-income markets can be explained by the cash margins required to trade on the futures exchange, which are invested in riskless bonds, as well as spread trades between the short and long end of the yield curve. Overall, the return series of macro funds and CTAs proves more difficult to replicate.

In contrast to trend-following funds, *emerging markets* portfolio returns correlate with the movements of broad equity indexes (changes in the MSCI Emerging Market index alone explain 62% of the in-sample variation in prices of the *emerging markets* portfolio).

*Relative value* funds, generally regarded as market neutral, show some surprising correlations with broad asset indexes and other asset-based factors. As a subclassification of relative value funds, fixed-income corporate funds focus primarily on high-yield corporate bonds with low or no credit rating. This explains the strategy’s correlation with movements of the yield curve as well as with changes in the spread of fixed-rate conventional home mortgage commitments and 10-year U.S. treasuries. The primitive trend-following strategy for commodities (PTFSCOM) as well as the USD index are both significant in explaining changes in portfolio performance.

Like all market neutral strategies, *event-driven* funds are less susceptible to price movements in equities. However, the portfolio reveals some exposure to equity markets, which may be attributable to style drift and the recent higher-than-average performance of emerging markets funds. As with equity hedge funds, the SMB and WML portfolios prove to be good estimators of the underlying investment strategy, albeit less significant in explaining overall portfolio performance.

The descriptive statistics and asymptotic properties of the predicted indexes are displayed in Exhibit 10. The expected performance varies depending on the index rebalancing intervals and the performance impact from defunct hedge funds. Exhibit 10 gives the results from reported returns as µ and the median return as *Q*
_{50}. It is evident that the return series for the CFPs exhibit substantially reduced volatility at the cost of period return when compared to the average performance of single-manager funds. It should be noted that the performance is somewhat conditional on the rebalancing interval chosen. All index series except equity hedge—fundamental growth and macro—systems/trends exhibit statistically significant deviations from normal distribution. Testing for autocorrelation at cumulative lags does not yield significant test statistics; effects from non-normality are partially diversified away for all CFPs.

In summary, the statistical clusters of hedge funds are both meaningful and an improvement over the existing classifications. Principal axis as a dimensionality reduction technique greatly limits the number of statistical clusters representative of particular investment strategies in hedge funds. For the CFPs as of June 2010, based on results for the *T* = 60 estimation period, 513 hedge funds belong to one of only seven distinct CFPs. However, this still leaves a large proportion of the in-sample hedge funds unclassified. Recall that smaller clusters are omitted when creating the CFPs. Hedge funds belonging to those portfolios are attributed to larger clusters to the degree possible (i.e., as long as their inclusion does not cause the number of extracted common factors to increase). This leaves some funds unclassified that are not representative of a major strategic theme.

Unclassified hedge funds potentially share some homogeneity amongst themselves. On an annual basis it is confirmed whether or not the commonality of these funds warrants an additional CFP that is: a) representative of a significant number of to-date unclassified funds (at least 30) and b) representative of a strategic thrust that is orthogonal to other common factors. If both conditions are fulfilled, it marks the inception of a new CFP and associated index. Exposure of nonclassified funds to one or more CFPs is easily identifiable using correlation analysis. Note that hedge fund classification is an ongoing process and that the increasing number of hedge funds may warrant additional style classifications.

Despite the aforementioned limitations, principal axis is a significant improvement over self-classification. Firstly, it identifies hedge funds that behave differently from their self-acclaimed strategy either due to trading restrictions or due to style drift. Secondly, it does not depend on managers reporting fund strategy to database vendors. Lastly, the common factor shared across hedge funds in one classification is a representation of a unique trading strategy that shares no common trait with other trading strategies (this is due to the orthogonality of the extracted factors). This is of particular interest for practitioners seeking to complement an existing portfolio with hedge fund investments and to compare diversification benefits across hedge fund classifications.

No evidence is found to conclude that the distinct style classifications or their composition change in estimation periods after the subprime lending crisis. This suggests that hedge funds within a particular classification reacted similarly to tightening liquidity after the demise of the Bear Sterns funds in 2007. This is to be expected, since hedge funds of a particular classification will share some common traits with respect to financial gearing, leverage, and hedge overlay. It is highly probable that hedge funds belonging to one classification will belong to the same classification in the following period, irrespective of external shocks.

On a month-to-month basis, any individual fund that is a constituent of a CFP is reassessed based on its performance over the past 60, 90, or 120 months. Three outcomes of reassessment are possible: (1) the fund is retained as constituent of the CFP, (2) the fund is reassigned to a different CFP (usually attributable to style shift), or (3) the fund ceases to be associated with any CFP (either attributable to style drift or because the fund stopped reporting in that month). The likelihood of the three outcomes for a random fund varies according to the CFP and the underlying strategy it represents, the rolling windows used in estimating the CFPs, the number of hedge funds attributed to a CFP, and the specific month under observation. However, some general statements can be made that allow for an assessment of the persistence of style classification as well as the presence of style drift.

Once classified, hedge funds tend to load on the same common factor throughout the period under observation, even though the factor weighting may change over time. The month-to-month likelihood for a random hedge fund to load on a different common factor, and hence be assigned to a different factor portfolio, is *P*(*X*) = 0.02 for all classifications identified. Hedge funds classified under equity hedge–fundamental value or long–short equity are most likely to be classified differently on a month-to-month basis. The proportion of funds being eliminated from a CFP, either due to style drift or reporting stop, is similar across all classifications at around *P*(*X*) ˜ 0.05 every month.

As could be expected, results based on longer estimation periods (*T* = 120) are more persistent than those for shorter timeframes (*T* = 90, *T* = 60). This is attributable to instances of style drift being detected sooner and featuring more prominently in shorter time series used for factor analysis. Additional observations improve on the significance of the results from factor analysis but preclude hedge funds with shorter track records from being sampled.

Principal factor axis yields better results compared to PCA, in particular where the number of principal components is larger than the number of observations (*k* > *n*) and asymptotic properties of the estimators do not hold. Furthermore, the explanatory power of the truncated component model is limited: In the example of Fung and Hsieh [1997], the five extracted style factors account for only 43% of the return variance across 409 hedge funds, despite a relatively short observation window of 36 months. This stems from small yet significant eigenvalues associated with the extracted eigenvectors. Principal factors, on the other hand, acknowledge that part of a hedge fund’s return variation is attributable to a unique component and seek to extract the communalities as defined by the covariance instead.

Considering the marginal differences between the eigenvalues, the truncated component model is arbitrary and of little statistical significance. Using the broken-stick methodology in PCA, additional extracted factors are discarded if their inclusion will not significantly improve the explanatory power of the model. One shortcoming of this approach stems from discarded factors being jointly significant. Consequently, the dimensionality reduction comes at the cost of the lack of representativeness. Using parallel analysis to determine the number of nontrivial factors in principal axis prevents such selection bias. The methodology is also unbiased, in contrast to earlier research, due to the number of observations included.^{16}

**CONCLUDING REMARKS**

Principal axis and factor rotation were used to extract common factors from hedge funds reporting to the HFR and HFN databases. The stepwise procedure yielded statistical clusters of hedge funds that loaded on one common factor only. Equally balanced indexes were created from the CFPs and labelled according to the predominant strategies of hedge funds within the CFPs. The return series of the indexes were regressed against asset-based factors and factors representing primitive trend followers. Nine classifications were identified that subsumed a significant proportion of the sample hedge funds. For the remaining hedge funds, commonalities with other hedge funds were not found to be substantial enough to warrant a separate style representation.

This leads to two conclusions: First, the majority of hedge funds follow a broad strategic theme that is common to all hedge funds within a classification. The long-term return on hedge funds is a function of the contribution from the common factor and the specific factor representative of the unique trading style of the manager. Second, the remaining hedge funds operate in niche markets or employ a specialized investment approach that is not easily replicated. The results are persistent throughout time, for different estimation windows, and for different macroeconomic cycles. Considering the relatively small sample size that results from the minimum entrance requirements for hedge funds, it is unlikely that the nine classifications describe the entire spectrum of investment strategies. However, they are indicators of the predominant investment themes over the July 1995 to June 2010 period. The regression analysis provides some initial indications as to how those investment themes might be replicated. The out-of-sample results are a likely extension of the current research.

## ENDNOTES

↵

^{1}Autoregressive integrated moving average models can be thought of as an adaptation for integrated autoregressive processes (e.g., the characteristic equation of the process has a unit root). An*ARMA*(*p*,*q*) model in the differenced variable (*d*times) is equivalent to an*ARMA*(*p*,*d*,*q*) model on the original data (Brooks [2008]).↵

^{2}Note here that some market-neutral strategies such as fixed-income arbitrage or equity market neutral are not represented. Related research indicates that the inclusion of option-based risk factors does not significantly improve upon the results for market-neutral strategies (e.g., Fung and Hsieh [2001] on the risk in fixed-income-based hedge fund styles).↵

^{3}Considering that common classifications of hedge funds are used in the analysis, single-manager funds may be expected to report under the same classification across database providers. Thus, a certain homogeneity in index return series is expected.↵

^{4}Identifying hedge funds reporting to both databases revealed that some funds are classified as CTAs when reporting to HFN but as macro funds when reporting to the HFR database.↵

^{5}Note that HFN does not provide a graveyard database. It is acknowledged that inclusion of the HFN database biases the analysis toward survivors. In the opinion of the authors, the benefits from a larger sample size outweigh the estimation error due to survivorship bias.↵

^{6}This represents a continuous track record of 60 return observations plus an additional three observations to build a residual series of sufficient length for an*AR*(*p*) model of maximum order*p*= 3.↵

^{7}This is expected to be true in the long run despite style drift and phase-locking behavior. Although hedge fund returns may become synchronized due to market disruptions or may dynamically shift their exposure to benefit from short-term arbitrage opportunities, it is expected that they will revert to a broad strategic theme after a short while. Conversely, comparing covariances between hedge funds over extended periods of time may help identify those funds that have permanently departed from their initial main strategic focus.↵

^{8}This is on par with Fung and Hsieh [2002]: To avoid currency fluctuations, they included funds reporting in USD only. Because most hedge funds have a USD version, focusing on USD-denominated funds helps avoid errors from duplicated funds.↵

^{9}Special thanks to Prof. Dr. G. S. Moe from the*Gesellschaft fuer Internationale Zusammenarbeit*(GIZ) for valuable insights on the application of factor axis to financial time series.↵

^{10}Upon manual inspection, one fund is removed from factor analysis due to inconsistencies in performance reporting. The historic time series suggest that the funds began as quarterly reporting funds and changed reporting standards at a later stage.↵

^{11}In factor axis, the error term at*T*can be large and significant. However, it shares no traits with the specific factor component of other hedge funds in the same classification.↵

^{12}The remaining funds were attributed to retained clusters on the basis of the significance of the coefficient when regressing the return series against the factor score of the existing portfolio. Funds were entered only if their inclusion did not require an additional factor to explain the commonalities across the hedge funds within the portfolio.↵

^{13}To account for lockdowns and other trading restrictions, the results from the estimation periods were used as forecasts for the particular month following the last observed month of the estimation period.↵

^{14}Indexes are named in accordance with the prevailing reported hedge fund strategy within the CFPs, as well as the correlation of said hedge funds with strategy indexes from the HFN and HFR database and results from the regression in Exhibit 10.↵

^{15}The volatility proxy explains a significant proportion of the variation in the index return series; however, the factor was removed during the stepwise regression procedure to avoid multicollinearity with other factors.↵

^{16}Inclusion of additional observations increases confidence in establishing the long-term communalities between single-manager hedge funds across periods of financial distress as well as recovery.

- © 2016 Pageant Media Ltd