Description

Charles University in Prague Faculty of Social Sciences Institute of Economic Studies Diploma thesis Jaromír Malenko Stock Market Trend Prediction Using Genetic Algorithms 2008 Supervisor: PhDr. Filip

Information

Category:
## Computers & Electronics

Publish on:

Views: 16 | Pages: 32

Extension: PDF | Download: 0

Share

Transcript

Charles University in Prague Faculty of Social Sciences Institute of Economic Studies Diploma thesis Jaromír Malenko Stock Market Trend Prediction Using Genetic Algorithms 2008 Supervisor: PhDr. Filip Žikeš, MSc. 1 Declaration I hereby declare that I have written this diploma thesis on my own, have used only the cited sources and literature. Prohlášení Prohlašuji, že jsem diplomovou práci vypracoval samostatně a použil pouze uvedené prameny a literaturu. Jaromír Malenko 2 Abstract In this thesis the genetic algorithm is considered as a method of computer learning and it is applied to stock market returns predicting. In the theoretical part we define the prediction task and summarize the results on stock market predictability. We present the statistical methods and measures to evaluate the accuracy of predictor and related computer learning approaches. We introduce the genetic algorithm as a method able to evolve an arbitrary algorithm. In the implementation part a tree representation of any algorithm is employed, the related genetic operators and user application are implemented. In the application part we first evolve predictor as an arbitrary program the black-box predictor. The result does not differ significantly from liner model, but the economic measures improved. Next, we optimize the parameters of MACD and RSI technical indicators. Optimized indicators may generate profit on the contrary to unoptimized indicators. We find that genetic algorithm is a general technique that is not straightforward to be applied for prediction, but it is efficient in optimizing parameters of a specific model. Abstrakt (in Czech) V této práci uvažujeme genetický algoritmus jako způsob počítačového učení a aplikujeme jej na předpovídání výnosu na akciových trzích. V teoretické části definujeme předpovídání a shrneme výsledky o předpověditelnosti akciových trhů. Ukážeme statistické metody a míry pro vyhodnocování přesnosti předpovědi a související přístupy počítačového učení. Zavedeme genetický algoritmus jako metodu pro odvození libovolného algoritmu. V implementační části používáme stromovou reprezentaci algoritmu, naprogramujeme související genetické operátory a uživatelskou aplikaci. V aplikační části vyvineme předpovídač jako program fungující jako černá skříňka. Výsledek se významně neliší od předpovědi lineárního modelu, ale ekonomické ukazatele jsou lepší. Dále optimalizujeme parametry technických indikátorů MACD a RSI. Optimalizované indikátory mohou generovat zisk na rozdíl od neoptimalizovaných indikátorů. Shledali jsme, že genetický algoritmus je obecná technika, která se nedá přímočaře aplikovat, ale je účinná při optimalizaci parametrů konkrétního modelu. 3 Contents 1 Introduction Motivation of this Thesis Structure of Thesis Markets and Predictability Stock Market Predictability The Efficient Market Hypothesis Price Models...10 Martingale model...10 Law of Iterated Expectations...10 Random Walk Model Trading strategies Technical analysis Fundamental analysis Derived data Specification of Computer Learning...13 Time Series Approach...14 Trading Rule Approach Evaluation of Predictor Overall Measures of an Predictor Error Function Comparing Two Predictors Computer learning Prediction as Black Box Approaches to Computer Learning...21 Exhaustive Search...21 Heuristics...21 Expert Systems Search Space...22 Traveling Salesperson Problem Preprocessing the data Genetic Algorithms Biological background Evolution Terminology 3.2 Representation of a Chromosome Binary encoding The SIMPLE programming language Genetic Algorithm Generate Fitness End Condition Selection Crossover Mutation Extensions...38 Hybrid crossover and mutation operators...38 Elitism Further References Applications Input Data Index Prediction Indicator Prediction Moving Average Convergence/Divergence Indicator Relative Strength Index Indicator Conclusion of Predictions Experiments Lesson Learned During Implementation...46 SIMPLE Programming Language...46 Genetic Algorithm Future Work Transaction costs Integration with Trading Programs Learning Time Language Constructs Missing Data Challenge of Predictors Conclusion References 1 Introduction 1.1 Motivation of this Thesis As a student of financial markets and a enthusiastic fan of computers the topic of my diploma thesis was clear beforehand. I apply and an advanced computer approach to solve all the econometric challenges. The genetic algorithms have been showed to be successful in wide areas. I generalized the idea of evolving binary codes to evolving algorithms. Firstly, I defined a computer language that can encode such algorithms. I developed the SIMPLE programming language and implemented a framework to run the programs (the interpreter being the most difficult part of it). Secondly, I implemented the genetic algorithm that performs its operators on the SIMPLE programs. A novel tree representation of SIMPLE source codes is used to implement the genetic operators of crossover and mutation. Only the genetic algorithm and the tree representation is described in this thesis. The implementation of both SIMPLE language and genetic algorithm was similar in time and complexity. The total length of source code is 685 kilobytes in JAVA language. During all phases of development I tested the program on stock data. I observed the huge impacts of little changes in program. There is no evidence how many error function I used, almost always arriving at a different optimal solution of the same (maximization) problem. The change of parameters of genetic algorithm and also a different implementation of a feature leads to substantial change in performance and optimality of solution. In the application part I present only a representative few of the large amount of results I obtained. I headed toward the noble task in which the predictor is simply an algorithm that works on input series by its own and outputs the (correct) predicted value. I must sadly admit, that I wasn't able to evolve such an algorithm. This is no wonder, but my expectation of genetic algorithm were high... So the genetic algorithm is not able to evolve an universal and general model by itself. The evolved solution doesn't significantly differ from prediction linear model. My next 6 experiments focused on evolving just parts of a given model. I haven't evolved whole algorithms (source codes) but only the value of parameters of technical indicators. In this thesis I present only the standard MACD and RSI indicators. I played also with oter indicators: the WoodieCCI indicator has a huge and active community supporting (and, they claim, also trading) it. To my surprise this indicator is not better that MACD or RSI indicators! My conclusion is that genetic algorithm is a useful method, that is theoretically able to find solution to many problem. The problem is that the search space is large and the available computer power is not sufficient to find the solution. The special model for predicting, like neural networks, are still worth discovering. The genetic algorithm may be effectively used to optimize its parameters. 1.2 Structure of Thesis This thesis is organized as follows. In chapter 2 the predictability of markets is discussed from the theoretical point of view as the efficient market hypothesis. Also the achieved practical predictability results of various models are presented. The prediction task is defined for time series prediction and stock market trend prediction. Then, the statistical test, overall measures and error functions of predictors are discussed. The approaches and problems related to computer learning are summarized. In chapter 3 the introduction to genetic algorithms is presented. Genetic algorithms use genetic operators to evolve optimal solution. The genetic operators usually work with binary codes. We explain genetic algorithms on tree representation of algorithms. The algorithms are written in the SIMPLE computer language which is defined. Together with this thesis an implementation of the described procedures was implemented in JAVA language. Our experiences and the results are reported in chapter 4. The suggestions for further research present a list of issues we came across. In chapter 5 we summarize the lessons learned and the obtained results. 7 2 Markets and Predictability The crucial assumption of predictability is that patterns of price development observed in the past will repeat in the future. The goal of prediction task is to identify such patterns. In the era of automation we let the computers do the hard work. This problem is very demanding since it stimulates various approaches, there is (probably) no optimal solution and, finally, a solution that is just a bit better than others may lead to extensive profits. Only the profit generated by prediction matters. The predictor does not need to give a prediction at each time point of a time series. The lack of prediction does not mean the loss of money, while the wrong prediction does. Although we present a list of statistical test and measures to evaluate predictors, the achieved profit is the only true rating. As new prediction algorithms and techniques prove successful, the investors start including them in their investing decisions. The predicting power will decline until it becomes zero and the prediction algorithm becomes useless. That's the reason we must seek for novel approaches. Although, the fixed costs of developing a new prediction method and an econometric software are high. The variable costs transaction costs are negligible. Even small investors may enjoy the benefits of applying novel predictors. 2.1 Stock Market Predictability The stock prices are usually compared to a random walk process. In the the random walk, it is not possible (from definition) to predict the future price. Earlier studies inclined to the random walk hypothesis: The stock market prices behave as random walk process. Hawawini (1995) found that the serial correlations in stock price time series were insignificant. Fama (undated) tested size filters (a simple indicator) and concludes that the simple buy-and-hold method consistently beats the profits produced by different size filters. Recently many researchers published the evidence that the investigated market is predictable to some extent. See Lo and MacKinlay (1988) or Žikeš (2007) for valuable results or any other publication that presents a new prediction model (typically neural networks) or technical trading strategy to soaring slogans. Nygren (2004) successfully 8 predict the Swedish stock index with a neural network; the results of predicting stock are less convincing, nevertheless the network outperforms the naive strategy. The following section discusses efficiency of markets. In low efficient markets, the more informed and knowledgeable investors can outperform the other ones. The paradox of efficient markets is that if every investor believed the market was efficient, then investors would not expect any profit, thus they would not be trading and no market would exist at all. To sum up, the the markets exist only because the participants believe the market is inefficient. 2.2 The Efficient Market Hypothesis A market is said to be efficient if the prices fully reflect all the relevant information. According to Campbell (1997), depending on the information set that determine future prices, we distinguish the thee forms of market efficiency. 1. Weak-form efficiency: The information set includes only the past prices. In other words, technical analysis is of no use. 2. Semistrong-form efficiency: The information set includes all publicly available information known to all market participants. In other words, fundamental analysis is of no use. 3. Strong-form efficiency: The information set includes all privately available information known to any market participants. In other words, even insider information is of no use. The weak and semistrong-form of efficient market hypothesis is confirmed on stock market by White (1988). On the other hand Hellstrom (1998) states that based on recent progress in computer science, the neural networks or genetic algorithms can predict effectively. This is not necessary a counterexample of the efficient market hypothesis, since the effective methods are instantly adopted by investors. There was an evidence of a drop in profit based on technical analysis the last but one decade of the 20 th century observed by LeBaron (1991). The strong-form efficiency tests are usually restricted to a subset of market participants. According to Trippi (1996), the performance of mutual fund managers is not superior to the performance of other investor types. 9 2.2.1 Price Models We present several theoretical models that focus on the pricing of an asset. The common conclusion of all these models is that the expectation of price in next period equals the price in current period. Martingale model Martingale hypothesis by Girolamo Cardano (1565) states that the agents must have equal probability of winning or loosing to enter the game E [P t 1 I t ]=P t where P t is stock price at time t and I t is information set available at time t. The next price is sum of current price and change r t (called return) in this period E [P t 1 I t ]=P t E [r t 1 I t ]. Putting these two equations together we obtain E [r t 1 I t ]=E [P t 1 I t ] P t =P t P t =0. In the martingale model the price changes are uncorrelated. From the point of view of current asset pricing models, this model does not allow trade-off between risk and return. The martingale hypothesis is considered to be a necessary condition for market efficiency. Law of Iterated Expectations The market is efficient with respect to some information if the prices remain unaffected by revealing this information to all market participants. We demonstrate this by Law of Iterated Expectations by Samuelson. Let's consider two information sets I and J. The information set J is superior to information set I in the sense that it contains some extra information, hence we assume I J. We compare the expectation of a random variable X conditional on these information sets E[ X I ] and E[ X J ]. The Law of Iterated Expectations states that if one has a limited information I, the best forecast one can make of a random variable X is the forecast of the forecast one would make if one would have a superior information J. In mathematical notation E [ X I ]=E [E [ X J ] I ]. Samuelson applied the law of iterated expectations to the market analysis. Consider variables available at time t. The security price P t is the expectation of its fundamental value V conditional on the information set I t. P t =E [V I t ]=E t [V ]. Similar reasoning holds for the next period P t 1 =E [V I t 1 ]=E t 1 [V ]. 10 The expectation of price change in the next period is E t [P t 1 P t ]=E t [E t 1 [V ] E t [V ]]=E t [ E t 1 [V ]] E t [E t [V ]]. From I t I t 1 we have E t [E t 1 [V ]]=E t [V ]. Together with the fact that E t [E t [V ]]=E t [V ] we conclude E t [P t 1 P t ]=E t [V ] E t [V ]=0 with the natural reading that the price changes are unforeseeable given information in the set I t. Random Walk Model The random walk model p t 1 = p t t where p t is logarithm of price at time t, expected price change and t a random variable independently and identically distributed t ~ 0, 2. Campbel shows that the assumptions put on t guarantee that the random walk is sufficient but not necessary condition for weak-form efficient market. 2.3 Trading strategies The following two approaches are used for asset valuation. The distinction is based only on the type of known past data used to produce a prediction of future data Technical analysis Predictions are based on the past stock data only. The data for each asset contain the following values for each period (day, minute, etc.): Open, close, high and low values Volume A basic analysis of stock market excess return data shows both linear and non-linear dependence present. This observation was used to argue that it must therefore be possible to predict future values. However, Manton (undated) shows that the linear and non-linear dependence can be explained by simply allowing the mean and variance of Gaussian noise to be modulated by a (typically 3 state) hidden Markov model. Attempting to fit a Markov modulated AR process proved fruitless; the conclusion is that there is no ARpredictability present in excess return data. 11 Technical indicators such as moving average convergence divergence (MACD), relative strength index (RSI), trend-lines, support and resistance, volatility, momentum etc. are used in technical analysis. As we show in the empirical section, the significance of technical indicators is negligible with the publicly recommended parameters. We also discuss optimizing the parameters Fundamental analysis Predictions are based on the past stock data (as in technical analysis) and any other variables reflecting the company and market situation. Fundamental analysis aims at the true value of company's stock. Economic variables Inflation Interest rates Foreign exchange rates Imports, exports Industry variables Stock market indexes global indexes, sector indexes Stock price of direct competitors Prices of commodities Company variables based on the financial statements of company Number of shares Dividends Price/earnings ratio, book value, assets, liabilities EBIT, total income, total costs Prognoses of future profit, sales, etc. issued by management or analysts 12 2.3.3 Derived data From the technical and fundamental data artificial variables can be computed and also provided for analysis. This is done because the derived values may reveal new information (adjusting data for dividends). The asset prices X t have high variation and grow exponentially in time. Thus its difficult to create a valid model for prices. Instead, the return is usually used. The logreturn is used in theoretical papers. The relative increase in price is mainly used in technical analysis. Both definitions are equally well since where we used the theorem that log 1 x x for x close to 1 and the prices in adjoining periods to be similar X t / X t 1 1. The return as defined has many convenient properties. First, the values are small and around zero. Second, the return can be used to compare prices of one asset in distant periods. Finally different assets my be compared using returns. Volatility is used to estimate the investment risk. The increase in volume often means a new information is published. Relative stock performance allows the spread trading. There is a great deal of derived data and new ideas may prove useful. The genetic algorithms are constructed in such a way that derived data are computed automatically during the learning process. Thus the computer may find new relations in input data that a human may not exploit. R t =log X t X t 1 R t = X t X t 1 X t 1 log a b =log 1 a b 1 a b 1=a b b 2.4 Specification of Computer Learning We now define what prediction means in general models and the stock market application. N Given a set of N training examples { X i,y i } i=1 where X i are inputs and Y i expected prediction. The target function f X i =Y i is to be approximated by predictor g. On each training input X i the predictor computes the predicted values Z i =g X i. 13 An arbitrary error function e is forms the error vector e i =e Y i, Z i. The goal of computer learning is to find a predictor that minimizes the norm of the error vector E = e 1,,e N. Time Series Approach T In the standard time series approach a time series of prices {P t } t=1 these observation the training examples X i = P i, P i 1,, P i b 1 Y i =P i b 1 h N =T b h 1 was observed. From where b 1 is the number of past values used to compute prediction and h 1 is prediction horizon. The error function

Related Search

Similar documents

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks