Exploring Graph Neural Networks for Stock Market Predictions with Rolling Window Analysis Matsunaga, et al. ArXiv 2019

A theme that seems to proliferate in machine trading is to emulate the human trader while enhancing predictive capabilities. Since the world has become more interconnected, common technical and fundamental analysis techniques are often not capable of handling all the factors necessary to accurately model the financial world today. This paper takes an approach of integrating the vast network of relations between companies, suppliers, customers, and industries into a predictive deep learning model.

This paper investigates the effectiveness of work at the intersection of market predictions and graph neural networks, which hold the potential to mimic the ways in which investors make decisions by incorporating company knowledge graphs directly into the predictive model.

Incorporating interconnected relationships into models has been a challenge in the past. The authors of this paper build upon prior research by utilizing a Graph Neural Network algorithm. These algorithms are made to operate directly on a graph structure, outlining the dependencies between nodes to identify relationships.

For clarity, see the diagram I added below of an example of the graph structures in question, consisting of vertices (nodes) and edges (the connecting lines). The adjacency matrix simply summarizes the connections between each node with 1 or 0.

So, why are Graph Neural Networks (GNNs) useful for modeling stock performance?

Similar to how a professional investor makes decisions, graph neural networks can utilize the network structure to incorporate the interconnectivity of the market and make better stock price predictions, rather than relying solely on the historical stock prices of each individual company or on hand-crafted features.

This type of model takes into account the relationship between partners and companies in a supply chain, for example. Lens Technology Co. is a supplier of screen parts to Apple for the iPhone. If the iPhone is expected to be selling well after the release, Lens Tech’s stock price is likely to rise as well.

In order to accommodate these relational patterns in historic stock data, the authors utilize unique neural network modelling methods.

The Question

Are knowledge graph datasets e.g. company relations data, useful in making stock predictions across different markets and longer time horizons?

To answer this, the authors leverage the modified Graph Convolutional Network (GCN) algorithm proposed by Feng, et al. in their paper Temporal Relational Ranking for Stock Prediction. Their specific algorithm is a modified GCN where they have incorporated time-sensitivity into the relations among the nodes, something that a regular GCN inherently does not account for. The formula is relatively complex but is explained quite thoroughly in the two linked papers.

This algorithm is combined with a sequential LSTM model following four steps:

Historical stock price data is fed into an LSTM layer as features to output node embeddings – essentially finding similar nodes in the graph that are closer together. Noted features can be previous days close price, moving averages, or a combination of the two, for example. This captures the trend of each stock price.
An adjacency matrix is created consisting of company, sector, supplier, customer, partner, and shareholder relation.
The node embeddings created by the LSTM layer in step 1 and the adjacency matrix are combined and fed into a Graph Neural Network layer to update the node embeddings.
These updated embeddings are fed into a fully connected layer to make price predictions.

In addition, the paper also utilizes a rolling window to account for more realistic time horizons. This represents the idea that recent stock price data has more predictive power than older data.

In the financial sector, professionals are often concerned with models which reliably can predict the market over long time horizons. Therefore, we have implemented the rolling window analysis method for splitting the full dataset into the training and test set. We use 2000 timesteps for training and 200 for testing as a fixed window, and slide that window over the full number of timesteps. For example, index 0 to 2000 and 2000 to 2200 will be used in the first iteration for training and testing, respectively. The second iteration will have indices 200 to 2200 and 2200 to 2400, and so on

The Market Data

The authors test the model on the Japanese market with companies listed on the Nikkei 225 index. 176 companies in total are utilized, filtering out those that don’t have enough data to conduct long-term backtesting.

The Nikkei Value Search database is used to generate a comprehensive knowledge graph between the companies with 12,473 total nodes and 38,252 edges, with an average node degree of 6.13.

The nodes have a “node_type” attribute with values “company”, “sector”, “industry” or “country”. While edges have an “edge_type” attribute with values “in-country”, “in-industry”, “in-sector”, “parent-company-of”, “partner-with”, “related-to”, “same-company”, or “shareholder”.

This all boils down to the below summary. First-order relations are relationships where connected stocks could be directly related. The authors note this is an area where investors are already actively analyzing correlations. Second-order relations show companies who have an indirect connection between any other company (e.g. both have an ‘in-industry’ edge connected to the same ‘industry’ node).

Evaluating the Results

Our preliminary results in Table 4 shows the following results.
(a) Both the LSTM model and the various graph neural network models outperform the market benchmark.
(b) The LSTM model is outperformed by graph neural networks for “customer-of” and “all” in terms of both the Sharpe and return ratio, and is outperformed by “supplier-of” in terms of the Sharpe ratio.
(c) “all” relations have the best performance based on both the return ratio and Sharpe ratio.
(d) “customer-of” relations had a relatively high return and Sharpe ratio despite its sparsity, which suggests that customer relations are highly effective in improving predictive performance.
The results for (d) make intuitive sense since supply chain analysis is an important industry practice among investors. However, it is important to note that while customer relations seemed to be effective in the aggregate, its effectiveness varies by time periods

The authors provide a further breakdown of the results for the ‘customer-of’ edges a little further, providing ideas for further analysis in the future:

With the current model, the direct customer relation proved to be an effective indicator only for 1-day predictions, where the effectiveness wanes as the timesteps increase, as shown in Table 5. However, it is possible that the accuracy for predictions of longer time horizons will increase as we increase the number of hops. For example, the price of a customer two hops away can be a strong indicator for predicting the 5 or 10-day future returns of the target company, if the correlations propagate. Of course, the prediction of longer time horizons becomes more and more difficult as the timesteps increase due to the multivariate and non-linear nature of the stock market leading to higher error propagation

Room for Improvement

The authors finish with suggestions to improve the predictive power of this type of model, mainly suggesting to expand the graph dataset.

…it can also be worthwhile to incorporate other information such as macroeconomic data or news articles rather than solely relying on supply chain relationships, which can be regarded as an important predictor variable in a family of factors which affect the market, rather than the sole indicator. Finally, it is also worthwhile to explore the method of feeding the entire knowledge graph, rather than extracting relations only among the relevant companies listed in the Nikkei 225 market. For instance, if a company X listed in the Nikkei 225 had a relation with an American Company Y, we were only able to connect other listed companies if they also had the same relation with Company Y. This will be possible once we can collect stock price data for Company Y and other non-listed companies.

Overall, the paper shows an example of how graph neural networks hold promise in creating practical prediction models that incorporate new datasets to account for the relationships among different firms and markets.