Trading via Image Classification Cohen, et al. arXiv 2019

I’ve been reading many papers focusing on implementing machine learning methods into trading strategies lately. Most of these are naturally regression algorithms trying to calculate a continuous variable like price or volatility. I’ve come across a particularly interesting paper completed by the AI department at JP Morgan, who have been looking into how to utilize image classification of stock charts as an alternative to an algorithm that focuses on processing time-series data.

Conventional algorithms crunch time-series data with the goal of detecting patterns and trends to be traded. The paper focuses on utilizing the representation of time-series data as images and using a classifier to label the samples in a specific way. The inspiration for this work is to model the way human traders see the data:

In this perspective, financial time-series analysis can be thought of as a visual process; when experienced traders look at time-series data, they process and act upon the image instead of mentally executing algebraic operations on the sequence of numbers.

Traders use their intuition, knowledge, and experience to derive trading opportunities from stock charts and other sources. This sets up the main focus of the study more clearly:

In this study, we create and analyze an extensive financial time-series data set and using a supervised classification approach, we evaluate predictions using over 15 different classifiers when the input data is presented graphically as images. We make use of three label-generating rules following three algebraically-defined binary trade strategies and show that the classifiers are very efficient in recovering the complicated, sometimes multiscale, labels.

The focus of the study is not to find which of the models is the best predictor, but instead is an attempt to train a model to find trading signals in time-series data that are typical of technical analysis and normally defined mathematically.

Breaking it down

The paper focuses on the daily values of all S&P 500 stocks from 2010 to 2017. All data is in a discretized form of the continuous values, meaning that they only account for the Open, High, Low, and Close (OHLC) values. All OHLC values are represented in the usual candlestick chart, with Close values strictly represented in the line chart. The paper utilizes a simple diagram to outline these discretized OHLC values if you are unfamiliar:

Three commonly known technical buy signals are compared using the charts that focus on prior prices:

Bollinger Band (BB) Crossing: Models volatility by using price bands as bottom and lower bounds. These two bounds are 20-days ±2 standard deviations (σ). A buying opportunity is triggered when the prices reaches the lower bound.
Moving Average Convergence Divergence (MACD): A trend following momentum indicator that utilizes a long term EMA of 26 day, a medium term EMA of 12 day, and a signal line of 9 day EMA. A buy signal occurs when the signal line crosses the 12 day EMA from below.
Relative Strength Index (RSI): A ratio of the 14 day EMA of the incremental increase to the incremental decrease in price, scaled to values between 0 and 100. A buy signal occurs when the RSI line crosses above 30, indicating the market is oversold.

Images labelled according to the above indicators of the S&P 500 are then used to examine the supervised classification predictions. Each indicator contains 5000 samples per class. Two buy triggers and two no-buy triggers are chosen at random and corresponding images are created, resulting in 10,000 high-resolution images per trigger.

The paper is careful to account for the different time spans that each indicator uses (20 days for BB, 26 days for MACD, etc..). Each image is cropped to the number of effective training days per indicator, resulting in 80-108 features depending on the time span for each indicator.

High Resolution Images

High-resolution images are created as a result of the above method. The paper then considers the question of what is the optimal resolution? As resolution increases, so does the potential for noise to be introduced into the classifier. The task is to find an optimal resolution that minimizes the amount of noise while preserving sharpness. The authors tackle this problem by downscaling the images using the Lanczos filter and evaluating the results using several classifiers:

We examine this point by varying the resolution of the input images in logarithmic scale and compare the accuracy score of a hard voting classifier over the following 16 trained classifiers: Logistic Regression, Gaussian Naive-Bayes, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Gaussian Process, K-Nearest Neighbors, Linear SVM, RBF SVM, Deep Neural Net, Decision Trees, Random Forest, Extra Randomized Forest, Ada Boost, Bagging, Gradient Boosting, and Convolutional Neural Net.

The authors evaluate the models using 5-fold cross validation for the resolutions of 3×3, 7×7, 10×10, 30×30, and 70×70. The results show that accuracy and precision scores increase with increased resolution, but no real gain is made with resolutions larger than 30×30.

Notion of Time Represented in Images

A key issue arises when using images to convey time-series data: How to indicate the notion of time as the classifiers are expected to identify time-dependent labels. The authors use a novel approach where the images are labelled based on the most recent data point, in addition to implementing features into the image in the form of widening bars, etc..

Five different visual charts are used to examine how to best solve this problem. The paper experiments with regular OHLC and line charts, then adds three combination charts that include multiple factors into a single chart (box c, d, and e) to give a sense of time-dependency to each chart (i.e. widening bars as volume to give direction of time).

The same 16 classifiers as above are used to examine the accuracy and precision of the five chart types. It is obvious that the best performer is the line plots that exclusively use the Close values. The charts with varying bar widths and including previous close values are indicated to be not so useful, with the exception of the MACD indicator. As more information is added to the charts, it is possible that it simply added more uncertainty and noise resulting in lower scores.

Predictability Results

The authors end the paper by testing the classification task as a forecasting tool. Performance is examined in the specific window sizes of each indicator discussed earlier. The paper notes that performance decreases marginally as the window size gets larger due to the incorporation of unnecessary information.

We take the daily trading data from 2018 (remember that the previous training and evaluation was done over data from the period between 2010 and the end of 2017) and create 20-days images for every day in the data. Then we feed these images to the tuned voting classifier, as a test set, and for each image, predict what the label is.

The red triangles indicate the manually identified buy opportunities, while the blue triangles indicate the predicted buy opportunities from the 20-day images. Over five opportunities were classified correctly and the authors note that the others were very close to indicating a correct upward cross of the lower threshold band.

Additional Significant Results

Visualization by itself is not straightforward, especially for high-dimensional data, and it might take some time for the analyst to find a proper graphical design that will encapsulate the full complexity of the data. In this study, we essentially suggest considering that display as the input over the raw data. Our research indicates that even very complex multi-scale algebraic operations can be discovered by transferring the task to an image-recognition problem.

The paper tackled the large problem of addressing time-dependency in static images in two ways. First using specific labelling of images, and second by implementing features into the images themselves (e.g. in the form of widening of bars with volume). It is found that the labelling approach was found to be more helpful than the additional embedded image features.

We also talked about the paper’s examination of the importance of optimal image resolution and found it to be helpful in eliminating noise while preserving features.

We find that even at very low resolutions (see Fig. 5), time-series classification can be resolved effectively by transforming the task into an image recognition problem. This finding is in accordance with [5] who concurrently showed that classification using special visual designs or smoothed downscaling relates far-apart data and reveal global information that aid the classifiers in identifying the driving pattern and achieve better performance comparing to the raw tabular form.