Daily Linear Classifier

I wanted to get a baseline started for a machine learning model that would predict if a stock would go up or down the next period. The period in this case is daily, so if the stock will go up the next day. Building a baseline model is the first step in all my projects. There is magic in the 80 / 20 approach to everything. Here I start with a 2 layer linear neural network that tries to classify just one row of data as buy or sell.

Code Repo

Data

A single row of data has the candle for that day, and some technical indicators. Here is a raw example of a row:

Column Name	Value
id	200
event_datetime	2016-10-17 04:00:00
event_unix_timestamp	1476676800000
open_price	17.7999992370605
close_price	17.7700004577637
high_price	18.2000007629395
low_price	17.7049999237061
volume	4385696.0
volume_weighted_price	17.8098182678223
stock_symbol	JBLU
timeframe	1D
bar_trend	bearish
buy_or_sell	1
next_frame_price	17.7800006866455
next_frame_trend	bearish
next_frame_unix_timestamp	1476763200000
next_frame_event_datetime	2016-10-18 04:00:00
hundred_day_sma	17.1679515838623
hundred_day_ema	17.1679515838623
fifty_day_sma	16.9481010437012
fifty_day_ema	16.9481010437012
twenty_day_sma	17.5162487030029
twenty_day_ema	17.5162487030029
nine_day_ema	17.8033351898193
nine_day_sma	17.8033351898193
hundred_day_high	18.9400005340576
hundred_day_low	14.7600002288818
fifty_day_high	18.4699993133545
fifty_day_low	15.6999998092651
ten_day_high	18.4699993133545
ten_day_low	17.1499996185303
fourteen_day_rsi	57.9687461853027
top_bollinger_band	18.182430267334
middle_bollinger_band	17.5162487030029
bottom_bollinger_band	16.8500671386719

Data processing

The framework I use is called Burn. It provides a number of utilities that help create nueral nets and train them. They have a sqlite database utility that I use for this model. Their opinions on table format are linked here.

Along with formatting the data for the framework, I need to split it into training and validation. There are 1,000,000 rows in the dataset. I split it into 80% training and 20% validation. Also, since this is time series data, I want the validation to be after the training data. Here is an example of one of those tables after the split:

Column Name	Value
row_id	1
open_price	17.7999992370605
close_price	17.7700004577637
high_price	18.2000007629395
low_price	17.7049999237061
volume	4385696.0
volume_weighted_price	17.8098182678223
bar_trend	1
buy_or_sell	1
hundred_day_sma	17.1679515838623
hundred_day_ema	17.1679515838623
fifty_day_sma	16.9481010437012
fifty_day_ema	16.9481010437012
twenty_day_sma	17.5162487030029
twenty_day_ema	17.5162487030029
nine_day_ema	17.8033351898193
nine_day_sma	17.8033351898193
hundred_day_high	18.9400005340576
hundred_day_low	14.7600002288818
fifty_day_high	18.4699993133545
fifty_day_low	15.6999998092651
ten_day_high	18.4699993133545
ten_day_low	17.1499996185303
fourteen_day_rsi	57.9687461853027
top_bollinger_band	18.182430267334
middle_bollinger_band	17.5162487030029
bottom_bollinger_band	16.8500671386719

Training

Linear classifiers are usually the starting point for classification tasks on tabular data. I will walk through my experiment configs on each run and see how they perform. Hopefully adjusting hpyerparameters will improve performance with each run, but that's why it's called experimenting. First though, I will start with the most simple one that I can think of.

Run 1

Hyperparameters	Value
epochs	10
learning_rate	1e-4
weight_decay	5e-5
batch_size	64
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	1
hidden_layer_size	64
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Epoch	Loss	Accuracy
0	51.0	50.0
1	52.0	51.0
2	52.0	51.0

Early stop... no improvement.

Run 2

Hyperparameters	Value
epochs	10
learning_rate	1e-3
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	1
hidden_layer_size	128
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Model { input_layer: Linear {d_input: 25, d_output: 128, bias: true, params: 3328} output_layer: Linear {d_input: 128, d_output: 2, bias: true, params: 258} activation: Relu params: 3586 } Total Epochs: 5

Split	Metric	Min.	Epoch	Max.	Epoch
Train	CPU Usage	52.270	3	56.542	1
Train	CPU Memory	19.332	5	19.482	2
Train	Loss	0.692	5	0.692	1
Train	Accuracy	51.839	1	52.129	5
Valid	CPU Usage	50.215	4	55.582	1
Valid	CPU Memory	18.967	5	19.404	2
Valid	Loss	0.693	3	0.695	2
Valid	Accuracy	51.061	1	51.369	5

Run 3

Hyperparameters	Value
epochs	10
learning_rate	1e-5
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	1
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Model { input_layer: Linear {d_input: 25, d_output: 256, bias: true, params: 6656} output_layer: Linear {d_input: 256, d_output: 2, bias: true, params: 514} activation: Relu params: 7170 } Total Epochs: 3

Split	Metric	Min.	Epoch	Max.	Epoch
Train	Loss	0.692	3	0.692	1
Train	CPU Memory	19.209	3	19.498	1
Train	Accuracy	51.713	1	51.851	3
Train	CPU Usage	53.681	1	54.714	3
Valid	Loss	0.693	1	0.693	2
Valid	CPU Memory	19.228	2	19.481	1
Valid	Accuracy	51.387	3	51.414	1
Valid	CPU Usage	52.290	1	53.637	3

Run 4

Notes

tried taking the log of the volume column and removing the min max norm. Spoiler alert, fail.

Config

Hyperparameters	Value
epochs	10
learning_rate	1e-5
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	1
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Split	Metric	Min.	Epoch	Max.	Epoch
Train	CPU Usage	54.374	3	55.855	1
Train	Loss	NaN	1	NaN	3
Train	CPU Memory	19.632	1	19.952	3
Train	Accuracy	48.180	2	48.184	1
Valid	CPU Usage	50.668	3	53.622	1
Valid	Loss	NaN	1	NaN	3
Valid	CPU Memory	19.915	2	19.958	3
Valid	Accuracy	48.584	1	48.584	3

Run 5

Notes

tried taking the log of the volume column and removing the min max norm. Spoiler alert, fail.

Config

Hyperparameters	Value
epochs	10
learning_rate	1e-5
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	1
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Split	Metric	Min.	Epoch	Max.	Epoch
Train	CPU Usage	54.374	3	55.855	1
Train	Loss	NaN	1	NaN	3
Train	CPU Memory	19.632	1	19.952	3
Train	Accuracy	48.180	2	48.184	1
Valid	CPU Usage	50.668	3	53.622	1
Valid	Loss	NaN	1	NaN	3
Valid	CPU Memory	19.915	2	19.958	3
Valid	Accuracy	48.584	1	48.584	3

Run 6

Notes

Going to add a dropout layer.

Config

Hyperparameters	Value
epochs	10
learning_rate	1e-5
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	1
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true
dropout	0.5

Results

Model { input_layer: Linear {d_input: 25, d_output: 256, bias: true, params: 6656} output_layer: Linear {d_input: 256, d_output: 2, bias: true, params: 514} dropout: Dropout {prob: 0.5} activation: Relu params: 7170 } Total Epochs: 3

Split	Metric	Min.	Epoch	Max.	Epoch
Train	Accuracy	50.903	1	51.391	3
Train	Loss	0.693	3	0.694	1
Train	CPU Memory	19.722	1	19.949	2
Train	CPU Usage	55.119	1	56.131	2
Valid	Accuracy	51.397	3	51.417	1
Valid	Loss	0.693	1	0.693	2
Valid	CPU Memory	19.340	3	20.021	2
Valid	CPU Usage	52.641	2	54.060	3

Run 7

Notes

Added 2 more hidden layers.

Config

Hyperparameters	Value
epochs	10
learning_rate	1e-5
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	3
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true
dropout	0.5

Results

Split	Metric	Min.	Epoch	Max.	Epoch
Train	Accuracy	50.975	1	51.648	3
Train	CPU Usage	51.280	2	51.635	1
Train	Loss	0.693	3	0.693	1
Train	CPU Memory	19.638	3	19.773	2
Valid	Accuracy	51.416	1	51.416	3
Valid	CPU Usage	48.600	3	49.028	1
Valid	Loss	0.693	1	0.693	3
Valid	CPU Memory	19.627	2	19.733	1

Run 8

Notes

Taking out the bias.

Config

Hyperparameters	Value
epochs	10
learning_rate	1e-5
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	3
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	false

Results Model { input_layer: Linear {d_input: 25, d_output: 256, bias: true, params: 6656} ln1: Linear {d_input: 256, d_output: 256, bias: true, params: 65792} ln2: Linear {d_input: 256, d_output: 256, bias: true, params: 65792} output_layer: Linear {d_input: 256, d_output: 2, bias: true, params: 514} dropout: Dropout {prob: 0.5} activation: Relu params: 138754 } Total Epochs: 3

Split	Metric	Min.	Epoch	Max.	Epoch
Train	CPU Usage	55.270	1	61.351	2
Train	CPU Memory	19.492	2	19.718	1
Train	Loss	0.693	3	0.693	1
Train	Accuracy	50.980	1	51.621	3
Valid	CPU Usage	50.155	1	56.421	3
Valid	CPU Memory	19.342	2	19.663	3
Valid	Loss	0.693	1	0.693	3
Valid	Accuracy	51.416	1	51.416	3

Run 9

Notes

Seems I am stuck at a loss of 0.693. My learning rate or initialization probably off. Going to up my learning rate.

Config

Hyperparameters	Value
epochs	10
learning_rate	5e-1
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	3
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Split	Metric	Min.	Epoch	Max.	Epoch
Train	Loss	0.719	2	NaN	6
Train	CPU Usage	54.249	6	66.030	4
Train	Accuracy	50.465	4	50.626	1
Train	CPU Memory	19.801	2	20.048	4
Valid	Loss	0.693	4	NaN	6
Valid	CPU Usage	51.482	5	57.649	2
Valid	Accuracy	48.584	1	51.416	5
Valid	CPU Memory	19.713	1	20.040	6

Run 10

Notes

The learning rate increase was fine, still only getting my losst to around 0.7. Let's add more layers.

Config

Hyperparameters	Value
epochs	10
learning_rate	5e-1
weight_decay	5e-5
batch_size	256
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	7
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Split	Metric	Min.	Epoch	Max.	Epoch
Train	Accuracy	51.278	2	51.310	3
Train	CPU Usage	60.536	1	61.883	2
Train	CPU Memory	19.877	3	20.124	2
Train	Loss	0.694	2	0.698	1
Valid	Accuracy	48.584	1	51.416	3
Valid	CPU Usage	54.317	2	57.741	3
Valid	CPU Memory	19.761	3	20.103	1
Valid	Loss	0.694	1	0.756	2

Run 11

Notes

Not budging. Going to take out the shuffle and add 2 more workers, and slow wieght decay.

Config

Hyperparameters	Value
epochs	10
learning_rate	5e-2
weight_decay	2e-5
batch_size	256
num_workers	6
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	Adam
input_size	25
hidden_layers	7
hidden_layer_size	256
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	false
bias	true

Results

Model { input_layer: Linear {d_input: 25, d_output: 256, bias: true, params: 6656} ln1: Linear {d_input: 256, d_output: 256, bias: true, params: 65792} ln2: Linear {d_input: 256, d_output: 256, bias: true, params: 65792} ln3: Linear {d_input: 256, d_output: 256, bias: true, params: 65792} ln4: Linear {d_input: 256, d_output: 256, bias: true, params: 65792} ln5: Linear {d_input: 256, d_output: 256, bias: true, params: 65792} ln6: Linear {d_input: 256, d_output: 256, bias: true, params: 65792} output_layer: Linear {d_input: 256, d_output: 2, bias: true, params: 514} dropout: Dropout {prob: 0.5} activation: Relu params: 401922 } Total Epochs: 5

Split	Metric	Min.	Epoch	Max.	Epoch
Train	Accuracy	51.259	5	51.362	1
Train	CPU Memory	19.808	3	20.172	4
Train	CPU Usage	74.068	1	77.307	4
Train	Loss	0.693	5	0.721	1
Valid	Accuracy	51.416	1	51.416	5
Valid	CPU Memory	19.788	2	20.195	4
Valid	CPU Usage	72.220	4	78.686	3
Valid	Loss	0.693	3	0.693	1

Run 12

Notes

Last run. Going to add gradient clipping, and reduce the number of layers. Trying to combat maybe a vanishing gradient problem.

Config

Hyperparameters	Value
epochs	10
learning_rate	1e-2
weight_decay	5e-5
batch_size	512
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	SGD
input_size	25
hidden_layers	2
hidden_layer_size	512
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Model { input_layer: Linear {d_input: 25, d_output: 512, bias: true, params: 13312} ln1: Linear {d_input: 512, d_output: 512, bias: true, params: 262656} output_layer: Linear {d_input: 512, d_output: 2, bias: true, params: 1026} dropout: Dropout {prob: 0.5} activation: Relu params: 276994 } Total Epochs: 5

Split	Metric	Min.	Epoch	Max.	Epoch
Train	CPU Usage	49.157	1	51.136	4
Train	Accuracy	51.389	5	51.486	2
Train	CPU Memory	20.042	4	20.456	3
Train	Loss	0.693	3	0.705	1
Valid	CPU Usage	48.439	2	51.192	3
Valid	Accuracy	51.416	1	51.416	5
Valid	CPU Memory	19.971	4	20.622	3
Valid	Loss	0.696	3	0.701	2

Run 13

Notes

Added two more features, previous bar trend and macd signal. This was more of the same result wise.

Config

Hyperparameters	Value
epochs	10
learning_rate	1e-2
weight_decay	5e-5
batch_size	512
num_workers	4
seed	42
device	wgpu
loss	CrossEntropyLoss
optimizer	SGD
input_size	27
hidden_layers	2
hidden_layer_size	512
output_size	2
hidden_layer_activation	Relu
output_activation	with logits
shuffle_batch	true
bias	true

Results

Model { input_layer: Linear {d_input: 27, d_output: 512, bias: true, params: 14336} ln1: Linear {d_input: 512, d_output: 512, bias: true, params: 262656} output_layer: Linear {d_input: 512, d_output: 2, bias: true, params: 1026} dropout: Dropout {prob: 0.5} activation: Relu params: 278018 } Total Epochs: 8

Split	Metric	Min.	Epoch	Max.	Epoch
Train	CPU Memory	20.956	7	21.549	2
Train	CPU Usage	53.669	8	58.540	3
Train	Loss	0.693	2	0.709	1
Train	Accuracy	51.283	6	51.380	4
Valid	CPU Memory	20.979	7	21.453	2
Valid	CPU Usage	52.584	8	59.560	6
Valid	Loss	0.692	6	0.698	3
Valid	Accuracy	48.155	1	51.845	8

Conclusion

Still not moving the needle. This was expected, as predicting stocks is hard. I may need to rework my data, but a simple model like this was expected to not be very accurate. Random guessing is all that it can do, and my assumption is there is not much predictive power in the data. So my next step it do do some feature engineering and try to improve the dataset.