toy transformer

train loss: 0.0000

val accuracy: 0.0000

Dataset

Learn to sort a sequence of characters

2

Number of items to sort

26

Maximum value in the sequence

Model Architecture

1
6
16 per head

Optimizer

1.00e-5
10-5
10-4
10-3
10-2
10-1
1

Training Options

2
1.00e-5
10-5
10-4
10-3
10-2
10-1
1