What the model has learned
Layer-0 weights extracted via model.layers[0].getWeights()[0].dataSync(),
averaged across 16 hidden units per tag. Updated after every training step.
Deck refills automatically from WikiArt using the highest-weight styles.
Model State
Architecture—
Training steps0
Last reward—
Prediction accuracy—
Avg liked score—
Avg passed score—
Repetition penalty—
Explore temperature τ—
Deck remaining—
Last refill source—
Saved state—
Layer 0 Weights — Tag Affinities
Sampling Distribution — Exploration Policy
τ = — · high τ = explore · low τ = exploit
Prediction Accuracy — Over Time
swipe number →
Predicted Score — Liked vs Disliked
swipe number →
Policy Evolution — Style Weight Trajectories
training step →
← pass
Space skip
→ like
Esc close detail