Austensible: Machine learning and Victorian literature
Artificial neural networks, in all their forms, might share some underlying premises, but the design and implementation of these networks varies across applications. As a way to learn more about recurrent neural networks (RNNs), Joelle Zweekhoorst and I applied a neural network trained on the complete works of Jane Austen to the problem of original text generation.
Recurrent neural networks are networks of sequential nodes, wherein the output produced for a given input depends on the previous outputs and inputs in the network’s memory. As Andrej Karpathy writes, they have advantages over convolutional neural networks:
“A glaring limitation of [CNNs] is that they accept a fixed-sized vector as input and produce a fixed-size vector as output.”
Because RNNs use a sequential model for input, the author notes, RNNs are able to produce open-ended amounts of output.
To test a RNN, we attempted to recreate a long short-term memory model (LSTM) based on Megan Risdal’s online tutorial. This approach uses a character-level model for generating text. While the author’s model used internet forum comments as its source material, the one attempted for this project was trained on input from Shakespeare.
In one of the first attempts at creating output, the model produced the following text after one epoch:
What's done isy phehinss tarid s you se pont oy y fs im, ts on yay pyel yoy yongy s yinonss
While this output shows a rudimentary “understanding” that some sequences of letters are flanked by spaces, it is still incomprehensible. After adjusting some parameters, a run of ten epochs produced:
BRURIOLANUS:
Aeng:
Aemumnm you,
Welelllyssy
y lom uty; letaeed m.
BRUTUS:
Theury ul youl yourr m muus us
Fdlly unean
OAcam niRICINIUS:
Theitese your mhtyelurll smy
OAun gomsluy our surm Miumoumy mep
While still unintelligible, we begin to see distinct features of a play, including character names and the different type faces used to denote the speaker and the speaker’s lines. Additionally, the model produces names that resemble those from the play, or in the case of “Brutus,” taken directly from it.
Austensible
Ostensible: stated or appearing to be true, but not necessarily so.
Following our work with a RNN for Shakespeare generation, we decided to apply a similar model to a much larger body of text. Using files available from Project Gutenberg, we sought to create a text generator in the style of Jane Austen.
We opted for a RNN here because text generation can be described as a sequence prediction problem, a problem RNNs are well-suited for solving. Sequence prediction is about using input sequences to predict new values in a sequence, and we wanted to use Jane Austen’s existing works to predict what might be produced from a particular input word or phrase. We began by importing the text of seven of Austen’s novels and one compilation of short stories.
After some initial steps of decoding and uniformly encoding the text of the file, we stripped the file of extraneous copyright and publisher information, leaving behind only Austen’s words and chapter headings. Again, we used a character-level system of generation.
For training, we used a sparse categorical cross-entropy loss function to keep score of how our model improved. This loss function seemed appropriate for a model that might select from a broad range of potential outputs (in this case, the available characters), many of which will have low probabilities for selection.
Output and challenges
A significant issue with a text generation problem like this, is that it becomes relatively difficult to determine the success of the algorithm based on its output. For example, our first output was produced after three training epochs of 692 steps each. The training reported a loss of 1.1735. The text was, at least in our opinion, already discernibly influenced by Austen.
After 20 epochs of training, the model showed a loss of approximately 0.91, and the text is clearly more developed. Punctuation is consistently used and more varied, with commas, question marks and exclamation points finding their way into the text. With continued tests, the model’s loss seemed best after approximately 45 epochs, reporting a loss of 0.8650. Beyond this count, the text produced seemed to lose coherency again, or at least not gain any.
The following was the result of 50 epochs:
Yesterday, as fond of England and herself, in the sweetness of the house, and she admired her the exagement of Mr. Rushworth. Her composure to me, that she _missally to compare herself, she has a comfortableness of yesterdageriate cornexposing their family were quiet and more respectable about what her hair in room with delight for Frank Churchill, only was by visit in. At last he was to give his companion, but he thought her there is totally striking orders soon afterwards.
While text generators that incorporate the styles of a given author are novel, there are clear applications for neural networks in the field of natural language processing (NLP). Corpora of user-input text or speech could be used alongside neural networks to produce machine translations or speech synthesis.
Additionally, these sequential models could be used to help map the ways in which languages evolve over time. There are many examples in which these strategies are already being implemented. While the previous exercises are rudimentary examples, they serve to illustrate the efficacy of RNNs and their ability to accept and produce dynamic output.