Visualizing the text of (children’s) book series

Established: August 24, 2016

A stream of dancing lights, for all the world like the shimmering curtains of the aurora, blazed across the screen. They took up patterns that were held for a moment only to break apart and form again, in different shapes, or different colours; they looped and swayed, they sprayed apart, they burst into showers of radiance that suddenly swerved this way or that like a flock of birds changing direction in the sky. And as Lyra watched, she felt the same sense, as of trembling on the brink of understanding, that she remembered from the time when she was beginning to read the alethiometer.

— Philip Pullman, The Subtle Knife (opens in new tab) (Scholastic, 1997; USA, Knopf, 1997)

Introduction

(opens in new tab)Digital technologies have repeatedly redefined the paper world of books. Digital printing has overhauled the publishing processes, and the internet has revolutionised the way audiences and authors connect to share their enthusiasm and criticism. Now the digitization of books themselves, either for searching, browsing, and reading on a computer screen through services like Google Books, or for reading on dedicated devices like Amazon’s Kindle or the Sony Reader are threatening the established order.

But for this project we side-step these issues and concentrate instead on how the analytical power and display capabilities of computers may be used to enhance our understanding of book texts. We use the term “book texts” rather than the word “books” as we are not trying to build computer systems that might understand books, but rather we use the computer’s ability to treat books as an abstract sequence of words as the starting point for new analytical tools.

Who would use such tools? Anyone with an interest in books, be they authors, readers, publishers, agents, critics, academics, etc may find such tools useful, but we have designed our visualizations with fans and academic readers in mind. These readers form theories about the books that stand alongside the author’s own understanding and we hope that the abstract visualizations provided may help such an endeavour.

Background and Related Work

(opens in new tab)The statistical analysis of texts is an important area of work and is used widely in information retrieval (e.g. web search). It is also a mature area of research in its own right, and has been used in the past for things from author attribution to the ordering of works through time. For example in a letter published in 1882 Augustus De Morgan speculated about using statistical techniques to explore authorship questions around St Paul’s Epistles and the Epistle to the Hebrews [Lea76], while more recently Jockers, Witten, and Criddle (opens in new tab)used sophisticated statistical techniques to reassess the authorship of the Book of Mormon.

In contrast, the abstract visualization of book texts is not a large or a mature field of study, but there are notable and inspirational examples. The following sections list some of these (more on separate tab)

Visualizations

Fragment from whole text visualization Our work focuses on the abstract visualization of children’s book series, and in particular the trilogy “His Dark Materials” by Philip Pullman. Pullman’s trilogy is made up of the three novels “The Northern Lights” (called “The Golden Compass” in the USA and in the movie adaptation), “The Subtle Knife”, and “The Amber Spyglass”. We choose this genre partly through personal passion and partly because of the range of potential enthusiastic readers. The best children’s book series (especially before they are completed) are read and discussed by child and adult readers and many of these readers develop their own theories which they share with their friends and with other readers online. Similarly academic interest is piqued and there are conferences and journals dedicated to the study of children’s literature (more on separate tab)

Future Work

(opens in new tab)There are several directions we’d like to take this work in now.

User studies

We need to take these visualizations out of the research lab and engage both the fans and the academics who are theorising about Pullman’s works. We should engage them with these tools and establish if the tools are useful, how they might be improved, and what other visualizations may be of value to the community.

Infer.Net

Throughout this work we took the view that computers were not adept at understanding books, but should just essentially count words and draw the results for people to interpret. However advances in machine learning, and especially toolkits enabling machine learning techniques to be applied quickly to new domains have led us to seek to apply Infer.Net (opens in new tab)to the analysis phase of the visualization.

Other visualizations

Inevitably building visualizations leads to 1,001 other ideas as to how the data may be visualized. We would like to add the ability to pivot (e.g. for one flowers bud to open another flower side-by-side). We would like to add animations so that the dynamic movement between visualizations or as a visualization is formed is part of the semantics of the visualization itself.

Online version

The visualizations we made are not available for public use – either online or through downloading. This is partly because we have not spent time looking at the rights implications and partly because we have not engineered the code to the quality level required for public use. It would be great to get this to a level where people can try the visualizations we built for themselves without us present.

Other Books

It would be interesting to apply this work to other children’s book series, to see if the characteristic patterns revealed in the visualizations were different from author to author. We might also move from a reader’s perspective to a learner’s perspective and choose books which often appear on high-school syllabuses. But most intriguing would be to build visualizations that contrast the content and style of different author’s work.

References

[Bec07] Linda Becker, 2007 “In Translation” http://lindabecker.net/in-translation/ (opens in new tab)
[Dan05] Anh Dang, 2005 “Gospel Spectrum” http://thirteensquares.com/gospelspectrum/ (opens in new tab)
[Har08] Chris Harrison, 2008 “Visualizing the Bible” http://www.chrisharrison.net/projects/bibleviz/ (opens in new tab)
[JWC08] Jockers, Witten, and Criddle 2008 “Reassessing authorship of the Book of Mormon using delta and nearest shrunken centroid classification” in “Literary and Linguistic Computing” http://llc.oxfordjournals.org/cgi/content/abstract/23/4/465 (opens in new tab)
[Lar14] Clarence Larking 1914 – 1918 “Dispensational Charts” http://www.preservedwords.com/charts.htm (opens in new tab)
[Lea76] Peter Lea, “The Style is the Man”, unpublished lecture notes and slides, University of York, 1976
[Pal02] W Bradford Paley, 2002 “TextArc” http://textarc.org/ (opens in new tab)
[Pos06] Stephanie Posavec, 2006 “Writing Without Words” http://www.itsbeenreal.co.uk/index.php?/wwwords//about-this-project/ (opens in new tab)
[PRYAKSCL06] Plaisant, Rose, Yu, Auvil, Kirschenbaum, Nell Smith, Clement, and Lord 2006 “Exploring erotics in Emily Dickinson’s correspondence with text mining and visual interfaces” http://portal.acm.org/citation.cfm?id=1141753.1141781 (opens in new tab)
[Sha08] Ebany Spencer, 2008 “Romancing Dimensions” http://www.ebanyshae.com/page11.htm (opens in new tab)
[SK07] Philipp Steinweber and Andreas Koller, 2007 “Similar Diversity” http://similardiversity.net/project/ (opens in new tab)
[Wal08] Tim Walter, 2008 “textour” http://www.timwalter.de/portfolio/textour/ (opens in new tab)
[Wat02] Martin Wattenburg, 2002 “Arc Diagrams: Visualizing Structure in Strings” http://portal.acm.org/citation.cfm?id=857733 (opens in new tab) http://www.research.ibm.com/visual/papers/arc-diagrams.pdf (opens in new tab)
[WV08] Martin Wattenberg and Fernanda B. Viégas,2008 “The Word Tree: An Interactive Visual Concordance” http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4658133 (opens in new tab) http://www.research.ibm.com/visual/papers/wordtree_final2.pdf (opens in new tab)

Acknowledgements

(opens in new tab)This work was done as a collaboration between Linda Becker (opens in new tab) and Tim Regan (opens in new tab)during Linda’s internship at Microsoft Research’s Cambridge Lab (opens in new tab)in the Summer of 2008. The work would not have been possible without the generous, thought provoking, and supportive help of Pullman’s publishers especially Marion Lloyd (opens in new tab)and Claire Tagg at Scholastic (opens in new tab), Pullman’s agent Caradoc King (opens in new tab), and Philip Pullman (opens in new tab) himself.

People

Ken Woodberry

Principal Software Engineering Lead, Azure Sphere

Learn more