The Creation of the Comic Strip as an Audiovisual Stage in the New York Journal 1896-1900

By Eike Exner

This article examines the origins of the form of graphic narrative that Thierry Smolderen in The Origins of Comics refers to as "an audiovisual stage on paper" and which today has largely become synonymous with comics.[1] This audiovisual stage form of graphic narrative integrates sound and motion into the image, rendering external narration obsolete and thus distinguishing itself from earlier forms such as the picture story form employed by Rodolphe Töpffer and Wilhelm Busch. Contrary to the popular claim that 'audiovisual' comics were created by R. F. Outcault with a single cartoon, the emergence of visual representations of what I define below as "transdiegetic" content, including sound and motion, constituted a more gradual process, one intricately tied to larger social and technological changes. The article first examines when comics began to feature dialog represented visually as "sound images" between characters talking and responding to each other as a primary means of developing a narrative. It then outlines the history of early transdiegetic content preceding these first modern, audiovisual comics more broadly, focusing on how such comics came about.

Balloons versus Labels; Speech versus Clues and Commentary

The point at which sound and speech truly became part of graphic narrative has been much debated in comics historiography; in particular in connection with the question of the speech balloon. David Kunzle and others who assert a centuries-old continuity of comics have described all instances of text enclosed by a line culminating in a "tail" or "appendix" pointing at a particular character as "speech balloons."[2] As various comics historians have noticed, broadsheets and single-panel cartoons indeed featured such balloon shapes long before the Yellow Kid, although pre-1890s multi-panel narratives are strangely (from the perspective of these scholars) devoid of them. Some of these historians concede that most of these earlier 'balloons' seem awkward to the modern reader, however.[3] Smolderen explains the reason for this by distinguishing between "labels" and "balloons" and pointing out that the ahistorical subsuming of all balloons as speech balloons obscures their different functions:

For the 20th-century reader, speech balloons are simple graphic devices through which pictorial characters speak, and it is tempting to attribute any obscurity in more ancient examples to the clumsiness or naivety of artists from another era ... What I hope to show, in this essay, is that modern speech balloons—and the way we interpret them as part of an audiovisual scene played on paper—could not have existed before the technological changes of the 1890s.[4]

The 'balloons' that we see in works created prior to the 1890s, according to Smolderen, are more appropriately described as "labels," since they do not seek to represent the spoken word, even if it is tempting to read them so from our contemporary perspective. These labels instead function as authorial commentary or as "self-representational devices" that help us decipher the graphic puzzle by providing information about the dramatis personae.[5] Smolderen's argument contains more nuance and detail than can be replicated here, but its central point—that 'balloons' before the comic strip form do not represent a "sound image" that is synchronous with the depicted action (Smolderen's "audiovisual stage")—quickly becomes obvious when surveying pre-comic-strip cartoons. For example, an English print first published in 1677 called The Prodigall Sifted shows a husband and wife holding a sieve carrying their son (Fig. 1, this impression c.1740).[6] Each of the three characters has a phylactery (a 'balloon'-like shape approximating the appearance of a paper scroll) with a "tail" connected to him or her. The father's reads, "No more indulgence to our Graceless Son [line break] Let's Sift him, wife, to know what he has don," the mother's, "Then Sift on Husband; for it must be knowne [line break] How he hath Spent our mony, not his owne," and the son's, "Pardon Kinde parents and I'le tell the truth [line break] What I have done in my debauched youth."

Rather than being intended as an accurate representation of dialog spoken by the three characters as part of a graphic representation of them in the act of shaking or being shaken, the three 'balloons' here provide the clue to understanding each character's relationship to the others, and their motivation, emotional state, or role played in the static, timeless composition. Even the objects shown 'falling,' i.e. underneath the sieve, appear suspended in time. There is no posited interior story-world in which the viewer/reader is supposed to imagine the three persons depicted actually, audibly speaking to each other, as there would be in a contemporary comic (To understand the difference it may be helpful to imagine recreating the print in Fig. 1 as an animated film and compare the difficulty of this to doing the same with the comic strip in Fig. 5). The 'balloons' visible in the print cannot be accurately called speech balloons in the modern sense as a visual approximation of sound; to refer to them as such is ahistorical. The use of speech balloons to represent sound on the page synchronically with the action depicted may appear entirely obvious to us today, but this is because we are familiar with recorded sound that is reproducible at will. Prior to the philosophical and technological developments that enabled the recording of sound, the representation of a concrete sound image, as opposed to the timeless, representative commentary seen in works such as the print in Fig. 1, was simply inconceivable, which explains the absence of multi-panel narratives employing speech balloons until the 1890s.[7]

The Yellow Kid and the Speech Balloon

A number of comics historians hence agree that the speech balloon appropriately titled so appeared no earlier than the 1890s, and ascribe responsibility for its creation and establishment to the aforementioned R. F. Outcault, creator of the Yellow Kid. The "Yellow Kid origin theory" of comics essentially holds that contemporary comics were born when Outcault combined his recurring character with multiple, sequential panels (if not separated by panel borders or gutters) and speech balloons in an October 25, 1896 cartoon titled "The Yellow Kid and His New Phonograph," in which the Yellow Kid responds to—or more accurately: comments on—speech balloons coming out of a phonograph advertising the New York Journal.

The Kid's commentary is printed, as is usually the case, on his yellow nightshirt. In the fifth and last panel, a parrot (itself a recurring character in the Yellow Kid cartoons) is revealed to have been inside the phonograph and the true originator of the speech balloons. In this panel the Yellow Kid's commentary is suddenly expressed inside a balloon of its own, instead of on his shirt. According to adherents of the "Yellow Kid origin theory," Outcault with this cartoon discovered a new storytelling form and instituted the beginning of modern comics history. As Bill Blackbeard puts it, "The pattern or template for the modern comic strip established in this five-panel strip soon dominated comic art in the United States and spread around the world."[8]

The main problem with this claim is the lack of a direct link between Outcault's experiment and later comic strips. Outcault himself created only 16 other sequential Yellow Kid cartoons afterwards. First a series of nine six-panel (or in one instance seven-panel) cartoons following the October 25 five-panel phonograph one, beginning one month later and continuing until February 14, 1897.[9] After a break of three months, seven more multi-panel cartoons spaced out over eight months appeared between May 23, 1897 and January 23, 1898 (the last Yellow Kid cartoon signed by Outcault apart from another single-panel one on May 1, 1898 in which the Kid makes a cameo as an old man).[10] This makes a total of 17 cartoons starring the Yellow Kid that can be considered sequential narratives. Outcault appears to have lost interest in sequential narrative after his first ten multi-panel narrative cartoons that appeared between October 25, 1896 and February 14, 1897.

Of course Outcault could have inaugurated the lineage of audiovisual comics even with just this limited number of episodes. However, during the time that the Yellow Kid was appearing, no other artist copied said form. Uses of sound and/or 'balloons' that might be more accurately described as labels were extremely rare at the time in the New York Journal, with only four uses by other artists in sequential cartoons that I have been able to find for the entire year of 1897.[11] The general absence of sound and 'balloons' in the New York Journal outside of the Yellow Kid cartoons during the same period in which they were featured is striking.

Smolderen explains the lack of the Yellow Kid's obvious influence upon other works by claiming that "it took several years for Outcault's peers to adapt and generalize the principle," cautioning that "The Yellow Kid and His New Phonograph was never meant to light the fuse of a new genre; it was just a very clever standalone cartoon," and that the public had not yet become sufficiently accustomed to the phonograph.[12] If we are to accept all of the above as true, however, it would mean that Outcault may have been ahead of his time, but not that his cartoons ushered in the age of the audiovisual comic strip. Even Bill Blackbeard, most famous proponent of the Yellow Kid's status as having "started the comics," writes that "Outcault apparently never realized the significance of his invention. Nor did anyone else see anything out of the ordinary in the [sic] 'The Yellow Kid and His New Phonograph'" (emphasis added).[13]

Nonetheless, Blackbeard in the same paragraph also claims that, "[b]y the time he had established his second success, Buster Brown, his invention had been widely adopted, but there is no doubt that Outcault invented the form."[14] But if neither Outcault himself nor anyone else paid attention to the Yellow Kid cartoons as a new form of graphic narrative, it is a conundrum how this new form became so widely adopted. Smolderen's solution to this problem, as mentioned above, is to claim that "it took several years for Outcault's peers to adapt and generalize the principle." Smolderen does not, however, provide a reason for why it would have taken Outcault's peers so long to adapt. If Outcault's peers, as Blackbeard claims, did not pay attention to his invention and instead generalized the principle (of the speech balloon) on their own, this would render the Yellow Kid essentially irrelevant to the creation of the audiovisual comic strip form.

Part of the solution to the conundrum may be that the Yellow Kid cartoons were not as revolutionary as Smolderen and Blackbeard assume. I am not convinced that the Yellow Kid cartoons can truly be considered examples of graphic narrative as an audiovisual stage. For one, out of Outcault's 17 multi-panel narratives, some (including four out of the last five) feature no 'balloons' at all, while the others use balloons exclusively for inanimate objects (the phonograph alarm clock) and animals, apart from two exceptions: The October 25 and December 13, 1896 episodes are the only occasions on which the Yellow Kid himself uses a speech balloon (a single one each). There are other human characters in the single-panel cartoons who use balloons, such as the recurring falling kid always portrayed in mid-air. But in no instance do characters—human or otherwise—actually converse (i.e. respond to each other's utterances) using speech balloons. The same is true for the Kid's nightshirt, the writing on which could theoretically be a representation of sound as well, as there is no reason to limit representation of speech to the balloon, despite its fetishization in many comics studies texts. The lack of communication between characters indicates that Outcault's balloons still retain a strong self-representational element, primarily providing information directed at the reader, and thus are at least in part still label, even if they can at the same time be interpreted as sound images.

This is obvious when examining the two cartoons that can most convincingly be claimed to contain representations of sound: the October 25, 1896 cartoon about the phonograph and the February 14, 1897 one about the phonograph alarm clock. Both cartoons show characters responding to sounds. The Kid is clearly reacting to the speech balloons coming out of the phonograph in the former and is awoken by the phonograph clock in the latter. However, in the former the Kid's balloon reads, "De phonograph is a great invention—NIT! I don't think—Wait till I git dat foolish bird home I won't do a thing te him well say!" And the parrot's (final one), "I am sick of that stuffy little box." There is no direct, mutual interaction between the Kid and the parrot. By using the third person for the parrot, the Kid is evidently addressing the reader rather than the parrot. Similarly to the single-panel cartoons, the Kid's "utterances" (including the text on his nightshirt) for the most part are commentary directed at the reader, such as when the shirt reads, "Listen te de woids of wisdom wot de phonograff will give yer," in the first panel, and, when the phonograph/parrot mentions the Kid's girlfriend Liz in the fourth, "De phonograph knows her see."

Likewise, in the cartoon about the clock, the Yellow Kid as well as the parrot, the goat, the cat, and the dog (and even a painting of Liz) are woken up by the sound of the clock, but when the Yellow Kid's shirt introduces the alarm clock in the first two panels none of the animals responds to it. The parrot comments in a balloon in the first panel that, "I'm suspicious of dat fool ting [line break] it don't seem on de level," and in the third, "I cant sleep wit dat blame ting in de room," but none of the other characters appear to hear the parrot's words, either. While a limited number of Outcault's Yellow Kid cartoons hence do feature visual representations of sound, as well as balloons (even if the two are not necessarily the same thing), due to the lack of characters interacting with each other through speech these cartoons do not fully create "an audiovisual stage." Considering the near-exclusive use of balloons for objects and non-human animals, in conjunction with the fact that both of his most "audiovisual" cartoons are about the phonograph itself (including their titles), it appears that Outcault used "sound image" balloons in his sequential cartoons primarily as an extension of his phonograph/parrot joke, ridiculing the idea of speech existing apart from a human speaker.[15]

The Kids (and Hooligan) Who Started the Comics

Much of the resistance to the "Yellow Kid origin theory" has focused on its exclusion of works preceding it from the category of comics. Thierry Smolderen avoids this debate by defining the Yellow Kid not as the beginning of comics, but rather as the beginning of comics as an audiovisual stage.[16] But as we have seen above, even those few Yellow Kid cartoons that can be claimed to feature an "audio" component still focus more on addressing the reader than on showing characters addressing each other. Perhaps the strongest evidence against the Yellow Kid as having started the (audiovisual) comic strip form is the general absence of audiovisual strips until 1900, when Rudolph Dirks's Katzenjammer Kids and Frederick Burr Opper's Happy Hooligan began to regularly show their characters talking to each other. A more valid criticism of the "Yellow Kid origin theory" would therefore be that it presumes the contemporary (i.e. audiovisual) form of comics to have begun too early, rather than too late.

Some histories of comics imply that the form started by Outcault was immediately continued by Rudolph Dirks, whose Katzenjammer Kids made their first appearance on December 12, 1897, a month before the Yellow Kid (as signed by Outcault) last appeared on January 23, 1898 (if not counting the May 1 cameo).[17] Often this link is supported with an image of Dirks's March 27, 1898 Yellow Kid parody featuring the Katzenjammer Kids (and a dog with a balloon), implying that the Katzenjammer Kids simply took over where the Yellow Kid left off.[18] The actual Katzenjammer Kids franchise, however, did not feature its first speech balloon until a year later, on March 19, 1899, and did so only in a rare single-panel cartoon. The separate seven-panel Katzenjammer Kids strip above it stayed silent. It wasn't until July 2, 1899 that Dirks first used a (single) speech balloon in a "Katzies" strip, though this strip also featured other dialog written underneath a panel.[19] The first Katzenjammer Kids to use speech balloons and not use external dialog was published the following month, a single panel cartoon on August 6 and a multi-panel strip on August 20, 1899. On August 27, the first strip appeared in which characters actually talk and respond to each other using balloons, when the Kids tell Mamma Katzenjammer a joke ("Ven iss a door not a door? Ven it's a jar!") and try to explain it to her, albeit in vain (the strip ends with the Kids leaving in frustration while Mamma contemplates a container the Kids pointed to in their attempt, wondering, "Vat should it be, a jug der answer?"). This episode of the Katzenjammer Kids is the earliest work I have found that can claim to make use of a truly audiovisual stage and hence be an example of the modern comic strip form beyond all doubt.

Nevertheless, Dirks kept using other forms for the Katzenjammer Kids even after his first strips employing the "audiovisual form." Sakamoto Ichirō in "Manga no bunpō to sono yomi no moderu" (English title: "Grammer [sic] and reading model of comics") divides graphic narrative into the following forms besides the one I refer to as the comic strip/audiovisual form and Sakamoto as the "speech balloon form:" the serifu, or "dialog," form, which uses no narration, but writes all dialog outside of the images without visually connecting speech and speaker; the sashie, or "illustration," form, in which narrative text supplies similar information to that shown by the images; the komento, or "comment," form, if the narration comments on the images rather than duplicating their content; and the sairento, or "silent," form, if the cartoon consists of images without narration, dialog, or sound. Dirks frequently went back to the silent form, with the latest example I have found dating from February 17, 1901.[20] Occasionally Dirks also used the illustration, comment or dialog forms, the latest examples known to me dating from January 21, 1900 for an episode that is silent with a single narrative line providing comment (The Kids after running away return home in the end because they "find there are no whippings like mamma's, after all") and one from August 19, 1900 for an episode written mostly in the dialog form with one line of illustration-form narration ("But just then mamma saw a milkmaid, who said").[21]

A few historians, like Richard Marschall, credit Fred Opper's Happy Hooligan with the establishment of the audiovisual form. It is true that Happy Hooligan is the first strip that never used the dialog or comment forms, but it appeared first on March 11, 1900, months after Dirks had already begun employing the audiovisual form at least occasionally. The New York Journal's Sunday cartoon supplement, in which the Katzenjammer Kids were appearing, had been publishing cartoons by Opper since June 4, 1899, and he thus must have been familiar with Dirks's work. On March 25, 1900, the Journal also published a dialog-form four-panel cartoon signed "F. Opper after sketch by Dirks." The first use of a (single) speech balloon in Happy Hooligan appeared in the same issue. The first two episodes on March 11 and 18 had been written in the silent form. The single speech balloon on March 25 shows Happy speaking German, trying to pass as a proper Teutonic member at the Sangerbundverein's [sic] Grand Bierfest in order to get away with drinking the free beer ("Feller citizens, Hoch der Kaiser! Bully fer de Dutch!"—Unsurprisingly, Happy is found out and kicked out). The following issue of Happy Hooligan reverts to the silent form again, and until April 29 the strip only uses one single balloon each, usually for an exclamation. From May 6 on, Opper drew Happy Hooligan almost exclusively in audiovisual form using multiple balloons, with rare exceptions (such as on August 5, 1900, when the strip featured only one balloon and was mostly silent). Interestingly, the May 6 issue hearkens back to the voice/speaker disconnect joke begun by Outcault's The Yellow Kid and His New Phonograph, likewise having its protagonist misled by a parrot's utterances. In contrast to the Yellow Kid, however, Happy Hooligan directly (verbally and physically) interacts with the trickster parrot in his strip.

As seen above, Opper began using the audiovisual form consistently before Dirks did, though the latter was the first to use the form at least semi-regularly. Perhaps the fairest way to put it would be that the form was established by both artists. It is evident from their March 25, 1900 collaboration that the two had some kind of personal relationship, and later episodes in which the Katzenjammer Kids and Happy Hooligan star together (and which are signed by both artists) further support this assumption.[22] Considering that Outcault's last Yellow Kid multi-panel cartoon using balloons appeared on October 24, 1897, it is difficult to assert a direct influence of the Yellow Kid cartoons on Dirks's Katzenjammer Kids and Opper's Happy Hooligan, given the nearly two-year gap between the former and the first speech balloon in a multi-panel Katzenjammer Kids on July 2, 1899, even if one were to consider that Yellow Kid cartoon an early example of the audiovisual form. Once both of the New York Journal's most popular cartoon features predominantly used the comic strip form from 1900 onwards, however, other artists adopted it as well, including Outcault, who began his first truly audiovisual-form strip Buster Brown in 1902. The direct lineage of the comic strip form begun by Dirks and Opper continues to this day.

Speech, Sound, and the Transdiegetic

Although we thus see that Thierry Smolderen, too, like many others[23] overestimates the importance of the Yellow Kid for the establishment of the audiovisual, comic-strip form, his argument is revolutionary because it looks at "speech balloons" by their narrative function as opposed to their graphic appearance, and because it focuses on the history of sound in comics, refuting the ahistorical assumption that comics are a timeless medium that sometimes, seemingly arbitrarily, uses "speech balloons" and sometimes doesn't.[24] Smolderen's focus on "balloons," though, is unnecessarily restrictive given that visual representations of sound (Smolderen's "sound images") do not require balloons, and are not limited to them. This should be obvious to the modern reader, who most likely is well-acquainted with sound effects like "BANG" and "POW" (these specific examples strongly associated now, ironically, with the 1960s Batman TV series) which are usually not enclosed in balloons. The distinction between speech on the one hand and sound effects (or "onomatopoeia") on the other, though unfortunately common in comics studies, is arbitrary.

The speech balloon is a convenient way of distinguishing speech from writing that is intradiegetic, i.e. letters, characters, and symbols existing and visible as such inside (=intra) the story-world (=diegesis), but it is not a necessary condition for its representation. It is possible to represent "speech" (linguistic sound images) without balloons, and "sound effects" (non-linguistic sound images) within them (and examples of both exist). Focusing on a particular graphic shape used to represent sound instead of on the representation of sound itself obscures important historical developments and differences between distinct forms of graphic narrative, such as when we apply the term speech balloon to labels found in Victorian broadsheets (See Marschall).

The history of the comic strip form, of the creation of the "audiovisual stage on paper," is more than the history of the speech balloon. It is the history of the creation of transdiegetic content, and the division of graphic signs into intra-, extra-, and transdiegetic. The dichotomy between intra- and extradiegetic content has long been used in film studies, where it is necessary to distinguish for example between music audible only to the spectator and such music as can be perceived by characters within the diegesis (the fictional world created by the narrative). In the former case, the music would be extradiegetic; in the latter, intradiegetic. Narrative comics, like films, clearly contain intradiegetic elements such as characters, objects, and locations depicted, as well as extradiegetic elements, such as titles or panel borders. The division between intradiegetic and extradiegetic content is occasionally violated for comedic effect, as done in Winsor McCay's famous 1905 episode of Little Sammy Sneeze, in which the titular character's action causes the panel borders around him to collapse. The existence of such violations does not refute the existence of the intra-/extradiegetic boundary, but rather confirms it. The more realistic and/or serious a comic strives to be, the less likely it will violate said boundary.

What, then, is to be made of the speech balloon? The balloon or the writing in it does not appear visible to intradiegetic characters, and yet they nonetheless understand its content. It helps to consider the speech balloon as a Saussurean sign: The balloon's signifier (the convention through which meaning is expressed), i.e. the concrete graphic object on paper, is extradiegetic (perceptible only to the reader), while its signified (the expressed meaning), i.e. the sound within the story world to which it refers, is intradiegetic (perceptible by characters inside the story world).[25] Devices like the speech balloon thus translate non-visual intradiegetic content into a visual extradiegetic form, which is why I refer to such devices as transdiegetic.[26] The comic strip form of graphic narrative, the audiovisual stage on paper, can best be understood via the tripartite division of content into intra-, extra-, and transdiegetic.

This conceptual division of all narrative content into these three distinct categories while simultaneously combining them in the same image space is precisely what distinguishes the comic strip from earlier forms of graphic narrative. It does not make sense, for example, to apply these narratological categories to the c. 1700 print showing parents sifting their son, given that the print does not establish a meaningful distinction between a diegetic world and its outside. As noted above, its phylacteries ('balloons') cannot be justifiably seen as a representation of an actual conversation between the three figures, but rather serve as clues to the print's intended meaning. Rodolphe Töpffer's picture stories, on the other hand, do establish a diegetic world, but separate intradiegetic images and extradiegetic narration and feature little, if any, transdiegetic content. Even in R. F. Outcault's Yellow Kid cartoons one cannot say for certain whether the writing on the Yellow Kid's nightshirt is intradiegetic (Can other characters see it?), extradiegetic (Is it written commentary aimed purely at the reader?), or transdiegetic (Is it invisible, but signifying spoken words?). In the comic strip form established by Dirks and Opper (i.e. today's globally dominant form of graphic narrative) every element can be understood as clearly intra-, extra-, or transdiegetic, with few exceptions, such as the above-mentioned violation of boundaries for comedic effect.

The History of the Comic Strip as the History of Transdiegetic Content

1) Motion Lines and Impact Stars

Dirks/Opper's completion of the comic strip form as an audiovisual stage was the result of a gradual process begun when 19th century artists started experimenting with ways to depict motion on the page. In Techniques of the Observer, Jonathan Crary describes how the Enlightenment in Europe initiated a shift in philosophical and scientific notions of seeing away from a model represented by the camera obscura and towards one represented by the stereoscope. Crary argues that during the 19th century the understanding spread that vision was rooted in the corporeal observer (i.e. in the concrete bodily functions creating the ability to see), and hence subjective, rather than a scientifically precise and objective method of capturing material truth. This change in the conception of vision coincided with (more precisely, according to Crary: enabled and brought about) the development of new technologies of visual perception, culminating in the invention of photography. Prior to the 19th century, it was inconceivable to depict motion itself, for motion did not possess a material existence that could be captured within the camera obscura model (since the perception of motion is an illusion generated by the brain). In pre-19th century visual art, moving objects could only be represented in their objective state at the precise moment of depiction. To the modern observer this creates the impression of such objects being frozen in time, as is the case in the British "sifting" print or in Hogarth's A Harlot's Progress when the harlot kicks over a table.

As both Kunzle and Sasaki point out, we see early experiments in representing motion in works by Wilhelm Busch and to a lesser extent in those of Rodolphe Töpffer.[27] Kunzle traces the incorporation of movement into graphic narrative as far back as Töpffer's use of wavy lines and "montage," a term Kunzle uses to refer to depicting the same figure in different positions to suggest movements in between them. Whether these techniques should be considered representations of movement is arguable, but it is certain that representations of movement started to proliferate during the second half of the 19th century, when new technologies of visual entertainment became available. The influence of such technologies on illustration, cartooning, and graphic narrative is evident.

For example, Kunzle ties the use of black silhouette figures in graphic narrative seen during this period to the spread of magic lanterns.[28] While the magic lantern does not create a moving image, the creation and availability of devices that did, such as phenakistoscopes, zoetropes, praxinoscopes, zoopraxiscopes, and kinetoscopes, exerted similar influence. Kunzle illustrates this with an 1882 multi-panel cartoon called "New Zoöpraxiscopic Views of an Eminent Actor in Action."[29] The New York Journal between December 6, 1896 and March 10, 1901 (at least), too, ran a multi-panel cartoon called The Journal Kinetoscope with the tagline "Taken At The Rate Of A Million A Minute" which depicts short humorous vignettes and is drawn to look like a celluloid film strip, suggesting a direct link between new 19th century visual technologies and graphic narrative, in this case. It is thus plausible to assume that the emergence of specific transdiegetic motion devices such as speed/motion lines and blurs in graphic narrative between Töpffer's works and the Katzenjammer Kids and Happy Hooligan was an effect of the paradigm shift from "camera obscura" to "stereoscope," and of the new technologies accompanying it.

In addition to movement, a second form of early transdiegetic content preceding sound images exists: stars signifying pain or confusion. Often referred to as pain or impact stars in secondary literature, these stars are the oldest transdiegetic element not representing movement. It is unclear when impact stars were first created. Likely this happened during the latter half of the 19th century, the same time that we begin to see transdiegetic representations of movement appear. By the century's end both were already a common element of cartoons. On May 16, 1897, for example, The Journal Kinetoscope in an episode titled "The Inflated Tire and the Empty Goat, or The Inflated Goat and the Empty Tire," showed a goat inhaling air from a tire and floating through the air before exhaling and crashing down. After the goat exhales, motion lines in the penultimate panel indicate that it is spinning from the sudden movement. The final panel shows the goat on the ground, with several pain stars next to its head. The earliest examples of pain stars in the New York Journal date from a November 1, 1896 Yellow Kid ("McFadden's Row of Flats") single-panel cartoon (with a few small crude stars among several straight lines signifying the impact of a blow to the head) and a November 8, 1896 multi-panel cartoon about an elephant and a monkey (with more clearly defined and more numerous stars). It is easily understandable that artists would have begun to experiment with motion lines and blurs after becoming familiar with the visual technologies mentioned above (in particular after the advent of photography and the experience of seeing moving objects leave trails within a still image), but it is less obvious how someone first came up with the idea of representing a physiological occurrence like pain through a seemingly arbitrary graphic device. Though many texts written on comics mention the use of such forms of transdiegetic content (such as stars, lightbulbs, and heart shapes, representing interior states), to the best of my knowledge no one has undertaken a history of them or attempted to explain their origins.

It is unlikely a coincidence that the invention and spread of impact stars and that of transdiegetic representations of motion happened during the same period (mid-to-late 19th century). The Oxford English Dictionary lists three instances of recorded use of the phrase "to see stars" prior to 1800. The first two, from the 1598 The Sixth Book of the Myrrour of Knighthood and the 1640 The Love & Armes of the Greeke Princes (both translations, from Spanish and French, respectively), appear to refer to armor-clad warriors from Greek mythology fighting with swords. One could hypothesize that the origin of the original Spanish and French phrases may be related to sparks resulting from metal hitting metal, but even if both instances did describe the phenomenon of seeing lights after a blow to the head, this concept does not appear to have been widespread at the time (at least not in the English language). The third pre-19th-century entry and first native English one, from Charles Stearns's 1798 Dramatic Dialogues for the Use of Schools is about seeing stars as the result not of physical impact, but of imbibing potent potables ("It may make us see stars if it be too strong"). The next four entries are from 1838, 1839, 1868, and 1894, and all relate to being hit in the head, suggesting that the concept of seeing stars due to a strong physical impact (in particular to the head) only became widespread from the mid-19th century onwards.

Due to the lack of data on the earliest instance(s) of impact stars in 19th century cartoons, it is impossible to determine their precise point of origin. I suspect that a correlation between the 19th century emergence of the phrase "to see stars" referring to the result of a blow to the head and that of impact stars exists. Both likely result from the transformation that the understanding of the nature of vision underwent in the 19th century: the discovery of "the corporal subjectivity of the observer" described by Jonathan Crary and spotted by him first in Goethe's 1810 Theory of Colours.[30] Crary cites biologist Johannes Müller's work on the physiology of the senses as having laid much of the groundwork for the widespread acceptance of this discovery.[31] In a chapter subtitled "Physical Conditions Necessary for the Production of Luminous Images" (according to Crary, "a phrase that would have been unimaginable before the nineteenth century") in his Handbuch der Physiologie des Menschen, first published in 1833, Müller lists as one of his five causes of luminous images "mechanical influences; as concussion or blow."[32] As Crary points out and as we have seen in the examples from The Sixth Book of the Myrrour of Knighthood and The Love & Armes of the Greeke Princes, knowledge of the ability to produce "luminous images" through mechanical means preexisted the 19th century (Crary cites Thomas Hobbes's Leviathan: "And as pressing, rubbing, or striking the eye, makes us fancy a light"33), but Crary emphasizes that while in earlier times experiences like these had been considered deceptive illusions, "in the early nineteenth century, particularly with Goethe, such experiences attain the status of optical 'truth.' They are no longer deceptions that obscure 'true' perceptions; rather they begin to constitute an irreducible component of human vision."34 The realization that physical impact on the human body could manifest itself in one's vision (specifically, a "concussion or blow" causing one to see bright lights) thus appears likely to have come about as part of the same rethinking of vision described by Crary that generated knowledge of the afterimage and thus led to the creation of motion lines and blurs. This connection provides a plausible explanation for why we see these at first glance dissimilar forms of transdiegetic content emerge together during the 19th century, though further research is necessary.

2) Narrative Images without Narrative Text; Early Transdiegetic Sound

By the time Outcault began experimenting with representations of sound, cartoonists had thus already become familiar with the concept of transdiegetic devices, of making visible that which could not ordinarily be seen (pain), or at least not in still images on the page (motion). So-called pantomime cartoons (like the Journal Kinetoscope) demonstrated that action could be shown without the crutch of narration. Smolderen writes about such 'wordless' strips that, "the absence of authorial intervention was a statement in itself; for the comic artist, the deadpan tone of these pantomimes represented a very deliberate form of irony. What the author tried to emulate (with a grain of salt) was the mechanical recording of human action by such processes as chronophotography and Edison's Kinetoscope."[35] We have already seen evidence bolstering Smolderen's claim regarding a connection between new technological apparatuses and graphic narrative in the Journal Kinetoscope. Kunzle writes that "graphic autonomy in caricature was not to be attained until the 1880s" (although it had been written about as early as 1844), though he also reprints excerpts from George Du Maurier's 1869 "The Philosopher's Revenge. (A Story Without Words.)," a sixteen-panel cartoon that, contrary to its title, does feature words—extradiegetic panel labels and an intradiegetic sign in the last panel—but is indeed "graphically autonomous." Cartoons relying exclusively on the image to tell their story thus go back at least as far as 1869 and seem to have been a part of the larger shift in the European concept of vision identified by Crary and the new technologies and devices (both in the material sense of the kinetoscope and the abstract one of motion lines) enabled by it.

Du Maurier's "The Philosopher's Revenge" is remarkable for a second reason: It may well be the earliest attempt at a transdiegetic representation of sound. In the cartoon, the titular philosopher is perturbed by his next-door neighbor's singing and piano playing. In response, he ventures outside to purchase a street organ player's instrument, which he then places on the other side of the wall from his neighbor's piano. The hurdy-gurdy wins the battle of the two instruments and the last panel shows the philosopher again in peace, while a sign outside his neighbor's apartment proclaims it vacant, indicating that she has moved out. The sound coming from the instruments as well as from the neighbor's vocal cords is represented by long thin lines whose ends are reminiscent of musical notes. The long lines are intersected perpendicularly by multiple short ones. The method of representation is striking in how much it differs from the way in which these musical sounds would be represented today. Had the cartoon been drawn half a century later, the sound would likely be indicated by a mixture of actual music notes, a speech balloon featuring text for the vocal sound, sound effect words trying to approximate the sound of a piano and a street organ, and/or concentric lines emanating from each source (or straight lines radiating from it). "The Philosopher's Revenge" illustrates what a new concept transdiegetic representations of sound were at the time of its creation.

Similarly to the concurrent emergence of pantomime cartoons with technological devices capturing and reproducing short visual vignettes in the 19th century, it is plausible that the emergence of transdiegetic sound was strongly intertwined with the development of equipment that could record and eventually also reproduce sound. Smolderen and other writers have tied Outcault's use of balloons to the invention and spread of the phonograph (and Outcault's temporary employment by Thomas Edison, its inventor). One can see "The Philosopher's Revenge" as a similar response to Edouard-Léon Scott de Martinville's creation of the phonautograph, patented in 1857, which was able to record sound as two-dimensional lines.[36] Given Jonathan Crary's well-supported argument that 19th century developments in painting (such as the emergence of Impressionism) were not a response to technological inventions (mainly photography), but that both of the above rather were separate responses to an underlying shift in European knowledge about the nature of vision (i.e. the shift from the camera obscura model to that of the stereoscope), something similar may be true for sound. Rather than Outcault's and Du Maurier's cartoons being the respective effect, and the phonograph and the phonautograph the respective cause, it could well be the case that all were responses to a new conception of sound that developed in the 19th century.

In The Audible Past Jonathan Sterne suggests precisely this, claiming that "[s]sound-reproduction technologies are artifacts of vast transformations in the fundamental nature of sound, the human ear, the faculty of hearing, and practices of listening that occurred over the long nineteenth century."[37] Sterne posits that in the scientific study of sound, attention shifted from the mouth to the ear as the locus of sound, away from the sources of those vibrations that the human sensory apparatus perceives and renders as the experience of sound ("the mouth"), and to precisely this apparatus ("the ear"). Like Crary, Sterne cites as a factor the work of Johannes Müller, which concluded that "[s]ound has no existence but in the excitement of a quality of the auditory nerve."[38] Focusing on the eardrum (tympanum) and how it registered and transmitted vibrations permitted researchers like Alexander Graham Bell to understand that in order to reproduce sound, it was necessary only to replicate the vibrations themselves, and not the precise conditions under which they had originally been generated. Other researchers focusing on the mouth rather than the ear had tried to create automata that would generate sonic vibrations much like an actual human would, but this approach proved ineffective. Seeing the ear as the locus of sound production was a prerequisite for making possible the invention of devices such as the phonograph, which were utterly disconnected from the original means by which a sound had been produced.

Sterne's argument has in common with Crary's the suggestion that over the nineteenth century in Europe and the United States, the understanding of a sense (sight or sound) shifted from conceptualizing it as a passive reflection of an objective reality that exists outside of the human mind, and towards the active generation of both phenomena by nerves and the brain. This explains why we do not see transdiegetic depictions of motion or sound before the mid-to-late nineteenth century. Prior to the paradigm shifts from camera obscura to stereoscope and from voice to tympanum, it was simply impossible to recreate motion and sound on the page, because according to the prevailing knowledge at the time, it would have required reproducing in some form their objective, material reality. This was only feasible with physical objects and bodies, whose material reality could be reproduced on paper in similar fashion as it appeared to their observer, albeit in simplified or caricatured fashion. But it was only with the spreading knowledge that motion and sound were centered in the observer/listener that it became possible to recreate these phenomena on the page, because all that mattered now was to evoke their perception in this observer/listener; their physicality had become irrelevant.

Whether or not one finds this theoretical argument convincing, it is certain that transdiegetic representations of sound developed concurrently with the emergence of technology that could record it—whether the former was inspired by the latter or whether both responded to a deeper underlying shift in knowledge. It is unsurprising that it took longer for transdiegetic sound to become a regular element of cartoons than it did for transdiegetic motion; considering the visual nature of cartooning, it must have appeared significantly more obvious to artists to incorporate transdiegetic content related to vision than such content related to sound. Despite Du Maurier's early example, transdiegetic representations of sound remained rare until the spread of the phonograph. The first phonograph was patented by Thomas Edison in 1877, though phonographs would not become affordable for many Americans until the late 1890s, after multiple improvements to the technology by Edison and his competitors across the intervening years.[39] The late 1890s is of course precisely when the phonograph, and along with it transdiegetic sound, begins to appear prominently in American cartoons such as Outcault's Yellow Kid works.

The cartoons featuring depictions of sound in the New York Journal between Outcault's October 25, 1896 "The Yellow Kid and His New Phonograph" and Dirks's and Opper's regular use (as opposed to Dirks's earlier occasional use) of transdiegetic speech balloons to show intradiegetic characters conversing with each other in 1900 support Smolderen's hypothesis that the introduction of sound to cartooning happened through artists addressing the new phenomenon of the mechanical sound image without an author. The majority of early uses of transdiegetic sound is tied to phonographs (and sometimes similar technological devices such as the telephone) and parrots. The earliest non-Yellow-Kid-related transdiegetic sound in the New York Journal happened in a single-panel (whole page) cartoon on November 1, 1896 by Homer Davenport, which (among other things) shows multiple gadgets (likely a form of phonograph) emit the words "16 to 1," for example.

On January 10, 1897 the New York Journal published a six-panel cartoon titled "The Mysterious Trunk—A Story With Words" (whose title reads almost like a reference to Du Maurier's "The Philosopher's Revenge," given the second part of the latter's title: "A Story Without Words").[40] The subtitle is a wry comment on the fact that the cartoon, unlike most at the time, does not feature (extradiegetic) narrative text, while it does feature, again unlike most cartoons then, (transdiegetic) words written directly into the panels. The cartoon's plot is as follows: A wealthy-looking man is followed by a porter dragging his trunk, out of which cries for help (such as "Police!!" "Let me out..." "Murder!!!") can suddenly be heard.[41] The surprised porter alerts police, who cut open the trunk to reveal a parrot that thanks them for its liberation. What is most striking about the cartoon is the complete silence of the various human actors, especially in contrast to the parrot's plentiful words in five out of the six panels.

Similar to Outcault's earlier cartoon about a phonograph and a parrot, the joke appears to be the disconnection of human language from a human source. It is not difficult to imagine that the entry of phonographs into American homes at the time was a significant factor both in generating the idea for such a joke and in making the audience receptive to it. The notion that an ordinary (i.e. not supernatural or magic) parrot's voice could be mistaken for that of a human would likely have been incomprehensible to pre-phonograph readers, for whom any given human voice outside of stories of a metaphysical nature had always corresponded in an immediate manner to a physically present human source. Although parrots are of course known for their ability to imitate human sounds, the parrot in "The Mysterious Trunk," rather than "parroting" individual words, appears to be speaking on its own, much as a phonograph must have appeared to do to the 1897 audience.

Two months after "The Mysterious Trunk," on March 21, 1897, the Journal featured a five-panel "dialog form" (See Sakamoto's classifications) cartoon called "A Phonographic Proposal," in which a woman's suitor uses a phonograph recording to ask her father's permission to marry her. Unfortunately for the suitor, the recording only catches his badmouthing the father, which the latter then hears via the phonograph, leading the outraged father to kick the suitor out of the house. All of the cartoon's speech is written underneath the panels, disconnected from the images. However, the phonograph's recording when played by the father in the fourth panel is represented doubly, the second time transdiegetically within the image, with a speech balloon coming out of the phonograph. Similarly to "The Mysterious Trunk," the transdiegetic sound here is connected to the phenomenon of human speech originating from a non-human source.

The disconnect between voice and human author links "A Phonographic Proposal" and "The Mysterious Trunk" to "The Yellow Kid and His New Phonograph." Outcault appears to have been part of, or, absent the discovery of earlier such cartoons, to have started, a trend of cartoons about the voice/human speaker disconnect which by its nature induced artists to employ the first transdiegetic representations of speech that later enabled the creation of the comic strip as an audiovisual stage by Dirks and Opper. Considering that in none of the cartoons by Outcault and other artists that employ at least some transdiegetic sound do we see characters interact with each other using transdiegetic speech, the audiovisual stage strip did not yet exist in 1897, but the parrot/phonograph experiments with transdiegetic sound represent an important intermediary between "The Philosopher's Revenge" and the August 27, 1899 "Ven iss a door not a door?" Katzenjammer Kids strip.

One of the reasons that ahistorical claims about Victorian 'speech balloons' persist is that it seems such an obvious idea to us today to show human beings speaking to each other and to simultaneously render their actual words as a sound image near them (as opposed to depicting the characters silently in the act of uttering them and supplying the words as separate external dialog or narrating them). One must keep in mind that until the early 20th century, even until after the establishment of the comic strip as an audiovisual stage, there was no other visual medium that did this. The other popular visual entertainment medium of the late 19th century besides graphic narrative, the cinema, had not yet come up with a viable method to feature speech simultaneously with its human sources. It took until 1927's The Jazz Singer for sound to become a regular part of feature-length films (though there were earlier experiments), and, according to subtitle scholar Henrik Gottlieb, even subtitles were not used until 1922's Mireille.[42] Before then, speech had to be represented with the help of intertitles, functioning in a way similar to Sakamoto's dialog form of cartoons, with the dialog separated from the intradiegetic images (spatially in graphic narrative, temporally in film). As the few examples of transdiegetic sound in cartoons before 1900 demonstrate, to represent characters actually speaking to each other (as opposed to the commentary-style balloon-labels used in many single-panel cartoons) along with the content of their speech must have been a revolutionary idea that was able to take hold only gradually because it had been inconceivable for all of human history preceding the phon(aut)ograph.

Dirks and the Emancipation of the Sound Image from the Voice/Speaker Disconnect Joke

The transdiegetic "sound image" (enclosed in balloons or not) thus came about as an essential element in cartoons that mocked the voice/speaker disconnect brought about by the phonograph, via representations of phonographs and parrots (The next example of a transdiegetic speech balloon in the New York Journal after "A Phonographic Proposal," too, is from a three-panel cartoon about a parrot from September 26, 1897). At this point no one was yet thinking of using sound images to portray an entire sequential narrative audiovisually. Two more years passed until Rudolph Dirks drew the August 27, 1899 Katzenjammer Kids episode that should be considered the first true instance of the comic strip as an audiovisual stage. Contrary to existing lore, the form's creation took more than simply adopting Richard Outcault's 'invention' of the balloon.

Dirks seems to have quickly understood the potential of transdiegetic content, employing motion lines (January 8, 1898), motion swirls, freestanding exclamation and question marks, hats flying off (i.e. drawn above) characters' heads to indicate surprise (all: January 30, 1898), music notes (March 6, 1898), dust clouds to show movement/speed (June 19, 1898), and pain stars and exaggerated bumps as a result of blows to the body (September 18, 1898) within his first year of working on the Katzenjammer Kids, which had started out as merely another American newspaper version of Wilhelm Busch's Max und Moritz.[43] Few histories of comics mention that the Katzenjammer Kids were preceded by Harry Cornell Greening's Tinkle Brothers (Tinkle Kids in their fifth and last installment), who bear a similar resemblance to Busch's duo, from September 5 to October 17, 1897.[44] Dirks likely knew the strip, given the general similarities as well as some in content, such as the Katzenjammer Kids' attempt to ride a goat on January 16, 1898 much like the Tinkle Brothers had done on September 12, 1897. The Katzenjammer Kids, however, somehow managed to avoid the Tinkle Brothers' short-lived fate.

Not counting the earlier use of music notes, it took until March 5, 1899 for the Katzenjammer Kids to feature its first sound image. In that day's episode, one of the Kids pretends to stab his brother, who screams, "Help!! Murder - Help!" Notably, the words are not framed by a balloon, but instead accompanied by several straight lines radiating from the source of the sound, much as was the case with January 10, 1897's "The Mysterious Trunk" and also a July 31, 1898 six-panel cartoon imitative of the Katzenjammer Kids called "The Gashouse Twins Nearly Commit Murder," by William Marriner. In the cartoon, the Twins hide a speaking doll (i.e. a phonograph doll) in a well to prank their parents (presumably). The single (but repeated) sound image is of the doll saying "Mama!!!" over and over again. This cartoon fits in with the other examples of early sound images discussed above, which had all centered on the voice/speaker disconnect brought about by sound recording technology.[45] Another six-panel cartoon titled "The Story That Wasn't Printed - And the Reason for It" on January 1, 1899 about an editor demanding "more copy" through an intercom, too, falls in this category and likewise uses the same, balloon-less extradiegetic depiction of the intradiegetic sound (which together form the transdiegetic sound image). The is also true for "The Parrot Learned Not Wisely But Too Well," a February 26, 1899 five-panel cartoon about a parrot repeating swear words.

Dirks's March 5, 1899 Katzenjammer cartoon uses such a sound image, linked to the voice/speaker disconnect, for a human speaker instead. Although the sound image does remain linked to the voice/speaker disconnect in that the Katzenjammer Kid to whom the sound image is attached is using a literal balloon as a fake head above his own, which is hidden inside his coat, the sound image's use for a human speaker instead of a parrot or phonograph (and at the same time, unlike in "The Yellow Kid and His New Phonograph," for an utterance clearly audible to other characters) marks an important shift. Two weeks later, on March 19, a Katzenjammer character (Mamma) uses a speech balloon for the first time, albeit in a single-panel cartoon, saying, "Alreaty but not vunce yet." Rather than serving as a self-representation device or label, the balloon emphasizes the sonic qualities of the utterance, i.e. Mamma's strong German accent. While the Yellow Kid's balloons generally were written in a way to emphasize his manner of speaking, this was never their sole purpose, as it seems to be in the case of Mamma Katzenjammer. Evidence for this interpretation is provided by a five-panel cartoon by Dirks in the same issue, "Mrs. Schneider Gets Her First Telegram," in which a messenger tries to deliver a telegram to the eponymous Mrs. Schneider. The messenger's utterances ("A telegram for you, lady! A telegram! A tel--!!-! A---!!-!!??-! Aw rats!") are all written underneath the panels (i.e. in Sakamoto's "dialog form"), while Mrs. Schneider's single line, in German pidgin, "VAT IT IS?" [sic] is expressed transdiegetically as a sound image in a balloon.

For the Katzenjammer Kids episode on April 23, 1899, "The Katzenjammer Kids Play Mazeppa With Two Dummies," Dirks reverts to the "speech lines" type of sound image seen in "The Mysterious Trunk" et al, when the Kids are hiding in a haystack and shouting for help to play a prank on Mamma (panel 4). The joke is that the cries for help are misattributed by Mamma Katzenjammer to two dummies of her children, which the Kids have strapped to the back of a mule in order to send their Mamma chasing after it. Like the Gashouse Twins with their speaking doll and the Kids with the toy balloon, the joke remains that of an inanimate object (or parrot) being mistaken for the source of a human voice, essentially the same voice/speaker disconnect gag seen in previous cartoons going back all the way to "The Yellow Kid and His New Phonograph."

However, in "The Katzenjammer Kids Play Mazeppa With Two Dummies," the sound image is finally emancipated from the voice/speaker disconnect, as the speech lines are emerging directly out of the Kids' mouths and are intended as an accurate depiction of the words enunciated by them. Unlike the Yellow Kid's speech balloon in "The Yellow Kid and His New Phonograph," which was directed at the extradiegetic reader and ignored by the intradiegetic parrot, the Katzenjammer Kids' cries for help are both addressed at Mamma Katzenjammer and reacted to by her. Although Dirks does not switch to a complete audiovisual stage right away, it is only a small step from here to the fully audiovisual strips he begins to draw four months later. Even after August 1899, transdiegetic sound images are not immediately used universally by Dirks and others (including Opper), but often primarily for utterances of a particular vocal quality, such as exclamations, singing, recitations, or dialects, likely because artists were still accustomed to the silent and dialog forms of graphic narrative and used sound images first and foremost when the narrative required emphasizing certain vocal acts. Through this use, however, sound images became an increasingly familiar device and Dirks and Opper pioneered their general use as a part of an audiovisual stage over late 1899 and 1900, with other artists gradually following suit and adopting the audiovisual stage model.

I argued above that the history of the audiovisual stage form of graphic narrative (and hence contemporary comics) is not the history of "the balloon," but the history of the creation of a multitude of transdiegetic signs, such as motion, emotion, and sound, which all come together in the audiovisual stage, or perhaps more accurately: the transdiegetic stage. The history of transdiegetic signs did not end with the sound image. It continued with the creation of other inventive ways of making visual that which escapes direct visual apperception, such as the act of having an idea (represented by a light bulb), greed (pupils replaced by dollar signs) or love (the stylized heart shape), and still continues today. The rudimentary form of this transdiegetic stage was completed by Rudolph Dirks and Fred Opper when they made the sound image an essential element of their works, however. The combination of motion and sound made comics the first audiovisual narrative medium and enabled artists to tell complex stories mimetically, without the need for external narration to fill in for what earlier forms of graphic narrative had not been able to express visually.


