Books on the Web

This document introduces a user-focused vision of books as first-class citizens of the Web—with off-Web reading systems (beyond normal Web user agents) made superfluous, and “Portable Web Publications” made irrelevant.

The vision presented in this document is one of providing the best possible user experience for readers: A seamless reading experience that automatically remains consistent even as the user’s reading context changes.

So this vision assumes no user desire for “portability” of books, nor user desire to ever be required to “switch” into off-Web reading of electronic books, nor user desire for “convergence” of the reading experience of books on the Web with that of legacy reading systems. Instead, this document starts by reminding all that the fundamental quality of the Web, consciously designed into it by its creator from its very creation, is Universality.

This document outlines how to draw on that fundamental quality of Universality designed into the Web’s soul to build a user-first, universal reading experience for books on the Web based the principle of the Priority of Constituencies[[Priority-of-Constituencies]], always putting the needs of users first—above the needs of any other constituency.

Why work on this now?

In June 2013, the W3C launched a new “Digital Publishing” Interest Group with a charter for the group that states the following goal:

the current format- and workflow-level separation between offline/portable and online document publishing should be diminished to zero

The W3C is now considering proposals for achieving that “zero separation between offline and online” goal for Web publications, and this document outlines a specific concrete technical direction for making that actually happen—for making it a reality, rather just a vision.

However, the W3C is considering to instead invest organizationally in a contrary direction based on the much vaguer idea of an EPUB-inspired “EPUB+WEB” “Portable Web Publication” format as envisioned in Portable Web Publications for the Open Web Platform, and as illustrated in .

Illustration of conceptial model in which Web content needs to be “packed” and “unpacked” for use on a smartphone. — Flawed conceptual model that assumes needing to “package” and “unpackage” Web publications just to be readable offline or on mobile devices

It would be a mistake to invest in that or any direction that is based on the assumption that the solution needed is a “portable” document format usable outside of the Web.

To be very clear about the distinction, a reading system is not part of the Web if:

the reading system does not use URLs
the reading system does not use the Web security model
the reading system requires users to deal with the extra steps of having a “packed” Web publication, copied off its home on the Web, that then needs to be “unpacked” in order for the user to read it

Such a reading system is instead just part of the non-Web, and designing any solution for users that optimizes for “portability” to such off-Web reading systems is a mistake. While a document format that is “portable” into such off-Web reading systems may solve problems for constituencies other than users, it is absolutely not the right way to provide the best possible user experience for users themselves.

So instead of designing for “portability”, what’s actually needed is to design for universality[[Universality]]—using a true 100% Web-based approach that doesn’t require users to download copies of Web publications in order to read them offline but instead requires nothing more from users than to simply browse to the URL for a Web publication in exactly the same way they normally would, and subsequently just be able to automatically continue to access that Web publication at that same URL regardless of whether they are offline or online.

Explaining how such seamlessly consistent reading experiences of Web publications can be achieved for users is the purpose of this document—as well as making clear why the direction outlined in this document is a much better solution for providing the best possible user experience to users—and truly achieving the “zero format- and workflow-level separation between offline and online” goal in the “Digital Publishing” Interest Group charter—than is the contrary direction outlined in Portable Web Publications for the Open Web Platform.

What are the areas of interest?

A principled approach to making books first-class citizens of the Web involves addressing the needs of the following stakeholders:

Users

The right approach to making the best possible user experience for books on the Web starts simply by meeting the needs of users first, beyond all other needs. The needs of other constituencies will naturally be addressed by meeting user needs (at least for other constituencies that are actually acting in good faith to meet user needs, instead of acting in practice against the best interests of users).

In contrast to the vision presented in Portable Web Publications for the Open Web Platform, this document starts from a realization that users in fact do no want to be ushered (however subtly) into having to make choices among different reading systems for the same content, depending on the particular reading contexts they find themselves in. Users instead want a truly universal system for reading. Users want the Web.

Before 2008 or so, when smartphones and tablets capable of running full Web browsers with full access to the Web began to become prevalent, there was some value to users in having specialized off-Web devices with specialized reading systems. But for years now, users have actually had smartphones and tablets with full Web capabilities—so there is no longer any user need or desire for such specialized off-Web devices, nor for such legacy off-Web specialized reading systems.

Even in the case where, say, roaming charges are often high, or when internet access may be of low quality or not available at all, the Web itself—through Service Workers—provides users with the possibility to automatically have, in an easy manner, offline access at any time, anywhere, to any Web publication they are reading.

So the best possible user experience can now be provided to readers just through the power of the Web. A truly universal reading experience.

Secondary constituencies

Secondary constituencies who can help give readers the best possible user experience of books on the Web:

Publishers

Book publishers are investing in the development of technical expertise in web technologies. While gaining understanding of technical topics is important to new and future publishing workflows, the lack of communication between the trade publishers and web-application developer communities is resulting in unnecessary duplication and investments in effort.

Collaboration between the Web content development and publishing communities will result in major benefits to publishers. Moving to a truly user-first model of books on the Web that takes full advantage of the universality[[Universality]] of the Web means publishers can concentrate on engaging content authors in the production of high quality content.

Along with familiar, established Web technologies such as CSS and SVG, support for technologies enabling features such as 3D rendering (and even visualization tools such as D3) will naturally flow into the publishing realm by moving to a user-first model of books on the Web, hence increasing publishers’ opportunities to sell new content products across the board.

Realizing new opportunities is a reality for publishers traditionally considered to be on the leading edge of technological advances in working with content. These publishers include STM and educational publishing houses, as well as scholarly and journal publishing organizations.

A user-first model of books on the Web will support more tools and services and a much larger population of trained practitioners compared to the current state of working in parallel universes.

Scholarly journal and STM publishers

Scholarly journal publishers also provide articles for download these days. The most popular distribution format for journal articles continues to be PDF—as a direct reflection of the scholarly community, which highly prioritizes linear text and preservation of print typography. Indeed, the original goal for scholarly publishers to make files available online was to enable readers to download and print content directly, instead of borrowing a paper copy of a journal issue and photocopying relevant articles.

But things are changing. First of all, Web-only publications become part of the mainstream (e.g., the multidisciplinary PLOS ONE or the new PeerJ CompSci journals) with the main content being published using traditional Web technologies like HTML and CSS. And there is much more. Scholarly communication increasingly uses additional media such as video, audio, animated graphics, or very large images, and the trend is to consider these as integral parts of the scientific output. (Mike Bostock’s recent article on visualizing algorithms or the “live” presentation of data in a paper published by F1000 Research are good examples for the new possibilities.)

Furthermore, publishing the scientific data sources, like the results of, say, a sociological survey or measurement output of biochemical experiments alongside the “main” publication, is also coming to the fore, with some journals and institutions actually requiring public access to those. Gaining truly universal access to all these various media and contents is important for scholars—in order for them to be able to read articles at any time, anywhere, without ever being forced into using off-Web reading systems.

And tools such as peer-review systems and bibliographic management systems like Mendeley or Zotero, to the degree they don’t yet handle 100% Web-based publications, will necessarily evolve to support them—in order to better meet the needs of scholars, and to provide the best possible user experience for scholars.

And as those tools evolve to actually put the needs of their scholarly users first, any requirement for authors to be forced to try to duplicate Web publications into redundant offline versions of the same content—in order to compensate for deficiencies and short-sightedness in the user model—will disappear.

And with the advent of Service Workers, things like user annotations, formal reviews, etc., can be performed by the scholar online or offline directly in Web applications running in a normal mobile browser on small, mobile devices—with no need for a separate redundant off-Web version of a publication, and so no need to ever be forced to have to synchronize the Web publication with any off-Web version.

The transformative effects of moving to a Web-first model that truly places the needs of users above all else will fundamentally change the way scholarly publishing works—for the better.

In-house publishers

A special form of document production is related to technical and/or user documentation of complex products as well as complex administrative documents. Such documents are often akin to STM or scholarly publications edited by traditional trade or scholarly publishers—but, often, the sheer quantity and complexity of production, as well as confidentiality requirements, mean that the production are done in-house. In many respects major corporations such as IBM, Intel, Renault, or Boeing, or institutions like the European Commission, the FAO, or the UNESCO have become specialized publishers themselves.

The quantity of such documentation makes it infeasible to produce these documents in print (or print-only); instead, Web-publishing them is the optimal way to provide the best possible user experience for them.

Just as for scientific publications, moving to a user-first model of books on the Web will provide new possibilities for these types of in-house publications. Using Service Workers, a Web publication can be made fully usable even when offline—in, for example, an airplane cockpit.

Libraries and archival services

The archiving of digital assets is coming to the fore as a significant issue for dedicated institutions like national libraries. With the arrival of highly dynamic and possibly interactive Web publications as primary content, the traditional means of archiving (i.e., storing on some backup device for long term preservation) is no longer adequate. Web publications depend on a multitude of auxiliary files, like CSS stylesheets, images, videos, JavaScript programs, etc. The good news is that the Web itself already provides the means for archiving such publications, and there will be large growth in business opportunities for developers of systems that cater to this need well.

Financial reporting

The domain of financial reporting is increasingly making use of dynamic and interactive content for reports published on the Web—but in order to meet legal, audit, and market requirements, it is important to be able to identify a specific, unchanging version of the document, including all auxiliary files it references.

For example, financial reports must often be submitted to tax authorities, company registrars, stock exchanges, securities regulators, and other institutions, and it is critical to be able to identify the specific version that was filed, and to ensure that the document is not subsequently altered, including through changes to any external files that it references. Similarly, an auditor will give an opinion on a financial document, and it is important to be able to identify the exact version that was audited. These documents should be universally available in a common location for all parties to access, while still enabling confidentiality of the information among the parties.

The Web itself provides the means for both universally and confidentially serving snapshot versions of such reports to all authenticated users who need access to them. And in contrast to off-Web Portable Web Publications, the Web itself—through standard Web features such as TLS (HTTPS) and Subresource Integrity—provides for any user to fully and independently verify the provenance and integrity (against any tampering in transit) of any Web-published report—including the integrity of all auxiliary files it references; that is, the Web provides robust digital signing and non-repudiation mechanisms for such reports.

Terminology

This document is based on the following definitions.

Books on the Web

The term Books on the Web itself is chosen to convey, in the clearest and simplest possible way, that a Web book is simply just a website, or a subdirectory or path within a website, with a Web URL from which the book is made available to readers—a URL that can be passed around (for example, sent to others via e-mail) and that forms a universal address for the book for potential access by anyone, anywhere.
And the term book is used here in part just as shorthand for any Web publication—a scholarly journal, or an in-house publication, or a financial report, or whatever other class of document published on the Web.
That said, books as a class are called out in this term because in practice currently, the user experience around reading books is often made unnecessarily different, in suboptimal ways, from the user experience of reading other classes of online publications—for example, online newspapers.
The difference with books in practice is that book readers are unnecessarily ushered into reading books in specialized off-Web reading systems that do not participate in the richness of the full feature set of the Web and so that do not provide users with the full advantages of the Web and the best user experience possible.
But there is no reason in principle why reading a book should ever require a user to need an off-Web reading system—any more than reading a newspaper should require a user to need an off-Web reading system. So the term books on the Web is meant to convey the lack of any need for a packaging mechanism like the one on which the vision in Portable Web Publications for the Open Web Platform is premised.
If specialized reading systems are made superfluous, then there is no need for a “portable” package format for Web publications; instead, books and other publications can be universally accessed as 100% true Web publications—that is, just as books on the Web.
A Web Publication is a Web Resource which itself is an aggregated set of interrelated Web Resources, and which is intended to be considered as a single Web Resource. Furthermore:
- As a Web Resource, a Web Publication has its own URL which refers to the full set of the constituent resources (as opposed to a particular Web Resource within the Web Publication).
- A Web Publication must be constructed of resources whose formats enable (individually or in conjunction with other resources in the same Web Publication) delivery of essential content and functionality based on standard technologies of the Open Web Platform.
- A Web Publication should provide accessibility of content to all users regardless of disability.
(Note that Portable Web Publications for the Open Web Platform redefines this term to include mention of delivery through undefined “delivery platforms” other than the Web.)
A Web Resource is anything that can be uniquely addressed by a Web URL; that is, content that can be accessed through baseline Web protocols such as HTTP and through secure Web protocols such as HTTPS/TLS, and whose integrity can be verified using Web features such as Subresource Integrity, etc. (Note that Portable Web Publications for the Open Web Platform incorrectly redefines this term too broadly to include off-Web resources on a user’s local file system, not just resources actually on the Web.)
Content of a web resource: information and sensory experience to be communicated to the user by means of a user agent, including code or markup that defines the content’s structure, presentation, and interactions.
Essential Content of a Web Resource: content which, if removed, would fundamentally change the information or functionality of the Web Resource.
Functionality related to a Web Resource: processes and outcomes achievable through user action.

The concepts of content, essential content, and functionality have been taken from the W3C Web Content Accessibility Guidelines, though slightly modified for this context.

Offline contexts for Web publications

It is important to understand that in order to be made available to users for reading offline, a Web publication never needs to go through any explicit transformation—specifically, a Web publication never needs to be packaged up into a “packed” off-Web format in the way described in Portable Web Publications for the Open Web Platform, and so also then never needs to be “unpacked” for users to read in a different context.

So the only state actually changing in relation to the reading context is the state of a user’s Internet connection: A user is either online, with an active Internet connection, or offline, with no active Internet connection.

Ideally, to provide the best possible user experience for reading Web publications, a Web publication should remain readable and usable regardless of the current state of the user’s Internet connection. Users should be able to navigate to the URL for a Web publication while online, but then subsequently, even when the state of their Internet connection changes and the users are offline, they should still be able to seamlessly continue reading the Web publication, in a Web browser, at the same URL they originally navigated to.

Service Workers enable such a reading scenario—in which the reading experience remains seamlessly consistent even if the state of the user’s Internet connection changes.

It seems clear that users would prefer such a Service-Worker-enabled seamless reading experience—rather than, in the way described in Portable Web Publications for the Open Web Platform, being required to instead deal with getting the Web publication transformed into a “packed” off-Web format, in preparation for reading offline, and then needing for that to be “unpacked” for the users to actually read in an offline context.

Enabling seamless reading experiences

This section provides details of an approach for providing users with the best possible user experience of books on the Web, and with Web publications in general—regardless of the state of users’ connections to the Internet. It also lists a number of non-requirements that don’t need to be considered when designing reading experiences for users that remain seamlessly consistent whether the users are online or offline.

General architecture and demo

Service Workers[[Service-Workers]] offer the means to provide users with a seamlessly consistent reading experience of a Web publication even if the state of the user’s Internet connection changes and the user drops offline.

Service Workers work by providing programmable local caching of Web resources, acting as a network proxy while a user is offline—in such a way the rest of the browser engine continues to behave just as it did when the user was online, but now with a Service Worker intercepting all network requests for any Web resources that are part of a particular Web publication and automatically re-routing them to be served from the local cache instead.

The end result for users is that nothing changes: Even while they are offline, their reading experience remains exactly the same—seamlessly consistent, even though the state of their Internet connection has changed.

In this architecture there is absolutely no need for any additional “packing” and “unpacking” steps such as those proposed in Portable Web Publications for the Open Web Platform. Instead, Service Workers alone can provide all that is needed to ensure a seamless reading experience.

For proof-of-concept demonstrations of Service-Worker-offline-enabled Web publications, see the following:

A simple “readme” document that provides more details and explains how to view the demo content offline.
Robert Louis Stevenson’s Treasure Island as a offline-enabled web book—all 34 chapters, plus all 115 of Louis Rhead’s illustrations from the 1915 Harper’s edition of the novel.

Both are based on code from an earlier demo developed by Jake Archibald. Jake has also created another demo that uses Service Workers in the same way to allow you to create your own custom “book” of Wikipedia pages to take offline for reading anywhere:

https://wiki-offline.jakearchibald.com/

It’s easy to imagine many other interesting demonstrations for making existing content readable offline.

No archive format needed

A Service-Worker-enabled Web publication requires no archiving step in order to remain readable while offline. So, given that the Web itself can provide a truly universal reading experience even when users are offline, there is no need for any archive format to provide “portability” of a Web publication to off-Web legacy reading systems in the way envisioned in Portable Web Publications for the Open Web Platform.

No publication manifests needed

A Service-Worker-enabled Web publication requires no publication manifest in order to remain readable while offline. That said, an author or publisher might choose to use a manifest when implementing such a publication, and manifests can be useful for other purposes; they are just not a strict requirement for offline reading.

No special addressing/identification needed

URLs (generally HTTP/HTTPS URLs) serve as universal addresses for resources on the Web. Books on the Web and all other Web publications are simply just websites, with URLs that are identify them. Therefore, books on the Web come with no special addressing needs. In terms of addressing and identification, books on the Web benefit from the same universality[[Universality]] that all other Web resources enjoy.

Even when a user is offline, the user can read a previously-fetched Service-worker-offline-enabled book simply by navigating to that book’s universal address in a normal Web browser—that is, by browsing back to the same URL from which the user originally fetched the book for reading. When offline, the browser essentially uses an internal catalog to automatically map the URL for the book and the URLs for all its constituent resources to locally-cached locations from which it fetches the resources, rather then (re)fetching them over the Internet.

No special metadata needed

No special metadata is required to enable users to read a Web book or other Web publication offline. Because the state of a Web publication never changes whether the user is online or offline (only the state of the user’s Internet connection changes), and because to be readable offline, a Service-Worker-enabled Web publication never actually needs to leave the Web—in contrast to scenarios outlined in Portable Web Publications for the Open Web Platform where users are instead essentially required to have redundant “packed” copies of Web publications just to be able to read them offline, in separate off-Web reading systems—there are no special metadata requirements necessary for the scenario where a user is forced into using an off-Web copy of a Web publication. Such requirements are avoided just by never forcing users into needing to use off-Web copies.

No new styling/layout/pagination needed

The requirements for taking the hard-copy printed-page conventions of paper books and other paged media and emulating them on the Web are either largely already known or are the subject of ongoing discussion in the W3C CSS Working Group. What mostly remains is just for browser-engine projects to implement support for solutions that have already been specified, and to continue working with other participants in the CSS Working Group to get the remaining solutions identified. Direct discussion in the CSS Working Group is the most productive means to ensure the necessary requirements get identified and addressed.

Regardless, the requirement to make Web publications readable offline—whether in the way outlined in this document or in the contrary way envisioned in Portable Web Publications for the Open Web Platform—is completely orthogonal to any styling, layout, and pagination requirements. In other words, making Web publications readable by users offline introduces no new styling, layout, or pagination needs.

No new security model needed

Service-worker-offline-enabled books on the Web are simply websites that are served to users under the standard Web security model whether the user is online or offline. So, in contrast to the scenarios envisioned in Portable Web Publications for the Open Web Platform where users are required to use separate off-Web reading systems in order to read documents offline—which would necessitate creating a corresponding off-Web security model of some kind—no new security model is needed.

No standard personalization control needed

The requirement to make Web publications readable offline—whether in the way outlined in this document or the contrary way envisioned in Portable Web Publications for the Open Web Platform—is completely orthogonal to user abilities to personalize the presentation of a publication by adapting it to suit their needs.

Web browsers already provide some degree of built-in capabilities for users to personalize the presentation of Web publications as they read them (for example, to dynamically change font size as they read), and third-party browser extensions are also available for users to have a greater level of control over such presentation-related personalization (for example, to dynamically change background/foreground color schemes).

Authors and publishers of Web publications can also innovate in this area, by adding JavaScript code to enable their own built-in personalization controls in publications they provide to users. And JavaScript library developers can create shared libraries for authors and publishers to use for enabling such personalization controls.

So while there’s no disagreement on the value of users being provided with presentation-related personalization controls, there is no agreement to require any standard mechanisms for doing it. Instead, browser projects, extension developers, JavaScript-library creators, as well as authors and publishers can all innovate and compete in this area to try to produce the best possible user experience around such personalization.

No standard domain profiles needed

Different domains of publishing have different expectations from users around the nature of their content and their presentation. Educational publishers, for example, have a particular set of expectations from users for things like content structure and metadata. And comic books, for example, have a default presentation that is typically pre-paginated, fixed-form, and image-based, with a set of comic-book-reading user-interaction conventions commonly followed across publications from different authors and publishers.

However, it is a mistake to believe that the best way to address those domain-specific expectations is for standards organizations to attempt to produce designed-by-committee profiles that content can be authored and validated against (in order to produce predictability of content within particular domains, or for whatever other reasons). Past experiences have taught us that such attempts at producing domain-specific profiles of Web technologies are rarely successful—among other reasons because the market moves much faster than standards committees, and any specifications for profiles quickly become stale and irrelevant.

Instead, innovation in the area of addressing domain-specific user expectations for Web publications can and will just continue naturally in the market at the hands of JavaScript library creators, browser-extension developers, and authors and publishers themselves—as they compete and collaborate on new ideas about how to make the best possible user experiences for readers, as selling points to set their publications apart from others that are less creative about trying to meet specific domain expectations.

Occasionally, as such new ideas are allowed to incubate and mature naturally in the market, some agreement about standardizing particular mechanisms eventually emerges. But it is imprudent to attempt to prematurely mandate standard profiles or requirements before that natural incubation has had time to occur and mature.

Why not do something EPUB-inspired?

EPUB merits mention here for being the inspiration behind the “Portable Web Publication” idea envisioned in Portable Web Publications for the Open Web Platform (the original title of which was in fact EPUB+WEB).

As an older format that was developed outside the W3C for use in specialized off-Web reading systems, EPUB was designed in the years before users had smartphones and tablets with full Web browsers. In those years, it made sense for users to carry low-capability devices that were specialized just for reading electronic books, and it made sense for there to be a format that targeted such users.

However, in the intervening years, EPUB and the general approach underlying it have increasingly faced criticism for not being at all a good match for current user needs; for one representative example, see the article “The publishing industry has a problem, and EPUB is not the solution”[[EPUB]] from Jani Patokallio (who at the time he wrote the article was working for the publisher Lonely Planet, and now works for Google).

The shared understanding underlying those criticisms of EPUB and the general notion of any similar “portable” document format is that since users have commonly had smartphones and tablets with full Web capabilities for years now, there is no longer any user requirement for specialized low-capability devices just to read electronic books, nor for legacy off-Web specialized reading systems—and so going forward into the future, there is no need for EPUB as a format, nor more generally for any EPUB-inspired “Portable Web Publication” format as envisioned in Portable Web Publications for the Open Web Platform.

Instead what we should do is to look at the actual current user needs without concern for any existing legacy formats, and to start back from first principles and design fresh solutions—with the absolute highest priority given to user needs, over all other constituencies—and with the goal being purely to provide the best possible user experience of books on the Web and other Web publications to users today and for the future.

That is what this document does.

Our Vision

Why work on this now?

What are the areas of interest?

Users

Secondary constituencies

Terminology

Books on the Web

Offline contexts for Web publications

Enabling seamless reading experiences

General architecture and demo

No archive format needed

No publication manifests needed

No special addressing/identification needed

No special metadata needed

No new styling/layout/pagination needed

No new security model needed

No standard personalization control needed

No standard domain profiles needed

Why not do something EPUB-inspired?

Conclusions

Acknowledgments