Our vision of Books on the Web is for Web books to become
true first-class citizens of the Web, with user needs given the very
highest priority and thus with the focus on enabling the absolute best
possible user experience for readers of Web books and other
Web publications, by providing users with a
seamless reading experience that remains consistent
even if the state of a user’s Internet connection changes—so that
users can just always continue reading right on the Web, regardless of
whether the users are offline or online.
We believe that such a
seamless reading experience for users,
using a truly 100% Web-based approach, is very clearly the right way to
achieve a
“zero format- and
workflow-level separation between offline and online” goal.
Further, we believe it is clear that “zero separation between offline
and online” goal can be met without any kind of
“Portable Web Publication”
format, and in fact without resort in any way to the notion of
“portability”—but instead based just on the Web’s fundamental quality of
Universality[[Universality]],
never requiring users to download copies of Web publications
in order to read them offline but instead requiring nothing more from users
than to simply browse to the URL for a Web publication in exactly
the same way they normally would, and subsequently just be able to
automatically continue to access that Web publication at that same
URL regardless of whether they are offline or online. In other words, a
truly universal reading experience.
Why work on this now?
In June 2013, the W3C launched a new
“Digital Publishing” Interest Group
with
a charter for the group
that states the following goal:
the current format- and workflow-level separation between
offline/portable and online document publishing should be diminished to
zero
The W3C is now considering proposals for achieving that “zero separation
between offline and online” goal for Web publications, and this
document outlines a specific concrete technical direction for making that
actually happen—for making it a reality, rather just a vision.
However, the W3C is considering to instead invest organizationally in
a contrary direction based on the much vaguer idea of an EPUB-inspired
“EPUB+WEB”
“Portable Web Publication” format as envisioned in
Portable Web Publications for the Open Web Platform,
and as illustrated in .
It would be a mistake to invest in that or any direction that is based
on the assumption that the solution needed is a “portable” document format
usable outside of the Web.
To be very clear about the distinction, a reading system is not part of the Web if:
- the reading system does not use URLs
- the reading system does not use the Web security model
- the reading system requires users to deal with the extra steps of
having a “packed” Web publication, copied off its home on the Web,
that then needs to be “unpacked” in order for the user to read it
Such a reading system is instead just part of the non-Web, and
designing any solution for users that optimizes for “portability” to such
off-Web reading systems is a mistake. While a document format that is
“portable” into such off-Web reading systems may solve problems for
constituencies other than users, it is absolutely not the right way to
provide the best possible user experience for users themselves.
So instead of designing for “portability”, what’s actually needed is to
design for
universality[[Universality]]—using
a true 100% Web-based approach that doesn’t require users to download
copies of Web publications in order to read them offline but instead
requires nothing more from users than to simply browse to the URL for a
Web publication in exactly the same way they normally would, and
subsequently just be able to automatically continue to access that
Web publication at that same URL regardless of whether they are
offline or online.
Explaining how such
seamlessly consistent reading experiences of Web publications
can be achieved for users is the purpose of this document—as well as
making clear why the direction outlined in this document is a much better
solution for providing the best possible user experience to users—and truly
achieving the “zero format- and workflow-level separation between offline
and online” goal in the
“Digital Publishing” Interest Group charter—than
is the contrary direction outlined in
Portable Web Publications for the Open Web Platform.
What are the areas of interest?
A principled approach to making books first-class citizens of the Web
involves addressing the needs of the following stakeholders:
Users
The right approach to making the best possible user experience
for books on the Web starts simply by meeting the needs of users
first, beyond all other needs. The needs of other constituencies will
naturally be addressed by meeting user needs (at least for other
constituencies that are actually acting in good faith to meet user needs,
instead of acting in practice against the best interests of users).
In contrast to the vision presented in
Portable Web Publications for the Open Web Platform,
this document starts from a realization that users in fact do no want
to be ushered (however subtly) into having to make choices among
different reading systems for the same content, depending on the
particular reading contexts they find themselves in. Users instead want
a truly universal system for reading. Users want the Web.
Before 2008 or so, when smartphones and tablets capable of running full
Web browsers with full access to the Web began to become prevalent,
there was some value to users in having specialized off-Web devices with
specialized reading systems. But for years now, users have actually had
smartphones and tablets with full Web capabilities—so there is no longer any
user need or desire for such specialized off-Web devices, nor for such
legacy off-Web specialized reading systems.
Even in the case where, say, roaming charges are often high, or when
internet access may be of low quality or not available at all, the Web
itself—through
Service Workers—provides
users with the possibility to automatically have, in an easy manner,
offline access at any time, anywhere, to any Web publication they
are reading.
So the best possible user experience can now be provided to readers just
through the power of the Web. A truly universal reading experience.
Secondary constituencies
Secondary constituencies who can help give readers the best possible user
experience of books on the Web:
- Publishers
-
Book publishers are investing in the development of technical
expertise in web technologies. While gaining understanding of technical
topics is important to new and future publishing workflows, the lack of
communication between the trade publishers and web-application developer
communities is resulting in unnecessary duplication and investments in
effort.
Collaboration between the Web content development and publishing
communities will result in major benefits to publishers. Moving to
a truly user-first model of books on the Web that takes full
advantage of the
universality[[Universality]]
of the Web means publishers can concentrate on engaging content
authors in the production of high quality content.
Along with familiar, established Web technologies such as CSS and
SVG, support for technologies enabling features such as 3D rendering
(and even visualization tools such as D3)
will naturally flow into the publishing realm by moving to a user-first
model of books on the Web, hence increasing publishers’
opportunities to sell new content products across the board.
Realizing new opportunities is a reality for publishers
traditionally considered to be on the leading edge of technological
advances in working with content. These publishers include
STM and educational publishing houses, as well as scholarly
and journal publishing organizations.
A user-first model of books on the Web will support more
tools and services and a much larger population of trained practitioners
compared to the current state of working in parallel universes.
- Scholarly journal and STM publishers
-
Scholarly journal publishers also provide articles for download
these days. The most popular distribution format for journal articles
continues to be PDF—as a direct reflection of the scholarly community,
which highly prioritizes linear text and preservation of print typography.
Indeed, the original goal for scholarly publishers to make files available
online was to enable readers to download and print content directly,
instead of borrowing a paper copy of a journal issue and photocopying
relevant articles.
But things are changing. First of all, Web-only publications
become part of the mainstream (e.g., the multidisciplinary PLOS ONE or the new PeerJ CompSci journals) with
the main content being published using traditional Web technologies like
HTML and CSS. And there is much more. Scholarly communication increasingly
uses additional media such as video, audio, animated graphics, or very
large images, and the trend is to consider these as integral parts of the
scientific output. (Mike
Bostock’s recent article on visualizing algorithms or the “live”
presentation of data in a paper published by
F1000 Research are good examples for the new possibilities.)
Furthermore, publishing the scientific data sources, like the
results of, say, a sociological survey or measurement output of biochemical
experiments alongside the “main” publication, is
also coming to the fore, with some journals and institutions actually
requiring public access to those. Gaining truly universal access to all
these various media and contents is important for scholars—in order
for them to be able to read articles at any time, anywhere, without
ever being forced into using off-Web reading systems.
And tools such as peer-review systems and bibliographic management systems
like
Mendeley or
Zotero, to
the degree they don’t yet handle 100% Web-based publications, will
necessarily evolve to support them—in order to better meet the needs of
scholars, and to provide the best possible user experience for scholars.
And as those tools evolve to actually put the needs of their scholarly
users first, any requirement for authors to be forced to try to duplicate
Web publications into redundant offline versions of the same
content—in order to compensate for deficiencies and short-sightedness in
the user model—will disappear.
And with the advent of
Service Workers,
things like user annotations, formal reviews, etc., can be performed
by the scholar online or offline directly in Web applications running in a
normal mobile browser on small, mobile devices—with no need for a separate
redundant off-Web version of a publication, and so no need to ever be
forced to have to synchronize the Web publication with any off-Web
version.
The transformative effects of moving to a Web-first model that truly
places the needs of users above all else will fundamentally change
the way scholarly publishing works—for the better.
- In-house publishers
-
A special form of document production is related to technical
and/or user documentation of complex products as well as complex
administrative documents. Such documents are often akin to STM or scholarly
publications edited by traditional trade or scholarly publishers—but,
often, the sheer quantity and complexity of production, as well as
confidentiality requirements, mean that the production are done in-house.
In many respects major corporations such as IBM, Intel, Renault, or Boeing,
or institutions like the European Commission, the FAO, or the UNESCO have
become specialized publishers themselves.
The quantity of such documentation makes it infeasible to produce these
documents in print (or print-only); instead, Web-publishing them is
the optimal way to provide the best possible user experience for them.
Just as for scientific publications, moving to a user-first model of
books on the Web will provide new possibilities for these types of
in-house publications. Using
Service Workers,
a Web publication can be made fully usable even when offline—in, for
example, an airplane cockpit.
- Libraries and archival services
-
The archiving of digital assets is coming to the fore as a
significant issue for dedicated institutions like national libraries. With
the arrival of highly dynamic and possibly interactive Web publications
as primary content, the traditional means of archiving (i.e., storing on some
backup device for long term preservation) is no longer adequate.
Web publications depend on a multitude of auxiliary files, like CSS
stylesheets, images, videos, JavaScript programs, etc. The good news is
that the Web itself already provides the means for archiving such
publications, and there will be large growth in business opportunities
for developers of systems that cater to this need well.
- Financial reporting
-
The domain of financial reporting is increasingly making use of
dynamic and interactive content for reports published on the Web—but in
order to meet legal, audit, and market requirements, it is important to be
able to identify a specific, unchanging version of the document, including
all auxiliary files it references.
For example, financial reports must often be submitted to tax
authorities, company registrars, stock exchanges, securities regulators,
and other institutions, and it is critical to be able to identify the
specific version that was filed, and to ensure that the document is not
subsequently altered, including through changes to any external files that
it references. Similarly, an auditor will give an opinion on a financial
document, and it is important to be able to identify the exact version that
was audited. These documents should be universally available in a
common location for all parties to access, while still enabling
confidentiality of the information among the parties.
The Web itself provides the means for both universally and
confidentially serving snapshot versions of such reports to all
authenticated users who need access to them. And in contrast to off-Web
Portable Web Publications,
the Web itself—through standard Web features such as TLS (HTTPS) and
Subresource Integrity—provides
for any user to fully and independently verify the provenance and
integrity (against any tampering in transit) of any Web-published
report—including the integrity of all auxiliary files it references; that
is, the Web provides robust digital signing and non-repudiation mechanisms
for such reports.
Terminology
This document is based on the following definitions.
Books on the Web
-
The term
Books on the Web
itself is chosen to convey, in the clearest and simplest possible
way, that a Web book is simply just a website, or a subdirectory or
path within a website, with a Web URL from which the book is made
available to readers—a URL that can be passed around (for example,
sent to others via e-mail) and that forms a universal address for the
book for potential access by anyone, anywhere.
And the term book is used here in part just as shorthand
for any Web publication—a scholarly journal, or an
in-house publication, or a
financial report, or whatever other
class of document published on the Web.
That said, books as a class are called out in this term because in
practice currently, the user experience around reading books is often
made unnecessarily different, in suboptimal ways, from the user
experience of reading other classes of online publications—for example,
online newspapers.
The difference with books in practice is that book readers are
unnecessarily ushered into reading books in specialized off-Web reading
systems that do not participate in the richness of the full feature set of
the Web and so that do not provide users with the full advantages of the
Web and the best user experience possible.
But there is no reason in principle why reading a book should
ever require a user to need an off-Web reading system—any more than
reading a newspaper should require a user to need an off-Web reading
system. So the term books on the Web is meant to convey the
lack of any need for a packaging mechanism like the one on which
the vision in
Portable Web Publications for the Open Web Platform
is premised.
If specialized reading systems are made superfluous, then there is no
need for a “portable” package format for Web publications;
instead, books and other publications can be universally accessed as
100% true Web publications—that is, just as books on the Web.
- A Web Publication
is a Web Resource which itself is an aggregated set of
interrelated Web Resources, and which is intended to be considered
as a single Web Resource. Furthermore:
(Note that
Portable Web Publications for the Open Web Platform
redefines this term to include mention of delivery through
undefined “delivery platforms” other than the Web.)
- A Web Resource is
anything that can be uniquely addressed by a Web URL; that is, content
that can be accessed through baseline Web protocols such as HTTP and
through secure Web protocols such as HTTPS/TLS, and whose integrity
can be verified using Web features such as
Subresource Integrity,
etc.
(Note that
Portable Web Publications for the Open Web Platform
incorrectly redefines this term too broadly to include off-Web
resources on a user’s local file system, not just resources actually
on the Web.)
- Content of a web resource: information and sensory
experience to be communicated to the user by means of a user agent,
including code or markup that defines the content’s structure,
presentation, and interactions.
- Essential Content
of a Web Resource: content which, if removed, would
fundamentally change the information or functionality of the Web
Resource.
- Functionality related to a Web Resource:
processes and outcomes achievable through user action.
The concepts of content, essential
content, and functionality have been taken from the W3C
Web Content Accessibility Guidelines,
though slightly modified for this context.
Offline contexts for Web publications
It is important to understand that in order to be made available to
users for reading offline, a Web publication never needs to
go through any explicit transformation—specifically, a Web publication
never needs to be packaged up into a “packed” off-Web format in the way
described in
Portable Web Publications for the Open Web Platform,
and so also then never needs to be “unpacked” for users to read
in a different context.
So the only state actually changing in relation to the reading context
is the state of a user’s Internet connection: A user is either online,
with an active Internet connection, or offline, with no active Internet
connection.
Ideally, to provide the best possible user experience for reading Web
publications, a Web publication should remain readable and
usable regardless of the current state of the user’s Internet connection.
Users should be able to navigate to the URL for a Web publication
while online, but then subsequently, even when the state of their
Internet connection changes and the users are offline, they should still
be able to seamlessly continue reading the Web publication, in a
Web browser, at the same URL they originally navigated to.
Service Workers
enable such a reading scenario—in which the
reading experience remains seamlessly consistent
even if the state of the user’s Internet connection changes.
It seems clear that users would prefer such a Service-Worker-enabled
seamless reading experience—rather than, in the way described in
Portable Web Publications for the Open Web Platform,
being required to instead deal with getting the Web publication
transformed into a “packed” off-Web format, in preparation for reading offline,
and then needing for that to be “unpacked” for the users to actually
read in an offline context.
Enabling seamless reading experiences
This section provides details of an approach for providing users with
the best possible user experience of books on the Web, and with
Web publications in general—regardless of the state of users’
connections to the Internet. It also lists a number of non-requirements
that don’t need to be considered when designing reading experiences for users
that remain seamlessly consistent whether the users are online or offline.
General architecture and demo
Service Workers[[Service-Workers]]
offer the means to provide users with a seamlessly consistent reading
experience of a Web publication even if the state of the user’s
Internet connection changes and the user drops offline.
Service Workers work by providing programmable local caching of
Web resources, acting as a network proxy while a user is offline—in
such a way the rest of the browser engine continues to behave just as it
did when the user was online, but now with a Service Worker intercepting
all network requests for any Web resources that are part of a
particular Web publication and automatically re-routing them to
be served from the local cache instead.
The end result for users is that nothing changes: Even while they are
offline, their reading experience remains exactly the same—seamlessly
consistent, even though the state of their Internet connection has
changed.
In this architecture there is absolutely no need for any additional
“packing” and “unpacking” steps such as those proposed in
Portable Web Publications for the Open Web Platform.
Instead, Service Workers alone can provide all that is needed to ensure
a seamless reading experience.
For proof-of-concept demonstrations of Service-Worker-offline-enabled
Web publications, see the following:
Both are based on code from an earlier demo developed by
Jake Archibald.
Jake has also created another demo that uses Service Workers in the
same way to allow you to create your own custom “book” of Wikipedia pages
to take offline for reading anywhere:
https://wiki-offline.jakearchibald.com/
It’s easy to imagine many other interesting demonstrations for making
existing content readable offline.
No archive format needed
A Service-Worker-enabled Web publication requires no archiving
step in order to remain readable while offline.
So, given that the Web itself can provide a truly universal reading
experience even when users are offline, there is no need for any
archive format to provide “portability” of a Web publication
to off-Web legacy reading systems in the way envisioned in
Portable Web Publications for the Open Web Platform.
No publication manifests needed
A Service-Worker-enabled Web publication requires no publication
manifest in order to remain readable while offline.
That said, an author or publisher might choose to use a manifest when
implementing such a publication, and manifests can be useful for other
purposes; they are just not a strict requirement for offline reading.
No special addressing/identification needed
URLs (generally HTTP/HTTPS URLs) serve as universal addresses for
resources on the Web. Books on the Web and all other Web
publications are simply just websites, with URLs that are identify
them. Therefore, books on the Web come with no special addressing
needs. In terms of addressing and identification, books on the Web
benefit from the same
universality[[Universality]]
that all other Web resources enjoy.
Even when a user is offline, the user can read a previously-fetched
Service-worker-offline-enabled book simply by navigating to that book’s
universal address in a normal Web browser—that is, by browsing back to
the same URL from which the user originally fetched the book for
reading. When offline, the browser essentially uses an internal catalog
to automatically map the URL for the book and the URLs for all its
constituent resources to locally-cached locations from which it fetches
the resources, rather then (re)fetching them over the Internet.
No new styling/layout/pagination needed
The requirements for taking the hard-copy printed-page conventions
of paper books and other paged media and emulating them on the Web are
either largely already known or are the subject of ongoing discussion
in the W3C CSS Working Group. What mostly remains is just for
browser-engine projects to implement support for solutions that have
already been specified, and to continue working with other participants
in the CSS Working Group to get the remaining solutions identified.
Direct discussion in the CSS Working Group is the most productive means
to ensure the necessary requirements get identified and addressed.
Regardless, the requirement to make Web publications readable
offline—whether in the way outlined in this document or in the
contrary way envisioned in
Portable Web Publications for the Open Web Platform—is
completely orthogonal to any styling, layout, and pagination
requirements. In other words, making Web publications readable by
users offline introduces no new styling, layout, or pagination needs.
No new security model needed
Service-worker-offline-enabled books on the Web are simply
websites that are served to users under the standard Web security model
whether the user is online or offline. So, in contrast to the scenarios
envisioned in
Portable Web Publications for the Open Web Platform
where users are required to use separate off-Web reading systems in
order to read documents offline—which would necessitate creating a
corresponding off-Web security model of some kind—no new security model
is needed.
No standard personalization control needed
The requirement to make Web publications readable offline—whether
in the way outlined in this document or the contrary way envisioned in
Portable Web Publications for the Open Web Platform—is
completely orthogonal to user abilities to personalize the presentation of
a publication by adapting it to suit their needs.
Web browsers already provide some degree of built-in capabilities for
users to personalize the presentation of Web publications as they
read them (for example, to dynamically change font size as they read),
and third-party browser extensions are also available for users to
have a greater level of control over such presentation-related
personalization (for example, to dynamically change background/foreground
color schemes).
Authors and publishers of Web publications can also innovate in
this area, by adding JavaScript code to enable their own built-in
personalization controls in publications they provide to users. And
JavaScript library developers can create shared libraries for authors and
publishers to use for enabling such personalization controls.
So while there’s no disagreement on the value of users being provided
with presentation-related personalization controls, there is no agreement
to require any standard mechanisms for doing it. Instead, browser projects,
extension developers, JavaScript-library creators, as well as authors and
publishers can all innovate and compete in this area to try to produce
the best possible user experience around such personalization.
No standard domain profiles needed
Different domains of publishing have different expectations
from users around the nature of their content and their presentation.
Educational publishers, for example, have a particular set of
expectations from users for things like content structure and metadata.
And comic books, for example, have a default presentation that is
typically pre-paginated, fixed-form, and image-based, with a set of
comic-book-reading user-interaction conventions commonly followed
across publications from different authors and publishers.
However, it is a mistake to believe that the best way to address those
domain-specific expectations is for standards organizations to attempt
to produce designed-by-committee profiles that content can be authored
and validated against (in order to produce predictability of content within
particular domains, or for whatever other reasons). Past experiences have
taught us that such attempts at producing domain-specific profiles of Web
technologies are rarely successful—among other reasons because the market
moves much faster than standards committees, and any specifications for
profiles quickly become stale and irrelevant.
Instead, innovation in the area of addressing domain-specific user
expectations for Web publications can and will just continue
naturally in the market at the hands of JavaScript library creators,
browser-extension developers, and authors and publishers themselves—as
they compete and collaborate on new ideas about how to make the best
possible user experiences for readers, as selling points to set their
publications apart from others that are less creative about trying to
meet specific domain expectations.
Occasionally, as such new ideas are allowed to incubate and mature
naturally in the market, some agreement about standardizing particular
mechanisms eventually emerges. But it is imprudent to attempt to prematurely
mandate standard profiles or requirements before that natural
incubation has had time to occur and mature.
Why not do something EPUB-inspired?
EPUB merits mention here for being the inspiration behind the
“Portable Web Publication” idea envisioned in
Portable Web Publications for the Open Web Platform
(the original title of which was in fact
EPUB+WEB).
As an older format that was developed outside the W3C for use in
specialized off-Web reading systems, EPUB was designed in the years before
users had smartphones and tablets with full Web browsers. In those years,
it made sense for users to carry low-capability devices that were
specialized just for reading electronic books, and it made sense for there
to be a format that targeted such users.
However, in the intervening years, EPUB and the general approach
underlying it have increasingly faced criticism for not being at all a
good match for current user needs; for one representative example, see
the article
“The publishing industry has a problem, and EPUB is not the solution”[[EPUB]]
from Jani Patokallio (who at the time he wrote the article was
working for the publisher Lonely Planet, and now works for Google).
The shared understanding underlying those criticisms of EPUB and the
general notion of any similar “portable” document format is that
since users have commonly had smartphones and tablets with full Web
capabilities for years now, there is no longer any user requirement for
specialized low-capability devices just to read electronic books, nor for
legacy off-Web specialized reading systems—and so going forward into the
future, there is no need for EPUB as a format, nor more generally for
any EPUB-inspired
“Portable Web Publication”
format as envisioned in
Portable Web Publications for the Open Web Platform.
Instead what we should do is to look at the actual current user needs
without concern for any existing legacy formats, and to start back from
first principles and design fresh solutions—with the absolute highest
priority given to user needs, over all other constituencies—and with the
goal being purely to provide the best possible user experience of
books on the Web and other Web publications to users today
and for the future.
That is what this document does.
Conclusions
This document outlines a vision for achieving the best possible user
experience for readers of books on the Web and other
Web publications, by looking at the problem from a fresh
perspective—starting from first principles.
So in this vision there is no need for “convergence” with other legacy
off-Web approaches, nor any need to require users to ever “switch” back and
forth between the Web and legacy off-Web reading systems, and no need for a
new “portable” document format, and in particular no need for an
EPUB-inspired
“EPUB+WEB”
“Portable Web Publication” idea as envisioned in
Portable Web Publications for the Open Web Platform.
Instead as outlined here we can use
Service Workers
and the Web’s fundamental quality of
universality[[Universality]]
as the foundation to provide users with a user-optimized seamless
reading experience of books on the Web and other Web
publications any time, anywhere, regardless of their reading
context—specifically, we can give users a 100% Web-based reading experience
that
remains seamlessly consistent whether the users are online or offline.
Just through the power of the Web itself, we can give users a truly
universal reading experience.