Skip to main content

PDF accessibility introduction

Published on May 30, 2019 in PDF accessibility category

PDF icon with accessibility icon overlaid on top

Although web accessibility is most commonly thought of when talking about digital accessibility, document accessibility also plays an important role. Document formats such as Word, PowerPoint, Excel, ePub, and most commonly PDF, are quite common and are usually linked within web pages or sent via email. As such, it’s important to make sure that those documents are accessible as well.

Overview

Before diving into PDF accessibility, ask yourself do you really need a document in the PDF format. Usually, you can achieve better accessibility much more easily and faster via regular HTML pages.

PDF format

PDF is the end format, not an authoring tool. What that means is that you will usually create your document in some other software, most commonly in the Office suite (Word, Excel, PowerPoint), and then convert that source document into PDF. Apart from the Office suite, HTML pages or scanned documents are also common sources from which PDFs are generated.

PDF is used because it has the ability to preserve the visual formatting. That means that you can be sure that your document will look exactly the same on any computer and won’t be “broken” in the process.

PDF is an open standard and as such has wide software support, mainly in terms of various PDF readers. The most popular one is the free Adobe Reader.

Preserving accessibility

In the ideal world, as you would write a document in eg. Office Word - all the accessibility features would be preserved during conversion to PDF and as such PDFs would be accessible by default (with the assumption that your source document is accessible). However, that is usually not the case and most generated PDFs are not accessible. That is why you need to tweak and fine-tune that PDF afterward with Adobe Acrobat Pro, even if your source document is fully accessible.

Adobe Reader vs. Adobe Acrobat Pro

There is a differentiation between these two programs.

Adobe Reader is just that, a PDF reader, while with Adobe Acrobat Pro you can tweak and fine-tune the PDFs as well. Adobe Acrobat is also packed with much more features such as the creation of PDFs from various sources (images, web pages, etc.), integrated OCR (Optical Character Recognition) which helps you to convert your scanned documents into text, and more.

Adobe Reader is free, Adobe Acrobat Pro is paid only.

If you would like to just read your PDFs, you can also use any other PDF reader, such as Foxit Reader, Nitro Reader, Sumatra PDF, etc. Some are free, some are paid only.

If you would like to make your PDFs accessible, you will need an Adobe Acrobat Pro so you can add metadata, create/modify tags, rearrange reading and stacking order and more.

PDF accessibility

Because the PDF is derived from various sources (Word, Excel, HTML, scanned pages, etc.) - PDF accessibility starts before the PDF document even exists. That means that the most efficient way to make PDF accessible is to start with the foundations - in this case with the accessibility of source documents.

When your source document is accessible, chances are that your PDF will need only some minor tweaks here and there, instead of creating accessible PDF from scratch.

That means that even if you already have the PDF and the source document, always scrap the PDF and start from the source document. If you don’t have the access to the source document, but only the PDF file, well then you’ll need to work with that - doable, but simply not the most efficient way of doing things.

There are three steps to an accessible PDF:

  1. optimize source document
  2. convert it to a tagged PDF
  3. touch-up resulting PDF

Optimize source document

Because source document can be a lot of things, including but not limited to Word, Excel, PowerPoint, ePub, HTML, and scanned images, I won’t include software-specific details but just provide an overview. For specific instructions for eg. Word see Make your Word documents accessible to people with disabilities.

Some source documents can be more, and some less accessible. One of the more accessible ones is a Microsoft Word document, while eg. Google Docs isn’t that great. For example, you can’t mark table cells as header cells in Google Docs - which means that table won’t be accessible to screen readers by default but will need to be made accessible post-conversion using Adobe Acrobat Pro.

The simple goal is to use as much of the accessibility features in your source document as possible so that when you’re converting it to a PDF, most of those accessibility attributes are preserved.

There is a range of accessibility features/attributes, but most common ones are:

  • headings
  • lists
  • document and passage-specific language
  • images
  • hyperlinks
  • headers/footers
  • footnotes
  • tables
  • forms

If supported, each of those features can be embedded into the source document, and as such will provide not only visual but semantic meaning as well which can then be programmatically accessed by assistive technology.

For example, instead of changing the font size, font type or color manually to create a “heading”, use the predefined Heading style in Word. Instead of drawing a table, use the Table option in Word to create a table. Instead of creating a list by indented asterisks, use the List option to create lists.

It’s the same thing as in HTML - instead of changing the font size, type, and color to simulate a particular heading, you would use the heading elements - <h1> through <h6>. Why? Because that way assistive technology, based on those tags, can differentiate headings from the rest of the regular text and provide some additional options on top of that (eg. quick navigation via Headings).

When you finished your source document and made is as accessible as possible, it’s time to convert it into PDF.

Convert to tagged PDF

Tagged PDF means that resulting PDF will contain tags (the same way that HTML contains tags such as a, img, table, etc.) which serve as a foundation for assistive technologies to know the semantics. In other words, the non-tagged PDF is non-accessible PDF. Tagged PDF usually means that PDF is accessible, although you can tag your PDF with completely invalid tags or with 99% tags missing.

Tags within a PDF document, as viewed via Adobe Acrobat Pro

Only correctly tagged PDF is accessible PDF.

Most common methods of converting a source document to a PDF are:

  • using plugins in document authoring tools (such as MS Office, Firefox, etc.)
  • opening and converting documents directly in Adobe Acrobat Pro
  • using a third-party conversion tool or API (Adobe provides such PDF conversion tool as well)

Pop-up for Adobe Create PDF cloud service when converting to a PDF

Depending on the method, preserved accessibility features will vary. One of the better methods is using plugins or Adobe Cloud document to PDF conversion tool. One of the worse methods is “Printing to a PDF” as no accessibility features will be preserved.

Touch-up resulting PDF

Metadata

PDF metadata are essentially document properties such as document title, language, whether it’s a tagged PDF or not, etc. There are just a few metadata attributes, but they need to be provided for an accessible PDF.

PDF metadata

All of the properties can be modified with Adobe Acrobat Pro.

PDF tags

The most important part of the PDF accessibility are the tags. With tags, you provide semantics (meaning) so that assistive technology can correctly interpret a specific piece of content (is it an image, a paragraph, or a table?).

If you are familiar with HTML, you’ll quickly grasp the PDF tags as they are quite similar (eg. h1 in HTML is H1 in PDF, ul in HTML is LI in PDF, etc.).

Unlike with HTML, all of your tags modification happen within Adobe Acrobat Pro with either Tags panel or via Touch Up Reading Order panel. Tags panel is more code-based (you see actual tags) and more comprehensive, while Touch Up Reading Order is a more visually-based tool.

Touch Up Reading Order panel

Essentially, you create/modify/delete tags and then associate them with the content in the PDF. The result is that you have a list of tags, where each tag is tied to specific content (eg. Figure tag to an image) with optional attributes (eg. Description which is an equivalent to an alt tag in HTML).

Important: tag order is screen reader reading order - meaning screen reader will read the content one after the other based on the ordering in Tags panel.

Reading order

When you have your tags ready, you need to make sure that reading order is correct. For simple documents that is usually the case by default, however, for more complex one’s things can become a bit tricky.

Before going further you must know about the Content Reflow option. Reflow is a temporary view that shows the content of a PDF document in a single column, taking up the entire width of the document pane. Users then have the option to magnify text to their desired size, possibly making it easier for them to read.

Reflow view

There are two types of reading order:

  1. screen reader reading order
  2. content reflow reading order

Screen reader reading order is based on ordering in Tags panel and is something that screen readers use, while content reflow reading order is based on ordering in Order panel and is something that Acrobat uses when users activate the Content Reflow option. Those two are usually the same, however, they don’t need to be necessarily.

Tab order

This one is easy as all you need to do is make sure the Use Document Structure is selected as an option in Page Properties - Tab order tab. That ensures that tab order is based on the tags (order) that you specified earlier - the same order that screen readers use.

Use Document Structure option inside the Page Properties panel

Check your results

When completed, you’ll need to check/verify your results before marking the PDF as accessible. There are, as it’s the case with webpages, automated and manual means.

Adobe Acrobat Pro has it’s own Adobe Acrobat Pro Accessibility checker - an automated solution which checks for some things (eg. missing alt text, non-tagged content, etc.) and can be a good starting point.

Adobe Acrobat Pro Accessibility checker results

However, it’s required that you test it out manually with your screen reader of choice (although the recommendation is to test it with at least two screen readers - eg. JAWS and NVDA, because VoiceOver on Mac is quite rudimental and doesn’t support navigation through headings, lists, tables, etc.).

Provide alternative formats

Although your PDF is completely accessible, that is not quite the case on Mac and mobile devices which lack good support for accessible PDFs. That is why it’s recommended to provide an alternative format - usually an HTML page which can be easily accessed on Mac and mobile devices.

You can provide the source document as well, eg. Word document, however, although the majority of users have the software support to read such a format (Microsoft Office), HTML is a safer bet as it’s universally supported.

Summary

First, ask yourself do you need a document in PDF format or not. If not, then use the good-old HTML page and you will be good to go. If, however, you need a PDF document, make sure to think about accessibility from the start - starting inside your source document.

When finished with your source document, convert it to a tagged PDF with a tool that will preserve as much accessibility feature as possible. Then, tweak it using Adobe Acrobat Pro (add metadata, check tags, reading and tab order), and finally check using automated and manual solutions. Provide alternative formats if possible.

I’ll go into nitty-gritty details in later posts, but this should serve as a brief overview of how things work and how to create an accessible PDFs.

Sven Kapuđija profile image
Sven Kapuđija
Author