With over one billion people worldwide affected by disabilities, there is good reason to make digital content more accessible. When it comes to digital PDF documents, the PDF/UA (Universal Accessibility) standard aims to deliver disabled users a first-rate digital experience. To ensure this, PDF/UA expands upon the general PDF 1.7 specifications with some additional requirements, PDF applications and assistive technologies (AT). By improving this general accessibility for disabled persons by adding more structure and meta-data, PDF/UA also opens up options for machine-reading that benefit all users, with or without disabilities. In this blog, we explore those benefits.
For more general information about the standard, requirements, validation, key stakeholders, laws and regulations and creating and manipulating PDF/UA with a PDF library, we recommend reading the newly released ebook: PDF/UA: the inclusive document format or watching the webinar.
People with motor disabilities are often limited by their input controls when viewing and navigating a PDF document. In a best-case scenario, they may navigate the document with an adaptive keyboard, but often even with eye-tracking software, a sip-and-puff switch, and so on, it is easy to see that viewing and navigating a document with these devices require more effort than the traditional mouse and keyboard setup. The PDF/UA requirements aim to make sure navigation with these AT devices can happen as efficiently as possible. It does so by requiring tagging of structural elements and mapping these in a structure tree that represents a logical reading order.
Yet users without disabilities also use devices with limited input controls on a daily basis. Navigating complex and inaccessible documents on our phones and tablets can be a frustrating experience. Luckily, we can already alleviate much of those frustrations by allowing for automatically adjusting content to a readable size and layout. Furthermore, navigating can be made easier by allowing the user to jump to certain structural elements in the document. This can only happen if the document is properly tagged.
Thanks to text-to-speech it is now possible to let your car or your home assistant (such as Alexa, Siri, Google Assistant, Cortana, or the open-source Home Assistant) read your documents for you, while you can go about your business and have your hands free.
Since PDF/UA compliant documents cater to users with visual impairments that often use screen readers, such documents will already have the right reading order, language detection, and alternative descriptions for images in place to potentially make it a far better experience.
We say potentially because today home assistants don’t yet fully leverage PDF/UA’s provisions. However, web content-focused initiatives like Google’s Read It aim to make long-read content available no matter the form, and PDF is certainly an essential format for long-read content.
ngPDF stands for the next generation of PDF and focuses on the challenge of making PDFs a first-class citizen of the web, where users consume information through a wide range of devices with diverging screen sizes.
The Tags in PDF/UA and the structure tree are comparable to the HTML tags and the DOM (Document Object Model) they form, and this similarity is exactly what ngPDF leverages. It does so by using an algorithm developed by the PDF Association that can produce a reliable HTML presentation of a properly Tagged PDF document. For example, the PDF structure element for a paragraph P translates to a p tag in HTML. There is an available that serves as a proof of concept where you can see this algorithm in action. This demo was created by Dual Lab and powered by the iText PDF Library.
You can read more about ngPDF in the whitepaper co-authored by iText and Dual Lab Web-Friendly PDFs with ngPDF.
Making a PDF document accessible allows easier web crawling by search engines such as Google. This forms an important part of SEO and improves the likelihood of your PDF finding its way to your target audience. Google has been indexing PDFs since 2001, and today, PDFs are even included as featured snippets.
How does PDF/UA help? Well first and foremost, PDF/UA requires text-based PDFs, rather than image-based PDF. This makes the document’s body text searchable. If you want to convert your scanned image-based PDFs, you can do so by using an OCR (Optical Character Recognition) tool such as iText pdfOCR.
Since we also tag structure elements like headings (H1, H2, H3 and so on) in PDF/UA, search engine algorithms can navigate the document structure and determine which content is more important. PDF/UA also requires alternative descriptions for images, not only allowing access by screen readers for people with a visual impairment but also allowing web crawlers to understand what an image represents. Finally, PDF/UA requires for the language of all document content to be specified so the web crawler immediately understands what language (or languages) the document is written in.
Many of the requirements of PDF/UA improve machine readability by adding structure and metadata to a PDF, and thereby benefit users with and without disabilities alike. Making documents accessible was already the right thing to do, since it influences information access for people with disabilities greatly. And since accessibility is mandated by law in many countries, failing to provide accessible documents can create a legal liability for organizations. Now we have learned PDF/UA’s universal benefits add yet another incentive to make your documents compliant.
Also Published Here