Skip to main content Accessibility statement Glossary Sitemap

Call us on +353 (0)65 684 7040,

info at simiusweb.ie
, or use our online form

Accessible Document Conversion

Convert PDF, Word (and other formats) to accessible, semantic and lean HTML/XHTML

Converting documents from PDF, Word and other formats to accessible, semantic and lean HTML

Converting content from a variety of different sources, often in a wide range of formats, to HTML/XHTML is the single biggest task facing most Web Managers.

Making this content accessible, semantic, and lean helps to ensure that it:

  • can be read by a wider audience
  • maximises search engine optimisation
  • minimises loading times

The problem

Often we get asked "can you recommend any tools to auto-convert my document to HTML". This is a tricky question to which we respond "how well-formed are your source documents?".

Auto-conversion tools work on a set of rules which assume a degree of structural mark-up in the original document, whether it be PDF, Word, RTF or other format. The problem with this assumption is that most source documents are poorly and inconsistently marked up, many lacking any structure at all.

The result is an automatic conversion that is neither accurate nor consistent and tends to have very heavy amounts of tagging. Please let us stress that this is not the fault of the conversion tools. No automated system can cope without well structured source documents. The basic rule is "rubbish in-rubbish out", you can not expect more from an automated tool.

Using an automated tool can speed up the initial conversion process in many cases, but the post-conversion tidy-up that has to be applied manually can easily out-weigh any savings made.

Our approach

In the short term we advocate manual conversion of your documents by experienced and efficient staff.

We take documents and rapidly convert these to highly semantic XHTML in no time: a 200 page technical document can take us less than 3 hours to complete.

Whilst we prefer to stick to the purest mark-up possible we do recognise that it is often necessary for some websites to use additional classes and IDs in order to achieve advanced styling. If you have some clearly defined rules we can easily apply these during the conversion process.

Our preference is to work on bulk conversions or for organisations with on-going publishing needs. This allows us to learn your specific requirements and to be able to offer you the very best service at an excellent price.

If you have large, on-going requirements for document conversion we recommend a review of the entire information supply chain to identify if you may benefit from a document/content management system. If implemented correctly such systems can provide significant cost and time savings.

Document formats that we convert

We can convert from any of the following:

  • Microsoft files (e.g. Word, Excel, PowerPoint)
  • PDFs
  • OpenOffice files
  • StarOffice files
  • Any other open-source based format (e.g. .txt .rtf .csv or tab-delimited spreadsheets)

These are just the most common formats so if you do not see the file format you use above, please ask.

Semantics and coding standards

By default we convert to XHTML (1.0 Strict) but we can work to older HTML standards if your website is still using them.

Our code is 100% semantic and as such we can guarantee the highest levels of accessibility. It is also exceptionally lean code, reducing loading times, and keeping your website in excellent shape with no 'bloat'.

To discuss your needs please contact us.