Jan's Illustrated Computer Literacy 101 Logo:Jegsworks Jan's Illustrated Computer Literacy 101


Home > Jan's CompLit 101 > Working with the Web > HTML & CSS Basics > Convert
Icon: Arrow - Previous pagePrevious    NextIcon: Arrow - Next page

Jan's Working with the Web

   Convert

Often people want to use an existing document on their web site. Why rewrite material that you have already massaged into an attractive and useful shape? Unfortunately, it is not quite as straight-forward as you might think to get a well-shaped web page from a paper document. They do not share the same definition of what makes a "good" page.

In the examples below, the original Word document was saved as PDF and also saved as HTML. Very different results!

Example 1: Text document with formatting

Example: Word document
Example: Word doc saved as PDF Example: Word doc saved as web page, HTM


Example 2: Complex document with images, charts, table:

Example: Complex Word doc Example: Complex doc saved as PDF Example: Complex Word doc saved as web page


What's PDF?

Many documents for download on the Internet are in PDF format. PDF stands for Portable Document Format. This format was designed to duplicate the formatting and layout of the original printed document. It requires a PDF reader program to open and read the document. Adobe ReaderIcon: Off Site and many other similar programs are available for free download. Most people who use the Internet have such a program already.

When to use PDF?

  • Exact layout is important.

  • Your intended users don't have a program that can open the original file type of the document.
  • You want to keep viewers from editing the document. 

With the better PDF writing programs, you can control who can open your PDF, who can make changes and what type of changes. You can make a form that the user can fill in while online, but the user cannot edit the form itself. Many cool features are not available in programs that just save in PDF format! Adobe Acrobat is the standard (expensive) PDF writing program but there are also many free programs that offer many of the advanced features.


Design Issues for Converted Documents

Web pages have different design issues than documents prepared for print. The shape of the space is different. The physical interface is different. The way people interact with the pages is different.

You may need to modify your converted document to make it more web-friendly by:

  • Rearranging parts
  • Adding visual divisions for original pages or sections
  • Rewriting for less reading, more scanning (fewer words, more lists)
  • Breaking it into separate pages
  • Adding navigation

Visual Field

A normal browser window shows much less text at once than a normal printed page. Try to place your illustrations so that they show in the same window as the related text. It is harder for the reader to scroll around than it is to flip back and forth between pages of a book.

How Readers Read

People scan web pages more than they read, especially pages with lots of text or long paragraphs. This is partly because text on screen is not yet as comfortable to read as on paper. The characters are not as crisp. Eyes get tired more easily.

Bullet lists and short paragraphs work best. (Some topics are not easy to do in short form!)

Navigation

Unlike a printed page, each web page needs clear navigation to the top of the site and to other pages in the site. A single table of contents page won't do! 

Clear Purpose

The visitor to your page may have popped in directly from a search engine. Does the page make sense by itself? Can the visitor tell where this page is in your site structure? Can the visitor navigate to an overview page, to the home page, or to the beginning of the section?

Length

Breaking a long document into separate pages may work better than a single long page. Readers don't like to scroll and scroll and scroll. Neither do they like the long download time for a long page. Be sure that the navigation between pages is clear.

Each page should make sense by itself, so you may need to re-write or add text after breaking into separate pages. 


What to Use to Convert

You have choices for converting a document to a web page: 

  • Original program that created the document - with its own Save as HTML or Save as Web Page command.

  • HTML Editing software - Copy and Paste from the original to a blank page in the program's Design view, and then save as a web page.

  • Conversion program - There are many free programs and online services for converting between many different file types, including to HTML.

Original Program

Many programs, including Word and Excel, allow you to save documents in HTML format. The resulting web page may need some tweaking, however. Sometimes the changes that the conversion makes are quite startling.

Conversion errors

You must proof-read your converted document very carefully to be sure that all of the parts arrived safely and in the arrangement you want. You may need to edit the source code yourself to make any corrections. Older versions of Office were notorious for coding errors. They could not include any text that was in a separate text box at all, and background colors all went to white without making sure the text would be in a dark color. White text just vanished!

Example: Word doc saved as HTML creates a lot of extra lines in the source codeMicrosoft Office Round-tripping: Since Office 2000, all Office programs that can save as a web page use a complex internal style sheet to make the HTML page look almost exactly like the Word document. This code makes it easy to 'round trip' the document. That is, it lets you open the web page in Word again, edit, and save again as a web page. But... it takes a lot of very messy code. This makes the converted document MUCH longer and VERY hard to edit in a text editor or even with an HTML editor.

The illustration shows the first 47 lines of the web page version of Example 1 (the text-only Word document) above opened in EditPad Pro 7. There are 2811 lines of code! The BODY tag does not appear until line 2504. That means the code for the what shows in the browser page only took about 300 lines. Wow! That's a lot of styles and special code for the 'round tripping'!

TipStick to same editor: If you use Save as Web Page from Word or Excel, you should plan to use that same program for all future edits of this document. [Open the HTML page in that program, make changes, and then re-save as a web page.] It is very hard to deal with this source code directly!


HTML Editing Software

Copy and Paste: You can use an HTML editor to open HTML, CSS, script files, and plain text files, but that's about it.   You can open your document in its original program and Copy all of it. Then open a fresh blank document in your HTML editor and Paste. You may lose all or part of the layout and formatting. Headers, footers, and images probably won't paste. Some formatting choices in Word, for example, just cannot be done easily or at all in a web page. You can at least get the document text into a page that you can work with.

Example, using Copy and Paste: 

Example: Original document from Word
Example: Pasted from Word to Dreamweaver editor Example: Pasted from WordPro to Google Docs and saved as web page

Copied from Microsoft Word. The Dreamweaver version needs tweaking. There is a stray word at the far right.
Google Docs saved as a web page with the image in a subfolder.

Both examples above look better at a glance than they do on close examination. The original Word document created the effect of columns with tabs and spaces. These do not translate well into HTML. Some words are in the wrong column. A table would have converted better.


Problem: Complex Documents

Saving a complex document as HTML may not produce a good web page.

Example: The original tri-fold brochure used text boxes to create three columns. In old versions of Word, a Save As using the file type HTML will include only the blank mailing address section and the front flap. Totally useless on the Web!  Word 2007/2010 handles this much better, as the illustration shows.

NZ Brochure - outsideNZ brochure - inside

Outside and inside of the original tri-fold brochure Word document

Trifold brochure saved as MHT, single file web page, from Word 2010  Trifod brochure saved as HTM from Word 2010

 Saved as a single file web page, MHT (Look good!)
Saved as a web page, HTML, which puts images in a folder (A mess!)

What are the problems with these web page versions?

  • Blank area in the middle of the top of the page.
    Originally meant for a mailing address.
  • Sideways text
  • HTML format: Images badly misaligned
  • MHT format: Cannot be edited with another editor.
    It includes the photos in the same file.
  • Minimum window width to show page in the browser = 960px. Is that a problem or not?

TipWorking with complex documents: *****Simplify first! *****

Copy the contents and paste to a new document in the original program. Simplify the document. Convert this new document to HTML. It may take several steps, depending on how complex your document is. It's awkward but more workable in the long run.