Often people want to use an existing document on their web site. Why rewrite material that you have already massaged into an attractive and useful shape? Unfortunately, it is not quite as straight-forward as you might think to get a well-shaped web page from a paper document. They do not share the same definition of what makes a "good" page.
In the examples below, the original Word document was saved as PDF and also saved as HTML. Very different results!
Many documents for download on the Internet are in PDF
format. PDF stands for Portable Document Format.
This format was designed to duplicate the formatting and layout of the original printed document. It requires a PDF reader program to open and read
the document. Adobe Reader and
many other similar programs are available for
free download. Most people who use the Internet have such a program already.
Exact layout is important.
With the better PDF writing programs, you can control who can open your PDF, who can make changes and what type of changes. You can make a form that the user can fill in while online, but the user cannot edit the form itself. Many cool features are not available in programs that just save in PDF format! Adobe Acrobat is the standard (expensive) PDF writing program but there are also many free programs that offer many of the advanced features.
Web pages have different design issues than documents prepared for print. The shape of the space is different. The physical interface is different. The way people interact with the pages is different.
You may need to modify your converted document to make it more web-friendly by:
A normal browser window shows much less text at once than a normal printed page. Try to place your illustrations so that they show in the same window as the related text. It is harder for the reader to scroll around than it is to flip back and forth between pages of a book.
People scan web pages more than they read, especially pages with lots of text or long paragraphs. This is partly because text on screen is not yet as comfortable to read as on paper. The characters are not as crisp. Eyes get tired more easily.
Bullet lists and short paragraphs work best. (Some topics are not easy to do in short form!)
Unlike a printed page, each web page needs clear navigation to the top of the site and to other pages in the site. A single table of contents page won't do!
The visitor to your page may have popped in directly from a search engine. Does the page make sense by itself? Can the visitor tell where this page is in your site structure? Can the visitor navigate to an overview page, to the home page, or to the beginning of the section?
Breaking a long document into separate pages may work better than a single long page. Readers don't like to scroll and scroll and scroll. Neither do they like the long download time for a long page. Be sure that the navigation between pages is clear.
Each page should make sense by itself, so you may need to re-write or add text after breaking into separate pages.
You have choices for converting a document to a web page:
Original program that created the document - with its own Save as HTML or Save as Web Page command.
HTML Editing software - Copy and Paste from the original to a blank page in the program's Design view, and then save as a web page.
Conversion program - There are many free programs and online services for converting between many different file types, including to HTML.
Many programs, including Word and Excel, allow you to save documents in HTML format. The resulting web page may need some tweaking, however. Sometimes the changes that the conversion makes are quite startling.
You must proof-read your converted document very
carefully to be sure that all of the parts arrived safely and in the
arrangement you want. You may need to edit the source code yourself to make
any corrections. Older versions of Office were notorious for coding errors.
They could not include any text that was in a separate text box at all, and background
colors all went to white without making sure the text would be in a dark
color. White text just vanished!
Microsoft Office Round-tripping: Since Office 2000, all Office programs that
can save as a web page use a complex
internal style sheet to make the HTML page look almost exactly like the Word
document. This code makes it easy to 'round trip' the document. That is, it
lets you open the web page
in Word again, edit, and save again as a web page. But... it takes a lot of very messy code.
This makes the converted document MUCH longer and VERY hard to edit
in a text editor or even with an HTML editor.
The illustration shows the first 47 lines of the web page version of Example 1 (the text-only Word document) above opened in EditPad Pro 7. There are 2811 lines of code! The BODY tag does not appear until line 2504. That means the code for the what shows in the browser page only took about 300 lines. Wow! That's a lot of styles and special code for the 'round tripping'!
Stick
to same editor: If you use Save as Web Page from Word or Excel, you should plan to use that
same program for all future edits
of this document. [Open the HTML page in that program, make changes,
and then re-save as a web page.] It is very hard to deal with this source code
directly!
Copy and Paste: You can use an HTML editor to open HTML, CSS, script files, and plain text files, but that's about it. You can open your document in its original program and Copy all of it. Then open a fresh blank document in your HTML editor and Paste. You may lose all or part of the layout and formatting. Headers, footers, and images probably won't paste. Some formatting choices in Word, for example, just cannot be done easily or at all in a web page. You can at least get the document text into a page that you can work with.
Copied from Microsoft Word. The Dreamweaver version needs tweaking. There is a stray word at the far right.
Google Docs saved as a web page with the image in a subfolder.
Both examples above look better at a glance than they do on close examination. The original Word document created the effect of columns with tabs and spaces. These do not translate well into HTML. Some words are in the wrong column. A table would have converted better.
Saving a complex document as HTML may not produce a good web page.
Example: The original tri-fold brochure used text boxes to create three columns. In old versions of Word, a Save As using the file type HTML will include only the blank mailing address section and the front flap. Totally useless on the Web! Word 2007/2010 handles this much better, as the illustration shows.
Outside and inside of the original tri-fold brochure Word document
Saved as a single file web page, MHT (Look good!)
Saved as a web page, HTML, which puts images in a folder (A mess!)
What are the problems with these web page versions?
Working with complex
documents: *****Simplify first! *****
Copy the contents and paste to a new document in the original program. Simplify the document. Convert this new document to HTML. It may take several steps, depending on how complex your document is. It's awkward but more workable in the long run.