Why does it take so long to download or to send files?


Scanning of text documents


Most REA members, the vast majority them, that have access to the Internet have nothing better than dial-up modems to connect them.  Many are connected to the nearest switching exchange office of the telephone company by means of many miles of copper lines.  That means that they cannot get connection speeds faster than 14.4 kbs (kilo-Baud per second – eight Bauds, also often called bits, but is not quite correct in transmission terminology, comprise a byte but there is also one additional bit for parity).  Some rural residents are a bit closer to the nearest switching exchange office and may be lucky enough to have their dial-up modem connect them at a speed of 28.8 kbs.

Eight or nine bits are required to transmit a single character of text.  Error checking protocols are used by modems that require some packets of data to be transmitted a second and even a third time before they are received without errors.  That usually causes the effective data-transfer rates (reflecting the rate at which data is successfully received error-free) to be far lower than the transmission speed at which a modem connects a user to the Internet.

Even at a connection speed of 28.8 kbs, anyone using a dial-up modem is lucky to be able to receive or transmit a page of text in plain-text mode (ASCII characters in *.txt files) at the rate of three seconds per page.  People that are connected at speeds of 14.4 kbs will use about six seconds to receive or transmit such a page.  A single page of plain text comprises a file size of about 4 to 8 kB.  A file larger than that will require proportionally more time to be successfully transmitted.

It is therefore important for anyone that sends text to people that are limited by their dial-up modems to consider how agonizingly slow it is for such people to receive the information they are supposed to receive.

Plain text documents are the first choice for transmission of text to people with dial-up modems.  Plain text documents (in ASCII format) are a must for them.  From their perspective, any other file format is vastly inferior.

The scanning of a printed document, so that it can get converted with OCR software (OCR: Optical Character Recognition) to plain text is a bit of a chore, but it takes far less time to convert a scanned text to ASCII format than the time it will save even a very small-sized user community.  Those are simple considerations that must be taken into account when deciding when to spend time (by the originator or designer of a document) to save time (for the few to many users or recipients of that document).

It takes about one hour at a connection speed of 28.8 kbs to receive 6 MB of data over a dial-up modem (and that is when the connection doesn't get reset).  With a 14.4 kbs connection it takes twice as long.  Multiply the time it takes for a user to receive that document by the number of intended users.   It does not take very many users to collectively use up far more time in receiving the document than it would have taken the originator to spend in optimizing the document.

A legible JPG file that remains legible after it is printed-out needs to be no larger than about 90 kB for an 8.5" by 11" page of scanned text.  To receive that file over a dial-up modem will take from about 43 seconds (at 28.8 kbs) to about 86 seconds (at 14.4kbs).  If the file is larger than 90 kB, then the originator is doing something wrong, as a rule.  There are exceptions to that.  It could be that someone must transmit, say, a high resolution photograph at maximum resolution.  However, before you decide that is what the intended recipients need, consider that, collectively, they may be forced to spend many, hundreds and perhaps thousands of hours to receive it.  Talk to them first, so that you decide whether they need a JPG file that is larger than 90 kB for a page of scanned text.  After all, you want them to be able to read the file.  That's all, and for that they don't need a JPG file that is larger than 90 kB.

When a full page of scanned text is put through the OCR application software that comes with scanners, the size of the resulting text file is in the range of 4 kB to 8 kB.  There is not much difference in scanning time for different resolutions, but there are differences in the quality of the results of the optical character recognition process, depending on the resolution used for scanning.

A page scanned at 75 dpi (dots per inch) will produce a legible file in JPG format, but it will produce few words that are intelligible after it is put through the OCR software.

A page scanned at 150 dpi will produce a JPG file that is better than needed to provide legibility for the users, and it will result in a comprehensible page of text after the OCR software is done with it.  But the plain text file will require correcting a good number of character recognition errors.

A page scanned at 300 dpi will produce a JPG file that is far better than needed for legibility, and it will produce a text file after the OCR software is done with it that contains very few and relatively insignificant character recognition errors.

Examples of a printed text scanned at different resolutions.

The examples were created from a scanned newspaper article.
The article is shown in two formats, JPG and ASCII text.
The legibility of the text versions (ASCII format) and graphic (JPG format) versions scanned at 75 dpi, 150 dpi and 300 dpi was compared and is illustrated in the following table, along with the resulting file sizes and loading times for each file size and type.

Resolution (dpi)

File Type

File Size (kB)


Download Time (seconds) *

14.4 kbs

28.8 kbs
75 ASCII text 4 Unintelligible 8 4
75 JPG 82 Acceptable 86 43
150 ASCII text 4 Intelligible (about 15 errors that require careful reading to correct them) 6 3
150 JPG 209 Better than necessary 222 111
300 ASCII text 4 Good (about 26 errors; all but about three nothing more than missing spaces - easily corrected) 6 3
300 JPG 678 Far better than necessary 742 361

* The download times don't include overhead contributed by the design of the basic layout of an HTML page.  The time required for the overhead must be added to the indicated download times.  The amount of time required for page overhead varies with the efficiency of the design of a page and ranges from just a few seconds to a few minutes.  Very few people will wait for more than 30 seconds for a page to load.

The advantages of using an ASCII text version produced from the scanned text are that it requires the least amount of time to be downloaded and that portions of the text can be copied and pasted as quotes into other documents.

The advantage of a JPG file is that it is difficult to falsify and that it accurately reflects the format of the source document.  However, a legible JPG version of the text requires ten times as much download time and a file size that is twenty times that of a comparable text file.  A JPG file that is more than 20 times larger than its text version is objectionable and unacceptable overkill.

It is relatively easy (a few seconds of work) to re-scale a file that was scanned at 300 dpi resolution for best OCR results and to convert it to a 75 dpi resolution JPG file that is legible and will print out at full page size.  Therefore, if you wish to produce a JPG version of a scanned text page that was scanned at 300 dpi or higher resolution, open the file with something like Adobe Photoshop, set the print image size to 18cm by 24 cm and 75 dpi resolution and then resize and save the image of the text.  That way a few clicks of a mouse and a couple of characters typed into the pull-down menu in the graphics editing software will save the intended users hundreds, if not thousands of hours of wasted time.

Moreover, there can be considerable savings in the cost of renting the web space required to replace unreasonably-sized large JPG or other graphics files with reasonably-sized and much smaller ones.

For users that are limited by dial-up modems (practically all REA members), the only acceptable way by which text files must be posted for downloading or for distribution is by means of the ASCII format (first choice) or by means of graphics files at 75 dpi resolution (second and inferior choice).  Some web-design experts recommend to corporations that web designers spend time working in the exact environment in which their intended user community exists.

If you are posting or distributing text files and don't know how to go about doing it the right way, get in touch with me.  If you are receiving text in graphic format and the files with such text are larger than 90 kB for a single page of text, complain to the originator and make it clear that you will refuse to accept files like that which are larger than 90 kB.

The Adobe PDF format is very convenient for someone wishing to post a large number of pages to a website, but by posting such files the originator places a large and unjustifiable burden on the users of such files.  At the very least the originators of large PDF files must make HTML versions of the PDF files available, so that the intended users have to refer to the much larger PDF versions only when absolutely necessary.  The storage space requirements for PDF documents is so large that many users must forego storing copies of them on their hard drives and will therefore be condemned to having to download them time and again when they need them.

The file size of photos incorporated into HTML pages rarely requires to be larger than 30 kB and often needs to be no more than 12 to 15 kB.

A web designer or webmaster that does not pay attention to, or that doesn't follow, the requirements listed above is not worth his salt.  If he works for you, look for someone to replace him if he refuses to adhere to those requirements and to change his inconsiderate ways.


Posted 2004 09 06