Why does it take so long to download or to send files?
Scanning of text documents
Considerations
Most REA members, the vast majority them, that have access to the
Internet have nothing better than dial-up modems to connect them. Many are connected
to the nearest switching exchange office of the telephone company by means
of many miles of copper lines. That means that they cannot get
connection speeds faster than 14.4 kbs (kilo-Baud per second eight
Bauds, also often called bits, but is not quite correct in transmission
terminology, comprise a byte but there is also one additional bit for
parity). Some rural residents are a
bit closer to the nearest switching exchange office and may be lucky
enough to have their dial-up modem connect them at a speed of 28.8 kbs.
Eight or nine bits are required to transmit a single character of text.
Error checking protocols are used by modems that require some packets of
data to be transmitted a second and even a third time before they are
received without errors. That usually causes the effective
data-transfer rates (reflecting the rate at which data is successfully
received error-free) to be far lower than the transmission speed at which
a modem connects a user to the Internet.
Even at a connection speed of 28.8 kbs, anyone using a dial-up modem is
lucky to be able to receive or transmit a page of text in plain-text mode
(ASCII characters in *.txt files) at the rate of three seconds per page.
People that are connected at speeds of 14.4 kbs will use about six seconds
to receive or transmit such a page. A single page of plain text
comprises a file size of about 4 to 8 kB. A file larger than that
will require proportionally more time to be successfully transmitted.
It is therefore important for anyone that sends text to
people that are limited by their dial-up modems to consider how
agonizingly slow it is for such people to receive the information they are
supposed to receive.
Plain text documents are the first choice for transmission of text to
people with dial-up modems. Plain text documents (in ASCII format) are a must for
them. From their perspective, any other file format is vastly
inferior.
The scanning of a printed document, so that it can get converted with
OCR software (OCR: Optical Character Recognition) to plain text is a bit
of a chore, but it takes far less time to convert a scanned text to ASCII
format than the time it will save even a
very small-sized user community. Those are simple considerations that
must be taken into account when deciding when to spend time (by the
originator or designer of a document) to save time (for the few to many users or
recipients of that document).
It takes about one hour at a connection speed of 28.8 kbs to receive 6
MB of data over a dial-up modem (and that is when the connection doesn't
get reset). With a 14.4 kbs connection it takes twice as long.
Multiply the time it takes for a user to receive that document by the
number of intended users. It does not take very many users to
collectively use up far more time in receiving the document than it would
have taken the originator to spend in optimizing the document.
A legible JPG file that remains legible after it is printed-out needs
to be no larger than about 90 kB for an 8.5" by 11" page of scanned text.
To receive that file over a dial-up modem will take from about 43 seconds
(at 28.8 kbs) to about 86 seconds (at 14.4kbs).
If the file is larger than 90 kB, then the originator is doing something wrong, as a
rule. There are exceptions to that. It could be that someone must
transmit, say, a high resolution photograph at maximum resolution.
However, before you decide that is what the intended recipients need,
consider that, collectively, they may be forced to spend many, hundreds
and perhaps thousands of hours to receive it. Talk to them first, so
that you
decide whether they need a JPG file that is larger than 90 kB for a page
of scanned text. After all, you want them to be able to read the
file. That's all, and for that they don't need a JPG file that is larger
than 90 kB.
When a full page of scanned text is put through the OCR application
software that comes with scanners, the size of the resulting text file is
in the range of 4 kB to 8 kB. There is not much difference in
scanning time for different resolutions, but there are differences in the
quality of the results of the optical character recognition process,
depending on the resolution used for scanning.
A page scanned at 75 dpi (dots per inch) will produce a legible file in
JPG format, but it will produce few words that are intelligible after it
is put through the OCR software.
A page scanned at 150 dpi will produce a JPG file that is better than
needed to provide legibility for the users, and it will result in a
comprehensible page of text after the OCR software is done with it.
But the plain text file will require correcting a good number of
character recognition errors.
A page scanned at 300 dpi will produce a JPG file that is far better
than needed for legibility, and it will produce a text file after the OCR
software is done with it that contains very few and relatively
insignificant character recognition
errors.
Examples of a printed text scanned at different resolutions.
The examples were created from a scanned newspaper article. The article is shown in two formats, JPG and ASCII text. The legibility of the text versions (ASCII format) and graphic (JPG
format) versions scanned at 75 dpi, 150 dpi and 300 dpi was compared and
is illustrated in the following table, along with the resulting file sizes
and loading times for each file size and type. |