Thursday, June 6, 2013

Cost saving in Bulk Scanning by intelligent use of technology

Challenge

Any bulk scanning for archiving online has to be done intelligently. "Unclean" scans result in large data storage and bandwidth overheads for organizations. This is explained in the example below.

Example

The following is the scanned image of a book. The scan was taken in black and white because the book had no images.
You can clearly see that the image has:
  • Almost a grey background because the book was old and its pages were yellowish in colour.
  • Has speckles because of aging (uneven oxidation of the pages).
  • Edges are black due to shadow effect (the book was quite voluminous).
  • The scan in its original form is about 591 KB in size.
Using technology, this scan has been improved to the form as given below:


Image Size reduction = Savings on Bandwidth and Storage cost

In the example above, the original scan was 591 KB. After digital cleaning, the size reduces to 64 KB for the left page and 124 KB for the right page. Thereby, achieving a reduction of about 70%. Moreover, you can clearly see that the page quality has improved quite significantly.

Contact
For all your bulk scanning requirements, please contact Shrivathsa at [scan AT bhagirathi DOT in] or call at +91-9845830632. 

Wednesday, June 5, 2013

Bulk Scanning

What is bulk scanning?

Any scanning activity which involves more than 100 pages can be called as bulk scanning.  Organizations typically require to scan such high volumes and generally they consist of
  • Old documents for which no soft copies exist.
  • Legal documents which consist of signatures / seals etc.
  • Documents containing photos / cartoons etc. (as in magazines, newspapers etc.)
  • Books which have gone out of print and for which no soft-copies exist.

Motivation:

Bulk scanning is at times done with a purpose to archive old documents. Many times it is to make material already available in hard copies by digitizing and putting it up on the internet. Of late, many publishing houses are scanning their old books to make them available as e-books. Using optical character recognition, once the text of an old book is extracted, it can be used for re-publishing of a book, thereby cutting down on the time required to type up the book.

Challenges:

The main challenges in bulk scanning include:
  • Quick and efficient scanning of documents with minimal errors: Quality control at every stage of scanning and digitization ensures minimal errors.
  • Efficient storage of digitized content (reduce the size of digitized content): Any scan needs to be cleaned up and improved before being put up on a website. An unclean scan may be at times 8-16 times the size of a clean scan. Using an unclean scan may result in you consuming 8-16 times more bandwidth on your server.
  • Improving quality of digitized documents so as to improve readability: The paper used in old books tend to become yellow after a few years. There are also stray marks, speckles etc. which need to be removed. Once these are removed, the page becomes easily readable.

Optical character recognition (OCR)

One way to optimize size of text predominant documents is by using optical character recognition (OCR). OCR-ed documents become searchable and hence this is of a great advantage. OCR technology is now at a stage where it can give 95% accuracy for English text. OCR for other languages are still being improved for accuracy.

Bulk scanning and archiving solutions

We provide bulk scanning and archiving solutions. Our team of experts will ensure that all documents handed to us are digitized, optimized for online use. Please contact Shrivathsa at [scan AT bhagirathi DOT in] or call on +91-9845830632.