Turn your Raspberry Pi into a Scan-To-Cloud Device

by admin. 12 Comments

by Frederik Granna

I’ve always been frustated with the complicated process required to scan a document: You need to get the scanner from the shelf, install the right software and hope that the hardware gets recognized when you plug the USB cable, start the scanning process, click through alerts and so on. Not to mention that with tablets and smartphones it’s usually not even possible to connect an external device. I‘ve been looking for a solution to all of these problems in one always-online device. I wanted the possibility to archive all incoming postal mail as searchable PDFs direct to my doctape account and without the need of using my laptop. Fortunately, I had an unused Rasperry Pi at home which I decided to use for this purpose.

Today, I created an interesting DIY solution for uploading documents to a cloud service without spending much time or money doing so!

I did some research and stumpled upon the great the solution from Eduardo Luís: http://eduardoluis.com/raspberry-pi-and-usb-network-scanner/. Eduardos solution scans tiff files to email – emulating a basic function of modern desktop scanners in a device with only a single button.

After planing a bit I realized that a single button was a little to restricted for my case: I wanted a little more comfort – A button to cancel the scan process and to add more pages to a document made more sense for me. What was still missing was a bit of feedback, so I decided to use a display. After minutes of research I found out, that Adafruit’s Addon Shield Display with its two-line LCD Display and five small input buttons was a perfect match.

The scanning process is carried out by the Software “SANE” (http://www.sane-project.org/). Then “tiff2pdf” converts the raw image into a PDF. Initially I planed for the OCR process to be handled by the well known open source OCR engine “tesseract”- The problem: The Raspberry took too much time doing the OCR part (at least 1-2 minutes per page). For this reason I started looking for an OCR alternative. Finally I found the Abbyy Cloud OCR SDK (http://ocrsdk.com/). A great SaaS solution to convert existing PDFs into searchable ones (http://ocrsdk.com/plans-pricing/).

An important requirement I had, was to make the device usable for multiple users. The best way to achieve this without additional hardware parts, was to use QR-Codes. The QR-Code should be scanned in order to identify the user to whose doctape account the pdf’s shall be sent to.

Doctape (http://doctape.com) provides a free email inbox for its users: Any attachments that are send to the personal email address are not only saved into your doctape account, but also indexed for full-text search! So all I had to do is to recognize the username within the provided QR Code. Doing so any doctape user just needs to create a QR Code containing the doctape username and the cloud connection is established :-)

 

  

 

The display control and the button query is managed by i2c. Specifically for this project, I cloned a small NodeJS-Module, (https://github.com/korevec/node-i2c) and made it match my case.

Picture: Communication with the LCD in nodejs

 

The scanning Process:

To start scan the QR Code, which includes your doctape user name.
(There are many possibilities to make this part better, for example by doing user identification over RFID)

Scanning of 1-N Pages

Processing Steps:

(1) unify single scans (pages)

(2) convert to PDF

(3) send the PDF to Abbyy for text recognition

Upload

Used components:

Hardware:

· Raspberry Pi (version B)

· Adafruit Blue&White 16×2 LCD+Keypad Kit for Raspberry Pi

http://www.adafruit.com/products/1115

· Canon CanoScan Lide 110* (works with other scanners too. Photo proof attached)


Software:

· SANE

http://www.sane-project.org/

· zbarimg for QR-Code Recognition

https://github.com/herbyme/zbar

· tiffcp

http://www.libtiff.org/tools.html

· tiff2pdf

http://www.remotesensing.org/libtiff/

· NodeJS

http://nodejs.org

· Node-i2c-Lib

https://github.com/sysrun/node-i2c

 

Services:

· ABBYY Cloud OCR SDK

http://ocrsdk.com/

· doctape

http://www.doctape.com

 

12 Responses to Turn your Raspberry Pi into a Scan-To-Cloud Device

  1. Hans says:

    Generally I like tinkering.
    But I hate wasting time.

    So I have since years a Fujitsu ScanSnap S1500 scanner, wich does that and much more.

  2. nomadtales says:

    A couple of suggestions.

    You could use a Gmail account and Google Drive to do something similar. Drive has built in OCR already. Send the email with attached scan to your gmail account using an address like username+bills@gmail.com. Have a Google Apps Script periodically scan your mailbox for any messages sent to that address and grab the attachment and copy it to a Drive Folder.

    If anyone wants the Apps Script code for this let me know, I have one doing this at the moment.

    Another idea, why not use the LCD screen and buttons to select the recipient you want to send to?

  3. Élio Severiano says:

    Can you share the code you are using on raspberry pi?

    I really want to do the same and I already have all the components but if you can share the code would be much faster to get it all working..

  4. Alexey Zimarev says:

    Very nice, I got SnapScan 1300 like you do and I hate the need to power up my PC to scan. I am surprised to see it worked in general since I thought ScanSnap uses their own proprietary protocol. At lease there was no TWAIN driver that I could find.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>