openDIAS - Specification.
This document is open for community scrutiny. If you have any feedback, please send it here.
INTRODUCTION ------------ open Document Imaging Archive System (openDIAS) is an interface that provides document office workflow application to the home user. The application will accept documents and/or images in a file or from a scanning interface. These documents will then seamlessly be saved into a database using 'tags' as index markers. Optionally, when using an image input, the source can be OCRed, storing the basic text of the document (basic text can also be extracted from other document formats). This text will be linked to and stored along with the original document. Later documents can be browsed, updated, printed or deleted. A second application will provide a user interface to filter by the stored tags and search using the OCRed text or the document body. MODULES ------- 1. Document collection. Collect documents either from a ODF file or by scanning a document. Allow user to select: location of ODF document; if to extract basic text; or number of pages to scan; the proposed resolution (always B&W) if to OCR the document (OCR will not be done in this module) openDIASs own interface to the Sane API (Similar to Xsane but tailored for SDA use). After document collection [possible loop for multiple docs] move to module 2. 2. Saving interface Shows the scanned document, allows users to operate on the image. Discard the image - move to the next scan; Open the image/document for editing [keep version control]. OCR (if not already done & is in required resolution) [ocr done here]; Save with specific tags 3. Retrieve & control Provides the main interface to the openDIAS application Filter and search documents then allow interaction. Filter by tag Search by OCR text or document body Browse resultant documents Change attributes [opens module 2 for specific file(s)] Delete, email, print, export [PDF, etc] Start an acquisition process [opens module 1] STRUCTURES ---------- The project will be released under GPL (2 or 3). The system will be written in C, using the GlibC and GTK interfaces. The system will be optimised for the GNOME platform, but should be transportable. The system should be fully localisable. PRE-REQUISITS ------------- Scanning API [sane] OCR interface [teseract] PDF creator [unknown] Database [sqlite3]
