Search results

1 – 10 of over 93000
Article
Publication date: 8 May 2017

Carl Wilson, Rebecca McGuinness and Joachim Jung

This paper describes the development of the veraPDF validator. The objective of veraPDF is to build an industry supported, open source validator for all parts and conformance…

Abstract

Purpose

This paper describes the development of the veraPDF validator. The objective of veraPDF is to build an industry supported, open source validator for all parts and conformance levels of the PDF/A specification for archival PDF documents. The project is led by the Open Preservation Foundation and the PDF Association and is funded by the EU PREFORMA project.

Design/methodology/approach

veraPDF is designed to meet the needs of the digital preservation community and the PDF industry alike. The technology is subject to the review of and acceptance by the PDF Association’s PDF Validation Technical Working Group, including many participants of the relevant ISO working groups. Cultural heritage institutions are collecting ever-increasing volumes of digital information, which they have a mandate to preserve for the long term. However, in many cases, they need to ensure their content has been produced to the specifications of a standard file format, as well as any acceptance criteria stated in their institutional policy.

Findings

With increasing knowledge and experience of processes and policies, cultural heritage institutions are influencing the production and development of digital preservation software. The product development funded by the PREFORMA project shows how such cooperation can benefit the community as a whole.

Originality/value

This paper describes the value of an open source approach to developing a PDF/A validator for cultural heritage organisations.

Details

Digital Library Perspectives, vol. 33 no. 2
Type: Research Article
ISSN: 2059-5816

Keywords

Article
Publication date: 1 January 2006

Susan J. Sullivan

This article sets out to explain the purpose of PDF/A, how it addresses archival and records management concerns, how PDF/A was designed to have “desirable properties of a

3240

Abstract

Purpose

This article sets out to explain the purpose of PDF/A, how it addresses archival and records management concerns, how PDF/A was designed to have “desirable properties of a long‐term preservation format”, and the future of PDF/A.

Design/methodology/approach

The contents of this article are based on the author's knowledge and experience of the subject.

Findings

It is emphasized that PDF/A must be implemented in conjunction with policies and procedures, including quality assurance procedures to ensure acceptable replication of source material.

Originality/value

This article will be of interest to anyone working with PDF files. Work has already begun on PDF/A Part 2 which will be based on PDF 1.6. Application notes and a listing of frequently asked questions will be made publicly available to assist developers of PDF/A applications to better understand the requirements of the file format and provide implementation guidance.

Details

Records Management Journal, vol. 16 no. 1
Type: Research Article
ISSN: 0956-5698

Keywords

Article
Publication date: 20 November 2009

Michael Seadle

The purpose of this paper is to consider whether PDF formats are appropriate for long‐term digital archiving.

1521

Abstract

Purpose

The purpose of this paper is to consider whether PDF formats are appropriate for long‐term digital archiving.

Design/methodology/approach

The approach takes the form of examining how well PDF's capabilities fit eReader devices that future scholars may use in addition to or instead of paper print‐outs.

Findings

Fixity is the advantage that PDF offers for archiving, while its alternatives generally offer greater flexibility for eReader devices. The question for long‐term digital archiving is whether fixity or flexibility best suits the interests of future readers?

Originality/value

PDF is widely accepted as a digital archiving format and PDF documents are found in virtually every repository. There has, however, been little discussion as to whether the fixed format is not in fact a long‐term disadvantage.

Details

Library Hi Tech, vol. 27 no. 4
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 1 December 2005

Yakov Ben‐Haim

To study the effect of Knightian uncertainty – as opposed to statistical estimation error – in the evaluation of value‐at‐risk (VaR) of financial investments. To develop methods…

1854

Abstract

Purpose

To study the effect of Knightian uncertainty – as opposed to statistical estimation error – in the evaluation of value‐at‐risk (VaR) of financial investments. To develop methods for augmenting existing VaR estimates to account for Knightian uncertainty.

Design/methodology/approach

The value at risk of a financial investment is assessed as the quantile of an estimated probability distribution of the returns. Estimating a VaR from historical data entails two distinct sorts of uncertainty: probabilistic uncertainty in the estimation of a probability density function (PDF) from historical data, and non‐probabilistic Knightian info‐gaps in the future size and shape of the lower tail of the PDF. A PDF is estimated from historical data, while a VaR is used to predict future risk. Knightian uncertainty arises from the structural changes, surprises, etc., which occur in the future and therefore are not manifested in historical data. This paper concentrates entirely on Knightian uncertainty and does not consider the statistical problem of estimating a PDF. Info‐gap decision theory is used to study the robustness of a VaR to Knightian uncertainty in the distribution.

Findings

It is shown that VaRs, based on estimated PDFs, have no robustness to Knightian errors in the PDF. An info‐gap safety factor is derived that multiplies the estimated VaR in order to obtain a revised VaR with specified robustness to Knightian error in the PDF. A robustness premium is defined as a supplement to the incremental VaR for comparing portfolios.

Practical implications

The revised VaR and incremental VaR augment existing tools for evaluating financial risk.

Originality/value

Info‐gap theory, which underlies this paper, is a non‐probabilistic quantification of uncertainty that is very suitable for representing Knightian uncertainty. This enables one to assess the robustness to future surprises, as distinct from existing statistical techniques for assessing estimation error resulting from randomness of historical data.

Details

The Journal of Risk Finance, vol. 6 no. 5
Type: Research Article
ISSN: 1526-5943

Keywords

Article
Publication date: 1 March 2003

Kathy Konicek, Joy Hyzny and Richard Allegra

Electronic reserves help registered campus users who need anytime‐access to documents. Electronic reserves comprise digital files, mostly HTML or PDF formats. In some…

1083

Abstract

Electronic reserves help registered campus users who need anytime‐access to documents. Electronic reserves comprise digital files, mostly HTML or PDF formats. In some circumstances the HTML or PDF file is “readable” to the sighted individual, but are sometimes either partially or completely unreadable to the visually impaired using assistive technology. Creating “accessible” PDF files poses more challenges than creating “accessible” HTML files. Several options are suggested to help solve this problem.

Details

Library Hi Tech, vol. 21 no. 1
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 20 February 2007

J.P. Noonan and Prabahan Basu

In many problems involving decision‐making under uncertainty, the underlying probability model is unknown but partial information is available. In some approaches to this problem…

Abstract

Purpose

In many problems involving decision‐making under uncertainty, the underlying probability model is unknown but partial information is available. In some approaches to this problem, the available prior information is used to define an appropriate probability model for the system uncertainty through a probability density function. When the prior information is available as a finite sequence of moments of the unknown probability density function (PDF) defining the appropriate probability model for the uncertain system, the maximum entropy (ME) method derives a PDF from an exponential family to define an approximate model. This paper, aims to investigate some optimality properties of the ME estimates.

Design/methodology/approach

For n>m, when the exact model can be best approximated by one of an infinite number of unknown PDFs from an n parameter exponential family. The upper bound of the divergence distance between any PDF from this family and the m parameter exponential family PDF defined by the ME method are derived. A measure of adequacy of the model defined by ME method is thus provided.

Findings

These results may be used to establish confidence intervals on the estimate of a function of the random variable when the ME approach is employed. Additionally, it is shown that when working with large samples of independent observations, a probability density function (PDF) can be defined from an exponential family to model the uncertainty of the underlying system with measurable accuracy. Finally, a relationship with maximum likelihood estimation for this case is established.

Practical implications

The so‐called known moments problem addressed in this paper has a variety of applications in learning, blind equalization and neural networks.

Originality/value

An upper bound for error in approximating an unknown density function, f(x) by its ME estimate based on m moment constraints, obtained as a PDF p(x, α) from an m parameter exponential family is derived. The error bound will help us decide if the number of moment constraints is adequate for modeling the uncertainty in the system under study. In turn, this allows one to establish confidence intervals on an estimate of some function of the random variable, X, given the known moments. It is also shown how, when working with a large sample of independent observations, instead of precisely known moment constraints, a density from an exponential family to model the uncertainty of the underlying system with measurable accuracy can be defined. In this case, a relationship to ML estimation is established.

Details

Kybernetes, vol. 36 no. 1
Type: Research Article
ISSN: 0368-492X

Keywords

Article
Publication date: 1 September 1995

Zuu‐Chang Hong, Ching Lin and Ming‐Hua Chen

A transport equation for the one‐point velocity probability densityfunction (pdf) of turbulence is derived, modelled and solved. The new pdfequation is obtained by two modeling…

Abstract

A transport equation for the one‐point velocity probability density function (pdf) of turbulence is derived, modelled and solved. The new pdf equation is obtained by two modeling steps. In the first step, a dynamic equation for the fluid elements is proposed in terms of the fluctuating part of Navier‐Stokes equation. A transition probability density function (tpdf) is extracted from the modelled dynamic equation. Then the pdf equation of Fokker‐Planck type is obtained from the tpdf. In the second step, the Fokker‐Planck type pdf equation is modified by Lundgren’s formal pdf equation to ensure it can properly describe the turbulence intrinsic mechanism. With the new pdf equation, the turbulent plane Couette flow is solved by the direct finite difference method coupled with dimensionality reduction and QUICKER scheme. A simple boundary treatment is proposed such that the near‐wall solution is tractable and then no refined grid is required. The calculated mean velocity, friction coefficient, and turbulence structure are in good agreement with available experimental data. In the region departed from the center of flow field, the contours of isojoint pdf of V1 and V2 is very similar to that of experimental result of channel flow. These agreements show the validity of the new pdf model and the availability of the boundary treatment and QUICKER scheme for solving the turbulent plane Couette flow.

Details

International Journal of Numerical Methods for Heat & Fluid Flow, vol. 5 no. 9
Type: Research Article
ISSN: 0961-5539

Keywords

Article
Publication date: 6 June 2018

Roland Erwin Suri and Mohamed El-Saad

Changes in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for…

1847

Abstract

Purpose

Changes in file format specifications challenge long-term preservation of digital documents. Digital archives thus often focus on specific file formats that are well suited for long-term preservation, such as the PDF/A format. Since only few customers submit PDF/A files, digital archives may consider converting submitted files to the PDF/A format. The paper aims to discuss these issues.

Design/methodology/approach

The authors evaluated three software tools for batch conversion of common file formats to PDF/A-1b: LuraTech PDF Compressor, Adobe Acrobat XI Pro and 3-HeightsTM Document Converter by PDF Tools. The test set consisted of 80 files, with 10 files each of the eight file types JPEG, MS PowerPoint, PDF, PNG, MS Word, MS Excel, MSG and “web page.”

Findings

Batch processing was sometimes hindered by stops that required manual interference. Depending on the software tool, three to four of these stops occurred during batch processing of the 80 test files. Furthermore, the conversion tools sometimes failed to produce output files even for supported file formats: three (Adobe Pro) up to seven (LuraTech and 3-HeightsTM) PDF/A-1b files were not produced. Since Adobe Pro does not convert e-mails, a total of 213 PDF/A-1b files were produced. The faithfulness of each conversion was investigated by comparing the visual appearance of the input document with that of the produced PDF/A-1b document on a computer screen. Meticulous visual inspection revealed that the conversion to PDF/A-1b impaired the information content in 24 of the converted 213 files (11 percent). These reproducibility errors included loss of links, loss of other document content (unreadable characters, missing text, document part missing), updated fields (reflecting time and folder of conversion), vector graphics issues and spelling errors.

Originality/value

These results indicate that large-scale batch conversions of heterogeneous files to PDF/A-1b cause complex issues that need to be addressed for each individual file. Even with considerable efforts, some information loss seems unavoidable if large numbers of files from heterogeneous sources are migrated to the PDF/A-1b format.

Details

Library Hi Tech, vol. 39 no. 2
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 9 September 2014

Quan Lu, Gao Liu and Jing Chen

The purpose of this paper is to propose a novel approach to integrate portable document format (PDF) interface into Java-based digital library application. It bridges the gap…

Abstract

Purpose

The purpose of this paper is to propose a novel approach to integrate portable document format (PDF) interface into Java-based digital library application. It bridges the gap between conducting content operation and viewing on PDF document asynchronously.

Design/methodology/approach

In this paper, the authors first review some related research and discuss PDF and its drawbacks. Next, the authors propose the design steps and implementation of three modes of displaying PDF document: PDF display, image display and extensible markup language (XML) display. A comparison of these three modes has been carried out.

Findings

The authors find that the PDF display is able to completely present the original PDF document contents and thus obviously superior to the other two displays. In addition, the format specification of PDF-based e-book does not perform well; lack of standardization and complex structure is exposed to the publication.

Practical implications

The proposed approach makes viewing the PDF documents more convenient and effective, and can be used to retrieve and visualize the PDF documents and to support the personalized function customization of PDF in the digital library applications.

Originality/value

This paper proposes a novel approach to solve the problem between content operation and the view of PDF synchronously, providing users a new tool to retrieve and reuse the PDF documents. It contributes to improve the service specification and policy of viewing the PDF for digital library. Besides, the personalized interface and public index make further development and application more feasible.

Details

Library Hi Tech, vol. 32 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

Article
Publication date: 21 September 2015

Yan Han

The purpose of this paper is to introduce PDF/A to replace TIFF as the preferred file format for digitization of textual documents. In addition, PDF/A can be used as an open…

1358

Abstract

Purpose

The purpose of this paper is to introduce PDF/A to replace TIFF as the preferred file format for digitization of textual documents. In addition, PDF/A can be used as an open archival information system (OAIS) submission information package (SIP) container to reduce digitization and digital preservation costs.

Design/methodology/approach

The author first reviewed the current digitization guidelines, the OAIS model and provides on an overview of the development PDF and PDF/A as international standards. Then literature review of the uses of PDF/A is presented. The author analyzed pitfalls of TIFFs as the preferred format for digitization, and showed how to use PDF/A to code digitization SIP.

Findings

TIFF file format has been the preferred master file format by Federal Agency Digitization Guidelines Initiative digitization guidelines for the past 20 years. However, there are drawbacks of TIFF format. Literature reviews show that PDF/A has been the preferred standard for coding born-digital documents in court, government and business sectors. PDF/A-2 and PDF/A-3 are relatively new standards released after 2010. However, few understood the standards and have utilized the full potentials in digitization. The author shows that PDF/A can be used as an OAIS SIP container.

Practical implications

In order to delivery OAIS SIPs, current practices require a combination of files, directories and various types of metadata. The author shows that PDF/A (PDF/A-2 and/or PDF/A-3) can be a better file format for textual document digitization with coding various types of metadata in extensible metadata platform and arbitrary file/data can be coded in PDF/A-3. These features in PDF/A provide much better ways to deliver SIPs in a cost-efficient manner.

Originality/value

PDF/A has been recognized as the preferred standard for born-digital documents, but it has not been used as the preferred file format for digitized materials. The author recommends that: PDF/A with lossless JPX compressions as the preferred file format; and PDF/A with lossless JPX compressions along with metadata/data as the preferred OAIS SIP container. As a result, the uses reduce costs in digitization and digital preservation and also increase productivity. The author recommends to update the national and international digitization practices using PDF/A.

Details

Library Hi Tech, vol. 33 no. 3
Type: Research Article
ISSN: 0737-8831

Keywords

1 – 10 of over 93000