Data Conversion Area

The Data Conversion Area contains services used to convert data from one domain into data of another domain. The services can deal with metadata content (in XML format), to be converted from one format onto another format (i.e., XML schema to XML schema), or woth generic file content, e.g., images to thumbnails.
Metadata records can be applied structural mappings  or semantics mappings (e.g., from input vocabulary to output vocabulary). Such transformations can be one-to-one, i.e., an XML record is transformed into another XML record, or one-to-many, an XML record is transformed into a number of XML records, and vice versa.
More specifically, the services are:

  • Metadata Transformation service. It is used to convert one XML file onto another XML file by following a number of transformation rules (which can be specified through a user interface)
  • Metadata Cleaner service. It is used to convert the values of XML elements from one format to another (e.g., data format converson) or from one vocabulary to another.
  • Metadata Unpackaging service. It is used to apply XSLT scripts to input XML records in order to obtain a set of (possibly interlinked) XML records. The service is also capable of packaging XML records to become one XML record.
  • Feature Extraction Service. It is used to process a list of incoming files (provided as a list of URLs) with a given algorithm, to produce an output file for each processed file. The Service is designed to be extended with new algorithms when required. Examples of processing algorithms are: extraction of text from PDF file, language detection from text file, extraction of thumbnails from images, extraction of text from images (OCR).