DocCharConvert

A Converter between different font encodings

Introduction
Installation
Getting Started
Command Line
Trouble Shooting

Document Character Converter

DocCharConvert: Introduction

Document Character Converter is a tool to convert between different encoding formats and different scripts.

It currently supports the following types of converters:

TECkit Converters [scripts.sil.org/TECkit]
Syllable Converters
External Programs

It currently allows conversion of the following file types:

Plain text files
OpenDocument files (OO native format)[www.OpenOffice.org]
TeX Files - limited support

DocCharConvert: Installation and Configuration

Prerequisites

You must have Java 1.6 or later installed. You can get it from the Sun website - you only need the JRE.

For conversion of OpenOffice documents, it is recommended to have OpenOffice installed from OpenOffice.org.

Starting DocCharConvert

If the installation went smoothly, then you should be able to just double click on the DocCharConvert Desktop icon or start it from the Programs Menu. The DocCharConvert Main Form should appear.

If this does not work, you can start it from a command line. If you are already in the DocCharConvert installation directory, then you may be able to just type:

eclipse

If you need to specify a different Java virtual machine you can use something like:

eclipse -jvm "C:\Program Files\Java\jre1.6.0_06\bin\java.exe"

You will need to adjust the paths as appropriate to your system.

Configuration

Choose Preferences from the Window menu.

You can change the Converter directory by browsing to the directory containing the converters and select one of the dccx files.

The built in converters will be in directories under plugins in the installation directory.

DocCharConvert: Getting Started

Converting some files

Open the Document Conversion Wizard from the File Menu or by clicking on the icon icon.

You can either convert existing files or type / paste the text directly.
If you choose to convert existing files, then you need to select the File Mode, usually "Plain Text" or "OpenOffice".
Click next
Select the converter that you want to use and click Next. (The font choices are only used with OpenOffice files)
If you chose to convert existing files, then you need to choose which files to convert:
1. Browse to the file that you want to convert
2. Browse to the file where you want to save the conversion result
3. Add more files if necessary and click Next
Set the encoding of the Input and Output files. If you are not sure what to use, Windows-1252 is a common encoding used by older fonts. More modern Unicode files probably use UTF-8 or UTF-16.
Click Finish

If you want to convert the same set of files lots of times, you may want to save the list of files in a text file. You can do this by clicking the Save List button. You can then reload it with Load List. The list is saved a simple text file format with one file pair per line:

"input file.txt" "output file.txt"

File Encoding

The file encoding refers to the standard that is used to map the raw bytes in a file into specific characters in a font. For correct results, you need to know what format the files are in that you want to convert.

OpenOffice supports many document formats, so that may be the best choice for any non-text files. You will need to open the files in OpenOffice and save them in OpenDocument format before conversion. If you are using text files, then you need to know the encoding of the text in the files. The default on many versions of Windows is windows-1252, however UTF-8 or another Unicode format will probably be necessary for non-Latin languages. If you are converting from text in an old legacy font to Unicode, then you will probably want Windows-1252 for Input and UTF-8 for Output.

DocCharConvert: Command Line

Command Line Usage

If you are running lots of conversions on a regular basis you may want to use a command line version of the tool.

Make sure that the correct version of Java is in your path. Change directory to the DocCharConvert directory which has plugins as a subdirectory. You can then run the command:

java -cp plugins/org.thanlwinsoft.doccharconvert_1.0.0.jar org.thanlwinsoft.doccharconvert.CommandLine

On Windows you can run the DocCharConvert.bat script from a Windows Command Console.

cd C:\Program Files\ThanLwinSoft.org\DocCharConvert
DocCharConvert.bat

You can see the command line options with the --help option:

Using config dir:C:\Program Files\ThanLwinSoft.org\DocCharConvert
Arguments: [-i iEnc] [-o oEnc] [-r] converter.dccx mode 
           [-f list]|[inputFile outputFile]
           [--converters ConvertersPath]
Modes:
	0	Plain Text
	1	OpenOffice
	2	TeX
	3	OpenDocument
Optional Arguments:
	--help display this help
	-r use the converter in reverse mode
	-i iEnc = input encoding e.g. -i iso-8859-1 (default UTF-8)
	-o oEnc = output encoding e.g. -o iso-8859-1 (default UTF-8)
	-f fileList = file containing list input output files
	--converters path = change the default Converters dir to path
Please choose from one of the following converters:
	Academy.dccx
	AcademyExt.dccx
	AcademyPipe.dccx
	IwinMedium.dccx
	Winnwa.dccx
	Wwin_burmese.dccx
	MyanmarUni4ToUni5.dccx
	WinnwaUTN11.dccx

The list of converters may vary according to what is installed on your system.

For example, to convert a text file in WinInnwa to Myanmar Unicode, you would something like type:

DocCharConvert.bat -i windows-1252 Winnwa.dccx 0 wininnwa.txt myUni.txt

Downloads

LanguageBuddy>>

သံ‌လွင် Soft