Document Character Converter is a tool to convert between different encoding formats and different scripts.
It currently supports the following types of converters:
It currently allows conversion of the following file types:
You must have Java 1.6 or later installed. You can get it from the Sun website - you only need the JRE.
For conversion of OpenOffice documents, it is recommended to have OpenOffice installed from OpenOffice.org.
If the installation went smoothly, then you should be able to just double click on the DocCharConvert Desktop icon or start it from the Programs Menu. The DocCharConvert Main Form should appear.
If this does not work, you can start it from a command line. If you are already in the DocCharConvert installation directory, then you may be able to just type:
eclipseIf you need to specify a different Java virtual machine you can use something like:
eclipse -jvm "C:\Program Files\Java\jre1.6.0_06\bin\java.exe"You will need to adjust the paths as appropriate to your system.
Choose Preferences from the Window menu.
You can change the Converter directory by browsing to the directory containing the converters and select one of the dccx files.
The built in converters will be in directories under plugins in the installation directory.
Open the
Document Conversion Wizard
from the File Menu or by clicking on the
icon.
If you want to convert the same set of files lots of times, you may want to save the list of files in a text file. You can do this by clicking the Save List button. You can then reload it with Load List. The list is saved a simple text file format with one file pair per line:
"input file.txt" "output file.txt"
The file encoding refers to the standard that is used to map the raw bytes in a file into specific characters in a font. For correct results, you need to know what format the files are in that you want to convert.
OpenOffice supports many document formats, so that may be the best choice for any non-text files. You will need to open the files in OpenOffice and save them in OpenDocument format before conversion. If you are using text files, then you need to know the encoding of the text in the files. The default on many versions of Windows is windows-1252, however UTF-8 or another Unicode format will probably be necessary for non-Latin languages. If you are converting from text in an old legacy font to Unicode, then you will probably want Windows-1252 for Input and UTF-8 for Output.
If you are running lots of conversions on a regular basis you may want to use a command line version of the tool.
Make sure that the correct version of Java is in your path. Change directory to the DocCharConvert directory which has plugins as a subdirectory. You can then run the command:
java -cp plugins/org.thanlwinsoft.doccharconvert_1.0.0.jar org.thanlwinsoft.doccharconvert.CommandLine
On Windows you can run the DocCharConvert.bat script from a Windows Command Console.
cd C:\Program Files\ThanLwinSoft.org\DocCharConvert DocCharConvert.bat
You can see the command line options with the --help option:
Using config dir:C:\Program Files\ThanLwinSoft.org\DocCharConvert Arguments: [-i iEnc] [-o oEnc] [-r] converter.dccx mode [-f list]|[inputFile outputFile] [--converters ConvertersPath] Modes: 0 Plain Text 1 OpenOffice 2 TeX 3 OpenDocument Optional Arguments: --help display this help -r use the converter in reverse mode -i iEnc = input encoding e.g. -i iso-8859-1 (default UTF-8) -o oEnc = output encoding e.g. -o iso-8859-1 (default UTF-8) -f fileList = file containing list input output files --converters path = change the default Converters dir to path Please choose from one of the following converters: Academy.dccx AcademyExt.dccx AcademyPipe.dccx IwinMedium.dccx Winnwa.dccx Wwin_burmese.dccx MyanmarUni4ToUni5.dccx WinnwaUTN11.dccx
The list of converters may vary according to what is installed on your system.
For example, to convert a text file in WinInnwa to Myanmar Unicode, you would something like type:
DocCharConvert.bat -i windows-1252 Winnwa.dccx 0 wininnwa.txt myUni.txt
Some of the data may be converted correctly, other data isn't. This probably means that you have got the encoding specified wrongly for either the Input or Output files. Check the original source of the data and the documentation for the specific converter that you are using. Internally, the text is all converted to Unicode before it is processed. If the encoding that you are using has codes that cannot be translated to Unicode then these may fail to be converted correctly. Old files created with legacy pre-Unicode fonts should probably be converted as Windows-1252 with the Output set to UTF-8.
Some legacy fonts use code points that are undefined in Windows-1252. In this case you may want to try the RawBytes encoding.
For other problems and bugs please send an email to develNO JUNK@thanlwinsoft.org. Please try to be as explicit as possible in describing your problem. If it is a case of incorrect conversion, then it is very hard to diagnose problems unless I can reproduce it. If possible, please send some example files (though not too big please!) that illustrate the problem.
For more information contact ThanLwinSoft.org
သံလွင်
Warning: This site uses the International Unicode Standard to store and display Myanmar/Burmese text. Please upgrade to a browser with Myanmar Unicode support:
This site uses Myanmar Unicode technology from ThanLwinSoft.org