Difference between revisions of "Ebook Conversion"

From Blue-IT.org Wiki

(Cropping the pages is the most sophisticated solution)
(Why PDF is not a good choice for e-reader devices)
 
(91 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
[[#Changelog|Changelog here]]
 +
 
= A word before we start =
 
= A word before we start =
When I bought an eBook-Reader last chrismas (2011), an [[#PocketBook Pro|PocketBook Pro 912]] with an open sourced linux operating system and firmware and an e-Ink Display. By the way:  
+
I bought an eBook-Reader last chrismas (2011), a [[Ebook_Reader#PocketBook Pro 912|PocketBook Pro 912]] with an open source linux operating system and firmware and an e-Ink Display.  
* this article is based on the prerequisites of a reader with an e-Ink display
+
 
* this article uses Linux (ubuntu 11.10) and a lot of bash shell scripting, so be prepared
+
By the way:  
You might be able to adjust some things for other devices, but this is mainly intended for working with devices with e-Ink displays, and mainly the ones of a 10'' size!
+
* this article is based on the prerequisites to work with an e-book reader with an [[Ebook_Reader#A_word_in_advance_about_e-ink_readers|e-Ink display]]
 +
* tips might be applicable to other devices but could be obsolte for future - better -display technologies
 +
* this article uses Linux (Ubuntu 11.10) and a lot of bash shell scripting
  
 
I didn't know, how much work it will cost myself to dive deep into the theme. The e-Ink-technology was what kept my interest. What I didn't know: this technology also needs a special way of reading your documents. It is not possible to simply throw a PDF to your device being sacrificed.
 
I didn't know, how much work it will cost myself to dive deep into the theme. The e-Ink-technology was what kept my interest. What I didn't know: this technology also needs a special way of reading your documents. It is not possible to simply throw a PDF to your device being sacrificed.
  
Years ago, when a began dealing with LaTeX - my preferred text processing method - I already had been aware of the problem, that the information about the semantics of the given text, or the "metainformations" is low, if not exists at all any more. In technical term one would say, the ''entropy'' of a PDF-document is - compared to HTML, SGML, or XML, or even LaTeX - very low. This is, why PDF is only a media for printing but not a document format used by computer programs, thereby viewed on a screen or an e-reader.
+
=== Why PDF is not a good choice for e-reader devices ===
 +
Years ago, when a began dealing with LaTeX - my preferred text processing method - I already had been aware of the problem that the information about the semantics of the given text in a PDF document is extremely low, if not exists at all any more.  
  
Concrete:
+
Speaking in technical terms one would say, the ''entropy'' of a PDF-document is - compared to HTML, SGML, or XML, or even LaTeX - very low. What dows that mean practically?: In a PDF document, the computer - not you - cannot distinguich any more if a text passage is pure text, which format is has, if it is a header, ... We ourself with our brain can do this very well, a computer has to use e.g. an OCR programm, scan the PDF, and do a "guess" with certain algorithm. In HTML the Markup (the "M" in HTML) assign a header a "header" tag and therefore the header '''is''' and remains a header, no matter what page format, script size you are applying to it!
* you cannot alter the font size
 
* a program cannot guess the text flow
 
* on a reader with e-ink display, it might be impossible to read the document, because the fonts are to small
 
  
That's why you might want to convert your documents.
+
PDF is an excellent ''print'' media but not (!) a format that should be used '''reading''' documents on a computer display. Why do you think the internet has consists on the markup language HTML?
  
How do you like to do this?
+
Reasons for why you might want to convert your documents are:
 +
* you cannot alter the font size in a PDF
 +
* a program cannot easily get the text flow  (split page layout)
 +
* on a reader with e-ink display, it might be impossible to read the document, because the fonts are to small or not of ideal shape (sans serif)
 +
 
 +
How do we like to do the E-Book-conversion:
 
* unattended
 
* unattended
 
* batch-like
 
* batch-like
Line 23: Line 30:
 
* with open source software ;)
 
* with open source software ;)
  
Let's have a look at the general workflow.
+
Most of what you read here I had found in the excellent [http://www.mobileread.com/forums/forumdisplay.php?f=177 mobile read forums - here about E-Book formats].
  
 
= Test it =
 
= Test it =
Line 43: Line 50:
  
 
= The process =
 
= The process =
== Reading plain PDF ==
+
== PDF ==
 
There are mainly 3 points of interest, when it comes to read an PDF on an e-Ink device:
 
There are mainly 3 points of interest, when it comes to read an PDF on an e-Ink device:
 
# Keep all metainformation - like the table of content - after conversion
 
# Keep all metainformation - like the table of content - after conversion
Line 51: Line 58:
 
=== Cropping the pages is the most sophisticated solution ===
 
=== Cropping the pages is the most sophisticated solution ===
  
Remember ... we are talking about e-Ink devices!
+
Remember ... we are talking about e-Ink devices! We like to get as much as possible visible text without white page margins.
  
 +
==== Briss ====
 
I had different approaches to cut the pages my PDFs, from the commercial acrobat writer, over imagemagick and other "crop"-tools. After all, and keeping the 3 points from above in mind the only tool I can recommend is [http://sourceforge.net/projects/briss briss].
 
I had different approaches to cut the pages my PDFs, from the commercial acrobat writer, over imagemagick and other "crop"-tools. After all, and keeping the 3 points from above in mind the only tool I can recommend is [http://sourceforge.net/projects/briss briss].
  
 +
==== Wrapper for the java program briss ====
 
Download and extract it somewhere. Written in java, and so that you can use it as any other program, you should write a little wrapper an put in in your path (e.g. ~/bin, /usr/local/bin, /usr/bin):
 
Download and extract it somewhere. Written in java, and so that you can use it as any other program, you should write a little wrapper an put in in your path (e.g. ~/bin, /usr/local/bin, /usr/bin):
 +
$> sudo youreditor /usr/local/bin/briss
 +
 
  #!/bin/bash
 
  #!/bin/bash
 
  version="0.0.13"
 
  version="0.0.13"
 
  cd "$(pwd)"
 
  cd "$(pwd)"
  java -jar /local/share/pdfr/briss-${version}/briss-${version}.jar "${@}"
+
  java -jar /whereever/is/installed/briss-${version}/briss-${version}.jar "${@}"
 +
 +
$> sudo chmod 755 /usr/local/bin/briss
 +
 
 +
==== Nautilus script for briss ====
 +
Wouldn't it be nice to open a PDF or even a symbolic link in nautilus?
 +
 
 +
Her we go: put this file into
 +
gedit "~/.gnome2/nautilus-scripts/Open with briss ..."
 +
and make it executable
 +
chmod "~/.gnome2/nautilus-scripts/Open with briss ..."
 +
 
 +
The following script works in most environments, but NOT with a sftp-mounted filesystem (see next script):
 +
#!/bin/bash
 +
 +
cd "$NAUTILUS_SCRIPT_CURRENT_URI"
 +
 +
MYFILE="${1}"
 +
MYBASENAME="$(basename ${MYFILE} .${MYTYPE})"
 +
 +
if file -L ${MYFILE} | grep -v grep | grep "PDF document"
 +
then
 +
briss ${MYFILE}
 +
else
 +
zenity --info --title "Error" --text "${MYFILE} seams not to be a file of type ${MYTYPE}. Please check."
 +
exit 0
 +
fi
 +
 
 +
But if you are mounting the Pockebook like me via sftp, you will run into trouble. [https://help.ubuntu.com/community/EnvironmentVariables#System-wide_environment_variables Nautilus scripts don't work in this environment] (there scroll down to "Gnome-specific variables")!
 +
 
 +
What we have to do
 +
* Create an symbolic link so that your can access the mountpoint directly, e.g. with
 +
ln -s "~/.gvfs/SFTP - Pocketbook" ~/MyPocketbook
 +
 
 +
* When executing the script: warn the user, if an sftp mounted directory is detected and abort
 +
* When executing the script: set the propper path variable
 +
 
 +
And that's the way, it works, when you access your device via sftp and are within the ~/.gvfs-dir or a link to it:
 +
#!/bin/bash
 +
# version 0.0.1, 12-05-2012, Axel Pospischil, http://blue-it.org
 +
 +
MYTYPE="pdf"
 +
MYFILE="${1}"
 +
 +
MYPATH="${NAUTILUS_SCRIPT_CURRENT_URI}"
 +
 +
if echo ${MYPATH} | grep -v grep | grep "^sftp:"
 +
then
 +
# SFTP (sftp://)
 +
zenity --error --title "Error" --text "${MYPATH}\n\nYou try to run this script within a directory mouted via sftp.\n\nThis will not work.\n\nPlease do a symbolic link with e.g. \n\nln -s ~/.gvfs ~GVFS\n\nThen access the device with this newly created link and - if you like - create a shortcut in nautilus!"
 +
exit 1
 +
else
 +
# Symbolic link (file:///)
 +
if echo ${MYPATH} | grep -v grep | grep "^file:"
 +
then
 +
MYPATH="$(echo "${NAUTILUS_SCRIPT_CURRENT_URI}" | sed -e 's/^file:\/\///g' | sed -e 's/\%20/ /g' )"
 +
else
 +
# Standard (local dir)
 +
MYPATH="${NAUTILUS_SCRIPT_CURRENT_URI}"
 +
fi
 +
fi
 +
 +
 +
cd "${MYPATH}"
 +
 +
MYBASENAME="$(echo "${MYFILE}" | sed -e 's/\.pdf$//g' )"
 +
[ "${MYBASENAME}" ] || exit 1
 +
 +
if file -L "${MYFILE}" | grep -v grep | grep "PDF document"
 +
then
 +
briss "${MYFILE}"
 +
 +
FILESIZE=$(ls -l "${MYBASENAME}_cropped.${MYTYPE}" | awk '{ print $5 }')
 +
[ "${FILESIZE}" -eq 0 ] && rm -f "${MYBASENAME}_cropped.${MYTYPE}"
 +
 +
else
 +
zenity --info --title "Error" --text "${MYFILE} seams not to be a file of type ${MYTYPE}. Please check."
 +
exit 1
 +
fi
 +
 
 +
==== Unattended batch conversion with briss ====
  
 
This way prepared, I wrote a batch conversion program. The main problem when writing a conversion script is, that many ebook titles contain chars like "[" or "&". This is something the bash does not like at all! Speaking shortly: I know this script has a lot of duplicate code in it. But believe me when I say: I tried more than once to change this.
 
This way prepared, I wrote a batch conversion program. The main problem when writing a conversion script is, that many ebook titles contain chars like "[" or "&". This is something the bash does not like at all! Speaking shortly: I know this script has a lot of duplicate code in it. But believe me when I say: I tried more than once to change this.
Line 74: Line 165:
  
 
  #!/bin/bash
 
  #!/bin/bash
  # version 0.0.1, 12-03-2001, Axel Pospischil, http://blue-it.org
+
  # version 0.0.1, 12-03-2012, Axel Pospischil, http://blue-it.org
 
  # version 0.0.2
 
  # version 0.0.2
 
  #    - added parenthesis when doing filetype check for singe-mode: file "${1}"  
 
  #    - added parenthesis when doing filetype check for singe-mode: file "${1}"  
Line 135: Line 226:
  
 
=== Contrast enhancement ===
 
=== Contrast enhancement ===
Briss is doing a very good job cropping the documents. But e-Ink devices normally only can show 16 gray scale "colors". So it would be handy to convert a document to grayscale and thereby enhance the contrast ;)
+
Briss is doing a very good job cropping the documents. And when it comes to the adobe reader, I think, the contrast of the files is much better after cropping with briss.
 +
 
 +
But e-Ink devices normally only can show 16 gray scale "colors". So it would be handy to convert a document to grayscale and thereby enhance the contrast ;)
  
 
The solutions:
 
The solutions:
Line 143: Line 236:
 
The problems:
 
The problems:
 
# ''imagemagick''  
 
# ''imagemagick''  
## does not preserve the metacontent of our PDF (table of content)
+
## does not preserve the metacontent of our PDF ('''you loose the table of content'''!)
 
## the document is not searchable any more, or a TTS is not working any more
 
## the document is not searchable any more, or a TTS is not working any more
 
## the document becomes significantly bigger
 
## the document becomes significantly bigger
## the result on my Pocketbook 912 is not what I expected, when it comes to quality and contrast enhancement. The PDf's seam to be not that crispy, clear.
+
## the result on my [[Ebook_Reader#PocketBook Pro 912|PocketBook Pro 912]] is not what I expected, when it comes to quality and contrast enhancement. The PDf's seam to be not that crispy, clear.
 
# reader software
 
# reader software
 
## I did not find any software, that satisfied me
 
## I did not find any software, that satisfied me
## There is mainly one that can handle PDFs: a fork of ''fbreader'', it's called ''fbreader-bw'' and you will find it, when you search the http://www.mobileread.com forum.
+
## There is mainly one that can handle DJVUs: convert the pdf to djvu and use a fork of ''djviewer'', it's called ''djviewer-bw'' and you will find it, when you search the http://www.mobileread.com forum. Mainly [http://www.mobileread.com/forums/showthread.php?t=120200 you should read and post in this thread]. The modified software has 3 levels for viewing documents: black and white, grayscale and normal. You can choose this either my quickmenu or clicking in the upper left, bottom left or bottom right area of the reader.
## Just for note: ''coolreader'' can not (!) display PDF files.
+
## Just for note: the reader software ''coolreader'' can not (!) display PDF files.
  
 
Do it with Imagemagick:
 
Do it with Imagemagick:
 
  convert -density 600 -contrast -gamma 0.1 -colorspace GRAY input.pdf output.pdf
 
  convert -density 600 -contrast -gamma 0.1 -colorspace GRAY input.pdf output.pdf
  
=== A word on djvu an rescanning of PDF===
+
scriptified ;)
 +
#!/bin/bash
 +
MYFILE="${1}"
 +
MYTYPE="pdf"
 +
MYCONVTYPE="pdf"
 +
TMPDIR="/tmp"
 +
 +
cd "$(pwd)"
 +
 +
if file "${MYFILE}" | grep -v grep | grep "PDF document"
 +
then
 +
 +
MYBASENAME="$(basename ${MYFILE} .${MYTYPE})"
 +
MYCONVNAME="${MYBASENAME}-gray.${MYCONVTYPE}"
 +
 +
if convert -density 600 -contrast -gamma 0.1 -colorspace GRAY "${MYFILE}" "${TMPDIR}/${MYCONVNAME}"
 +
then
 +
 +
FILESIZE=$(ls -l "${TMPDIR}/${MYCONVNAME}" | awk '{ print $5 }')
 +
[ "${FILESIZE}" = "0" ] && rm -f "${TMPDIR}/${MYCONVNAME}"
 +
 +
mv "${TMPDIR}/${MYCONVNAME}" .
 +
 +
else
 +
echo "ERROR converting the pdf."
 +
fi
 +
 +
else
 +
echo "Wrong format. Please use a PDF file."
 +
exit 1
 +
fi
 +
 
 +
=== A word on djvu and rescanning of PDF===
 
DJVU is a very good format for keeping your scanned documents. It is NOT a good format for reading text on an e-Ink device. It has the same disadvantages as PDF.
 
DJVU is a very good format for keeping your scanned documents. It is NOT a good format for reading text on an e-Ink device. It has the same disadvantages as PDF.
  
 
There are a lot of converters out there. Mainly
 
There are a lot of converters out there. Mainly
* pdf2djvu (contained in any linux distribution)
+
* pdf2djvu (contained in any linux distribution). Keeps all the metainformation - including the table of content - of the pdf!
 
* [http://code.google.com/p/pdf2djvu/wiki/DjVuDigital djvudigital], which uses ghostscript. Because there are licence issues, you have to compile it by yourself, which isn't very much fun.
 
* [http://code.google.com/p/pdf2djvu/wiki/DjVuDigital djvudigital], which uses ghostscript. Because there are licence issues, you have to compile it by yourself, which isn't very much fun.
  
If you are interested, please read the corresponding webpage or the onlinemanuals ;)
+
In the part of [[#Contrast enhancement|contrast enhancement]] (see problem nr. 2.2) there is a link to a djviewer fork that has a contrast enhanced viewing mode (mainly black and white mode) for djvu files. But I could not see a big difference compared to a good cropped pdf. This depends heavily on the kind of pdf you have.
 +
 
 +
If you are interested, please read the corresponding webpages or the onlinemanuals ;)
  
 
== From LaTeX to PDF or HTML ==
 
== From LaTeX to PDF or HTML ==
Line 224: Line 351:
 
  fi
 
  fi
  
 +
=== A word on MHT ===
 +
There would be an ideal solution for archiving webpages in one single file: the mht-format. There are plugins for firefox to view these files.
 +
 +
Unfortuneately none of the existant readers of the Pocketbook is able to read mht-files. One excuse: the coolreader, but there are errors displaying complex files and also no "table of content".
 +
 +
Probably future software versions can handle this format.
 +
 +
Articles about using and creating mht files:
 +
* [http://ubuntumanual.org/posts/444/how-to-save-web-pages-as-an-mht-file-in-ubuntu MHT - How to save web pages as an mht file in Ubuntu. | Ubuntu Manual]
 +
* [https://addons.mozilla.org/en-US/firefox/addon/unmht/ Firefox addon UnMHT]
 +
* [http://baohaojun.wordpress.com/2011/11/26/html2mht-pl-and-dir2html-pl batch convert html to mht with html2mht.pl and dir2html.pl]
 +
* [http://html2mht.sourceforge.net SourceForge.net: HTML to MHT converter]
  
 
== From PDF to epub, reflow and rescanning of PDF ==
 
== From PDF to epub, reflow and rescanning of PDF ==
Line 238: Line 377:
 
* [http://www.willus.com/archive pdfr] from the same author. Almost like k2pdfopt
 
* [http://www.willus.com/archive pdfr] from the same author. Almost like k2pdfopt
  
== From HTML to epub or mobi ==
+
== From HTML to epub or mobi (or htmlz) ==
One of the most sophisticated formats when it comes to eBooks is epub.
+
[Update --[[User:Apos|Apos]] 08:30, 26 February 2012 (CET)] '''Amazon is discontinuing the mobi pocket format.''' Despite there are and will be a lot of books in mobi format and kindlereaders will support it, there is no guarantee for this proprietary format can be read on future devices of other vendors!
 +
[Update End]
 +
 
 +
One of the most sophisticated formats when it comes to eBooks is '''epub'''.
  
 
'''HTML is - from my point of view - the best starting point for conversion.''' As described above it can easily be created using LaTeX or even other word processors. There are a lot of tools out there converting from e.g. HTML. But almost none is capable of keeping the "table of content". And this is - when it comes to ereading - one of the most important part.
 
'''HTML is - from my point of view - the best starting point for conversion.''' As described above it can easily be created using LaTeX or even other word processors. There are a lot of tools out there converting from e.g. HTML. But almost none is capable of keeping the "table of content". And this is - when it comes to ereading - one of the most important part.
Line 250: Line 392:
  
 
So here are the candidates:
 
So here are the candidates:
# The free tool [http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000234621 kindlegen] from amazon.
+
# The free tool [[#Kindlegen|Kindlegen]] from amazon
# For private use free ist [http://www.juliansmart.com/ecub eCub], a simple version of the next tool
+
# The well known [[#Calibre|Calibre]] cross platform software
# The shareware [http://www.jutoh.com Jutoh], but for demonstration purposes you can test it.
+
# For private use free ist [[#eCub|eCub]], a simple version of the next tool
 +
# The shareware [[#Jutoh|Jutoh]], but for demonstration purposes you can test it
  
=== Prepare the HTML ===
+
=== How do I get my html page(s) ===
 +
A good question. Normally you would just download your html to your desktop. Every browser will will do that.
 +
 
 +
The ultimate method for me is using [https://addons.mozilla.org/de/firefox/addon/scrapbook/ scrapbook extension for firefox]. With this extension you can recursively (!) download - even password protected - protected pages locally to my computer. This works best, if you simply choose a page with all the links you like to access a master table of content (toc).
 +
 
 +
The entry point is then a page "index.html" in the scrapbook directory you specified. But that doesn't matter, because you will have that link in you browser, when you open the page with it.
 +
 
 +
==== Tidy up your HTML ====
 
Before you are going on any further, you should be aware to work with a so called '''"well formed" html document'''. You can do this by using the software [http://tidy.sourceforge.net tidy]:
 
Before you are going on any further, you should be aware to work with a so called '''"well formed" html document'''. You can do this by using the software [http://tidy.sourceforge.net tidy]:
 
  tidy -m -asxhtml -utf8 <yourfile>.html
 
  tidy -m -asxhtml -utf8 <yourfile>.html
 +
 +
Most pages though should be "clean" and you would not need to tidy them up.
  
 
You also can use the free online software [http://services.w3.org/tidy/tidy tidy service]. But be aware, that sending confidential content to a web service might not be a good idea!
 
You also can use the free online software [http://services.w3.org/tidy/tidy tidy service]. But be aware, that sending confidential content to a web service might not be a good idea!
  
 
If you are under windows, [http://tidybatchfiles.info this site might be from interest for you].
 
If you are under windows, [http://tidybatchfiles.info this site might be from interest for you].
 +
 +
=== Calibre ===
 +
[http://calibre-ebook.com/ Calibre] does a good job, but also has its limitations. HTML documents, that are not too complex can be converted without hassle into a vast variety of formats.
 +
 +
You can either use the graphical user interface of calibre or use the batch commandline programm [http://manual.calibre-ebook.com/cli/ebook-convert.html ebook-convert] which can be fine-tuned [http://manual.calibre-ebook.com/cli/ebook-convert-4.html#html-input-to-epub-output in a various of ways].
 +
 +
==== Scrapbook and Calibre - a dream team ====
 +
I had some very good experiences with the combination of the [[#How do I get my html page(s)|firefox extension scrapbook]], Calibre and [http://wiki.mobileread.com/wiki/Ereader_program_Coolreader Coolreader] or fbreader (particuarly fbreader180).
 +
 +
I download my page coolection with scrapbook, open the result in my broswer, copy the link called something like "file:///path/to/index.html" into the import dialog of Calibre, import it, edit the metadata and export it to epub forma.
 +
 +
This works pretty well and Coolreader also shows up graphics very well (which neither Fbreader - excuse: fbreader180 -, nor Adobe Reader do).
 +
 +
You also will be provided with a toc, if one exists, but since you downloaded an entrypage with scrapbook, you should always have a good starting point setting a bookmark on the first page .
  
 
=== Kindlegen ===
 
=== Kindlegen ===
Is crossplattform and does a very good job. I hav nothing to complain.  It produces files in the "*.mobi"-format.
+
'''[Update February 2012]''' The "mobipocket" format is going ot die! This is - for me - another chapter to the discussion about using DRM protected eBooks. People: if any possible, don't use them. Use ePub, wherever avaiable!
 +
 
 +
[http://www.amazon.com/gp/feature.html?ie=UTF8&docId=1000234621 Kindlegen] Is crossplattform and does a very good job. I hav nothing to complain.  It produces files in the "*.mobi"-format.
  
 
The only problem I have:
 
The only problem I have:
 
* the "table of content" goes away!
 
* the "table of content" goes away!
 +
 +
==== The exotic htmlz format====
 +
 +
Thereby I found the zip compressed HTML format "HTMLZ". I never recognized this before. Searching a software, that can read this format, I found [http://coolreader.orgc coolreader] the only canditate. I like the idea to simply compress the HTML file with all its folders and then just start over ;) Without dealing with MHT. So I gave it a try.
 +
 +
Even complex pages seam to start, despite it seams you are loosing the images (they won't show up). But formatting (e.g. source code, tables, ...) is excellent. Big documents take a lot of time to load on the [[Ebook_Reader#PocketBook_Pro_912|Pocketbook device]] to try this out, but nevertheless, they will.
 +
 +
Navigation is not easy because of the lack of a "table of content". I converted aopen book 10 MB zipped HTML document from [http://www.galileocomputing.de/katalog/openbook/ gallileocomputing.de]  which resulted in a 2000 (!) pages, 25 MB HTMLZ file. Things are very slow, altogether!
 +
 +
My recommendation for smaller websites, if coolreader is avaiable for the device.
  
 
=== eCub ===
 
=== eCub ===
Did not succeed the "table content" test, but was quiet handy. It is only free for private usage and has the same limitations like [[#Jutoh|Jutoh]]. It seams to be a simpler version of that program.
+
[http://www.juliansmart.com/ecub eCub] Did not succeed the "table content" test, but was quiet handy. It is only free for private usage and has the same limitations like [[#Jutoh|Jutoh]]. It seams to be a simpler version of that program.
 +
 
 +
For this and the next program and more complex documents it is important to edit the settings and add a correct css file. Why it is not possible to use the style links in the HTML document is a miracle for me.
  
 
=== Jutoh ===
 
=== Jutoh ===
This is not free software. But it is a full featured Editor for ebook generation.
+
[http://www.jutoh.com Jutoh]Jutoh is not free software]. But it is a full featured Editor for ebook generation.
  
 
Advantages:
 
Advantages:
Line 281: Line 461:
 
Disadvantages:
 
Disadvantages:
 
* Project based. So you first have to import a document via GUI in a project file.
 
* Project based. So you first have to import a document via GUI in a project file.
* This is a GUI-tool, no commandline tool. But it has a batch mode.
+
* This is a GUI-tool, no commandline tool. But it has a batch mode, if you formerly created a project file.
 
* Generation of the "table of content" will not allays succeed with my LaTeX documents.
 
* Generation of the "table of content" will not allays succeed with my LaTeX documents.
  
= PocketBook Pro =
+
= Changelog =
A description of the device I already made [http://www.mobileread.com/forums/showpost.php?p=1895107&postcount=66 in this thread (in language german) at mobileread)] and [http://www.mobileread.com/forums/showthread.php?t=162607 here more actual]. A Youtube video you can find [http://www.youtube.com/watch?feature=player_embedded&v=OksWzxZ1knU here].
+
* Section about Scrapbook and Calibre to ePub --[[User:Apos|Apos]] 03:17, 13 January 2012 (CET)
 
+
* Section about [[#A word on MHT|MHT]] --[[User:Apos|Apos]] 21:44, 4 January 2012 (CET)
I like it because of its excellent hardware quality and intuitive handling through the hardware buttons and stylus. But the most important thing is: the Linux firmware is completely open. Surely it it possible to hack even a kindle, but all Pocketbooks make it easy to add software and compile it on your own. Without any restrictions. The disadvantages are: hight price, slower hardware, the display is not "state of the art" (type Vizplex). But I can honestly say: it suffices my needs completely.
+
* Nautilus-script for [[#Briss|briss]]- --[[User:Apos|Apos]] 05:32, 5 January 2012 (CET)
 
+
* Nautilus-script for [[#Briss|briss]] whichprevent user from using sftp mounts - --[[User:Apos|Apos]] 15:59, 5 January 2012 (CET)
== Fbreader software ==
+
* There was an error in [[#Contrast enhancement]], sorry. The reader is not fbreader, but djvureader, link added
Tried the [http://translate.google.com/translate?js=n&prev=_t&hl=en&ie=UTF-8&layout=2&eotf=1&sl=ru&tl=en&u=http%3A%2F%2Fwww.the-ebook.org%2Fforum%2Fviewtopic.php%3Ft%3D15696%26sid%3D935d71e6ba766d9d0d65558134242e99&act=url fbreader180]. The features are really good, but the integration with the PocketBook 912 Pro is disappointing:
 
* No TTS
 
* No Notes
 
  
So I stay with the stock, but a little [http://www.mobileread.com/forums/showthread.php?t=162038&highlight=fbreader+bold buggy] fbreader and will wait until it becomes more mature.
+
[[Category:Document Management]]
 +
[[Category:E-Books]]

Latest revision as of 09:00, 15 October 2012

Changelog here

A word before we start

I bought an eBook-Reader last chrismas (2011), a PocketBook Pro 912 with an open source linux operating system and firmware and an e-Ink Display.

By the way:

  • this article is based on the prerequisites to work with an e-book reader with an e-Ink display
  • tips might be applicable to other devices but could be obsolte for future - better -display technologies
  • this article uses Linux (Ubuntu 11.10) and a lot of bash shell scripting

I didn't know, how much work it will cost myself to dive deep into the theme. The e-Ink-technology was what kept my interest. What I didn't know: this technology also needs a special way of reading your documents. It is not possible to simply throw a PDF to your device being sacrificed.

Why PDF is not a good choice for e-reader devices

Years ago, when a began dealing with LaTeX - my preferred text processing method - I already had been aware of the problem that the information about the semantics of the given text in a PDF document is extremely low, if not exists at all any more.

Speaking in technical terms one would say, the entropy of a PDF-document is - compared to HTML, SGML, or XML, or even LaTeX - very low. What dows that mean practically?: In a PDF document, the computer - not you - cannot distinguich any more if a text passage is pure text, which format is has, if it is a header, ... We ourself with our brain can do this very well, a computer has to use e.g. an OCR programm, scan the PDF, and do a "guess" with certain algorithm. In HTML the Markup (the "M" in HTML) assign a header a "header" tag and therefore the header is and remains a header, no matter what page format, script size you are applying to it!

PDF is an excellent print media but not (!) a format that should be used reading documents on a computer display. Why do you think the internet has consists on the markup language HTML?

Reasons for why you might want to convert your documents are:

  • you cannot alter the font size in a PDF
  • a program cannot easily get the text flow (split page layout)
  • on a reader with e-ink display, it might be impossible to read the document, because the fonts are to small or not of ideal shape (sans serif)

How do we like to do the E-Book-conversion:

  • unattended
  • batch-like
  • automatically
  • repeatedly
  • with open source software ;)

Most of what you read here I had found in the excellent mobile read forums - here about E-Book formats.

Test it

If you like to test you should have the following documents on your side:

  1. Best a LaTeX or LyX document in a separate directory with
    • titlepage
    • table of content
    • footnotes
    • pictures and all that other fancy stuff ;)
  2. Convert this document to HTML, RTF, PDF
  3. Test all the conversion programs
  4. Test your reader software, if it can show
    • titlepage
    • table of content
    • footnotes
    • pictures and all that other fancy stuff ;)

If you like, you can use this Lyx-file: File:Lyx whysiwym editor.zip. Be aware you need Lyx for this file to run.

The process

PDF

There are mainly 3 points of interest, when it comes to read an PDF on an e-Ink device:

  1. Keep all metainformation - like the table of content - after conversion
  2. Cut as much of unnecessary space (title, white borders, ...)
  3. Batch processing

Cropping the pages is the most sophisticated solution

Remember ... we are talking about e-Ink devices! We like to get as much as possible visible text without white page margins.

Briss

I had different approaches to cut the pages my PDFs, from the commercial acrobat writer, over imagemagick and other "crop"-tools. After all, and keeping the 3 points from above in mind the only tool I can recommend is briss.

Wrapper for the java program briss

Download and extract it somewhere. Written in java, and so that you can use it as any other program, you should write a little wrapper an put in in your path (e.g. ~/bin, /usr/local/bin, /usr/bin):

$> sudo youreditor /usr/local/bin/briss

#!/bin/bash
version="0.0.13"
cd "$(pwd)"
java -jar /whereever/is/installed/briss-${version}/briss-${version}.jar "${@}"

$> sudo chmod 755 /usr/local/bin/briss

Nautilus script for briss

Wouldn't it be nice to open a PDF or even a symbolic link in nautilus?

Her we go: put this file into

gedit "~/.gnome2/nautilus-scripts/Open with briss ..."

and make it executable

chmod "~/.gnome2/nautilus-scripts/Open with briss ..."

The following script works in most environments, but NOT with a sftp-mounted filesystem (see next script):

#!/bin/bash

cd "$NAUTILUS_SCRIPT_CURRENT_URI"

MYFILE="${1}"
MYBASENAME="$(basename ${MYFILE} .${MYTYPE})"

if file -L ${MYFILE} | grep -v grep | grep "PDF document" 
then
	briss ${MYFILE} 
else
	zenity --info --title "Error" --text "${MYFILE} seams not to be a file of type ${MYTYPE}. Please check."
	exit 0
fi

But if you are mounting the Pockebook like me via sftp, you will run into trouble. Nautilus scripts don't work in this environment (there scroll down to "Gnome-specific variables")!

What we have to do

  • Create an symbolic link so that your can access the mountpoint directly, e.g. with
ln -s "~/.gvfs/SFTP - Pocketbook" ~/MyPocketbook
  • When executing the script: warn the user, if an sftp mounted directory is detected and abort
  • When executing the script: set the propper path variable

And that's the way, it works, when you access your device via sftp and are within the ~/.gvfs-dir or a link to it:

#!/bin/bash
# version 0.0.1, 12-05-2012, Axel Pospischil, http://blue-it.org

MYTYPE="pdf"
MYFILE="${1}"

MYPATH="${NAUTILUS_SCRIPT_CURRENT_URI}"

if echo ${MYPATH} | grep -v grep | grep "^sftp:"
then
# SFTP (sftp://)
	zenity --error --title "Error" --text "${MYPATH}\n\nYou try to run this script within a directory mouted via sftp.\n\nThis will not work.\n\nPlease do a symbolic link with e.g. \n\nln -s ~/.gvfs ~GVFS\n\nThen access the device with this newly created link and - if you like - create a shortcut in nautilus!"
	exit 1
else
# Symbolic link (file:///)
	if echo ${MYPATH} | grep -v grep | grep "^file:"
	then
		MYPATH="$(echo "${NAUTILUS_SCRIPT_CURRENT_URI}" | sed -e 's/^file:\/\///g' | sed -e 's/\%20/ /g' )"
	else
# Standard (local dir)
		MYPATH="${NAUTILUS_SCRIPT_CURRENT_URI}"
	fi
fi


cd "${MYPATH}"

MYBASENAME="$(echo "${MYFILE}" | sed -e 's/\.pdf$//g' )"
[ "${MYBASENAME}" ] || exit 1

if file -L "${MYFILE}" | grep -v grep | grep "PDF document" 
then
	briss "${MYFILE}"

	FILESIZE=$(ls -l "${MYBASENAME}_cropped.${MYTYPE}" | awk '{ print $5 }')
	[ "${FILESIZE}" -eq 0 ] && rm -f "${MYBASENAME}_cropped.${MYTYPE}"

else
	zenity --info --title "Error" --text "${MYFILE} seams not to be a file of type ${MYTYPE}. Please check."
	exit 1
fi

Unattended batch conversion with briss

This way prepared, I wrote a batch conversion program. The main problem when writing a conversion script is, that many ebook titles contain chars like "[" or "&". This is something the bash does not like at all! Speaking shortly: I know this script has a lot of duplicate code in it. But believe me when I say: I tried more than once to change this.

The script (assumed you name it crop_with_briss.sh) mainly does the following:

  • General: all found PDF's are cropped and prefixed with "_cropped.pdf" (this is the default way briss works in batch mode "-s")
  • crop_with_briss.s myPDF.pdf: The given PDF will be automatically cropped to myPDF_cropped.pdf
  • crop_with_briss.s -l : Scan for all PDF-files in the local directory. Files which are formerly cropped (a file with the name *_cropped.pdf" exists, will not be cropped again!
  • crop_with_briss.s -lf : Scan for all PDF-files in the local directory. All (!!!) PDF's are cropped again.
  • crop_with_briss.s -r : Same as -l, but the script recurses into all (!) subdirectories.
  • crop_with_briss.s -rf : Same as -lf, but the script recurses into all (!) subdirectories.

So, here we go:

#!/bin/bash
# version 0.0.1, 12-03-2012, Axel Pospischil, http://blue-it.org
# version 0.0.2
#    - added parenthesis when doing filetype check for singe-mode: file "${1}" 

which briss > /dev/null || echo "Briss must be installed to run this script."
which briss > /dev/null || exit 0

[ "${1}" == "" ] && echo "Please specify -l (local path only) or -r (recursive) as parameter." && exit 1
MODE=""
FORCE="false"
[ "${1}" == "-r" ]  && MODE="recursive"
[ "${1}" == "-rf" ] && MODE="recursive"
[ "${1}" == "-rf" ] && FORCE="true"
[ "${1}" == "-l" ]  && MODE="local"
[ "${1}" == "-lf" ] && MODE="local"
[ "${1}" == "-lf" ] && FORCE="true"
file "${1}" | grep -v grep | grep "PDF document" && MODE="single"

if [ "${MODE}" == "single" ]
then
	cd $(pwd)
	briss -s "${1}"
	exit 1

fi

	
if [ "${MODE}" == "recursive" ]
then

if [ "${FORCE}" == "true" ]
then	
# First scan the local dir, then recursive
find '.' -name "*.pdf" | grep -v "cropped" | awk '{print $0}' | sed -e 's/^\.\///g' | sed -e 's/\.pdf$//g' | sed -e 's/(/\\(/g' | sed -e 's/)/\\)/g' | sed -e "s/'/\\\'/g" | sed -e 's/\[/\\[/g' | sed -e 's/\]/\\]/g' | sed -e 's/\&/\\&/g' | sed -e 's/\ /\\ /g' | awk '{system("briss -s " $0 ".pdf");}'
exit 1

else
find '.' -name "*.pdf" | grep -v "cropped" | awk '{print $0}' | sed -e 's/^\.\///g' | sed -e 's/\.pdf$//g' | sed -e 's/(/\\(/g' | sed -e 's/)/\\)/g' | sed -e "s/'/\\\'/g" | sed -e 's/\[/\\[/g' | sed -e 's/\]/\\]/g' | sed -e 's/\&/\\&/g' | sed -e 's/\ /\\ /g' | awk '{system("\[ -f " $0 "_cropped.pdf \] \|\| briss -s " $0 ".pdf");}'
exit 1

fi

fi

if [ "${MODE}" == "local" ]
then

if [ "${FORCE}" == "true" ]
then
ls -b *.pdf | grep -v "cropped" | awk '{print $0}' | sed -e 's/\.pdf$//g' | sed -e 's/(/\\(/g' | sed -e 's/)/\\)/g' | sed -e "s/'/\\\'/g" | sed -e 's/\[/\\[/g' | sed -e 's/\]/\\]/g' | sed -e 's/\&/\\&/g' | sed -e 's/\ /\\ /g' | awk '{system("briss -s " $0 ".pdf");}'
exit 1

else
ls -b *.pdf | grep -v "cropped" | awk '{print $0}' | sed -e 's/\.pdf$//g' | sed -e 's/(/\\(/g' | sed -e 's/)/\\)/g' | sed -e "s/'/\\\'/g" | sed -e 's/\[/\\[/g' | sed -e 's/\]/\\]/g' | sed -e 's/\&/\\&/g' | sed -e 's/\ /\\ /g' | awk '{system("\[ -f " $0 "_cropped.pdf \] \|\| briss -s " $0 ".pdf");}'
exit 1

fi

fi

Contrast enhancement

Briss is doing a very good job cropping the documents. And when it comes to the adobe reader, I think, the contrast of the files is much better after cropping with briss.

But e-Ink devices normally only can show 16 gray scale "colors". So it would be handy to convert a document to grayscale and thereby enhance the contrast ;)

The solutions:

  1. convert the PDF with imagemagick ("convert" or "mogrify" are the corresponding commands)
  2. use a reader software with the capability of enhance the gamma or contrast of the content

The problems:

  1. imagemagick
    1. does not preserve the metacontent of our PDF (you loose the table of content!)
    2. the document is not searchable any more, or a TTS is not working any more
    3. the document becomes significantly bigger
    4. the result on my PocketBook Pro 912 is not what I expected, when it comes to quality and contrast enhancement. The PDf's seam to be not that crispy, clear.
  2. reader software
    1. I did not find any software, that satisfied me
    2. There is mainly one that can handle DJVUs: convert the pdf to djvu and use a fork of djviewer, it's called djviewer-bw and you will find it, when you search the http://www.mobileread.com forum. Mainly you should read and post in this thread. The modified software has 3 levels for viewing documents: black and white, grayscale and normal. You can choose this either my quickmenu or clicking in the upper left, bottom left or bottom right area of the reader.
    3. Just for note: the reader software coolreader can not (!) display PDF files.

Do it with Imagemagick:

convert -density 600 -contrast -gamma 0.1 -colorspace GRAY input.pdf output.pdf

scriptified ;)

#!/bin/bash
MYFILE="${1}"
MYTYPE="pdf"
MYCONVTYPE="pdf"
TMPDIR="/tmp"

cd "$(pwd)"

if file "${MYFILE}" | grep -v grep | grep "PDF document"
then

	MYBASENAME="$(basename ${MYFILE} .${MYTYPE})"
	MYCONVNAME="${MYBASENAME}-gray.${MYCONVTYPE}"

	if convert -density 600 -contrast -gamma 0.1 -colorspace GRAY "${MYFILE}" "${TMPDIR}/${MYCONVNAME}"
	then

		FILESIZE=$(ls -l "${TMPDIR}/${MYCONVNAME}" | awk '{ print $5 }')
		[ "${FILESIZE}" = "0" ] && rm -f "${TMPDIR}/${MYCONVNAME}"

		mv "${TMPDIR}/${MYCONVNAME}" .

	else
		echo "ERROR converting the pdf."
	fi

else
	echo "Wrong format. Please use a PDF file."
	exit 1
fi

A word on djvu and rescanning of PDF

DJVU is a very good format for keeping your scanned documents. It is NOT a good format for reading text on an e-Ink device. It has the same disadvantages as PDF.

There are a lot of converters out there. Mainly

  • pdf2djvu (contained in any linux distribution). Keeps all the metainformation - including the table of content - of the pdf!
  • djvudigital, which uses ghostscript. Because there are licence issues, you have to compile it by yourself, which isn't very much fun.

In the part of contrast enhancement (see problem nr. 2.2) there is a link to a djviewer fork that has a contrast enhanced viewing mode (mainly black and white mode) for djvu files. But I could not see a big difference compared to a good cropped pdf. This depends heavily on the kind of pdf you have.

If you are interested, please read the corresponding webpages or the onlinemanuals ;)

From LaTeX to PDF or HTML

My LaTeX-documents can easily be altered to produce appropriate output for an e-Ink device.

But generating a PDF will be only suitable for a certain e-Ink device (when it comes to the font size). By the way: there is no direct possibility to create a ebub or mobi document from LaTex (as far as I know at the moment).

So my preferred output format is HTML! There is nothing more to say about.

  • The table of content is preserved
  • No problems with font-sizing
  • Easy conversion to other ebook formats (epub, mobi, ...)

Elyxer

I am using elyxer to convert my LaTeX files. I have to admit, that I am working - exclusively (!) - with LyX. So everything here (elyxer, scripts) is only suitable, if you are working with lyx. You can alter the scripts for usage with plain latex, there should not be any problem.

The next script converts all lyx-files, either locally ( -l ) or recursively ( -r ):

#!/bin/bash
[ ! -f /usr/bin/elyxer.py ] && echo "Elyxer must be installed to run this script." && exit 0

[ "${1}" == "" ] && echo "Please specify -l (local path only) or -r (recursive) as parameter." && exit 1

if [ "${1}" == "-r" ]
then
	for myfile in "$(find '.' -name "*.lyx" | awk '{print $0}' | sed -e 's/ /\\ /g' | sed -e 's/.lyx//g')"
	do 
        	echo "${myfile}" | awk '{system("elyxer.py " $0 ".lyx > " $0 ".html");}'
	done
fi

if [ "${1}" == "-l" ]
then
	for myfile in "$(ls *.lyx | awk '{print $0}' | sed -e 's/ /\\ /g' | sed -e 's/.lyx//g')"
	do 
        	echo "${myfile}" | awk '{system("elyxer.py " $0 ".lyx > " $0 ".html");}'
	done
fi


The same script for producing PDF-files using pdflatex ( lyx --export pdf2 ). You can easily adopt this:

#!/bin/bash

[ "${1}" == "" ] && echo "Please specify -l (local path only) or -r (recursive) as parameter." && exit 1

if [ "${1}" == "-r" ]
then
	for myfile in "$(find '.' -name "*.lyx" | awk '{print $0}' | sed -e 's/ /\\ /g')"
	do 
        	echo "${myfile}" | awk '{system("lyx --export pdf2 -f " $0);}'
	done
fi

if [ "${1}" == "-l" ]
then
	for myfile in "$(ls *.lyx | awk '{print $0}' | sed -e 's/ /\\ /g')"
	do 
        	echo "${myfile}" | awk '{system("lyx --export pdf2 -f " $0);}'
	done
fi

A word on MHT

There would be an ideal solution for archiving webpages in one single file: the mht-format. There are plugins for firefox to view these files.

Unfortuneately none of the existant readers of the Pocketbook is able to read mht-files. One excuse: the coolreader, but there are errors displaying complex files and also no "table of content".

Probably future software versions can handle this format.

Articles about using and creating mht files:

From PDF to epub, reflow and rescanning of PDF

Generally: no good idea, but possible. Why? The reason is simple: PDF has almost no met information about the document structure any more. So all tools more or less have to guess - of course a very clever guess - about the document structure, what is a heading, what is text, which kind of heading do we have, 2 or more column pages. What should I say: I leave it and read my PDf-Files - if they are too big for my screen - in landscape format. Any 10 device is capable to turn the pages 90° to the left or right so you can read the pages.

It doesn't matter, how you generate your epub out of an PDF or use a reflow software - either standalone or integrated in your reader: the more complex your PDF document is, the more disappointing the result will be. So better don't waste your time. There might be a time when these converters are that smart, that they can produce acceptable results out of complex documents, but my approach would be, to buy an appropriate format (like HTML or EPUB) before I run into this trouble!

The reflow software of the most readers out there seams to do a quiet good job. But there are some problems, when it comes to complex documents.

For those, who like to try it out:

From HTML to epub or mobi (or htmlz)

[Update --Apos 08:30, 26 February 2012 (CET)] Amazon is discontinuing the mobi pocket format. Despite there are and will be a lot of books in mobi format and kindlereaders will support it, there is no guarantee for this proprietary format can be read on future devices of other vendors! [Update End]

One of the most sophisticated formats when it comes to eBooks is epub.

HTML is - from my point of view - the best starting point for conversion. As described above it can easily be created using LaTeX or even other word processors. There are a lot of tools out there converting from e.g. HTML. But almost none is capable of keeping the "table of content". And this is - when it comes to ereading - one of the most important part.

My prerequisites for a conversion tool

  • should be opensource
  • crossplatform (Windows, Linux, Mac, ...)
  • commandline batch processing
  • should preserve most of the text structure (table of content, footnotes, ...)

So here are the candidates:

  1. The free tool Kindlegen from amazon
  2. The well known Calibre cross platform software
  3. For private use free ist eCub, a simple version of the next tool
  4. The shareware Jutoh, but for demonstration purposes you can test it

How do I get my html page(s)

A good question. Normally you would just download your html to your desktop. Every browser will will do that.

The ultimate method for me is using scrapbook extension for firefox. With this extension you can recursively (!) download - even password protected - protected pages locally to my computer. This works best, if you simply choose a page with all the links you like to access a master table of content (toc).

The entry point is then a page "index.html" in the scrapbook directory you specified. But that doesn't matter, because you will have that link in you browser, when you open the page with it.

Tidy up your HTML

Before you are going on any further, you should be aware to work with a so called "well formed" html document. You can do this by using the software tidy:

tidy -m -asxhtml -utf8 <yourfile>.html

Most pages though should be "clean" and you would not need to tidy them up.

You also can use the free online software tidy service. But be aware, that sending confidential content to a web service might not be a good idea!

If you are under windows, this site might be from interest for you.

Calibre

Calibre does a good job, but also has its limitations. HTML documents, that are not too complex can be converted without hassle into a vast variety of formats.

You can either use the graphical user interface of calibre or use the batch commandline programm ebook-convert which can be fine-tuned in a various of ways.

Scrapbook and Calibre - a dream team

I had some very good experiences with the combination of the firefox extension scrapbook, Calibre and Coolreader or fbreader (particuarly fbreader180).

I download my page coolection with scrapbook, open the result in my broswer, copy the link called something like "file:///path/to/index.html" into the import dialog of Calibre, import it, edit the metadata and export it to epub forma.

This works pretty well and Coolreader also shows up graphics very well (which neither Fbreader - excuse: fbreader180 -, nor Adobe Reader do).

You also will be provided with a toc, if one exists, but since you downloaded an entrypage with scrapbook, you should always have a good starting point setting a bookmark on the first page .

Kindlegen

[Update February 2012] The "mobipocket" format is going ot die! This is - for me - another chapter to the discussion about using DRM protected eBooks. People: if any possible, don't use them. Use ePub, wherever avaiable!

Kindlegen Is crossplattform and does a very good job. I hav nothing to complain. It produces files in the "*.mobi"-format.

The only problem I have:

  • the "table of content" goes away!

The exotic htmlz format

Thereby I found the zip compressed HTML format "HTMLZ". I never recognized this before. Searching a software, that can read this format, I found coolreader the only canditate. I like the idea to simply compress the HTML file with all its folders and then just start over ;) Without dealing with MHT. So I gave it a try.

Even complex pages seam to start, despite it seams you are loosing the images (they won't show up). But formatting (e.g. source code, tables, ...) is excellent. Big documents take a lot of time to load on the Pocketbook device to try this out, but nevertheless, they will.

Navigation is not easy because of the lack of a "table of content". I converted aopen book 10 MB zipped HTML document from gallileocomputing.de which resulted in a 2000 (!) pages, 25 MB HTMLZ file. Things are very slow, altogether!

My recommendation for smaller websites, if coolreader is avaiable for the device.

eCub

eCub Did not succeed the "table content" test, but was quiet handy. It is only free for private usage and has the same limitations like Jutoh. It seams to be a simpler version of that program.

For this and the next program and more complex documents it is important to edit the settings and add a correct css file. Why it is not possible to use the style links in the HTML document is a miracle for me.

Jutoh

JutohJutoh is not free software]. But it is a full featured Editor for ebook generation.

Advantages:

  • Support of almost any ebook format
  • Fully WHYSIWYG editor

But I don't need the latter, because I want to edit my documents with the processor of my choice.

Disadvantages:

  • Project based. So you first have to import a document via GUI in a project file.
  • This is a GUI-tool, no commandline tool. But it has a batch mode, if you formerly created a project file.
  • Generation of the "table of content" will not allays succeed with my LaTeX documents.

Changelog

  • Section about Scrapbook and Calibre to ePub --Apos 03:17, 13 January 2012 (CET)
  • Section about MHT --Apos 21:44, 4 January 2012 (CET)
  • Nautilus-script for briss- --Apos 05:32, 5 January 2012 (CET)
  • Nautilus-script for briss whichprevent user from using sftp mounts - --Apos 15:59, 5 January 2012 (CET)
  • There was an error in #Contrast enhancement, sorry. The reader is not fbreader, but djvureader, link added