Newbie question: How to convert a Wikidot to an offline project and embed images

Hans's Avatar

Hans

03 Jan, 2016 04:25 PM

Hello all,

My name is Hans and I am a technical translator for German > Dutch.

I created the Wiki http://cafetran.wikidot.com and I'd like to convert it to well-structured document that can be viewed offline. The file format isn't really that important (ePub, DOCX, PDF, whatever ...).

I have found this site that allows conversion of Wikidot content to Markdown files: http://apps.philippklaus.de/static/convert_wikidot_to_markdown.html

I use TextWrangler for editing and I noticed that embedded images (e.g. [[image /Users/Hans/Desktop/memsource/1.png width="680" title="Click to enlarge the image" link="</Users/Hans/Desktop/memsource/1.png> " style="border: 1px solid gray;"]]) aren't displayed in Marked 2.

Q: Is it possible to embed images?

I'd be very grateful for any tips on converting a Wikidot project to an offline project.

Best wishes for 2016!

Hans

  1. Support Staff 1 Posted by Brett on 03 Jan, 2016 05:16 PM

    Brett's Avatar

    That isn't markdown syntax for images. It would be

    ![optional caption](path/to/image.png)

    As far as converting a folder of files, Marked supports index formats via LeanPub and mmd_merge styles, as well as include syntax. I can provide additional info when I'm back at my main computer.

    -Brett

  2. 2 Posted by Hans on 03 Jan, 2016 05:20 PM

    Hans's Avatar

    Thank you, Brett. I’ll test it right away :).

    Hans

  3. 3 Posted by Hans on 05 Jan, 2016 01:06 PM

    Hans's Avatar

    I have been reading the documentation and playing a little with Marked. What a great software; if only I had discovered it earlier!

    I will now definitely start porting http://cafetran.wikidot.com to a Markdown project. I am planning to host it in HTML via a public Dropbox folder. (Besides that, I also will be offering a PDF version for print out and an ePub version for e-readers.)

    Q: Does anyone have experience with hosting a HTML project via Dropbox?

    My storage location for maintaining the project (not necessarily the same as the one for the public access) would be:

    /Users/Hans/Dropbox/CT/Man

    Q: If I would want to make the part before /Dropbox generic, so that other persons can help me with writing and testing, from there *own* Dropbox (e.g. /Users/AntonObermayer/Dropbox/CT/Man), is there a way to code the path (e.g. to the embedded images)?

    Regarding the HTML version of the CafeTran manual:

    Q: What would be the easiest way to divide the webpage in 1/4 and 3/4, with the 1/4 column a the left-hand side containing the TOC as navigation means?

    Regarding the PDF version of the CafeTran manual:

    I've understood that the internal links cannot be made clickable (yet?), unless I use a work-around with Acrobat Pro (which I don't have ;)).

    Q: Is it possible to have the pages numbered and the page numbers appear in the TOC?

    Regarding the conversion from Wikidot markup to Markdown: I have decided to do this in TextWrangler, using regular expressions etc. After all, the conversion is pretty straight forward. However, with 454 Wiki articles it is a lot of work. Currently I am looking for a specific regular expression:

    Q: Does anyone have a regular expression (preferably for TextWrangler) on shelf to convert Wikidot-style links to their Markdown counterparts? E.g. [http://www.barebones.com/products/textwrangler TextWrangler for Mac] to [TextWrangler](http://www.barebones.com/products/textwrangler).

    Currently I have the images related to a single Wiki article located in a folder with the name of the article (as this is how Wikidot created an export of my database).

    So the images belonging to /Users/Hans/Dropbox/CT/Man/source/adding-an-existing-glossary.txt are located in the folder /Users/Hans/Dropbox/CT/Man/files/adding-an-existing-glossary. For the time being, I will maintain this structure (at a later moment, when I am making new screenshots, I will probably start using one central folder for all PNGs.) For now: I will have to insert the name of the folder containing the images belonging to specific article, in the path to the linked images. E.g. for the article Add the current Wikidot notation is:

    [[image 14.png width="680" title="Click to enlarge the image" link="*local--files/adding-glossary-terms/14.png" style="border: 1px solid gray;"]]

    In Markdown this would be:

    ![](/Users/Hans/Dropbox/CT/Man/files/adding-glossary-terms/14.png)

    Q: Does someone have a way to have the file name of the article (adding-glossary-terms.md) inserted (without the extension) in an image link as above?

    I have already experimented with Multi-file documents and that was the point where I started to see the great power of Marked 2! Wow!

    Before I am getting too enthusiastic, some questions:

    Q: What is the maximal depth of nesting Multi-file documents?

    Q: What is the maximal number of files to nest in total (can I nest/embed all 454 MDs)?

    Q: How to insert structurising headings in the TOC? E.g.:

    III. Interoperability with other tools

     1. memoQ
     2. Déjà Vu
     3. SDL
     
    The only way that I see is to create a dummy article that only contains the heading.

    Thank you very much for all suggestions!

    Hans

  4. 4 Posted by Hans on 05 Jan, 2016 02:48 PM

    Hans's Avatar

    In an attempt to solve this myself:

    Q: Does someone have a way to have the file name of the article (adding-glossary-terms.md) inserted (without the extension) in an image link as above?

    I tried this:

    1. Copy the MD files to the folder that contain the PNGs.
    2. Embed the images in the MD files without any path: ![](preparation.png)

    When I saved everything, in Marked 2 the images for one opened MD where visible.

    Then I created the Master.MD:

    # Document title

    <!--TOC-->

    <<[/Users/Hans/Desktop/xlz/xlz.md]

    <<[/Users/Hans/Desktop/adding-glossary-terms/adding-glossary-terms.md]

    As it turned out, Marked 2 couldn't find the path to the images in the included MDs.

    I had to change the path to:

    ![](/Users/Hans/Desktop/xlz/preparation.png)

  5. Support Staff 5 Posted by Brett on 05 Jan, 2016 03:52 PM

    Brett's Avatar

    Most of this is beyond Marked's stated features and requires additional coding. If you're interested, I can contact you privately with rates and an estimate. To answer a few Marked-related questions, though:

    Q: Is it possible to have the pages numbered and the page numbers appear in the TOC?

    Not at the current time. Once I've finished the RTF export rewrite, I intend to tackle a custom PDF generator that will provide both working internal links and features such as this. In its current incarnation, the TOC is generated from headlines at a point where the file isn't paginated yet, so it really has no means of determining where the anchor is going to fall after pagination.

    So the images belonging to /Users/Hans/Dropbox/CT/Man/source/adding-an-existing-glossary.txt are located in the folder /Users/Hans/Dropbox/CT/Man/files/adding-an-existing-glossary. For the time being, I will maintain this structure (at a later moment, when I am making new screenshots, I will probably start using one central folder for all PNGs.) For now: I will have to insert the name of the folder containing the images belonging to specific article, in the path to the linked images. E.g. for the article Add the current Wikidot notation is:

    This is currently a sore spot when including multiple files in a single document. Once the main document is compiled, any relative references within subfolders break because it builds them from the root document instead of the document within the subfolder. It's an issue that I could solve by rewriting image paths during processing, but I've always felt that might cause more issues than it would solve. I'm reconsidering that, but at the current time you either need a central folder (as you mentioned) with images resolved to an absolute path or a relative path from the destination index document.

    Beyond that, a custom pre-processor could (possibly) handle the rewriting, using the environment variables Marked passes to processors to rewrite image paths. It would likely be complex as even the pre-processor doesn't run until after the included documents are compiled, erasing the obvious idea of rewriting based on current file path.

    Q: Does anyone have a regular expression (preferably for TextWrangler) on shelf to convert Wikidot-style links to their Markdown counterparts? E.g. [http://www.barebones.com/products/textwrangler TextWrangler for Mac] to TextWrangler.

    This isn't necessarily a Marked feature, but you can script this pretty easily to run on the entire folder (probably using Ruby or Python) and searching for:

    \[(\S+) (.*?)\]
    

    and replacing with the template:

    [$2]($1)
    

    You'd want to strip any leading and trailing whitespace from $2 in order to ensure Markdown compatibility.

    Q: What is the maximal depth of nesting Multi-file documents?

    Q: What is the maximal number of files to nest in total (can I nest/embed all 454 MDs)?

    There shouldn't be a limit, but I haven't tested with that many includes or nested includes beyond 3 levels. It may cause slower rendering, but it should work.

    Q: How to insert structurising headings in the TOC? E.g.:

    The easiest way would be to include the headers in the index document prior to the include directive. This would only work if you were using Marked's include syntax and not one of the index formats (leanpub or mmd_merge). You'd use something like:

    ## Interoperability with other tools
    
    <<[memoQ.md]
    
    <<[Déjà-Vu.md]
    
    ## Next category
    
    [...]
    

    If the generation of the index file were scripted, you could then inject a level 3 header at the top of each file based on the filename. The script would read in all of the files, regex replace the wiki syntax, add the header, and then rewrite the original file (or save to a ".md" copy). Then it would add the <<[] directive to the index file with the path to the current filename. The ## Section title could also be created from the subfolder name and inserted at the beginning of each processed subfolder.

    Like I said, that's well beyond Marked's built-in functionality, but I can consult on the generation script if needed.

    -Brett

  6. 6 Posted by Hans on 05 Jan, 2016 04:35 PM

    Hans's Avatar

    Thank you very much, Brett!

    I’ll study your answers very thoroughly. I hope you don’t mind that some questions indeed exceeded the limits of Marked 2.

    Cheers,

    Hans

  7. 7 Posted by Hans on 05 Jan, 2016 04:59 PM

    Hans's Avatar

    Thanks, Brett!

    With your help I could create the TW syntax for this question:

    Q: Does anyone have a regular expression (preferably for TextWrangler) on shelf to convert Wikidot-style links to their Markdown counterparts? E.g. [http://www.barebones.com/products/textwrangler TextWrangler for Mac] to TextWrangler.

  8. 8 Posted by Hans on 08 Jan, 2016 09:23 AM

    Hans's Avatar

    I'm making good progress with the conversion process.

    I have some more questions:

    When I want to have a TOC in the master document, I noticed that it's not enough to insert the tag <!--TOC--> only in the master document, before the listing of the included subdocuments.

    I also have to insert a <!--TOC--> at the start of every subdocument.

    Q: Is this observation correct?

    One other question:

    I'm trying to shorten the link to the embedded images from:

    ![](/Users/Hans/Dropbox/CT/Man/en-GB/a-glossary-is-not-a-dictionary-dev1.png)

    to something more generic like:

    ![](˜/Dropbox/CT/Man/a-glossary-is-not-a-dictionary-dev1.png)

    I have tried several ways, including using the tilde and inserting a Transclude statement at the top of every subdocument.

    Q: Is there no way to shorten the path to the images while they are still displayed in the master document?

  9. 9 Posted by Hans on 10 Jan, 2016 07:46 AM

    Hans's Avatar

    >I also have to insert a <!--TOC--> at the start of every subdocument.
    >Q: Is this observation correct?

    I think that I've misunderstood this. I've now created an HTML export:

    https://dl.dropboxusercontent.com/u/15919910/ct/index.html

    and the Master TOC is repeated at the start of every included sub document.

  10. 10 Posted by Hans on 10 Jan, 2016 09:08 AM

    Hans's Avatar

    I just created a paginated PDF output. This one only contains one central TOC. That's different behaviour from the HTML creator. I'm a little puzzled now.

  11. 11 Posted by Hans on 11 Jan, 2016 07:55 AM

    Hans's Avatar

    I have finished the conversion of my wikidot site to Markdown and I'm still very enthusiastic about this language and about Marked 2 as a tool.

    I can now start the arranging of the individual files to a logical structure, via a skeleton document 'index.md':

    Transclude Base: ~/Dropbox/CT/Man/en-GB/

    # CafeTran Training Manual
    ## Table of Contents

    <!--TOC-->

    <<[a-glossary-is-not-a-dictionary.md]
    <<[abbreviations.md]
    <<[about-dialog.md]
    <<[about.md]
    <<[adding-an-existing-glossary.md]

    I have some advanced questions now and hope that someone will take the time to answer them.

    Regarding PDF output: Is it possible to create one global TOC via index.md and let every included chapter begin with its own small TOC?

    Regarding HTML output: I now have one big file index.HTML that looks okay, but that requires a long loading time (its size is about 240 MB). Is it possible to split this huge files in smaller parts?

  12. Support Staff 12 Posted by Brett on 12 Jan, 2016 01:25 AM

    Brett's Avatar

    Table of contents can only be generated once per combined document. Any other smaller TOCs would have to be manually generated.

    The HTML file size is possibly because you're choosing to embed images in the exported file. Making them external would greatly reduce file size.

    My initial understanding was that your intent was pdf or doc, not html. What is your goal with multiple html files? That sounds a lot like where you started to me.

    Thanks,

    Brett

  13. 13 Posted by Hans on 12 Jan, 2016 07:54 AM

    Hans's Avatar

    >My initial understanding was that your intent was pdf or doc, not html. What is your goal with multiple html files? That sounds a lot like where you started to me.

    I want to offer the manual in two or three different forms: PDF and ePub for offline reading and HTML for online reading during the use of the software (quick reference).

    I thought that splitting up the HTML in smaller part would make the loading time acceptable. But I'll try how fast the loading time is when the images are external. I guess that they will be copied to the folder where I let Marked 2 create the index.HTML. I also assume that the images will be loaded as soon as they will be displayed, thus reducing the loading time of the manual.

    We'll see :). I was suggested to use a tool like Help & Manual, but I rather want to use Markdown (and Marked), because of its simplicity.

  14. Support Staff 14 Posted by Brett on 12 Jan, 2016 02:47 PM

    Brett's Avatar

    Marked won't copy images, nor will it provide "lazy loading" to improve render time. Splitting at least the major sections apart would be preferable, but Marked would require you to create index files for each section.

    For a split-up HTML version (which is well beyond Marked's intended use) you'd really be best off using a wiki like gollum or a static site generator that works with markdown.

    Thanks,

    Brett

  15. 15 Posted by Hans on 12 Jan, 2016 04:08 PM

    Hans's Avatar

    Thank you!

    That's really great to read that there exists a wiki system like Gollum that can process Markdown without further conversion.

    For the time being, I'll focus on the PDF output, since this is what users of the CafeTran wikidot always were asking for.

    I appreciate your great help!

    Hans

  16. 16 Posted by Hans on 17 Jan, 2016 04:42 PM

    Hans's Avatar

    I've started to write the TOC (structure) of the new CafeTran manual (http://CafeTran-Training.com) and was wondering:
    - Can I use Marked 2 to create an ePub version of my Markdown document?
    - Can the output to DOCX assign styles to the headings?

  17. Support Staff 17 Posted by Brett on 17 Jan, 2016 05:15 PM

    Brett's Avatar

    Marked does not currently export ePub, but there are conversion tools
    that can take Markdown that Marked exports and handle the book creation.

    DOCX is currently a glorified RTF export, and does not support element
    styles. This is being updated in the next couple of months to export
    fully styleable DOCX formats.

    -Brett

  18. 18 Posted by Hans on 17 Jan, 2016 05:18 PM

    Hans's Avatar

    >fully styleable DOCX formats

    I'm really impressed!

    Many thanks

  19. 19 Posted by Hans on 21 Jan, 2016 04:50 PM

    Hans's Avatar

    Is Marked very tolerant for syntax errors in Markdown?

    When I read this sentence again, I find it a little strange. After all: Markdown is about simplicity and syntax sounds like Latin and complex.

    The reason why I'm asking this is because I bought to apps for my iPhone 6. First Byword: it crashed all the time while opening a MD file that I created in TextWrangler and that looked okay in Marked.

    Then I bought 1Writer. It could open my MDs but the files looked strange. I had to insert an empty line after bulleted lists and before headings.

    In Marked this wasn't necessary. So is Marked more tolerant than 1Writer?

  20. Support Staff 20 Posted by Brett on 21 Jan, 2016 09:53 PM

    Brett's Avatar

    Every language has a syntax, even English :).

    Marked normalizes a certain amount of formatting that would throw one
    processor off but not another. In general, empty lines are highly
    encouraged around every element (headlines, lists, tables, paragraphs,
    etc.).

    Marked also offers two different built-in processors, MultiMarkdown and
    Discount, both of which will treat certain syntax in different ways. For
    the best compatibility across all processors, there are definitely rules
    you can take into account:
    http://brettterpstra.com/2015/08/24/write-better-markdown/

    -Brett

  21. 21 Posted by Hans on 23 Jan, 2016 07:23 AM

    Hans's Avatar

    That's a very niece text, Brett! Thanks for the link.

    I've now replaced all tabs with four spaces and set the display font in TextWrangler to Monaco (monospaced), so the tables look nice too.

    Is it possible to define external links to URLs in a new browser tab?

  22. Support Staff 22 Posted by Brett on 23 Jan, 2016 01:52 PM

    Brett's Avatar

    Not with Markdown, but you can use a snippet of JavaScript to do so on
    page load:

    <script>
    function externalLinks() {
       for(var c = document.getElementsByTagName("a"), a = 0;a <
    c.length;a++) {
         var b = c[a];
         b.getAttribute("href") && b.hostname !== location.hostname &&
    (b.target = "_blank")
       }
    }
    ;
    externalLinks();
    </script>

    That will find any links in the current page that link to other hosts
    and set the target attribute to "_blank".

  23. 23 Posted by Ovod on 22 Jul, 2021 11:34 AM

    Ovod's Avatar

    I've wrote some kind of offline convertor that handles most Wikidot syntax features:
    https://github.com/IlyaOvodov/wikidot2markdown

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Already uploaded files

  • Screen_Shot_2016-01-03_at_16.59.10.png 610 KB

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac