Problems with custom preprocessor

paul.reinerfelt's Avatar

paul.reinerfelt

25 Apr, 2017 06:55 PM

Hi!

I was trying to write a custom pre-processor for Marked 2 in Python. However, everytime I tried it out I just got an infinite spinner in the lower right hand corner. (Log shows "Processing using custom preprocessor" and "Launching pre-render task" but never "Custom Preprocessor Finished".) Obviously I first thought I had done some stupid mistake myself and tried a lot of tweaks to get any results. Then I started disabling more and more of the code until it basically was just copying stdin to stdout (expensively). Still infinite spinner! I added logging, writing every line copied to a file, and then I realised that I didn't get the whole source document! (Nor did I get an end-of-file on stdin.) My code was standing still, waiting for more text from Marked 2 but never getting it! That can't be right!

I wrote a dummy pre-processor, a bash script that (apart from shebang-line) only contains a tee to a logfile and tried it out. Same result. Infinite spinner.

But I did get some of the document, could there be a cut-off point? I started running through my mmd-documents in size order. The answer was yes, short documents do finish (near instantly, of course) but at a certain point it becomes infinite spinner instead. That cut-off point seems to be somewhere between 1527 and 1547 bytes! I.e. I have a document at 1527 bytes that runs through like it should and one at 1547 bytes that gets the infinite spinner.

I'm not quite sure what is going on but it seems there is something wrong in Marked 2. It looks like something causes it to cease to move data through the pipe but why at those sizes? The numbers don't look that suspicious. (Unless it, for some reason, manages three 512 byte buffers but not four or more?)

Thanks for an otherwise excellent application! (Seriously, nvAlt and Marked 2 are indispensable in my daily workflow!)
/Paul

  1. Support Staff 1 Posted by Brett on 25 Apr, 2017 07:26 PM

    Brett's Avatar

    First off, are your scripts taking STDIN input? i.e., in a bash script,
    you would want to `cat|tee` to ensure the STDIN continues. Assuming
    that's all true, if you run `cat testfile.md|yourscript.py` on the
    command line does it finish immediately? And is your script writing out
    incrementally or all at once at the end?

    1. It needs to pick up the STDIN
    2. Nothing should be written out (STDOUT) until the script is complete
    and ready to exit with a success code of zero
    3. Writing to STDERR should be avoided unless there's actually an error

    You can see the STDOUT/STDERR output of the running script in Marked
    using Help->Show Custom Processor Log

    As far as size, no, there's not an inherent limit, and definitely not
    under 500k.

    -Brett

  2. 2 Posted by paul.reinerfelt on 26 Apr, 2017 04:07 AM

    paul.reinerfelt's Avatar

    Hi Brett! Thanks for the fast reply!

    Yeah, the scripts take STDIN, write to STDOUT and they work correctly on the command line. Classic UNIX filter scripts.

    I guess the culprit is your point 2. I didn't know that I couldn't work through the data incrementally (I have not seen anything about that neither in the Marked help nor any of your articles dealing with custom pre-processors). My script reads a line from STDIN, check if it needs special handling and, if not, write it to STDOUT, reads the next line from STDIN, etc.

    I suppose that at this particular size the system buffering of STDOUT is full and gets flushed and thus Marked sees data coming from the pre-processor and stops sending and switches to reading instead?

    I'll try to rewrite the script so that it gobbles the entirety of STDIN into memory and then work through that instead. I just thought that would unnecessarily slow things down but, at the sizes of normal documents, I guess it could be negligible.

    /Paul

  3. Support Staff 3 Posted by Brett on 26 Apr, 2017 04:35 AM

    Brett's Avatar

    If your script closes the STDOUT file handle in between writes, then
    yes, the system will report that it's done writing after a couple of
    microseconds. It's best to collect the output in a variable then write
    it out at the end. Let me know if that works for you.

    -Brett

  4. 4 Posted by paul.reinerfelt on 27 Apr, 2017 09:29 AM

    paul.reinerfelt's Avatar

    I seriously doubt that Python's standard print()-statement would close STDOUT in between writes! :-)

    But the incremental workflow is apparently the problem. I wrote minimal (pass-through) preprocessors in bash:

    #!/bin/bash
    cat > /Users/paulrein/dummy.log  
    cat /Users/paulrein/dummy.log
    

    and Python:

    #!/usr/local/opt/python3/Frameworks/Python.framework/Versions/3.6/bin/python3
    
    import sys  
    import logging  
    logging.basicConfig(filename='/Users/paulrein/Minimal.log', level=logging.DEBUG)
    
    if __name__ == "__main__":  
        with sys.stdin as f:
            SOURCE = [line for line in f]
        logging.info("Starting")
        logging.debug("Source: "+str(SOURCE))
        logging.info("STDIN is "+str(sys.stdin))
        logging.info("STDOUT is "+str(sys.stdout))
        LINES = iter(SOURCE)
    
        logging.info("Dumping the output to STDOUT")
        for line in LINES:
            sys.stdout.write(line)
        logging.info("Finished!")
        sys.exit(0)
    

    and they have no problem with any documents. I tried to bolt that kind of buffering onto my existing script but it still do not want to cooperate so I guess I have something else wrong. Now that I know that the minimal buffered script works, I can rewrite the processing logic into it and see if I can get it to work.

    Thanks for your help!

  5. Support Staff 5 Posted by Brett on 28 Apr, 2017 04:02 PM

    Brett's Avatar

    You should just be able to replace your current print/write statements
    with appending to an output array, then writing out line by line as you
    have or just doing a join/dump at the end. I'm not well versed in
    Python, but that's how the scripts I've written in it have worked.

    -Brett

Reply to this discussion

Internal reply

Formatting help / Preview (switch to plain text) No formatting (switch to Markdown)

Attaching KB article:

»

Attached Files

You can attach files up to 10MB

If you don't have an account yet, we need to confirm you're human and not a machine trying to post spam.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac