Brett on 25 Apr, 2017 07:26 PM
First off, are your scripts taking STDIN input? i.e., in a bash script,
you would want to `cat|tee` to ensure the STDIN continues. Assuming
that's all true, if you run `cat testfile.md|yourscript.py` on the
command line does it finish immediately? And is your script writing out
incrementally or all at once at the end?
1. It needs to pick up the STDIN
2. Nothing should be written out (STDOUT) until the script is complete
and ready to exit with a success code of zero
3. Writing to STDERR should be avoided unless there's actually an error
You can see the STDOUT/STDERR output of the running script in Marked
using Help->Show Custom Processor Log
As far as size, no, there's not an inherent limit, and definitely not
Yeah, the scripts take STDIN, write to STDOUT and they work correctly on the command line. Classic UNIX filter scripts.
I guess the culprit is your point 2. I didn't know that I couldn't work through the data incrementally (I have not seen anything about that neither in the Marked help nor any of your articles dealing with custom pre-processors). My script reads a line from STDIN, check if it needs special handling and, if not, write it to STDOUT, reads the next line from STDIN, etc.
I suppose that at this particular size the system buffering of STDOUT is full and gets flushed and thus Marked sees data coming from the pre-processor and stops sending and switches to reading instead?
I'll try to rewrite the script so that it gobbles the entirety of STDIN into memory and then work through that instead. I just thought that would unnecessarily slow things down but, at the sizes of normal documents, I guess it could be negligible.
Brett on 26 Apr, 2017 04:35 AM
If your script closes the STDOUT file handle in between writes, then
yes, the system will report that it's done writing after a couple of
microseconds. It's best to collect the output in a variable then write
it out at the end. Let me know if that works for you.
if __name__ == "__main__":
with sys.stdin as f:
SOURCE = [line for line in f]
logging.info("STDIN is "+str(sys.stdin))
logging.info("STDOUT is "+str(sys.stdout))
LINES = iter(SOURCE)
logging.info("Dumping the output to STDOUT")
for line in LINES:
and they have no problem with any documents. I tried to bolt that kind of buffering onto my existing script but it still do not want to cooperate so I guess I have something else wrong. Now that I know that the minimal buffered script works, I can rewrite the processing logic into it and see if I can get it to work.
Brett on 28 Apr, 2017 04:02 PM
You should just be able to replace your current print/write statements
with appending to an output array, then writing out line by line as you
have or just doing a join/dump at the end. I'm not well versed in
Python, but that's how the scripts I've written in it have worked.