python - How to subprocess the files on console directly (with or without using StringIO)? -
i trying read gtf file , edit (using subprocess, grep , awk) before loading pandas.
i have file name has header info (indicated #
), need grep , remove first. can in python want introduce grep
pipeline make processing more efficient.
i tried doing:
import subprocess io import stringio gtf_file = open('chr2_only.gtf', 'r').read() gtf_update = subprocess.popen(["grep '^#' " + stringio(gtf_file)], shell=true)
and
gtf_update = subprocess.popen(["grep '^#' " + gtf_file], shell=true)
both of these codes throw error, 1st attempt was:
traceback (most recent call last): file "/home/everestial007/pycharmprojects/stitcher/phase-stitcher-markov/markov_final_test/phase_to_vcf.py", line 39, in <module> gtf_update = subprocess.popen(["grep '^#' " + stringio(gtf_file)], shell=true) typeerror: can't convert '_io.stringio' object str implicitly
however, if specify filename directly works:
gtf_update = subprocess.popen(["grep '^#' chr2_only.gtf"], shell=true)
and output is:
<subprocess.popen object @ 0x7fc12e5ea588> #!genome-build v.1.0 #!genome-version jgi8x #!genome-date 2008-12 #!genome-build-accession gca_000004255.1 #!genebuild-last-updated 2008-12
could please provide different examples problem this, , explain why getting error , why/how possible run subprocess directly on files loaded on console/memory?
i tried using subprocess
call, check_call, check_output, etc.
, i've gotten several different error messages, these:
oserror: [errno 7] argument list long
and
subprocess in python: file name long
here possible solution allows send string grep. essentially, declare in popen
constructor want communicate called program via stdin , stdout. send input via communicate , receive output return value communicate.
#!/usr/bin/python import subprocess gtf_file = open('chr2_only.gtf', 'r').read() gtf_update = subprocess.popen(["grep '^#' "], shell=true, stdin=subprocess.pipe, stdout=subprocess.pipe) # stdout, stderr (but latter empty) gtf_filtered, _ = gtf_update.communicate(gtf_file) print gtf_filtered
note wise not use shell=true
. therefore, popen line should written as
gtf_update = subprocess.popen(["grep", '^#'], shell=false, stdin=subprocess.pipe, stdout=subprocess.pipe)
the rationale don't need shell parse arguments single executable. avoid unnecessary overhead. better security point of view, @ least if argument potentially unsafe comes user (think of filename containing |
). (this not case here.)
note performance point of view, expect reading file directly grep
faster first reading file python, , sending grep.
Comments
Post a Comment