Welcome to grep_vcf’s documentation!

User Guide

grep_vcf is a tiny tool to filter vcf file based on position file and vice et versa. The position file must be a tabulated file with a genomic position as first column. This tool is designed to support big files without consuming huge memory.

Usage

positional arguments:
positions The text file with the positions looking for in vcf file. It
must be a tsv file (https://en.wikipedia.org/wiki/Tab- separated_values).where position are in first column.Lines starting with ‘#’ are considering as comments.
optional arguments:
-h, --help show this help message and exit
--vcf VCF The path to the vcf file. By default grep_vcf search for the same path as position file but with ‘.vcf’ as extension.
--out OUT The path to an output file, default is stdout. If the file exists, it will be replaced.
--invert, -v Invert the sense of matching, to select non-matching vcf lines.
--switch Filter position file to keep lines that position match in vcf
--version, -V Display version information and quit.

Requirements

grep_vcf need python >= 3.6 (tested with 3.6, 3.7 3.8)

Installation

pip install git@https://github.com/bneron/grep_vcf.git#egg=grep_vcf

Developer Guide

Installation

The recommend way to install grep_vcf is to use a virtualenv:

python -m venv grep_vcf
cd grep_vcf
source bin/activate
git clone https://github.com/bneron/grep_vcf.git
cd grep_vcf
pip install -e .[dev]

Overview

There are 2 main files
  • grep_vcf/grep_vcf.py which is the module
  • grep_vcf/scripts/grep_vcf.py which is the entrypoint to run grep_vcf from command line.

API

Module API

The module contains mainly two functions

  • match_generator that allow to keep lines with a given position in target file based
    on position found in reference file.
  • invert_match_generator which that allow to filter out lines with a given position in target file based
    on position found in reference file.

These tow functions are generators to try to work in constant memory even with big files.

Note

in both cases line starting with # are considering as comments and are ignored.

The other functions are helpers.

grep_vcf.grep_vcf._parse_line(file)[source]

Go to next line and parse it, extract the first field and transform it in int. Ignore comments (line starting with #)

Parameters:

file (a file object) – the file to parse. it must be a tsv file with an integer as first column.

Returns:

the position parsed

Return type:

int

Raises:
  • StopIteration – when reach the end of file
  • ValueError – when first column can not be cast in an integer
grep_vcf.grep_vcf._until_the_end(file)[source]

Iterate over lines until the end of file. Skip line starting with ‘#’

Parameters:file – the file to iterate over
Returns:lines
Return type:str
grep_vcf.grep_vcf.invert_match_generator(ref_file, target_file)[source]

create a generator which can iterate over line in target_file where position not appear in reference file the position are extract from the first column of ref_file and target_file.

Parameters:
  • ref_file (file object) – the text file to extract
  • target_file (file object) – the vcf to compare
Returns:

a generator

Return type:

generator

grep_vcf.grep_vcf.match_generator(ref_file, target_file)[source]

create a generator which can iterate over line in target_file where position not appear in reference file the position are extract from the first column of ref_file and target_file.

Parameters:
  • ref_file (file object) – the text file to extract
  • target_file (file object) – the vcf to compare
Returns:

a generator

Return type:

generator

Scripts API

grep_vcf.scripts.grep_vcf.get_version_message()[source]
Returns:the version informations
Return type:str
grep_vcf.scripts.grep_vcf.main(args=None)[source]
Parameters:
  • args – the arguments to use to run
  • args – list of str
grep_vcf.scripts.grep_vcf.parse_args(args)[source]
Parameters:args (List of strings [without the program name]) – The arguments provided on the command line
Returns:The arguments parsed
Return type:aprgparse.Namespace object.

Indices and tables