annoq-py (Python Library)
A Python package for programmatically accessing SNP data from the AnnoQ API.
Installation
Install directly from GitHub using pip:
pip install git+https://github.com/USCbiostats/annoq-py.git
Requirements
- Python 3.7 or higher
Quick Start
import annoq
# Get available SNP attributes
attributes = annoq.get_snp_attributes()
# Search SNPs on chromosome 1
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=100000,
fields=["chr", "pos", "ref", "alt", "rs_dbSNP151"]
)
Core Functions
The package provides 7 main functions organized into three categories:
Attribute Discovery
get_snp_attributes()- List all available SNP attributes
SNP Retrieval
get_snps_by_chr()- Query by chromosome and position rangeget_snps_by_rsid_list()- Query by RSID identifiersget_snps_by_gene_product()- Query by gene information
SNP Counting
count_snps_by_chr()- Count SNPs by chromosomecount_snps_by_rsid_list()- Count SNPs by RSID listcount_snps_by_gene_product()- Count SNPs by gene
Detailed Usage
1. Getting SNP Attributes
Retrieve the list of all available SNP attributes that can be queried.
import annoq
# Get all available attributes
attributes = annoq.get_snp_attributes()
# attributes is a list of dictionaries with attribute metadata
for attr in attributes:
print(f"{attr['label']}: {attr['description']}")
2. Querying SNPs by Chromosome
Search for SNPs within a specific chromosome region.
Basic Usage
# Query chromosome 1 from position 1 to 100,000 and get basic fields
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=100000,
fields=["chr", "pos", "ref", "alt", "rs_dbSNP151"]
)
# Query the X chromosome from position 1,000 to 50,000 and get basic default fields
snps = annoq.get_snps_by_chr(
chromosome_identifier="X",
start_position=1000,
end_position=50000
)
Selecting Specific Fields
You can specify which fields to return in three different ways:
As a list of field names:
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=10000,
fields=["chr", "pos", "ref", "alt", "rs_dbSNP151"]
)
As a string config exported from AnnoQ:
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=10000,
fields='{"_source":["chr", "pos", "ref", "alt", "rs_dbSNP151"]}'
)
From a JSON config exported from AnnoQ:
# Export the config file: config.txt from AnnoQ
# {"_source":["chr", "pos", "ref", "alt", "rs_dbSNP151"]}
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=10000,
fields="/path/to/config.txt"
)
Note: The maximum number of fields you can request is 20. For more fields you can make multiple queries and combine the results.
Filtering by Non-Empty Fields
Return only SNPs where specific annotation fields have values:
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=100000,
filter_fields=["ANNOVAR_ucsc_Transcript_ID", "VEP_ensembl_Gene_ID"]
)
Pagination
By default, the API returns 1,000 results per page with a maximum of 10,000 results across all pages.
# Get first 500 results
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=1000000,
pagination_from=0,
pagination_size=500
)
# Get next 500 results
snps_page2 = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=1000000,
pagination_from=500,
pagination_size=500
)
# Note: pagination_from + pagination_size must be <= 10,000
Fetching All Results
To retrieve all matching SNPs (up to 1,000,000), use fetch_all=True:
# This will download all matching SNPs
all_snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=100000,
fetch_all=True
)
# When fetch_all=True, the pagination parameters are ignored
Important: When fetch_all=True, the function downloads a lot of data in a different format and may take a long time for large result sets.
3. Querying SNPs by RSID
Search for SNPs using RSID identifiers.
Basic Usage
# Using a comma-separated string
snps = annoq.get_snps_by_rsid_list(
rsid_list="rs1219648,rs2912774,rs2981582"
)
# Using a list
snps = annoq.get_snps_by_rsid_list(
rsid_list=["rs1219648", "rs2912774", "rs2981582"]
)
With Custom Fields
snps = annoq.get_snps_by_rsid_list(
rsid_list=["rs1219648", "rs2912774", "rs2981582"],
fields=["chr", "pos", "ref", "alt", "rs_dbSNP151"]
)
With Filtering
snps = annoq.get_snps_by_rsid_list(
rsid_list="rs1219648,rs2912774,rs2981582",
filter_fields=["VEP_ensembl_Gene_ID"],
pagination_from=0,
pagination_size=100
)
Fetching All Matching RSIDs
# Get all SNPs for a large list of RSIDs
all_snps = annoq.get_snps_by_rsid_list(
rsid_list=["rs1219648", "rs2912774", "rs2981582", "rs123456", "rs789012"],
fetch_all=True
)
4. Querying SNPs by Gene Product
Search for SNPs associated with a gene using gene ID, gene symbol, or UniProt ID.
Basic Usage
# Search by gene symbol
snps = annoq.get_snps_by_gene_product(gene="BRCA1")
# Search by gene ID or UniProt ID
snps = annoq.get_snps_by_gene_product(gene="ENSG00000012048")
With Custom Fields and Filtering
snps = annoq.get_snps_by_gene_product(
gene="TP53",
fields=["chr", "pos", "ref", "alt", "rs_dbSNP151"],
filter_fields=["ANNOVAR_ucsc_Transcript_ID"]
)
With Pagination
# Get first 500 SNPs for a gene
snps = annoq.get_snps_by_gene_product(
gene="APOE",
pagination_from=0,
pagination_size=500
)
Fetching All Gene-Associated SNPs
# Get all SNPs associated with a gene
all_snps = annoq.get_snps_by_gene_product(
gene="ZMYND11",
fetch_all=True
)
5. Counting SNPs
Count functions return the number of matching SNPs without retrieving the actual data.
Count by Chromosome
# Count all SNPs in a region
count = annoq.count_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=100000
)
print(f"Found {count} SNPs")
# Count with filters
count = annoq.count_snps_by_chr(
chromosome_identifier="X",
start_position=1000,
end_position=50000,
filter_fields=["VEP_ensembl_Gene_ID", "ANNOVAR_ucsc_Transcript_ID"]
)
Count by RSID List
# Count matching RSIDs
count = annoq.count_snps_by_rsid_list(
rsid_list=["rs1219648", "rs2912774", "rs2981582"]
)
# Count with filters
count = annoq.count_snps_by_rsid_list(
rsid_list="rs1219648,rs2912774,rs2981582",
filter_fields=["ANNOVAR_ucsc_Transcript_ID"]
)
Count by Gene Product
# Count SNPs for a gene
count = annoq.count_snps_by_gene_product(gene="BRCA1")
# Count with filters
count = annoq.count_snps_by_gene_product(
gene="TP53",
filter_fields=["VEP_ensembl_Gene_ID"]
)
Common Patterns
Example 1: Progressive Filtering
# First, count to see how many SNPs match
total = annoq.count_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=1000000
)
print(f"Total SNPs: {total}")
# Count with filters applied
filtered_count = annoq.count_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=1000000,
filter_fields=["VEP_ensembl_Gene_ID"]
)
print(f"Filtered SNPs: {filtered_count}")
# Retrieve the filtered data
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=1000000,
filter_fields=["VEP_ensembl_Gene_ID"],
fields=["chr", "pos", "ref", "alt", "rs_dbSNP151", "VEP_ensembl_Gene_ID"]
)
Example 2: Working with Large Datasets
# For large regions, first check the count
count = annoq.count_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=10000000
)
if count > 1000000:
print(f"Warning: {count} SNPs found. Consider narrowing your search.")
elif count > 10000:
# Use fetch_all for counts between 10K and 100K
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=10000000,
fetch_all=True
)
else:
# Use regular pagination for smaller datasets
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=10000000,
pagination_size=count # Get all in one go
)
Example 3: Gene-Focused Analysis
# Get all SNPs for multiple genes
genes = ["BRCA1", "BRCA2", "TP53"]
all_gene_snps = {}
for gene in genes:
count = annoq.count_snps_by_gene_product(gene=gene)
print(f"{gene}: {count} SNPs")
all_gene_snps[gene] = annoq.get_snps_by_gene_product(
gene=gene,
fields=["chr", "pos", "ref", "alt", "rs_dbSNP151"],
fetch_all=True
)
Example 4: Batch RSID Lookup
# Read RSIDs from a file
with open("rsid_list.txt", "r") as f:
rsids = [line.strip() for line in f if line.strip()]
# Check how many exist in the database
count = annoq.count_snps_by_rsid_list(rsid_list=rsids)
print(f"{count} out of {len(rsids)} RSIDs found")
# Retrieve all matching SNPs
snps = annoq.get_snps_by_rsid_list(
rsid_list=rsids,
fields=["chr", "pos", "ref", "alt", "rs_dbSNP151"],
fetch_all=True
)
Important Limitations
Pagination Constraints
- Regular queries: Maximum of 10,000 results across all pages (
pagination_from + pagination_size ≤ 10,000) - Fetch all queries: Maximum of 1,000,000 total results.
- Note: For large datasets, the results may be too large and could lead to performance issues. It is recommended to narrow down the query if possible.
Field Selection
- Maximum of 20 fields can be requested per query
- Use the
get_snp_attributes()function to see all available fields
Rate Limiting
- The API may implement rate limiting for excessive requests
- Use count functions before large retrievals to estimate data size
Error Handling
All functions raise exceptions for common error cases:
try:
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=100000,
pagination_from=9500,
pagination_size=1000 # This exceeds the 10,000 limit
)
except ValueError as e:
print(f"Pagination error: {e}")
try:
snps = annoq.get_snps_by_chr(
chromosome_identifier="1",
start_position=1,
end_position=100000,
fields="/nonexistent/file.json"
)
except ValueError as e:
print(f"File error: {e}")
try:
snps = annoq.get_snps_by_chr(
chromosome_identifier="invalid",
start_position=1,
end_position=100000
)
except requests.exceptions.HTTPError as e:
print(f"API error: {e}")
Contributing
Contributions are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
License
This package is licensed under the MIT License.
Support
For questions or issues related to AnnoQ itself, please visit the site AnnoQ