AnnoQR (R Package)
An R package for programmatically accessing SNP data from the AnnoQ API.
Installation
Install directly from GitHub using the devtools package:
install.packages("devtools")
devtools::install_github("USCbiostats/AnnoQR")
Requirements
- R 3.5 or higher
- Required packages:
httr,jsonlite(automatically installed)
Quick Start
library(AnnoQR)
# Get available SNP attributes
attributes <- snpAttributesQuery()
# Search SNPs on chromosome 1
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 100000,
fields = c("chr", "pos", "ref", "alt", "rs_dbSNP151")
)
Core Functions
The package provides 7 main functions organized into three categories:
Attribute Discovery
snpAttributesQuery()- List all available SNP attributes
SNP Retrieval
regionQuery()- Query by chromosome and position rangersidsQuery()- Query by RSID identifiersgeneQuery()- Query by gene information
SNP Counting
countRegionQuery()- Count SNPs by chromosomecountRsidsQuery()- Count SNPs by RSID listcountGeneQuery()- Count SNPs by gene
Detailed Usage
1. Getting SNP Attributes
Retrieve the list of all available SNP attributes that can be queried.
library(AnnoQR)
# Get all available attributes
attributes <- snpAttributesQuery()
# attributes is a list of attribute metadata
for (i in seq_along(attributes)) {
cat(sprintf("%s: %s\n", attributes[[i]]$label, attributes[[i]]$description))
}
2. Querying SNPs by Chromosome
Search for SNPs within a specific chromosome region.
Basic Usage
# Query chromosome 1 from position 1 to 100,000 and get basic fields
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 100000,
fields = c("chr", "pos", "ref", "alt", "rs_dbSNP151")
)
# Query the X chromosome from position 1,000 to 50,000 and get default fields
snps <- regionQuery(
chromosome_identifier = "X",
start_position = 1000,
end_position = 50000
)
Selecting Specific Fields
You can specify which fields to return in three different ways:
As a vector of field names:
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 10000,
fields = c("chr", "pos", "ref", "alt", "rs_dbSNP151")
)
As a string config exported from AnnoQ:
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 10000,
fields = '{"_source":["chr", "pos", "ref", "alt", "rs_dbSNP151"]}'
)
From a JSON config exported from AnnoQ:
# Export the config file: config.txt from Annoq
# {"_source":["chr", "pos", "ref", "alt", "rs_dbSNP151"]}
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 10000,
fields = "/path/to/config.txt"
)
Note: The maximum number of fields you can request is 20. For more fields you can make multiple queries and combine the results.
Filtering by Non-Empty Fields
Return only SNPs where specific annotation fields have values:
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 100000,
filter_fields = c("ANNOVAR_ucsc_Transcript_ID", "VEP_ensembl_Gene_ID")
)
Pagination
By default, the API returns 1,000 results per page with a maximum of 10,000 results across all pages.
# Get first 500 results
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 1000000,
pagination_from = 0,
pagination_size = 500
)
# Get next 500 results
snps_page2 <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 1000000,
pagination_from = 500,
pagination_size = 500
)
# Note: pagination_from + pagination_size must be <= 10,000
Fetching All Results
To retrieve all matching SNPs (up to 1,000,000), use fetch_all = TRUE:
# This will download all matching SNPs
all_snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 100000,
fetch_all = TRUE
)
# When fetch_all = TRUE, the pagination parameters are ignored
Important: When fetch_all = TRUE, the function downloads a lot of data in a different format and may take a long time for large result sets.
3. Querying SNPs by RSID
Search for SNPs using RSID identifiers.
Basic Usage
# Using a comma-separated string
snps <- rsidsQuery(
rsid_list = "rs1219648,rs2912774,rs2981582"
)
# Using a vector
snps <- rsidsQuery(
rsid_list = c("rs1219648", "rs2912774", "rs2981582")
)
With Custom Fields
snps <- rsidsQuery(
rsid_list = c("rs1219648", "rs2912774", "rs2981582"),
fields = c("chr", "pos", "ref", "alt", "rs_dbSNP151")
)
With Filtering
snps <- rsidsQuery(
rsid_list = "rs1219648,rs2912774,rs2981582",
filter_fields = c("VEP_ensembl_Gene_ID"),
pagination_from = 0,
pagination_size = 100
)
Fetching All Matching RSIDs
# Get all SNPs for a large list of RSIDs
all_snps <- rsidsQuery(
rsid_list = c("rs1219648", "rs2912774", "rs2981582", "rs123456", "rs789012"),
fetch_all = TRUE
)
4. Querying SNPs by Gene Product
Search for SNPs associated with a gene using gene ID, gene symbol, or UniProt ID.
Basic Usage
# Search by gene symbol
snps <- geneQuery(gene = "BRCA1")
# Search by gene ID or UniProt ID
snps <- geneQuery(gene = "ENSG00000012048")
With Custom Fields and Filtering
snps <- geneQuery(
gene = "TP53",
fields = c("chr", "pos", "ref", "alt", "rs_dbSNP151"),
filter_fields = c("ANNOVAR_ucsc_Transcript_ID")
)
With Pagination
# Get first 500 SNPs for a gene
snps <- geneQuery(
gene = "APOE",
pagination_from = 0,
pagination_size = 500
)
Fetching All Gene-Associated SNPs
# Get all SNPs associated with a gene
all_snps <- geneQuery(
gene = "ZMYND11",
fetch_all = TRUE
)
5. Counting SNPs
Count functions return the number of matching SNPs without retrieving the actual data.
Count by Chromosome
# Count all SNPs in a region
count <- countRegionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 100000
)
cat(sprintf("Found %d SNPs\n", count))
# Count with filters
count <- countRegionQuery(
chromosome_identifier = "X",
start_position = 1000,
end_position = 50000,
filter_fields = c("VEP_ensembl_Gene_ID", "ANNOVAR_ucsc_Transcript_ID")
)
Count by RSID List
# Count matching RSIDs
count <- countRsidsQuery(
rsid_list = c("rs1219648", "rs2912774", "rs2981582")
)
# Count with filters
count <- countRsidsQuery(
rsid_list = "rs1219648,rs2912774,rs2981582",
filter_fields = c("ANNOVAR_ucsc_Transcript_ID")
)
Count by Gene Product
# Count SNPs for a gene
count <- countGeneQuery(gene = "BRCA1")
# Count with filters
count <- countGeneQuery(
gene = "TP53",
filter_fields = c("VEP_ensembl_Gene_ID")
)
Common Patterns
Example 1: Progressive Filtering
# First, count to see how many SNPs match
total <- countRegionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 1000000
)
cat(sprintf("Total SNPs: %d\n", total))
# Count with filters applied
filtered_count <- countRegionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 1000000,
filter_fields = c("VEP_ensembl_Gene_ID")
)
cat(sprintf("Filtered SNPs: %d\n", filtered_count))
# Retrieve the filtered data
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 1000000,
filter_fields = c("VEP_ensembl_Gene_ID"),
fields = c("chr", "pos", "ref", "alt", "rs_dbSNP151", "VEP_ensembl_Gene_ID")
)
Example 2: Working with Large Datasets
# For large regions, first check the count
count <- countRegionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 10000000
)
if (count > 1000000) {
cat(sprintf("Warning: %d SNPs found. Consider narrowing your search.\n", count))
} else if (count > 10000) {
# Use fetch_all for counts between 10K and 1M
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 10000000,
fetch_all = TRUE
)
} else {
# Use regular pagination for smaller datasets
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 10000000,
pagination_size = count # Get all in one go
)
}
Example 3: Gene-Focused Analysis
# Get all SNPs for multiple genes
genes <- c("BRCA1", "BRCA2", "TP53")
all_gene_snps <- list()
for (gene in genes) {
count <- countGeneQuery(gene = gene)
cat(sprintf("%s: %d SNPs\n", gene, count))
all_gene_snps[[gene]] <- geneQuery(
gene = gene,
fields = c("chr", "pos", "ref", "alt", "rs_dbSNP151"),
fetch_all = TRUE
)
}
Example 4: Batch RSID Lookup
# Read RSIDs from a file
rsids <- readLines("rsid_list.txt")
rsids <- rsids[nchar(rsids) > 0] # Remove empty lines
# Check how many exist in the database
count <- countRsidsQuery(rsid_list = rsids)
cat(sprintf("%d out of %d RSIDs found\n", count, length(rsids)))
# Retrieve all matching SNPs
snps <- rsidsQuery(
rsid_list = rsids,
fields = c("chr", "pos", "ref", "alt", "rs_dbSNP151"),
fetch_all = TRUE
)
Important Limitations
Pagination Constraints
- Regular queries: Maximum of 10,000 results across all pages (
pagination_from + pagination_size <= 10,000) - Fetch all queries: Maximum of 1,000,000 total results
- Note: For large datasets, the results may be too large and could lead to performance issues. It is recommended to narrow down the query if possible.
Field Selection
- Maximum of 20 fields can be requested per query
- Use the
snpAttributesQuery()function to see all available fields
Rate Limiting
- The API may implement rate limiting for excessive requests
- Use count functions before large retrievals to estimate data size
Error Handling
All functions raise errors for common error cases:
# Pagination error
tryCatch({
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 100000,
pagination_from = 9500,
pagination_size = 1000 # This exceeds the 10,000 limit
)
}, error = function(e) {
cat(sprintf("Pagination error: %s\n", e$message))
})
# File error
tryCatch({
snps <- regionQuery(
chromosome_identifier = "1",
start_position = 1,
end_position = 100000,
fields = "/nonexistent/file.json"
)
}, error = function(e) {
cat(sprintf("File error: %s\n", e$message))
})
# API error
tryCatch({
snps <- regionQuery(
chromosome_identifier = "invalid",
start_position = 1,
end_position = 100000
)
}, error = function(e) {
cat(sprintf("API error: %s\n", e$message))
})
Contributing
Contributions are welcome! If you encounter any issues or have suggestions for improvements, please open an issue or submit a pull request on the GitHub repository.
License
This package is licensed under the MIT License.
Support
For questions or issues related to AnnoQ itself, please visit the site AnnoQ