š Quick Start
Prerequisites
- Python 3.11+ installed
- API Token (provided via secure email)
- Data Formats: Multiple input formats supported (see below)
Supported Data Formats
| Format | Example | Use Case | Endpoints |
|---|---|---|---|
| H5AD Files | pbmc3k.h5ad |
Direct upload of single-cell data | All genomics + Bio module |
| GEO Accessions | GSE12345 |
Auto-download from NCBI GEO | /load |
| Direct URLs | https://example.com/data.h5ad |
Download from external sources | /load |
| Gene Lists (JSON) | ["CD3E", "CD8A", "CD4"] |
Encode cell type markers as queries | /query/encode |
| Text Queries | "T cell markers" |
Natural language searches | /query/encode |
⢠Genomics endpoints accept all formats above
⢠Bio module requires H5AD files only (for ResoTrace integration)
⢠GEO accessions and URLs are automatically converted to H5AD format
Installation
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install required packages
pip install requests scanpy anndata pandas numpy
First API Call
import requests
API_URL = "https://don-research-api.onrender.com"
TOKEN = "your-texas-am-token-here" # Replace with your token
headers = {"Authorization": f"Bearer {TOKEN}"}
# Health check
response = requests.get(f"{API_URL}/health", headers=headers)
print(response.json())
# Expected: {"status": "ok", "timestamp": "2025-10-24T..."}
{"status": "ok"}, you're connected and authenticated.
Format-Specific Examples
Example 1: H5AD File Upload
with open("pbmc3k.h5ad", "rb") as f:
files = {"file": ("pbmc3k.h5ad", f, "application/octet-stream")}
response = requests.post(
f"{API_URL}/api/v1/genomics/vectors/build",
headers=headers,
files=files,
data={"mode": "cluster"}
)
print(response.json())
Example 2: GEO Accession
# Automatically downloads from NCBI GEO
data = {"accession_or_path": "GSE12345"}
response = requests.post(
f"{API_URL}/api/v1/genomics/load",
headers=headers,
data=data
)
h5ad_path = response.json()["h5ad_path"]
Example 3: Gene List Query
import json
# T cell markers
gene_list = ["CD3E", "CD8A", "CD4", "IL7R"]
data = {"gene_list_json": json.dumps(gene_list)}
response = requests.post(
f"{API_URL}/api/v1/genomics/query/encode",
headers=headers,
data=data
)
query_vector = response.json()["psi"] # 128-dimensional vector
Example 4: Text Query
# Natural language query
data = {
"text": "T cell markers in PBMC tissue",
"cell_type": "T cell",
"tissue": "PBMC"
}
response = requests.post(
f"{API_URL}/api/v1/genomics/query/encode",
headers=headers,
data=data
)
query_vector = response.json()["psi"]
š System Overview
What is the DON Research API?
The DON (Distributed Order Network) Research API provides access to proprietary quantum-enhanced algorithms for genomics data compression and feature extraction. The system combines classical preprocessing with quantum-inspired compression to generate high-quality 128-dimensional feature vectors from single-cell RNA-seq data.
Core Technologies
- DON-GPU: Fractal clustering processor with 8Ć-128Ć compression ratios
- QAC (Quantum Adjacency Code): Multi-layer quantum error correction
- TACE: Temporal Adjacency Collapse Engine for quantum-classical feedback
Validated Performance (PBMC3k Dataset)
š Authentication
API Token
Token Format: Bearer token (JWT-style)
Security: Never commit tokens to Git or share publicly
Using Your Token
HTTP Headers (curl):
curl -H "Authorization: Bearer YOUR_TOKEN" \
https://don-research-api.onrender.com/health
Python requests:
headers = {"Authorization": f"Bearer {YOUR_TOKEN}"}
response = requests.get(url, headers=headers)
Environment Variable (recommended):
export DON_API_TOKEN="your-token-here"
import os
TOKEN = os.environ.get("DON_API_TOKEN")
š API Endpoints
Base URLs
- Production:
https://don-research-api.onrender.com - Interactive Docs: /docs (Swagger UI)
1. Health Check
GET /health
Verify API availability and authentication.
2. Build Feature Vectors
POST /api/v1/genomics/vectors/build
Generate 128-dimensional feature vectors from single-cell h5ad files.
Parameters:
file(required):.h5adfile upload (AnnData format)mode(optional):"cluster"(default) or"cell"- cluster: One vector per cell type cluster (recommended)
- cell: One vector per individual cell (for detailed analysis)
Example Request:
import requests
with open("data/pbmc3k.h5ad", "rb") as f:
files = {"file": ("pbmc3k.h5ad", f, "application/octet-stream")}
data = {"mode": "cluster"}
response = requests.post(
"https://don-research-api.onrender.com/api/v1/genomics/vectors/build",
headers={"Authorization": f"Bearer {TOKEN}"},
files=files,
data=data
)
result = response.json()
print(f"Built {result['count']} vectors")
print(f"Saved to: {result['jsonl']}")
Response:
{
"ok": true,
"mode": "cluster",
"jsonl": "/path/to/pbmc3k.cluster.jsonl",
"count": 10,
"preview": [
{
"vector_id": "pbmc3k.h5ad:cluster:0",
"psi": [0.929, 0.040, ...], // 128 dimensions
"space": "X_pca",
"metric": "cosine",
"type": "cluster",
"meta": {
"file": "pbmc3k.h5ad",
"cluster": "0",
"cells": 560,
"cell_type": "NA",
"tissue": "NA"
}
}
]
}
3. Encode Query Vector
POST /api/v1/genomics/query/encode
Convert biological queries (gene lists, cell types, tissues) into 128-dimensional vectors for searching.
Parameters:
gene_list_json: JSON array of gene symbols, e.g.,["CD3E", "CD8A", "CD4"]cell_type: Cell type name, e.g.,"T cell"tissue: Tissue name, e.g.,"PBMC"
Example:
import json
# T cell marker query
t_cell_genes = ["CD3E", "CD8A", "CD4"]
data = {"gene_list_json": json.dumps(t_cell_genes)}
response = requests.post(
"https://don-research-api.onrender.com/api/v1/genomics/query/encode",
headers={"Authorization": f"Bearer {TOKEN}"},
data=data
)
query_vector = response.json()["psi"] # 128-dimensional vector
4. Search Vectors
POST /api/v1/genomics/vectors/search
Find similar cell clusters or cells using cosine similarity search.
Parameters:
jsonl_path: Path to vectors JSONL file from/vectors/buildpsi: JSON array of 128 floats (query vector from/query/encode)k: Number of results to return (default: 10)
Distance Interpretation:
| Distance | Interpretation |
|---|---|
| 0.0 - 0.2 | Very similar (same cell type) |
| 0.2 - 0.5 | Similar (related cell types) |
| 0.5 - 0.8 | Moderately similar (different lineages) |
| 0.8+ | Dissimilar (unrelated cell types) |
5. Generate Entropy Map
POST /api/v1/genomics/entropy-map
Visualize cell-level entropy (gene expression diversity) on UMAP embeddings.
Parameters:
file:.h5adfile uploadlabel_key: Cluster column inadata.obs(default: auto-detect)
⢠Higher entropy: More diverse/complex expression patterns
⢠Lower entropy: More specialized/differentiated cell states
⢠Collapse metric: Quantum-inspired cell state stability measure
š¬ Workflow Examples
Example 1: Basic Cell Type Discovery
import requests
import json
API_URL = "https://don-research-api.onrender.com"
TOKEN = "your-token-here"
headers = {"Authorization": f"Bearer {TOKEN}"}
# Step 1: Build vectors
with open("my_dataset.h5ad", "rb") as f:
files = {"file": ("my_dataset.h5ad", f, "application/octet-stream")}
response = requests.post(
f"{API_URL}/api/v1/genomics/vectors/build",
headers=headers,
files=files,
data={"mode": "cluster"}
)
vectors_result = response.json()
jsonl_path = vectors_result["jsonl"]
print(f"ā Built {vectors_result['count']} cluster vectors")
# Step 2: Encode T cell query
t_cell_genes = ["CD3E", "CD8A", "CD4", "IL7R"]
query_data = {"gene_list_json": json.dumps(t_cell_genes)}
response = requests.post(
f"{API_URL}/api/v1/genomics/query/encode",
headers=headers,
data=query_data
)
query_vector = response.json()["psi"]
print(f"ā Encoded T cell query")
# Step 3: Search for matching clusters
search_data = {
"jsonl_path": jsonl_path,
"psi": json.dumps(query_vector),
"k": 5
}
response = requests.post(
f"{API_URL}/api/v1/genomics/vectors/search",
headers=headers,
data=search_data
)
results = response.json()["hits"]
print(f"\nā Top 5 T cell-like clusters:")
for i, hit in enumerate(results, 1):
cluster_id = hit['meta']['cluster']
distance = hit['distance']
cells = hit['meta']['cells']
print(f"{i}. Cluster {cluster_id}: distance={distance:.4f}, cells={cells}")
Common Cell Type Markers
| Cell Type | Marker Genes |
|---|---|
| T cells | CD3E, CD8A, CD4, IL7R |
| B cells | MS4A1, CD79A, CD19, IGHM |
| NK cells | NKG7, GNLY, KLRD1, NCAM1 |
| Monocytes | CD14, FCGR3A, CST3, LYZ |
| Dendritic cells | FCER1A, CST3, CLEC10A |
𧬠Bio Module - ResoTrace Integration
The Bio module provides advanced single-cell analysis workflows including artifact export, signal synchronization, QC parasite detection, and evolution tracking. All endpoints support both synchronous (immediate) and asynchronous (background job) execution modes.
⢠Sync/Async Modes: Choose immediate results or background processing
⢠Memory Logging: All operations tracked with trace_id for audit trails
⢠Job Polling: Monitor long-running async jobs via
/bio/jobs/{job_id}⢠Project Tracking: Retrieve all traces for a project via
/bio/memory/{project_id}
1. Export Artifacts
POST /api/v1/bio/export-artifacts
Convert H5AD files into ResoTrace-compatible collapse maps and vector collections. Exports cluster graphs, PAGA connectivity, and per-cell embeddings.
Parameters:
file(required):.h5adfile upload (AnnData format)cluster_key(required): Column inadata.obswith cluster labels (e.g., "leiden")latent_key(required): Embedding key inadata.obsm(e.g., "X_umap", "X_pca")paga_key(optional): PAGA connectivity key (default: None)sample_cells(optional): Max cells to export (default: all cells)sync(optional):truefor immediate,falsefor async (default: false)seed(optional): Random seed for reproducibility (default: 42)project_id(optional): Your project identifier for trackinguser_id(optional): User identifier for audit logging
Example (Synchronous):
import requests
with open("pbmc_processed.h5ad", "rb") as f:
files = {"file": ("pbmc.h5ad", f, "application/octet-stream")}
data = {
"cluster_key": "leiden",
"latent_key": "X_umap",
"sync": "true",
"project_id": "cai_lab_pbmc_study",
"user_id": "researcher_001"
}
response = requests.post(
f"{API_URL}/api/v1/bio/export-artifacts",
headers={"Authorization": f"Bearer {TOKEN}"},
files=files,
data=data
)
result = response.json()
print(f"ā Exported {result['nodes']} clusters")
print(f"ā {result['vectors']} cell vectors")
print(f"ā Artifacts: {result['artifacts']}")
print(f"ā Trace ID: {result.get('trace_id')}")
Example (Asynchronous):
# Submit job
with open("large_dataset.h5ad", "rb") as f:
files = {"file": ("data.h5ad", f, "application/octet-stream")}
data = {"cluster_key": "leiden", "latent_key": "X_pca", "sync": "false"}
response = requests.post(
f"{API_URL}/api/v1/bio/export-artifacts",
headers={"Authorization": f"Bearer {TOKEN}"},
files=files,
data=data
)
job_id = response.json()["job_id"]
print(f"Job ID: {job_id}")
# Poll job status
import time
while True:
status_response = requests.get(
f"{API_URL}/api/v1/bio/jobs/{job_id}",
headers={"Authorization": f"Bearer {TOKEN}"}
)
job = status_response.json()
status = job["status"]
print(f"Status: {status}")
if status == "completed":
result = job["result"]
print(f"ā Complete! Nodes: {result['nodes']}, Vectors: {result['vectors']}")
break
elif status == "failed":
print(f"ā Failed: {job.get('error')}")
break
time.sleep(2) # Poll every 2 seconds
Response (Sync):
{
"job_id": null,
"nodes": 8,
"edges": 0,
"vectors": 2700,
"artifacts": [
"collapse_map.json",
"collapse_vectors.jsonl"
],
"status": "completed",
"message": "Export completed successfully (trace: abc123...)"
}
2. Signal Sync
POST /api/v1/bio/signal-sync
Compute cross-artifact coherence and synchronization metrics between two collapse maps. Useful for comparing replicates or validating pipeline stability.
Parameters:
artifact1(required): First collapse_map.json fileartifact2(required): Second collapse_map.json filecoherence_threshold(optional): Minimum coherence for sync (default: 0.8)sync(optional): Execution mode (default: false)seed,project_id,user_id: As above
Example:
with open("run1_collapse_map.json", "rb") as f1, open("run2_collapse_map.json", "rb") as f2:
files = {
"artifact1": ("run1.json", f1, "application/json"),
"artifact2": ("run2.json", f2, "application/json")
}
data = {"coherence_threshold": "0.7", "sync": "true"}
response = requests.post(
f"{API_URL}/api/v1/bio/signal-sync",
headers={"Authorization": f"Bearer {TOKEN}"},
files=files,
data=data
)
result = response.json()
print(f"Coherence Score: {result['coherence_score']:.3f}")
print(f"Node Overlap: {result['node_overlap']:.3f}")
print(f"Synchronized: {result['synchronized']}")
Interpretation:
| Coherence Score | Interpretation |
|---|---|
| 0.9 - 1.0 | Excellent consistency (technical replicates) |
| 0.7 - 0.9 | Good consistency (biological replicates) |
| 0.5 - 0.7 | Moderate similarity (related conditions) |
| < 0.5 | Low similarity (different conditions/batches) |
3. Parasite Detector (QC)
POST /api/v1/bio/qc/parasite-detect
Detect quality control "parasites" including ambient RNA, doublets, and batch effects. Returns per-cell flags and overall contamination score.
Parameters:
file(required):.h5adfile uploadcluster_key(required): Cluster column inadata.obsbatch_key(required): Batch/sample column inadata.obsambient_threshold(optional): Ambient RNA cutoff (default: 0.15)doublet_threshold(optional): Doublet score cutoff (default: 0.25)batch_threshold(optional): Batch effect cutoff (default: 0.3)sync,seed,project_id,user_id: As above
Example:
with open("pbmc_raw.h5ad", "rb") as f:
files = {"file": ("pbmc.h5ad", f, "application/octet-stream")}
data = {
"cluster_key": "leiden",
"batch_key": "sample",
"ambient_threshold": "0.15",
"doublet_threshold": "0.25",
"sync": "true"
}
response = requests.post(
f"{API_URL}/api/v1/bio/qc/parasite-detect",
headers={"Authorization": f"Bearer {TOKEN}"},
files=files,
data=data
)
result = response.json()
print(f"Cells: {result['n_cells']}")
print(f"Flagged: {result['n_flagged']} ({result['n_flagged']/result['n_cells']*100:.1f}%)")
print(f"Parasite Score: {result['parasite_score']:.1f}%")
# Flags: list of booleans, one per cell
print(f"\nFlagged cell indices: {[i for i, f in enumerate(result['flags']) if f][:10]}...")
Recommended Actions:
| Parasite Score | Action |
|---|---|
| 0-5% | Excellent quality - proceed |
| 5-15% | Good quality - minor filtering recommended |
| 15-30% | Moderate contamination - filter flagged cells |
| > 30% | High contamination - review QC pipeline |
4. Evolution Report
POST /api/v1/bio/evolution/report
Compare two pipeline runs to assess stability, drift, and reproducibility. Useful for parameter optimization and batch effect validation.
Parameters:
run1_file(required): First run H5AD filerun2_file(required): Second run H5AD filerun2_name(required): Name/label for second runcluster_key(required): Cluster columnlatent_key(required): Embedding keysync,seed,project_id,user_id: As above
Example:
with open("run1_leiden05.h5ad", "rb") as f1, open("run2_leiden10.h5ad", "rb") as f2:
files = {
"run1_file": ("run1.h5ad", f1, "application/octet-stream"),
"run2_file": ("run2.h5ad", f2, "application/octet-stream")
}
data = {
"run2_name": "leiden_resolution_1.0",
"cluster_key": "leiden",
"latent_key": "X_pca",
"sync": "true"
}
response = requests.post(
f"{API_URL}/api/v1/bio/evolution/report",
headers={"Authorization": f"Bearer {TOKEN}"},
files=files,
data=data
)
result = response.json()
print(f"Run 1: {result['run1_name']} ({result['n_cells_run1']} cells)")
print(f"Run 2: {result['run2_name']} ({result['n_cells_run2']} cells)")
print(f"Stability Score: {result['stability_score']:.1f}%")
print(f"\nDelta Metrics: {result['delta_metrics']}")
Stability Score Interpretation:
| Score | Interpretation |
|---|---|
| > 90% | Excellent stability (robust pipeline) |
| 70-90% | Good stability (acceptable variation) |
| 50-70% | Moderate drift (review parameters) |
| < 50% | High drift (investigate batch effects) |
Job Management
Poll Job Status
GET /api/v1/bio/jobs/{job_id}
response = requests.get(
f"{API_URL}/api/v1/bio/jobs/{job_id}",
headers={"Authorization": f"Bearer {TOKEN}"}
)
job = response.json()
# Fields: job_id, endpoint, status, created_at, completed_at, result, error
Retrieve Project Memory
GET /api/v1/bio/memory/{project_id}
response = requests.get(
f"{API_URL}/api/v1/bio/memory/cai_lab_pbmc_study",
headers={"Authorization": f"Bearer {TOKEN}"}
)
memory = response.json()
print(f"Project: {memory['project_id']}")
print(f"Total Traces: {memory['count']}")
for trace in memory['traces']:
print(f" {trace['event_type']}: {trace['metrics']}")
⢠Use
sync=true for small datasets (<5k cells)⢠Use
sync=false for large datasets (>10k cells)⢠Set
project_id to group related operations⢠Monitor
trace_id for debugging and audit trails⢠Check
parasite_score before downstream analysis
𧬠Understanding the Output
128-Dimensional Vector Structure
| Dimensions | Content | Purpose |
|---|---|---|
| 0-15 | Entropy signature | Gene expression distribution (16 bins) |
| 16 | HVG fraction | % of highly variable genes expressed |
| 17 | Mitochondrial % | Cell quality indicator |
| 18 | Total counts | Library size (normalized) |
| 22 | Silhouette score | Cluster separation quality (-1 to 1) |
| 27 | Purity score | Neighborhood homogeneity (0 to 1) |
| 28-127 | Biological tokens | Hashed cell type & tissue features |
Compression Example (PBMC3k)
- Raw data: 2,700 cells Ć 13,714 genes = 37,027,800 values
- Cluster vectors: 10 clusters Ć 128 dims = 1,280 values
- Compression ratio: 28,928Ć reduction
- Information retention: ~85-90% (via silhouette scores)
š§ Troubleshooting
Common Errors
1. Authentication Failed (401)
{"detail": "Invalid or missing token"}Solutions:
- Verify token is correct (check for extra spaces)
- Ensure
Authorization: Bearer TOKENheader format - Contact support if token expired
2. File Upload Failed (400)
{"detail": "Expected .h5ad file"}Solutions:
- Verify file extension is
.h5ad - Check file is valid AnnData:
sc.read_h5ad("file.h5ad") - Ensure file size < 500MB
3. Rate Limit Exceeded (429)
{"detail": "Rate limit exceeded"}Solutions:
- Wait 1 hour for rate limit reset
- Implement exponential backoff in your code
- Contact support for higher limits if needed
Best Practices
- Use cluster mode by default - Cell mode generates too many vectors for most use cases
- Cache query vectors - Encode once, search multiple times
- Preprocess h5ad files locally - Filter low-quality cells before uploading
- Monitor rate limits - Track API calls to stay under 1,000/hour
Data Preparation Tips
import scanpy as sc
# Quality control
adata = sc.read_h5ad("raw_data.h5ad")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata = adata[adata.obs['pct_counts_mt'] < 5, :]
# Normalization
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)
# Save cleaned data
adata.write_h5ad("cleaned_data.h5ad")
š Support & Resources
Research Liaison
research.request@donsystems.com
General inquiries, collaboration requests, token provisioning
Documentation Resources
- API Reference: Interactive Swagger UI
- GitHub Examples: github.com/DONSystemsLLC/don-research-api
- Research Paper: DON Stack: Quantum-Enhanced Genomics Compression (in preparation)
Office Hours
Monday-Friday, 9 AM - 5 PM CST
Response Time: < 24 hours for technical issues
Reporting Issues
Please include:
- Institution name (Texas A&M)
- API endpoint and parameters used
- Full error message (JSON response)
- Sample data file (if < 10MB) or description
- Expected vs. actual behavior