DON Research API - User Guide

Quantum-Enhanced Genomics for Academic Research | Texas A&M Cai Lab

šŸš€ Quick Start

Prerequisites

Supported Data Formats

Format Example Use Case Endpoints
H5AD Files pbmc3k.h5ad Direct upload of single-cell data All genomics + Bio module
GEO Accessions GSE12345 Auto-download from NCBI GEO /load
Direct URLs https://example.com/data.h5ad Download from external sources /load
Gene Lists (JSON) ["CD3E", "CD8A", "CD4"] Encode cell type markers as queries /query/encode
Text Queries "T cell markers" Natural language searches /query/encode
šŸ“Œ Format Notes:
• Genomics endpoints accept all formats above
• Bio module requires H5AD files only (for ResoTrace integration)
• GEO accessions and URLs are automatically converted to H5AD format

Installation

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install required packages
pip install requests scanpy anndata pandas numpy

First API Call

import requests

API_URL = "https://don-research-api.onrender.com"
TOKEN = "your-texas-am-token-here"  # Replace with your token

headers = {"Authorization": f"Bearer {TOKEN}"}

# Health check
response = requests.get(f"{API_URL}/health", headers=headers)
print(response.json())
# Expected: {"status": "ok", "timestamp": "2025-10-24T..."}
āœ“ System Ready! If health check returns {"status": "ok"}, you're connected and authenticated.

Format-Specific Examples

Example 1: H5AD File Upload

with open("pbmc3k.h5ad", "rb") as f:
    files = {"file": ("pbmc3k.h5ad", f, "application/octet-stream")}
    response = requests.post(
        f"{API_URL}/api/v1/genomics/vectors/build",
        headers=headers,
        files=files,
        data={"mode": "cluster"}
    )
print(response.json())

Example 2: GEO Accession

# Automatically downloads from NCBI GEO
data = {"accession_or_path": "GSE12345"}
response = requests.post(
    f"{API_URL}/api/v1/genomics/load",
    headers=headers,
    data=data
)
h5ad_path = response.json()["h5ad_path"]

Example 3: Gene List Query

import json

# T cell markers
gene_list = ["CD3E", "CD8A", "CD4", "IL7R"]
data = {"gene_list_json": json.dumps(gene_list)}
response = requests.post(
    f"{API_URL}/api/v1/genomics/query/encode",
    headers=headers,
    data=data
)
query_vector = response.json()["psi"]  # 128-dimensional vector

Example 4: Text Query

# Natural language query
data = {
    "text": "T cell markers in PBMC tissue",
    "cell_type": "T cell",
    "tissue": "PBMC"
}
response = requests.post(
    f"{API_URL}/api/v1/genomics/query/encode",
    headers=headers,
    data=data
)
query_vector = response.json()["psi"]

šŸ“Š System Overview

What is the DON Research API?

The DON (Distributed Order Network) Research API provides access to proprietary quantum-enhanced algorithms for genomics data compression and feature extraction. The system combines classical preprocessing with quantum-inspired compression to generate high-quality 128-dimensional feature vectors from single-cell RNA-seq data.

Core Technologies

Validated Performance (PBMC3k Dataset)

Input Data
2,700
cells Ɨ 13,714 genes
Compression Ratio
28,928Ɨ
37M → 1.3K values
Processing Time
<30s
on standard hardware
Information Retention
85-90%
biological signal preserved

šŸ” Authentication

API Token

Rate Limit: 1,000 requests per hour
Token Format: Bearer token (JWT-style)
Security: Never commit tokens to Git or share publicly

Using Your Token

HTTP Headers (curl):

curl -H "Authorization: Bearer YOUR_TOKEN" \
     https://don-research-api.onrender.com/health

Python requests:

headers = {"Authorization": f"Bearer {YOUR_TOKEN}"}
response = requests.get(url, headers=headers)

Environment Variable (recommended):

export DON_API_TOKEN="your-token-here"
import os
TOKEN = os.environ.get("DON_API_TOKEN")

šŸ”Œ API Endpoints

Base URLs

1. Health Check

GET /health

Verify API availability and authentication.

2. Build Feature Vectors

POST /api/v1/genomics/vectors/build

Generate 128-dimensional feature vectors from single-cell h5ad files.

Parameters:

Example Request:

import requests

with open("data/pbmc3k.h5ad", "rb") as f:
    files = {"file": ("pbmc3k.h5ad", f, "application/octet-stream")}
    data = {"mode": "cluster"}
    response = requests.post(
        "https://don-research-api.onrender.com/api/v1/genomics/vectors/build",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data=data
    )

result = response.json()
print(f"Built {result['count']} vectors")
print(f"Saved to: {result['jsonl']}")

Response:

{
  "ok": true,
  "mode": "cluster",
  "jsonl": "/path/to/pbmc3k.cluster.jsonl",
  "count": 10,
  "preview": [
    {
      "vector_id": "pbmc3k.h5ad:cluster:0",
      "psi": [0.929, 0.040, ...],  // 128 dimensions
      "space": "X_pca",
      "metric": "cosine",
      "type": "cluster",
      "meta": {
        "file": "pbmc3k.h5ad",
        "cluster": "0",
        "cells": 560,
        "cell_type": "NA",
        "tissue": "NA"
      }
    }
  ]
}

3. Encode Query Vector

POST /api/v1/genomics/query/encode

Convert biological queries (gene lists, cell types, tissues) into 128-dimensional vectors for searching.

Parameters:

Example:

import json

# T cell marker query
t_cell_genes = ["CD3E", "CD8A", "CD4"]
data = {"gene_list_json": json.dumps(t_cell_genes)}

response = requests.post(
    "https://don-research-api.onrender.com/api/v1/genomics/query/encode",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)

query_vector = response.json()["psi"]  # 128-dimensional vector

4. Search Vectors

POST /api/v1/genomics/vectors/search

Find similar cell clusters or cells using cosine similarity search.

Parameters:

Distance Interpretation:

DistanceInterpretation
0.0 - 0.2Very similar (same cell type)
0.2 - 0.5Similar (related cell types)
0.5 - 0.8Moderately similar (different lineages)
0.8+Dissimilar (unrelated cell types)

5. Generate Entropy Map

POST /api/v1/genomics/entropy-map

Visualize cell-level entropy (gene expression diversity) on UMAP embeddings.

Parameters:

Entropy Interpretation:
• Higher entropy: More diverse/complex expression patterns
• Lower entropy: More specialized/differentiated cell states
• Collapse metric: Quantum-inspired cell state stability measure

šŸ”¬ Workflow Examples

Example 1: Basic Cell Type Discovery

import requests
import json

API_URL = "https://don-research-api.onrender.com"
TOKEN = "your-token-here"
headers = {"Authorization": f"Bearer {TOKEN}"}

# Step 1: Build vectors
with open("my_dataset.h5ad", "rb") as f:
    files = {"file": ("my_dataset.h5ad", f, "application/octet-stream")}
    response = requests.post(
        f"{API_URL}/api/v1/genomics/vectors/build",
        headers=headers,
        files=files,
        data={"mode": "cluster"}
    )

vectors_result = response.json()
jsonl_path = vectors_result["jsonl"]
print(f"āœ“ Built {vectors_result['count']} cluster vectors")

# Step 2: Encode T cell query
t_cell_genes = ["CD3E", "CD8A", "CD4", "IL7R"]
query_data = {"gene_list_json": json.dumps(t_cell_genes)}
response = requests.post(
    f"{API_URL}/api/v1/genomics/query/encode",
    headers=headers,
    data=query_data
)
query_vector = response.json()["psi"]
print(f"āœ“ Encoded T cell query")

# Step 3: Search for matching clusters
search_data = {
    "jsonl_path": jsonl_path,
    "psi": json.dumps(query_vector),
    "k": 5
}
response = requests.post(
    f"{API_URL}/api/v1/genomics/vectors/search",
    headers=headers,
    data=search_data
)

results = response.json()["hits"]
print(f"\nāœ“ Top 5 T cell-like clusters:")
for i, hit in enumerate(results, 1):
    cluster_id = hit['meta']['cluster']
    distance = hit['distance']
    cells = hit['meta']['cells']
    print(f"{i}. Cluster {cluster_id}: distance={distance:.4f}, cells={cells}")

Common Cell Type Markers

Cell TypeMarker Genes
T cellsCD3E, CD8A, CD4, IL7R
B cellsMS4A1, CD79A, CD19, IGHM
NK cellsNKG7, GNLY, KLRD1, NCAM1
MonocytesCD14, FCGR3A, CST3, LYZ
Dendritic cellsFCER1A, CST3, CLEC10A

🧬 Bio Module - ResoTrace Integration

The Bio module provides advanced single-cell analysis workflows including artifact export, signal synchronization, QC parasite detection, and evolution tracking. All endpoints support both synchronous (immediate) and asynchronous (background job) execution modes.

Key Features:
• Sync/Async Modes: Choose immediate results or background processing
• Memory Logging: All operations tracked with trace_id for audit trails
• Job Polling: Monitor long-running async jobs via /bio/jobs/{job_id}
• Project Tracking: Retrieve all traces for a project via /bio/memory/{project_id}

1. Export Artifacts

POST /api/v1/bio/export-artifacts

Convert H5AD files into ResoTrace-compatible collapse maps and vector collections. Exports cluster graphs, PAGA connectivity, and per-cell embeddings.

Parameters:

Example (Synchronous):

import requests

with open("pbmc_processed.h5ad", "rb") as f:
    files = {"file": ("pbmc.h5ad", f, "application/octet-stream")}
    data = {
        "cluster_key": "leiden",
        "latent_key": "X_umap",
        "sync": "true",
        "project_id": "cai_lab_pbmc_study",
        "user_id": "researcher_001"
    }
    
    response = requests.post(
        f"{API_URL}/api/v1/bio/export-artifacts",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data=data
    )

result = response.json()
print(f"āœ“ Exported {result['nodes']} clusters")
print(f"āœ“ {result['vectors']} cell vectors")
print(f"āœ“ Artifacts: {result['artifacts']}")
print(f"āœ“ Trace ID: {result.get('trace_id')}")

Example (Asynchronous):

# Submit job
with open("large_dataset.h5ad", "rb") as f:
    files = {"file": ("data.h5ad", f, "application/octet-stream")}
    data = {"cluster_key": "leiden", "latent_key": "X_pca", "sync": "false"}
    
    response = requests.post(
        f"{API_URL}/api/v1/bio/export-artifacts",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data=data
    )

job_id = response.json()["job_id"]
print(f"Job ID: {job_id}")

# Poll job status
import time
while True:
    status_response = requests.get(
        f"{API_URL}/api/v1/bio/jobs/{job_id}",
        headers={"Authorization": f"Bearer {TOKEN}"}
    )
    
    job = status_response.json()
    status = job["status"]
    print(f"Status: {status}")
    
    if status == "completed":
        result = job["result"]
        print(f"āœ“ Complete! Nodes: {result['nodes']}, Vectors: {result['vectors']}")
        break
    elif status == "failed":
        print(f"āœ— Failed: {job.get('error')}")
        break
    
    time.sleep(2)  # Poll every 2 seconds

Response (Sync):

{
  "job_id": null,
  "nodes": 8,
  "edges": 0,
  "vectors": 2700,
  "artifacts": [
    "collapse_map.json",
    "collapse_vectors.jsonl"
  ],
  "status": "completed",
  "message": "Export completed successfully (trace: abc123...)"
}

2. Signal Sync

POST /api/v1/bio/signal-sync

Compute cross-artifact coherence and synchronization metrics between two collapse maps. Useful for comparing replicates or validating pipeline stability.

Parameters:

Example:

with open("run1_collapse_map.json", "rb") as f1,      open("run2_collapse_map.json", "rb") as f2:
    
    files = {
        "artifact1": ("run1.json", f1, "application/json"),
        "artifact2": ("run2.json", f2, "application/json")
    }
    data = {"coherence_threshold": "0.7", "sync": "true"}
    
    response = requests.post(
        f"{API_URL}/api/v1/bio/signal-sync",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data=data
    )

result = response.json()
print(f"Coherence Score: {result['coherence_score']:.3f}")
print(f"Node Overlap: {result['node_overlap']:.3f}")
print(f"Synchronized: {result['synchronized']}")

Interpretation:

Coherence ScoreInterpretation
0.9 - 1.0Excellent consistency (technical replicates)
0.7 - 0.9Good consistency (biological replicates)
0.5 - 0.7Moderate similarity (related conditions)
< 0.5Low similarity (different conditions/batches)

3. Parasite Detector (QC)

POST /api/v1/bio/qc/parasite-detect

Detect quality control "parasites" including ambient RNA, doublets, and batch effects. Returns per-cell flags and overall contamination score.

Parameters:

Example:

with open("pbmc_raw.h5ad", "rb") as f:
    files = {"file": ("pbmc.h5ad", f, "application/octet-stream")}
    data = {
        "cluster_key": "leiden",
        "batch_key": "sample",
        "ambient_threshold": "0.15",
        "doublet_threshold": "0.25",
        "sync": "true"
    }
    
    response = requests.post(
        f"{API_URL}/api/v1/bio/qc/parasite-detect",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data=data
    )

result = response.json()
print(f"Cells: {result['n_cells']}")
print(f"Flagged: {result['n_flagged']} ({result['n_flagged']/result['n_cells']*100:.1f}%)")
print(f"Parasite Score: {result['parasite_score']:.1f}%")

# Flags: list of booleans, one per cell
print(f"\nFlagged cell indices: {[i for i, f in enumerate(result['flags']) if f][:10]}...")

Recommended Actions:

Parasite ScoreAction
0-5%Excellent quality - proceed
5-15%Good quality - minor filtering recommended
15-30%Moderate contamination - filter flagged cells
> 30%High contamination - review QC pipeline

4. Evolution Report

POST /api/v1/bio/evolution/report

Compare two pipeline runs to assess stability, drift, and reproducibility. Useful for parameter optimization and batch effect validation.

Parameters:

Example:

with open("run1_leiden05.h5ad", "rb") as f1,      open("run2_leiden10.h5ad", "rb") as f2:
    
    files = {
        "run1_file": ("run1.h5ad", f1, "application/octet-stream"),
        "run2_file": ("run2.h5ad", f2, "application/octet-stream")
    }
    data = {
        "run2_name": "leiden_resolution_1.0",
        "cluster_key": "leiden",
        "latent_key": "X_pca",
        "sync": "true"
    }
    
    response = requests.post(
        f"{API_URL}/api/v1/bio/evolution/report",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data=data
    )

result = response.json()
print(f"Run 1: {result['run1_name']} ({result['n_cells_run1']} cells)")
print(f"Run 2: {result['run2_name']} ({result['n_cells_run2']} cells)")
print(f"Stability Score: {result['stability_score']:.1f}%")
print(f"\nDelta Metrics: {result['delta_metrics']}")

Stability Score Interpretation:

ScoreInterpretation
> 90%Excellent stability (robust pipeline)
70-90%Good stability (acceptable variation)
50-70%Moderate drift (review parameters)
< 50%High drift (investigate batch effects)

Job Management

Poll Job Status

GET /api/v1/bio/jobs/{job_id}

response = requests.get(
    f"{API_URL}/api/v1/bio/jobs/{job_id}",
    headers={"Authorization": f"Bearer {TOKEN}"}
)

job = response.json()
# Fields: job_id, endpoint, status, created_at, completed_at, result, error

Retrieve Project Memory

GET /api/v1/bio/memory/{project_id}

response = requests.get(
    f"{API_URL}/api/v1/bio/memory/cai_lab_pbmc_study",
    headers={"Authorization": f"Bearer {TOKEN}"}
)

memory = response.json()
print(f"Project: {memory['project_id']}")
print(f"Total Traces: {memory['count']}")

for trace in memory['traces']:
    print(f"  {trace['event_type']}: {trace['metrics']}")
šŸ’” Pro Tips:
• Use sync=true for small datasets (<5k cells)
• Use sync=false for large datasets (>10k cells)
• Set project_id to group related operations
• Monitor trace_id for debugging and audit trails
• Check parasite_score before downstream analysis

🧬 Understanding the Output

128-Dimensional Vector Structure

DimensionsContentPurpose
0-15Entropy signatureGene expression distribution (16 bins)
16HVG fraction% of highly variable genes expressed
17Mitochondrial %Cell quality indicator
18Total countsLibrary size (normalized)
22Silhouette scoreCluster separation quality (-1 to 1)
27Purity scoreNeighborhood homogeneity (0 to 1)
28-127Biological tokensHashed cell type & tissue features

Compression Example (PBMC3k)

šŸ”§ Troubleshooting

Common Errors

1. Authentication Failed (401)

Error: {"detail": "Invalid or missing token"}

Solutions:
  • Verify token is correct (check for extra spaces)
  • Ensure Authorization: Bearer TOKEN header format
  • Contact support if token expired

2. File Upload Failed (400)

Error: {"detail": "Expected .h5ad file"}

Solutions:
  • Verify file extension is .h5ad
  • Check file is valid AnnData: sc.read_h5ad("file.h5ad")
  • Ensure file size < 500MB

3. Rate Limit Exceeded (429)

Error: {"detail": "Rate limit exceeded"}

Solutions:
  • Wait 1 hour for rate limit reset
  • Implement exponential backoff in your code
  • Contact support for higher limits if needed

Best Practices

Data Preparation Tips

import scanpy as sc

# Quality control
adata = sc.read_h5ad("raw_data.h5ad")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
adata = adata[adata.obs['pct_counts_mt'] < 5, :]

# Normalization
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000)

# Save cleaned data
adata.write_h5ad("cleaned_data.h5ad")

šŸ“ž Support & Resources

Research Liaison

research.request@donsystems.com

General inquiries, collaboration requests, token provisioning

Technical Support

support@donsystems.com

API errors, troubleshooting, integration assistance

Partnerships

partnerships@donsystems.com

Academic collaborations, research proposals

Documentation Resources

Office Hours

Monday-Friday, 9 AM - 5 PM CST

Response Time: < 24 hours for technical issues

Reporting Issues

Please include:

  1. Institution name (Texas A&M)
  2. API endpoint and parameters used
  3. Full error message (JSON response)
  4. Sample data file (if < 10MB) or description
  5. Expected vs. actual behavior