DON Research API - Complete User Guide

📖 Web Interface Overview

What This Guide Covers

This comprehensive guide teaches you how to use the DON Research API web interface and its features:

Homepage Navigation - Understanding the main documentation page
Swagger UI - Interactive API testing in your browser (no code required!)
Data Formats - H5AD files, GEO accessions, URLs, gene lists, text queries
Bio Module - ResoTrace integration for advanced workflows
Complete Workflows - Step-by-step examples from start to finish
Troubleshooting - Solutions for common errors

🎯 Quick Links:
• Homepage: https://don-research.onrender.com/ - Quick reference docs
• This Guide: https://don-research.onrender.com/guide - Detailed tutorials
• Swagger UI: https://don-research.onrender.com/docs - Interactive testing

Who Should Use This Guide?

This guide is designed for Texas A&M researchers who:

✅ Have received their API token (via secure email)
✅ Want to test endpoints in their browser before writing code
✅ Need detailed explanations of Bio module features
✅ Want complete workflow examples for common tasks
✅ Need troubleshooting help for API errors

🔍 Swagger UI Tutorial: Test APIs in Your Browser

What is Swagger UI?

Swagger UI is an interactive tool that lets you test API endpoints directly in your web browser without writing any code. It's perfect for:

✅ Learning how endpoints work before writing scripts
✅ Validating your API token
✅ Testing with small datasets
✅ Debugging API calls
✅ Seeing real-time request/response data

Accessing Swagger UI

Open your web browser (Chrome, Firefox, Safari, or Edge)
Navigate to: https://don-research.onrender.com/docs
You'll see a list of all available API endpoints organized by category

Step-by-Step: Testing Your First Endpoint

Step 1: Authenticate with Your Token

Look for the green "Authorize" button at the top right of the page
Click it to open the authentication dialog
In the "Value" field, enter: Bearer your-tamu-token-here
- ⚠️ Important: Include the word "Bearer " (with a space) before your token
- Example: Bearer tamu_cai_lab_2025_HkRs17sgvbjnQax2KzD1iqYcHWbAs5xvZZ2ApKptWuc
Click "Authorize" button
Click "Close" to return to the main page

✓ Success Indicator: The padlock icons next to endpoints change from open 🔓 to closed 🔒

Step 2: Select an Endpoint to Test

Endpoints are organized into sections. For your first test, try the health check:

Scroll down to find GET /health
Click on it to expand the endpoint details

Step 3: Execute the Request

Click the "Try it out" button (top right of the endpoint section)
Click the "Execute" button (blue button)
Wait a moment for the response...

Step 4: View the Results

After execution, you'll see three important sections:

Request URL:

https://don-research.onrender.com/health

Response Body: (should look like this)

{
  "status": "ok",
  "timestamp": "2025-10-27T15:30:45.123Z",
  "database": "connected",
  "don_stack": {
    "mode": "real",
    "don_gpu": true,
    "tace": true,
    "qac": true
  }
}

Response Headers:

X-Trace-ID: tamu_20251027_abc123def
content-type: application/json

Testing File Upload Endpoints

Example: Build Feature Vectors

Find POST /api/v1/genomics/vectors/build
Click to expand → Click "Try it out"
file parameter: Click "Choose File" → Select your .h5ad file
mode parameter: Select cluster from dropdown
Click "Execute"
View response with vector counts and preview data

⚠️ Best Practices:
• Use small datasets (< 5,000 cells) in Swagger UI
• For large files, use Python scripts instead
• Don't upload sensitive/unpublished data via browser
• Copy the curl command shown for use in scripts

Understanding Response Codes

Code	Meaning	Action
200	Success	Request completed successfully
400	Bad Request	Check parameter format (e.g., file extension)
401	Unauthorized	Check token format (must include "Bearer ")
429	Rate Limit Exceeded	Wait 1 hour or use fewer requests
500	Server Error	Contact support with trace_id

📁 Supported Data Formats

Format Overview

The DON Research API supports multiple input formats, not just H5AD files:

Format	Example	Use Case	Supported Endpoints
H5AD Files	`pbmc3k.h5ad`	Direct upload of single-cell data (AnnData format)	All genomics + Bio module
GEO Accessions	`GSE12345`	Auto-download from NCBI GEO database	`/load`
Direct URLs	`https://example.com/data.h5ad`	Download from external HTTP/HTTPS sources	`/load`
Gene Lists (JSON)	`["CD3E", "CD8A", "CD4"]`	Encode cell type marker genes as query vectors	`/query/encode`
Text Queries	`"T cell markers in PBMC"`	Natural language biological queries	`/query/encode`

📌 Important:
• Bio Module endpoints require H5AD files only (for ResoTrace compatibility)
• Genomics endpoints support all formats above
• GEO accessions and URLs are automatically downloaded and converted to H5AD

Format-Specific Usage Examples

1. H5AD Files (Most Common)

import requests

# Upload H5AD file directly
with open("pbmc_3k.h5ad", "rb") as f:
    files = {"file": ("pbmc_3k.h5ad", f, "application/octet-stream")}
    response = requests.post(
        "https://don-research.onrender.com/api/v1/genomics/vectors/build",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data={"mode": "cluster"}
    )
print(response.json())

2. GEO Accessions

# Automatically downloads from NCBI GEO
data = {"accession_or_path": "GSE12345"}
response = requests.post(
    "https://don-research.onrender.com/api/v1/genomics/load",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)
h5ad_path = response.json()["h5ad_path"]
print(f"Downloaded to: {h5ad_path}")

3. Direct URLs

# Download from any HTTP/HTTPS source
data = {"accession_or_path": "https://example.com/dataset.h5ad"}
response = requests.post(
    "https://don-research.onrender.com/api/v1/genomics/load",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)
h5ad_path = response.json()["h5ad_path"]

4. Gene Lists (JSON)

import json

# T cell marker genes
gene_list = ["CD3E", "CD8A", "CD4", "IL7R", "CCR7"]
data = {"gene_list_json": json.dumps(gene_list)}

response = requests.post(
    "https://don-research.onrender.com/api/v1/genomics/query/encode",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)
query_vector = response.json()["psi"]  # 128-dimensional vector

5. Text Queries

# Natural language query
data = {
    "text": "T cell markers in PBMC tissue",
    "cell_type": "T cell",
    "tissue": "PBMC"
}

response = requests.post(
    "https://don-research.onrender.com/api/v1/genomics/query/encode",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)
query_vector = response.json()["psi"]

When to Use Each Format

H5AD files: When you have preprocessed single-cell data locally
GEO accessions: When referencing published datasets (e.g., "GSE12345" from papers)
URLs: When data is hosted externally (collaborator's server, cloud storage)
Gene lists: When searching for specific cell types by marker genes
Text queries: When exploring data without knowing exact gene names

🧬 Bio Module: ResoTrace Integration

Overview

The Bio Module provides advanced single-cell analysis workflows optimized for ResoTrace integration. Key capabilities:

✅ Export Artifacts: Convert H5AD → ResoTrace collapse maps
✅ Signal Sync: Compare pipeline runs for reproducibility
✅ Parasite Detection: QC for ambient RNA, doublets, batch effects
✅ Evolution Report: Track pipeline stability over parameter changes

Sync vs Async Execution Modes

Every Bio endpoint supports two execution modes:

Mode	When to Use	Response Time	Best For
`sync=true`	Small datasets (< 5K cells)	Immediate (< 30s)	Quick validation, exploratory analysis, Swagger UI testing
`sync=false`	Large datasets (> 10K cells)	Background job	Production pipelines, batch processing, automated workflows

Feature 1: Export Artifacts

POST /api/v1/bio/export-artifacts

What it does:

Converts .h5ad files into ResoTrace-compatible formats
Generates collapse maps (cluster graph structure)
Exports cell-level vector collections (128D embeddings)
Includes PAGA connectivity (if available)

Required parameters:

file: H5AD file upload
cluster_key: Column in adata.obs with cluster labels (e.g., "leiden")
latent_key: Embedding in adata.obsm (e.g., "X_umap", "X_pca")

Example (Synchronous):

with open("pbmc_3k.h5ad", "rb") as f:
    files = {"file": ("pbmc.h5ad", f, "application/octet-stream")}
    data = {
        "cluster_key": "leiden",
        "latent_key": "X_umap",
        "sync": "true",
        "project_id": "cai_lab_pbmc_study",
        "user_id": "researcher_001"
    }
    
    response = requests.post(
        "https://don-research.onrender.com/api/v1/bio/export-artifacts",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data=data
    )

result = response.json()
print(f"✓ Exported {result['nodes']} clusters")
print(f"✓ {result['vectors']} cell vectors")
print(f"✓ Trace ID: {result.get('trace_id')}")

Feature 2: Parasite Detector (QC)

POST /api/v1/bio/qc/parasite-detect

What it does:

Flags low-quality cells ("parasites")
Detects: Ambient RNA, doublets, batch effects
Returns per-cell boolean flags
Computes overall contamination score

Recommended Actions:

Parasite Score	Quality	Action
0-5%	Excellent	Proceed without filtering
5-15%	Good	Minor filtering recommended
15-30%	Moderate	Filter flagged cells
> 30%	Poor	Review QC pipeline

For complete Bio module documentation, see the homepage or Swagger UI.

🔬 Complete Workflow Examples

Workflow 1: Cell Type Discovery with T Cells

Goal: Identify T cell clusters in PBMC dataset using marker genes

import requests
import json

API_URL = "https://don-research.onrender.com"
TOKEN = "your-tamu-token-here"
headers = {"Authorization": f"Bearer {TOKEN}"}

# Step 1: Build cluster vectors
print("Step 1: Building vectors...")
with open("pbmc_3k.h5ad", "rb") as f:
    files = {"file": ("pbmc_3k.h5ad", f, "application/octet-stream")}
    response = requests.post(
        f"{API_URL}/api/v1/genomics/vectors/build",
        headers=headers,
        files=files,
        data={"mode": "cluster"}
    )

vectors_result = response.json()
jsonl_path = vectors_result["jsonl"]
print(f"✓ Built {vectors_result['count']} cluster vectors")

# Step 2: Encode T cell query
print("\nStep 2: Encoding T cell markers...")
t_cell_genes = ["CD3E", "CD8A", "CD4", "IL7R"]
query_data = {"gene_list_json": json.dumps(t_cell_genes)}

response = requests.post(
    f"{API_URL}/api/v1/genomics/query/encode",
    headers=headers,
    data=query_data
)
query_vector = response.json()["psi"]
print("✓ Encoded query vector (128 dimensions)")

# Step 3: Search for matching clusters
print("\nStep 3: Searching for T cell-like clusters...")
search_data = {
    "jsonl_path": jsonl_path,
    "psi": json.dumps(query_vector),
    "k": 5
}
response = requests.post(
    f"{API_URL}/api/v1/genomics/vectors/search",
    headers=headers,
    data=search_data
)

results = response.json()["hits"]
print(f"\n✓ Top 5 T cell-like clusters:")
for i, hit in enumerate(results, 1):
    cluster_id = hit['meta']['cluster']
    distance = hit['distance']
    cells = hit['meta']['cells']
    print(f"{i}. Cluster {cluster_id}: distance={distance:.4f}, cells={cells}")

Workflow 2: QC Pipeline with Parasite Detection

Goal: Clean dataset by detecting and removing low-quality cells

import requests
import scanpy as sc
import numpy as np

# Step 1: Detect parasites
print("Step 1: Detecting QC parasites...")
with open("pbmc_raw.h5ad", "rb") as f:
    files = {"file": ("pbmc_raw.h5ad", f, "application/octet-stream")}
    data = {
        "cluster_key": "leiden",
        "batch_key": "sample",
        "sync": "true"
    }
    
    response = requests.post(
        f"{API_URL}/api/v1/bio/qc/parasite-detect",
        headers=headers,
        files=files,
        data=data
    )

result = response.json()
flags = result["flags"]
parasite_score = result["parasite_score"]
print(f"✓ Parasite score: {parasite_score:.1f}%")

# Step 2: Filter flagged cells
print("\nStep 2: Filtering flagged cells...")
adata = sc.read_h5ad("pbmc_raw.h5ad")
adata = adata[~np.array(flags), :]
adata.write_h5ad("pbmc_cleaned.h5ad")
print(f"✓ Saved {adata.n_obs} clean cells")

🔧 Troubleshooting Common Errors

Error 401: Authentication Failed

Error: {"detail": "Invalid or missing token"}

Solutions:

✅ Verify token format includes "Bearer " prefix
✅ Check for extra whitespace in token
✅ Confirm token hasn't expired (valid 1 year)

Error 400: File Upload Failed

Error: {"detail": "Expected .h5ad file"}

Solutions:

✅ Verify file has .h5ad extension
✅ Validate AnnData format: sc.read_h5ad("file.h5ad")
✅ Check file size < 500MB

Error 429: Rate Limit Exceeded

Error: {"detail": "Rate limit exceeded"}

Solutions:

✅ Wait 1 hour for rate limit reset (1,000 req/hour)
✅ Implement exponential backoff in scripts
✅ Use cluster mode instead of cell mode
✅ Contact support for higher limits if needed

Contact Support

When reporting issues, include:

Institution: Texas A&M University (Cai Lab)
API endpoint and method (e.g., POST /vectors/build)
Full error message (JSON response)
Trace ID from response header (X-Trace-ID)
Dataset description (cells, genes, file size)

Email: support@donsystems.com | Response time: < 24 hours