šŸ“˜ DON Research API - Complete User Guide

Comprehensive Tutorial for Texas A&M University Cai Lab

šŸ“– Web Interface Overview

What This Guide Covers

This comprehensive guide teaches you how to use the DON Research API web interface and its features:

šŸŽÆ Quick Links:
• Homepage: https://don-research.onrender.com/ - Quick reference docs
• This Guide: https://don-research.onrender.com/guide - Detailed tutorials
• Swagger UI: https://don-research.onrender.com/docs - Interactive testing

Who Should Use This Guide?

This guide is designed for Texas A&M researchers who:

šŸ” Swagger UI Tutorial: Test APIs in Your Browser

What is Swagger UI?

Swagger UI is an interactive tool that lets you test API endpoints directly in your web browser without writing any code. It's perfect for:

Accessing Swagger UI

  1. Open your web browser (Chrome, Firefox, Safari, or Edge)
  2. Navigate to: https://don-research.onrender.com/docs
  3. You'll see a list of all available API endpoints organized by category

Step-by-Step: Testing Your First Endpoint

Step 1: Authenticate with Your Token

  1. Look for the green "Authorize" button at the top right of the page
  2. Click it to open the authentication dialog
  3. In the "Value" field, enter: Bearer your-tamu-token-here
    • āš ļø Important: Include the word "Bearer " (with a space) before your token
    • Example: Bearer tamu_cai_lab_2025_HkRs17sgvbjnQax2KzD1iqYcHWbAs5xvZZ2ApKptWuc
  4. Click "Authorize" button
  5. Click "Close" to return to the main page
āœ“ Success Indicator: The padlock icons next to endpoints change from open šŸ”“ to closed šŸ”’

Step 2: Select an Endpoint to Test

Endpoints are organized into sections. For your first test, try the health check:

  1. Scroll down to find GET /health
  2. Click on it to expand the endpoint details

Step 3: Execute the Request

  1. Click the "Try it out" button (top right of the endpoint section)
  2. Click the "Execute" button (blue button)
  3. Wait a moment for the response...

Step 4: View the Results

After execution, you'll see three important sections:

Request URL:

https://don-research.onrender.com/health

Response Body: (should look like this)

{
  "status": "ok",
  "timestamp": "2025-10-27T15:30:45.123Z",
  "database": "connected",
  "don_stack": {
    "mode": "real",
    "don_gpu": true,
    "tace": true,
    "qac": true
  }
}

Response Headers:

X-Trace-ID: tamu_20251027_abc123def
content-type: application/json

Testing File Upload Endpoints

Example: Build Feature Vectors

  1. Find POST /api/v1/genomics/vectors/build
  2. Click to expand → Click "Try it out"
  3. file parameter: Click "Choose File" → Select your .h5ad file
  4. mode parameter: Select cluster from dropdown
  5. Click "Execute"
  6. View response with vector counts and preview data
āš ļø Best Practices:
• Use small datasets (< 5,000 cells) in Swagger UI
• For large files, use Python scripts instead
• Don't upload sensitive/unpublished data via browser
• Copy the curl command shown for use in scripts

Understanding Response Codes

Code Meaning Action
200 Success Request completed successfully
400 Bad Request Check parameter format (e.g., file extension)
401 Unauthorized Check token format (must include "Bearer ")
429 Rate Limit Exceeded Wait 1 hour or use fewer requests
500 Server Error Contact support with trace_id

šŸ“ Supported Data Formats

Format Overview

The DON Research API supports multiple input formats, not just H5AD files:

Format Example Use Case Supported Endpoints
H5AD Files pbmc3k.h5ad Direct upload of single-cell data (AnnData format) All genomics + Bio module
GEO Accessions GSE12345 Auto-download from NCBI GEO database /load
Direct URLs https://example.com/data.h5ad Download from external HTTP/HTTPS sources /load
Gene Lists (JSON) ["CD3E", "CD8A", "CD4"] Encode cell type marker genes as query vectors /query/encode
Text Queries "T cell markers in PBMC" Natural language biological queries /query/encode
šŸ“Œ Important:
• Bio Module endpoints require H5AD files only (for ResoTrace compatibility)
• Genomics endpoints support all formats above
• GEO accessions and URLs are automatically downloaded and converted to H5AD

Format-Specific Usage Examples

1. H5AD Files (Most Common)

import requests

# Upload H5AD file directly
with open("pbmc_3k.h5ad", "rb") as f:
    files = {"file": ("pbmc_3k.h5ad", f, "application/octet-stream")}
    response = requests.post(
        "https://don-research.onrender.com/api/v1/genomics/vectors/build",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data={"mode": "cluster"}
    )
print(response.json())

2. GEO Accessions

# Automatically downloads from NCBI GEO
data = {"accession_or_path": "GSE12345"}
response = requests.post(
    "https://don-research.onrender.com/api/v1/genomics/load",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)
h5ad_path = response.json()["h5ad_path"]
print(f"Downloaded to: {h5ad_path}")

3. Direct URLs

# Download from any HTTP/HTTPS source
data = {"accession_or_path": "https://example.com/dataset.h5ad"}
response = requests.post(
    "https://don-research.onrender.com/api/v1/genomics/load",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)
h5ad_path = response.json()["h5ad_path"]

4. Gene Lists (JSON)

import json

# T cell marker genes
gene_list = ["CD3E", "CD8A", "CD4", "IL7R", "CCR7"]
data = {"gene_list_json": json.dumps(gene_list)}

response = requests.post(
    "https://don-research.onrender.com/api/v1/genomics/query/encode",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)
query_vector = response.json()["psi"]  # 128-dimensional vector

5. Text Queries

# Natural language query
data = {
    "text": "T cell markers in PBMC tissue",
    "cell_type": "T cell",
    "tissue": "PBMC"
}

response = requests.post(
    "https://don-research.onrender.com/api/v1/genomics/query/encode",
    headers={"Authorization": f"Bearer {TOKEN}"},
    data=data
)
query_vector = response.json()["psi"]

When to Use Each Format

🧬 Bio Module: ResoTrace Integration

Overview

The Bio Module provides advanced single-cell analysis workflows optimized for ResoTrace integration. Key capabilities:

Sync vs Async Execution Modes

Every Bio endpoint supports two execution modes:

Mode When to Use Response Time Best For
sync=true Small datasets (< 5K cells) Immediate (< 30s) Quick validation, exploratory analysis, Swagger UI testing
sync=false Large datasets (> 10K cells) Background job Production pipelines, batch processing, automated workflows

Feature 1: Export Artifacts

POST /api/v1/bio/export-artifacts

What it does:

Required parameters:

Example (Synchronous):

with open("pbmc_3k.h5ad", "rb") as f:
    files = {"file": ("pbmc.h5ad", f, "application/octet-stream")}
    data = {
        "cluster_key": "leiden",
        "latent_key": "X_umap",
        "sync": "true",
        "project_id": "cai_lab_pbmc_study",
        "user_id": "researcher_001"
    }
    
    response = requests.post(
        "https://don-research.onrender.com/api/v1/bio/export-artifacts",
        headers={"Authorization": f"Bearer {TOKEN}"},
        files=files,
        data=data
    )

result = response.json()
print(f"āœ“ Exported {result['nodes']} clusters")
print(f"āœ“ {result['vectors']} cell vectors")
print(f"āœ“ Trace ID: {result.get('trace_id')}")

Feature 2: Parasite Detector (QC)

POST /api/v1/bio/qc/parasite-detect

What it does:

Recommended Actions:

Parasite Score Quality Action
0-5% Excellent Proceed without filtering
5-15% Good Minor filtering recommended
15-30% Moderate Filter flagged cells
> 30% Poor Review QC pipeline

For complete Bio module documentation, see the homepage or Swagger UI.

šŸ”¬ Complete Workflow Examples

Workflow 1: Cell Type Discovery with T Cells

Goal: Identify T cell clusters in PBMC dataset using marker genes

import requests
import json

API_URL = "https://don-research.onrender.com"
TOKEN = "your-tamu-token-here"
headers = {"Authorization": f"Bearer {TOKEN}"}

# Step 1: Build cluster vectors
print("Step 1: Building vectors...")
with open("pbmc_3k.h5ad", "rb") as f:
    files = {"file": ("pbmc_3k.h5ad", f, "application/octet-stream")}
    response = requests.post(
        f"{API_URL}/api/v1/genomics/vectors/build",
        headers=headers,
        files=files,
        data={"mode": "cluster"}
    )

vectors_result = response.json()
jsonl_path = vectors_result["jsonl"]
print(f"āœ“ Built {vectors_result['count']} cluster vectors")

# Step 2: Encode T cell query
print("\nStep 2: Encoding T cell markers...")
t_cell_genes = ["CD3E", "CD8A", "CD4", "IL7R"]
query_data = {"gene_list_json": json.dumps(t_cell_genes)}

response = requests.post(
    f"{API_URL}/api/v1/genomics/query/encode",
    headers=headers,
    data=query_data
)
query_vector = response.json()["psi"]
print("āœ“ Encoded query vector (128 dimensions)")

# Step 3: Search for matching clusters
print("\nStep 3: Searching for T cell-like clusters...")
search_data = {
    "jsonl_path": jsonl_path,
    "psi": json.dumps(query_vector),
    "k": 5
}
response = requests.post(
    f"{API_URL}/api/v1/genomics/vectors/search",
    headers=headers,
    data=search_data
)

results = response.json()["hits"]
print(f"\nāœ“ Top 5 T cell-like clusters:")
for i, hit in enumerate(results, 1):
    cluster_id = hit['meta']['cluster']
    distance = hit['distance']
    cells = hit['meta']['cells']
    print(f"{i}. Cluster {cluster_id}: distance={distance:.4f}, cells={cells}")

Workflow 2: QC Pipeline with Parasite Detection

Goal: Clean dataset by detecting and removing low-quality cells

import requests
import scanpy as sc
import numpy as np

# Step 1: Detect parasites
print("Step 1: Detecting QC parasites...")
with open("pbmc_raw.h5ad", "rb") as f:
    files = {"file": ("pbmc_raw.h5ad", f, "application/octet-stream")}
    data = {
        "cluster_key": "leiden",
        "batch_key": "sample",
        "sync": "true"
    }
    
    response = requests.post(
        f"{API_URL}/api/v1/bio/qc/parasite-detect",
        headers=headers,
        files=files,
        data=data
    )

result = response.json()
flags = result["flags"]
parasite_score = result["parasite_score"]
print(f"āœ“ Parasite score: {parasite_score:.1f}%")

# Step 2: Filter flagged cells
print("\nStep 2: Filtering flagged cells...")
adata = sc.read_h5ad("pbmc_raw.h5ad")
adata = adata[~np.array(flags), :]
adata.write_h5ad("pbmc_cleaned.h5ad")
print(f"āœ“ Saved {adata.n_obs} clean cells")

šŸ”§ Troubleshooting Common Errors

Error 401: Authentication Failed

Error: {"detail": "Invalid or missing token"}

Solutions:
  • āœ… Verify token format includes "Bearer " prefix
  • āœ… Check for extra whitespace in token
  • āœ… Confirm token hasn't expired (valid 1 year)

Error 400: File Upload Failed

Error: {"detail": "Expected .h5ad file"}

Solutions:
  • āœ… Verify file has .h5ad extension
  • āœ… Validate AnnData format: sc.read_h5ad("file.h5ad")
  • āœ… Check file size < 500MB

Error 429: Rate Limit Exceeded

Error: {"detail": "Rate limit exceeded"}

Solutions:
  • āœ… Wait 1 hour for rate limit reset (1,000 req/hour)
  • āœ… Implement exponential backoff in scripts
  • āœ… Use cluster mode instead of cell mode
  • āœ… Contact support for higher limits if needed

Contact Support

When reporting issues, include:

  1. Institution: Texas A&M University (Cai Lab)
  2. API endpoint and method (e.g., POST /vectors/build)
  3. Full error message (JSON response)
  4. Trace ID from response header (X-Trace-ID)
  5. Dataset description (cells, genes, file size)

Email: support@donsystems.com | Response time: < 24 hours