Unified Data Access for AI - Databases, Scientific APIs, and 62 Research Data Sources
v4.0.0 · Port 8892Lightweight local databases for caching, configuration, and embedded data storage.
Enterprise-grade relational database with advanced features and JSON support.
Popular open-source database for web applications and analytics.
Enterprise database for mainframes, z/OS, and big data workloads.
Access 62 scientific research APIs and databases worldwide. Scientific connectors provide unified REST access to external data sources for AI analysis.
Response caching with configurable TTL (default 5 minutes) reduces redundant API calls and improves response times. Cache keys are generated from provider, endpoint, and parameters.
Sliding window rate limiter respects each provider's API limits. Prevents rate limit errors and ensures fair usage across concurrent requests.
Secure API key handling via environment variables or configuration. Keys are never logged or exposed in responses.
All 62 APIs accessible through a single endpoint format. Consistent request/response structure regardless of underlying provider.
APOD, NEO, Mars Rover photos, Exoplanet Archive, mission telemetry, Earth observation
Gaia star catalog, Copernicus Earth data, Rosetta/Herschel mission archives
Hayabusa asteroid samples, SLIM lunar data, Earth observation satellites
Chandrayaan lunar mission, Mangalyaan Mars orbiter data
Infrared observations, exoplanet atmospheres, deep field imaging
30+ years of optical/UV observations via MAST archive
X-ray astronomy: black holes, neutron stars, supernovae remnants
Infrared legacy data, galaxy surveys, star formation regions
LHC collision data, Higgs boson events, CMS/ATLAS/LHCb experiments
Neutrino experiments, Tevatron legacy data, dark matter searches
PETRA III synchrotron, European XFEL experiments
Particle properties, decay modes, physical constants database
Gravitational wave detections, black hole/neutron star mergers, strain data
GW Open Science Center: event catalog, parameters, sky maps
European and Japanese GW detector data, joint observation runs
Real-time earthquake data, historical seismicity, geological surveys
Weather forecasts, climate data, ocean observations, solar activity
Global seismological network, waveform data, station metadata
Global biodiversity data, species occurrences, marine observations
GenBank sequences, RefSeq, PubMed literature, protein databases
Protein sequences, functional annotations, proteomics data
Genome browser, variant annotations, comparative genomics
Protein Data Bank: 3D structures, ligand binding, structural biology
Gene expression maps, neural connectivity, cell type databases
fMRI, EEG, MEG neuroimaging datasets, BIDS format
Neuron morphology database, 3D reconstructions
Clinical trial registry, protocols, recruitment status, results
35M+ biomedical citations, abstracts, full-text links
Drug labels, adverse events, recalls, device data
SNP-trait associations, disease genetics, risk variants
Crystal structures, band gaps, formation energies, stability
Chemical compounds, bioassays, bioactivity data
Crystallography Open Database: crystal structures, diffraction
Thermochemical properties, spectra, reference standards
Nuclear reactions, isotope properties, cross-sections, decay data
Fusion project data, plasma physics, tokamak engineering
Global crop production, land use, food trade statistics
Plant species database, conservation status, distribution
Fossil occurrences, taxonomy, extinction events, paleogeography
Archaeological excavation data, artifacts, site records
Query any scientific API through the unified REST interface:
# Get today's APOD
curl -X GET "http://dataworker:8892/api/v1/science/nasa/apod"
# Get APOD for specific date
curl -X GET "http://dataworker:8892/api/v1/science/nasa/apod?date=2024-01-15"
# Get recent earthquakes (magnitude 4.5+)
curl -X GET "http://dataworker:8892/api/v1/science/usgs/earthquakes?\
starttime=2024-01-01&endtime=2024-01-31&minmagnitude=4.5"
# Search for BRCA1 gene information
curl -X POST "http://dataworker:8892/api/v1/science/ncbi/search" \
-H "Content-Type: application/json" \
-d '{"db": "gene", "term": "BRCA1 human", "max_results": 5}'
# Query any provider via unified endpoint
curl -X POST "http://dataworker:8892/api/v1/science/query" \
-H "Content-Type: application/json" \
-d '{
"provider": "uniprot",
"endpoint": "search",
"params": {
"query": "insulin human",
"format": "json",
"size": "10"
}
}'
# Get all configured scientific data connectors
curl -X GET "http://dataworker:8892/api/v1/science/connectors"
# Response includes provider info, rate limits, and API key status
{
"providers": [
{"id": "nasa", "name": "NASA", "category": "Space", "rate_limit_rpm": 1000, "available": true},
{"id": "ncbi", "name": "NCBI", "category": "Genomics", "rate_limit_rpm": 10, "available": true},
{"id": "cern", "name": "CERN Open Data", "category": "Physics", "rate_limit_rpm": 60, "available": true},
...
]
}
| Pool Setting | Default | Description |
|---|---|---|
min_connections |
2 | Minimum connections to keep warm |
max_connections |
10 | Maximum concurrent connections |
connection_timeout_ms |
5000 | Timeout waiting for available connection |
idle_timeout_ms |
300000 | Close idle connections after 5 minutes |
max_lifetime_ms |
3600000 | Maximum connection lifetime (1 hour) |
health_check_interval_ms |
30000 | Health check every 30 seconds |
| Method | Endpoint | Description |
|---|---|---|
| GET | /health |
Health check status |
| GET | /info |
Daemon info & available connectors |
| GET | /metrics |
Performance metrics |
| GET | /api/v1/data/sources |
List all data sources |
| POST | /api/v1/data/sources |
Add a new data source |
| POST | /api/v1/data/query |
Execute SELECT query |
| POST | /api/v1/data/execute |
Execute INSERT/UPDATE/DELETE |
| GET | /api/v1/data/sources/:id/schema |
Get database schema |
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/science/connectors |
List available scientific data connectors |
| GET | /api/v1/science/nasa/apod |
NASA Astronomy Picture of the Day |
| GET | /api/v1/science/nasa/neo |
Near Earth Objects data |
| GET | /api/v1/science/usgs/earthquakes |
Recent earthquake data |
| GET | /api/v1/science/ncbi/search |
Search NCBI databases (GenBank, etc.) |
| GET | /api/v1/science/ncbi/fetch/:id |
Fetch sequence by accession |
| GET | /api/v1/science/uniprot/:id |
Get UniProt protein entry |
| GET | /api/v1/science/pdb/:id |
Get PDB structure |
| GET | /api/v1/science/pubmed/search |
Search PubMed literature |
| GET | /api/v1/science/pubchem/:compound |
Get compound information |
| GET | /api/v1/science/materials/:id |
Materials Project data |
| GET | /api/v1/science/cern/datasets |
CERN Open Data catalog |
| GET | /api/v1/science/gwosc/events |
Gravitational wave events |
| GET | /api/v1/science/noaa/weather |
Weather and climate data |
Allow AI assistants to query business databases and generate insights.
Connect AI workers to IBM z/OS mainframe databases for enterprise data access.
Combine data from multiple databases for comprehensive AI context.
AI monitors database metrics and alerts on anomalies.
| Component | Port | Protocol | Description |
|---|---|---|---|
| Controller | 8880 | HTTP/REST | Cluster management & API |
| Router | 8881 | HTTP/REST | Request routing & load balancing |
| AI Worker | 8890 | HTTP/REST | LLM inference (Ollama/vLLM/TGI) |
| Data Worker | 8892 | HTTP/REST | Database connectivity service |
| Agent Worker | 8893 | HTTP/REST | Agentic RAG orchestration |
| Media Worker | 8894 | HTTP/REST | STT/TTS, Video processing |
| Comm Worker | 8895 | HTTP/REST | Messaging (Email, SMS, etc.) |