Eldric Data Worker

Unified Data Access for AI - Databases, Scientific APIs, and 62 Research Data Sources

v4.0.0 · Port 8892

Cluster Architecture

Eldric Distributed Cluster with Data Workers
Controller Port 8880 Router Port 8881 Worker Ollama Port 8890 Worker vLLM Port 8890 Worker TGI Port 8890 Data Worker PostgreSQL Port 8892 Data Worker DB2 Port 8892 PostgreSQL Analytics DB MySQL App DB IBM DB2 z/OS Mainframe Enterprise Data SQLite Local Cache Query Request Legend Controller (8880) AI Worker (8890) Data Worker (8892) Router (8881)

Database Connectors

SQLite

Built-in

Lightweight local databases for caching, configuration, and embedded data storage.

  • Zero configuration
  • File-based storage
  • Full SQL support
  • Always available

PostgreSQL

Optional

Enterprise-grade relational database with advanced features and JSON support.

  • Schema support
  • JSON/JSONB columns
  • Full-text search
  • libpq driver

MySQL / MariaDB

Optional

Popular open-source database for web applications and analytics.

  • High performance
  • Replication support
  • Wide compatibility
  • libmysqlclient driver

IBM DB2

Enterprise

Enterprise database for mainframes, z/OS, and big data workloads.

  • ODBC connectivity
  • Native CLI support
  • z/OS DRDA protocol
  • Mainframe integration

Scientific Data Connectors

Access 62 scientific research APIs and databases worldwide. Scientific connectors provide unified REST access to external data sources for AI analysis.

Scientific Data Connector Architecture
AI Worker Science Request provider: "nasa" endpoint: "apod" Data Worker Port 8892 ScientificConnector Response Cache TTL: 5 minutes Rate Limiter Per-provider sliding window API Key Manager Environment / Config 62 Provider Registry 🚀 NASA APOD, NEO, Mars api.nasa.gov ● 1000 rpm 🧬 NCBI GenBank, PubMed eutils.ncbi.nlm.nih.gov ● 10 rpm (no key) ⚛️ CERN Open Data Portal opendata.cern.ch ● 60 rpm + 59 More APIs USGS, ESA, GWOSC UniProt, PDB, NOAA Materials Project PubChem, Allen Brain and many more... Data Flow No API key required API key optional

Connector Features

🔄 Intelligent Caching

Response caching with configurable TTL (default 5 minutes) reduces redundant API calls and improves response times. Cache keys are generated from provider, endpoint, and parameters.

  • Per-request cache control
  • Automatic cache invalidation
  • Memory-efficient LRU eviction

⚡ Per-Provider Rate Limiting

Sliding window rate limiter respects each provider's API limits. Prevents rate limit errors and ensures fair usage across concurrent requests.

  • Automatic request queuing
  • Provider-specific RPM limits
  • Graceful backoff on 429 errors

🔑 API Key Management

Secure API key handling via environment variables or configuration. Keys are never logged or exposed in responses.

  • Environment variable lookup (e.g., NASA_API_KEY)
  • Runtime key updates via API
  • Optional keys for rate limit boost

🌐 Unified REST Interface

All 62 APIs accessible through a single endpoint format. Consistent request/response structure regardless of underlying provider.

  • POST /api/v1/science/query
  • Query templates (search, lookup, details)
  • Standard JSON responses
  • Provider-agnostic error handling

🚀 Space Agencies

NASA

APOD, NEO, Mars Rover photos, Exoplanet Archive, mission telemetry, Earth observation

ESA

Gaia star catalog, Copernicus Earth data, Rosetta/Herschel mission archives

JAXA

Hayabusa asteroid samples, SLIM lunar data, Earth observation satellites

ISRO

Chandrayaan lunar mission, Mangalyaan Mars orbiter data

🔭 Space Telescopes

JWST (James Webb)

Infrared observations, exoplanet atmospheres, deep field imaging

Hubble (MAST)

30+ years of optical/UV observations via MAST archive

Chandra X-ray

X-ray astronomy: black holes, neutron stars, supernovae remnants

Spitzer (Archive)

Infrared legacy data, galaxy surveys, star formation regions

⚛️ Particle Physics

CERN Open Data

LHC collision data, Higgs boson events, CMS/ATLAS/LHCb experiments

Fermilab

Neutrino experiments, Tevatron legacy data, dark matter searches

DESY

PETRA III synchrotron, European XFEL experiments

HEPData / PDG

Particle properties, decay modes, physical constants database

🌊 Gravitational Waves

LIGO

Gravitational wave detections, black hole/neutron star mergers, strain data

GWOSC

GW Open Science Center: event catalog, parameters, sky maps

Virgo / KAGRA

European and Japanese GW detector data, joint observation runs

🌍 Earth Sciences

USGS

Real-time earthquake data, historical seismicity, geological surveys

NOAA

Weather forecasts, climate data, ocean observations, solar activity

IRIS Seismology

Global seismological network, waveform data, station metadata

GBIF / OBIS

Global biodiversity data, species occurrences, marine observations

🧬 Genomics & Life Sciences

NCBI

GenBank sequences, RefSeq, PubMed literature, protein databases

UniProt

Protein sequences, functional annotations, proteomics data

Ensembl

Genome browser, variant annotations, comparative genomics

PDB

Protein Data Bank: 3D structures, ligand binding, structural biology

🧠 Neuroscience

Allen Brain Atlas

Gene expression maps, neural connectivity, cell type databases

OpenNeuro

fMRI, EEG, MEG neuroimaging datasets, BIDS format

NeuroMorpho

Neuron morphology database, 3D reconstructions

🏥 Medical & Clinical

ClinicalTrials.gov

Clinical trial registry, protocols, recruitment status, results

PubMed

35M+ biomedical citations, abstracts, full-text links

OpenFDA

Drug labels, adverse events, recalls, device data

GWAS Catalog

SNP-trait associations, disease genetics, risk variants

🔬 Materials & Chemistry

Materials Project

Crystal structures, band gaps, formation energies, stability

PubChem / ChEMBL

Chemical compounds, bioassays, bioactivity data

COD

Crystallography Open Database: crystal structures, diffraction

NIST Chemistry

Thermochemical properties, spectra, reference standards

☢️ Nuclear & Fusion

IAEA Nuclear Data

Nuclear reactions, isotope properties, cross-sections, decay data

ITER

Fusion project data, plasma physics, tokamak engineering

🌾 Agriculture & Paleontology

FAOStat

Global crop production, land use, food trade statistics

USDA Plants

Plant species database, conservation status, distribution

Paleobiology DB

Fossil occurrences, taxonomy, extinction events, paleogeography

Open Context

Archaeological excavation data, artifacts, site records

Usage Examples

Query any scientific API through the unified REST interface:

NASA Astronomy Picture of the Day
# Get today's APOD
curl -X GET "http://dataworker:8892/api/v1/science/nasa/apod"

# Get APOD for specific date
curl -X GET "http://dataworker:8892/api/v1/science/nasa/apod?date=2024-01-15"
USGS Earthquake Data
# Get recent earthquakes (magnitude 4.5+)
curl -X GET "http://dataworker:8892/api/v1/science/usgs/earthquakes?\
starttime=2024-01-01&endtime=2024-01-31&minmagnitude=4.5"
NCBI Gene Search
# Search for BRCA1 gene information
curl -X POST "http://dataworker:8892/api/v1/science/ncbi/search" \
  -H "Content-Type: application/json" \
  -d '{"db": "gene", "term": "BRCA1 human", "max_results": 5}'
Generic Query Interface
# Query any provider via unified endpoint
curl -X POST "http://dataworker:8892/api/v1/science/query" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "uniprot",
    "endpoint": "search",
    "params": {
      "query": "insulin human",
      "format": "json",
      "size": "10"
    }
  }'
List Available Providers
# Get all configured scientific data connectors
curl -X GET "http://dataworker:8892/api/v1/science/connectors"

# Response includes provider info, rate limits, and API key status
{
  "providers": [
    {"id": "nasa", "name": "NASA", "category": "Space", "rate_limit_rpm": 1000, "available": true},
    {"id": "ncbi", "name": "NCBI", "category": "Genomics", "rate_limit_rpm": 10, "available": true},
    {"id": "cern", "name": "CERN Open Data", "category": "Physics", "rate_limit_rpm": 60, "available": true},
    ...
  ]
}

Connection Pooling

Connection Pool Architecture
AI Worker 1 Query Request AI Worker 2 Query Request AI Worker 3 Query Request Data Worker Port 8892 Request Queue License Validation Pool Management Connection Pool Conn 1 - Active Conn 2 - Active Conn 3 - Idle Conn 4 - Idle PostgreSQL Database Server Port 5432 Health Monitor Interval: 30s Auto-reconnect on failure
Pool Setting Default Description
min_connections 2 Minimum connections to keep warm
max_connections 10 Maximum concurrent connections
connection_timeout_ms 5000 Timeout waiting for available connection
idle_timeout_ms 300000 Close idle connections after 5 minutes
max_lifetime_ms 3600000 Maximum connection lifetime (1 hour)
health_check_interval_ms 30000 Health check every 30 seconds

REST API

Method Endpoint Description
GET /health Health check status
GET /info Daemon info & available connectors
GET /metrics Performance metrics
GET /api/v1/data/sources List all data sources
POST /api/v1/data/sources Add a new data source
POST /api/v1/data/query Execute SELECT query
POST /api/v1/data/execute Execute INSERT/UPDATE/DELETE
GET /api/v1/data/sources/:id/schema Get database schema

Scientific Data Endpoints

Method Endpoint Description
GET /api/v1/science/connectors List available scientific data connectors
GET /api/v1/science/nasa/apod NASA Astronomy Picture of the Day
GET /api/v1/science/nasa/neo Near Earth Objects data
GET /api/v1/science/usgs/earthquakes Recent earthquake data
GET /api/v1/science/ncbi/search Search NCBI databases (GenBank, etc.)
GET /api/v1/science/ncbi/fetch/:id Fetch sequence by accession
GET /api/v1/science/uniprot/:id Get UniProt protein entry
GET /api/v1/science/pdb/:id Get PDB structure
GET /api/v1/science/pubmed/search Search PubMed literature
GET /api/v1/science/pubchem/:compound Get compound information
GET /api/v1/science/materials/:id Materials Project data
GET /api/v1/science/cern/datasets CERN Open Data catalog
GET /api/v1/science/gwosc/events Gravitational wave events
GET /api/v1/science/noaa/weather Weather and climate data

Use Cases

1. AI-Powered Business Analytics

Allow AI assistants to query business databases and generate insights.

1 User asks question
2 AI generates SQL
3 Data Worker executes
4 AI analyzes results
# User: "What were our top 5 products last month?" # AI generates and executes: SELECT product_name, SUM(quantity) AS total FROM orders WHERE order_date >= '2024-12-01' GROUP BY product_name ORDER BY total DESC LIMIT 5;

2. Mainframe Data Integration

Connect AI workers to IBM z/OS mainframe databases for enterprise data access.

{ "id": "mainframe-db2", "type": "db2_cli", "host": "mainframe.corp.com", "port": 446, "database": "PRODDB", "location": "DSN1LOC", "is_zos": true, "use_ssl": true }

3. Multi-Database RAG Pipeline

Combine data from multiple databases for comprehensive AI context.

1 PostgreSQL (customers)
+
2 MySQL (orders)
+
3 DB2 (inventory)
4 AI Analysis

4. Real-time Data Monitoring

AI monitors database metrics and alerts on anomalies.

// Query executed every 5 minutes by AI Worker curl -X POST http://data-worker:8892/api/v1/data/query \ -H 'Content-Type: application/json' \ -d '{ "source_id": "monitoring-db", "sql": "SELECT COUNT(*) as error_count FROM logs WHERE level = '\''ERROR'\'' AND timestamp > NOW() - INTERVAL '\''5 minutes'\''", "timeout_ms": 5000 }'

Quick Start

1. Install the Data Worker

# Download from core.at/eldric/downloads # RHEL/Fedora: sudo dnf install eldric-datad-4.0.0-1.el9.x86_64.rpm # Ubuntu/Debian: sudo dpkg -i eldric-datad_4.0.0-1_amd64.deb

2. Start Standalone

# List available connectors ./eldric-datad --list-connectors # Start with a SQLite database ./eldric-datad --add-source '{"id":"local","type":"sqlite","path":"/data/app.db"}'

3. Register with Controller

# Start data worker and register with cluster ./eldric-datad -c http://controller:8880 -l database -l production \ --add-source '{"id":"warehouse","type":"postgresql","host":"db.local","database":"analytics"}'

4. Test the API

# Health check curl http://localhost:8892/health # List data sources curl http://localhost:8892/api/v1/data/sources # Execute a query curl -X POST http://localhost:8892/api/v1/data/query \ -H 'Content-Type: application/json' \ -d '{"source_id":"warehouse","sql":"SELECT * FROM customers LIMIT 10"}'

Port Reference

Component Port Protocol Description
Controller 8880 HTTP/REST Cluster management & API
Router 8881 HTTP/REST Request routing & load balancing
AI Worker 8890 HTTP/REST LLM inference (Ollama/vLLM/TGI)
Data Worker 8892 HTTP/REST Database connectivity service
Agent Worker 8893 HTTP/REST Agentic RAG orchestration
Media Worker 8894 HTTP/REST STT/TTS, Video processing
Comm Worker 8895 HTTP/REST Messaging (Email, SMS, etc.)