001

February 26, 2026

What Got Done#

1. Project Scaffold#

Set up the Rust project with Cargo.

read : tonic-helloworld-tutorial

2. Proto Contract — `inference.proto`#

Defined the core service contract under the nanoinfer package:

Health RPC — Simple health check returning a status string.
Infer RPC — Server-streaming RPC. The client sends an InferRequest (prompt, max_tokens, temperature, top_p) and receives a stream of InferResponse tokens.
FinishReason enum — Captures why generation stopped.
UsageMetrics — Only sent with the final token to avoid per-token overhead. Reports prompt tokens, generated tokens, and total.
request_id — Client-provided correlation ID for tracing requests.

3. gRPC Service Implementation (`grpc.rs`)#

Implemented InferServiceImpl with a simulated inference loop:

Spawns a background tokio task that sends 5 tokens through a bounded mpsc channel (capacity 64).
Each token is sent with a 100ms delay to simulate a ggml forward pass.
The final token includes FinishReason::StopSequence and aggregated UsageMetrics.
Gracefully handles client disconnects — if tx.send() fails, the worker logs and exits.

The server binds to 0.0.0.0:50052 and is ready to accept connections.

4. Test Script (`test.sh`)#

Wrote a zsh test script using grpcurl for manual smoke testing:

./test.sh health     # → {"status": "OK"}
./test.sh infer      # → streams 5 tokens with usage stats
./test.sh infer "Custom prompt" 20

Challenges#

Streaming response design — Chose mpsc::channel over tokio_stream::iter because it naturally supports backpressure and allows the inference worker to run independently of the gRPC send loop.

Architecture So Far#

┌─────────────┐       gRPC/HTTP2        ┌──────────────────┐
│   Client    │ ──────────────────────► │  nanoinfer       │
│  (grpcurl)  │ ◄── stream of tokens ── │  :50052          │
└─────────────┘                         │                  │
                                        │  InferService    │
                                        │  ├─ Health()     │
                                        │  └─ Infer()      │
                                        │     └─ tokio::   │
                                        │        spawn()   │
                                        └──────────────────┘