001

What Got Done#

1. Project Scaffold#

Set up the Rust project with Cargo.

read : tonic-helloworld-tutorial

2. Proto Contract — inference.proto#

Defined the core service contract under the nanoinfer package:

  • Health RPC — Simple health check returning a status string.

  • Infer RPC — Server-streaming RPC. The client sends an InferRequest (prompt, max_tokens, temperature, top_p) and receives a stream of InferResponse tokens.

  • FinishReason enum — Captures why generation stopped.

  • UsageMetrics — Only sent with the final token to avoid per-token overhead. Reports prompt tokens, generated tokens, and total.

  • request_id — Client-provided correlation ID for tracing requests.

3. gRPC Service Implementation (grpc.rs)#

Implemented InferServiceImpl with a simulated inference loop:

  • Spawns a background tokio task that sends 5 tokens through a bounded mpsc channel (capacity 64).
  • Each token is sent with a 100ms delay to simulate a ggml forward pass.
  • The final token includes FinishReason::StopSequence and aggregated UsageMetrics.
  • Gracefully handles client disconnects — if tx.send() fails, the worker logs and exits.

The server binds to 0.0.0.0:50052 and is ready to accept connections.

4. Test Script (test.sh)#

Wrote a zsh test script using grpcurl for manual smoke testing:

./test.sh health     # → {"status": "OK"}
./test.sh infer      # → streams 5 tokens with usage stats
./test.sh infer "Custom prompt" 20

Challenges#

  • Streaming response design — Chose mpsc::channel over tokio_stream::iter because it naturally supports backpressure and allows the inference worker to run independently of the gRPC send loop.

Architecture So Far#

┌─────────────┐       gRPC/HTTP2        ┌──────────────────┐
│   Client    │ ──────────────────────► │  nanoinfer       │
│  (grpcurl)  │ ◄── stream of tokens ── │  :50052          │
└─────────────┘                         │                  │
                                        │  InferService    │
                                        │  ├─ Health()     │
                                        │  └─ Infer()      │
                                        │     └─ tokio::   │
                                        │        spawn()   │
                                        └──────────────────┘