INTERNAL — Protobuf Serialization Mismatch
증상
- gRPC calls fail with `StatusCode.INTERNAL` and message: "failed to unmarshal the received message" or "unexpected wire type"
- Error started after updating the `.proto` file on one side (client or server) but not yet on the other
- Some fields contain garbage data or default values instead of the expected content
- The error is intermittent during a rolling deployment because old and new server instances serve different clients
- `grpcurl` with the old proto definition can call the service; with the new definition it fails
- Error started after updating the `.proto` file on one side (client or server) but not yet on the other
- Some fields contain garbage data or default values instead of the expected content
- The error is intermittent during a rolling deployment because old and new server instances serve different clients
- `grpcurl` with the old proto definition can call the service; with the new definition it fails
근본 원인
- Client and server are using different versions of the .proto file
- A field number was reused after deleting a field (the old wire format collides with the new field's type)
- Incompatible field type change on the same field number (e.g., changing `int32 user_id = 1` to `string user_id = 1`)
- Generated stub code not regenerated after modifying the .proto file
- Rolling deployment with mixed proto versions where some instances use old generated code and others use new
진단
**Step 1 — Verify proto versions are in sync**
```bash
# Check the proto file hash on client and server
sha256sum api/proto/myservice.proto
# Must be identical on both sides
# Or use buf to detect breaking changes
buf breaking --against '.git#branch=main'
```
**Step 2 — Inspect field number history**
```bash
# Use git blame to see if a field number was reused
git log -p api/proto/myservice.proto | grep '^[-+].*= [0-9]\+'
# Look for a field number that appears in a deletion AND an addition
```
**Step 3 — Reproduce with grpcurl using mismatched protos**
```bash
# Decode the raw bytes with the old proto to see what fields are being sent
grpcurl -proto old/myservice.proto \
-d '{"user_id": 42}' \
localhost:50051 mypackage.MyService/GetUser
grpcurl -proto new/myservice.proto \
-d '{"user_id": "user-42"}' \
localhost:50051 mypackage.MyService/GetUser
```
**Step 4 — Check for stale generated code**
```bash
# Regenerate stubs and check for changes
make proto-gen
git diff -- '*_pb2*.py' '*_pb2*.go'
# If generated files changed, the stubs were stale
```
```bash
# Check the proto file hash on client and server
sha256sum api/proto/myservice.proto
# Must be identical on both sides
# Or use buf to detect breaking changes
buf breaking --against '.git#branch=main'
```
**Step 2 — Inspect field number history**
```bash
# Use git blame to see if a field number was reused
git log -p api/proto/myservice.proto | grep '^[-+].*= [0-9]\+'
# Look for a field number that appears in a deletion AND an addition
```
**Step 3 — Reproduce with grpcurl using mismatched protos**
```bash
# Decode the raw bytes with the old proto to see what fields are being sent
grpcurl -proto old/myservice.proto \
-d '{"user_id": 42}' \
localhost:50051 mypackage.MyService/GetUser
grpcurl -proto new/myservice.proto \
-d '{"user_id": "user-42"}' \
localhost:50051 mypackage.MyService/GetUser
```
**Step 4 — Check for stale generated code**
```bash
# Regenerate stubs and check for changes
make proto-gen
git diff -- '*_pb2*.py' '*_pb2*.go'
# If generated files changed, the stubs were stale
```
해결
**Fix 1 — Regenerate and redeploy stub code**
```bash
# Python
python -m grpc_tools.protoc \
-I ./proto \
--python_out=./generated \
--grpc_python_out=./generated \
./proto/myservice.proto
# Go
protoc --go_out=. --go-grpc_out=. proto/myservice.proto
```
Deploy the regenerated stubs on all services simultaneously.
**Fix 2 — Safely add new fields (field numbers are always additive)**
```protobuf
// SAFE: add a new field with a new field number
message GetUserResponse {
string name = 1;
string email = 2; // existing
int64 created_at = 3; // new — old clients ignore it, new clients read it
}
```
**Fix 3 — Reserve deleted field numbers to prevent reuse**
```protobuf
message GetUserResponse {
reserved 2; // was 'phone_number', never reuse field 2
reserved 'phone_number'; // also reserve the name
string name = 1;
string email = 3; // use a new number, not 2
}
```
**Fix 4 — Multi-phase deployment for breaking proto changes**
1. Deploy server with both old and new field support
2. Deploy all clients to use the new fields
3. Remove old field support from server after all clients are updated
```bash
# Python
python -m grpc_tools.protoc \
-I ./proto \
--python_out=./generated \
--grpc_python_out=./generated \
./proto/myservice.proto
# Go
protoc --go_out=. --go-grpc_out=. proto/myservice.proto
```
Deploy the regenerated stubs on all services simultaneously.
**Fix 2 — Safely add new fields (field numbers are always additive)**
```protobuf
// SAFE: add a new field with a new field number
message GetUserResponse {
string name = 1;
string email = 2; // existing
int64 created_at = 3; // new — old clients ignore it, new clients read it
}
```
**Fix 3 — Reserve deleted field numbers to prevent reuse**
```protobuf
message GetUserResponse {
reserved 2; // was 'phone_number', never reuse field 2
reserved 'phone_number'; // also reserve the name
string name = 1;
string email = 3; // use a new number, not 2
}
```
**Fix 4 — Multi-phase deployment for breaking proto changes**
1. Deploy server with both old and new field support
2. Deploy all clients to use the new fields
3. Remove old field support from server after all clients are updated
예방
- Run `buf breaking` in CI on every PR to catch breaking proto changes before they reach production
- Always reserve deleted field numbers with `reserved N;` in the proto file
- Never change the type of an existing field — add a new field with a new number instead
- Automate proto stub regeneration in your build pipeline so stale generated code is impossible to deploy
- Use a proto registry (Buf Schema Registry, internal git repo) as the single source of truth, with versioned releases
- Always reserve deleted field numbers with `reserved N;` in the proto file
- Never change the type of an existing field — add a new field with a new number instead
- Automate proto stub regeneration in your build pipeline so stale generated code is impossible to deploy
- Use a proto registry (Buf Schema Registry, internal git repo) as the single source of truth, with versioned releases