rpc : do not wait for response when sending RPC_CMD_SET_TENSOR (llama/12943)

RPC_CMD_SET_TENSOR always returns an empty response and we send this 4
times per token. We can improve TG speed if we don't wait for this empty
response.

The performance impact of this change depends on the network latency.
This commit is contained in:
Radoslav Gerganov
2025-04-25 10:08:08 +03:00
committed by Georgi Gerganov
parent 33bdbfbb33
commit fe21ddf0dc
2 changed files with 13 additions and 7 deletions

View File

@ -7,7 +7,7 @@
extern "C" {
#endif
#define RPC_PROTO_MAJOR_VERSION 1
#define RPC_PROTO_MAJOR_VERSION 2
#define RPC_PROTO_MINOR_VERSION 0
#define RPC_PROTO_PATCH_VERSION 0
#define GGML_RPC_MAX_SERVERS 16