Radoslav Gerganov
6cc3b022ee
llama : offload to RPC in addition to other backends (llama/7640)
...
* llama : offload to RPC in addition to other backends
* - fix copy_tensor being called on the src buffer instead of the dst buffer
- always initialize views in the view_src buffer
- add RPC backend to Makefile build
- add endpoint to all RPC object names
* add rpc-server to Makefile
* Update llama.cpp
Co-authored-by: slaren <slarengh@gmail.com>
---------
Co-authored-by: slaren <slarengh@gmail.com>
2024-06-16 18:19:48 +03:00
Radoslav Gerganov
39b0640b09
rpc : resource management rework (llama/7562)
...
* rpc : resource management rework
* address review comments
2024-06-16 18:19:48 +03:00
Radoslav Gerganov
caeeb32b41
rpc : track allocated buffers (llama/7411)
...
* rpc : track allocated buffers
ref: #7407
* rpc : pack rpc_tensor tightly
2024-06-16 18:19:48 +03:00
Radoslav Gerganov
77d708fabb
rpc : set SO_REUSEADDR for the server socket (llama/7320)
...
ref: #7293
2024-06-16 18:19:48 +03:00
Radoslav Gerganov
7bd69349bf
rpc : add command line arg for specifying backend memory
...
ref: #7293
2024-06-16 18:19:48 +03:00
Radoslav Gerganov
c451080c8b
ggml : add RPC backend (llama/6829)
...
* ggml : add RPC backend
The RPC backend proxies all operations to a remote server which runs a
regular backend (CPU, CUDA, Metal, etc).
* set TCP_NODELAY
* add CI workflows
* Address review comments
* fix warning
* implement llama_max_devices() for RPC
* Address review comments
* Address review comments
* wrap sockfd into a struct
* implement get_alignment and get_max_size
* add get_device_memory
* fix warning
* win32 support
* add README
* readme : trim trailing whitespace
* Address review comments
* win32 fix
* Address review comments
* fix compile warnings on macos
2024-05-14 19:16:29 +03:00