From 5981352bb5189b062c90b9953b8ceb8ca1da4b5e Mon Sep 17 00:00:00 2001 From: Karol Kontny <82021046+kkontny@users.noreply.github.com> Date: Sat, 8 Feb 2025 15:30:53 +0100 Subject: [PATCH] ggml: Fix data race in ggml threadpool (llama/11736) After the barrier in last iteration is executed, still the loop termination condition will be executed. However main thread can destroy the cgraph object and its nodes already, then another thread will access it, but the thing is already gone. Also trouble can happen when n_nodes == 0 or abort is called, but I'm not sure if the prior situation is possible. Last syncronization should be done after the loop to ensure the cgraph/cplan won't be accessed after the main thread exits from the function. --- ggml/src/ggml-cpu/ggml-cpu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/ggml/src/ggml-cpu/ggml-cpu.c b/ggml/src/ggml-cpu/ggml-cpu.c index 59efaeb7..fdb430a4 100644 --- a/ggml/src/ggml-cpu/ggml-cpu.c +++ b/ggml/src/ggml-cpu/ggml-cpu.c @@ -13856,9 +13856,13 @@ static thread_ret_t ggml_graph_compute_thread(void * data) { tp->ec = GGML_STATUS_ABORTED; } - ggml_barrier(state->threadpool); + if (node_n + 1 < cgraph->n_nodes) { + ggml_barrier(state->threadpool); + } } + ggml_barrier(state->threadpool); + return 0; }