Ensure a race condition while starting servald only starts one process

2025-04-14 22:26:44 +00:00 · 2014-06-06 15:49:16 +09:30 · 2014-06-06 15:49:16 +09:30 · afd31fe12c
commit afd31fe12c
parent 0b0e4cc8b4
5 changed files with 79 additions and 147 deletions
--- a/overlay.c
+++ b/overlay.c
@ -1,138 +0,0 @@
-/*
-Serval Distributed Numbering Architecture (DNA)
-Copyright (C) 2010 Paul Gardner-Stephen
- 
-This program is free software; you can redistribute it and/or
-modify it under the terms of the GNU General Public License
-as published by the Free Software Foundation; either version 2
-of the License, or (at your option) any later version.
- 
-This program is distributed in the hope that it will be useful,
-but WITHOUT ANY WARRANTY; without even the implied warranty of
-MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
-GNU General Public License for more details.
- 
-You should have received a copy of the GNU General Public License
-along with this program; if not, write to the Free Software
-Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
-*/
-
-/*
-  Serval Overlay Mesh Network.
-
-  Basically we use UDP broadcast to send link-local, and then implement a BATMAN-like protocol over the top of that.
-  
-  Each overlay packet can contain one or more encapsulated packets each addressed using Serval DNA SIDs, with source, 
-  destination and next-hop addresses.
-
-  The use of an overlay also lets us be a bit clever about using irregular transports, such as an ISM915 modem attached via ethernet
-  (which we are planning to build in coming months), by paring off the IP and UDP headers that would otherwise dominate.  Even on
-  regular WiFi and ethernet we can aggregate packets in a way similar to IAX, but not just for voice frames.
-
-  The use of long (relative to IPv4 or even IPv6) 256 bit Curve25519 addresses means that it is a really good idea to
-  have neighbouring nodes exchange lists of peer aliases so that addresses can be summarised, possibly using less space than IPv4
-  would have.
-  
-  One approach to handle address shortening is to have the periodic TTL=255 BATMAN-style hello packets include an epoch number.  
-  This epoch number can be used by immediate neighbours of the originator to reference the neighbours listed in that packet by
-  their ordinal position in the packet instead of by their full address.  This gets us address shortening to 1 byte in most cases 
-  in return for no new packets, but the periodic hello packets will now be larger.  We might deal with this issue by having these
-  hello packets reference the previous epoch for common neighbours.  Unresolved neighbour addresses could be resolved by a simple
-  DNA request, which should only need to occur ocassionally, and other link-local neighbours could sniff and cache the responses
-  to avoid duplicated traffic.  Indeed, during quiet times nodes could preemptively advertise address resolutions if they wished,
-  or similarly advertise the full address of a few (possibly randomly selected) neighbours in each epoch.
-
-  Byzantine Robustness is a goal, so we have to think about all sorts of malicious failure modes.
-
-  One approach to help byzantine robustness is to have multiple signature shells for each hop for mesh topology packets.
-  Thus forging a report of closeness requires forging a signature.  As such frames are forwarded, the outermost signature
-  shell is removed. This is really only needed for more paranoid uses.
-
-  We want to have different traffic classes for voice/video calls versus regular traffic, e.g., MeshMS frames.  Thus we need to have
-  separate traffic queues for these items.  Aside from allowing us to prioritise isochronous data, it also allows us to expire old
-  isochronous frames that are in-queue once there is no longer any point delivering them (e.g after holding them more than 200ms).
-  We can also be clever about round-robin fair-sharing or even prioritising among isochronous streams.  Since we also know about the
-  DNA isochronous protocols and the forward error correction and other redundancy measures we also get smart about dropping, say, 1 in 3
-  frames from every call if we know that this can be safely done.  That is, when traffic is low, we maximise redundancy, and when we
-  start to hit the limit of traffic, we start to throw away some of the redundancy.  This of course relies on us knowing when the
-  network channel is getting too full.
-
-  Smart-flooding of broadcast information is also a requirement.  The long addresses help here, as we can make any address that begins
-  with the first 192 bits all ones be broadcast, and use the remaining 64 bits as a "broadcast packet identifier" (BPI).  
-  Nodes can remember recently seen BPIs and not forward broadcast frames that have been seen recently.  This should get us smart flooding
-  of the majority of a mesh (with some node mobility issues being a factor).  We could refine this later, but it will do for now, especially
-  since for things like number resolution we are happy to send repeat requests.
-
-  This file currently seems to exist solely to contain this introduction, which is fine with me. Functions land in here until their
-  proper place becomes apparent.
-  
-*/
-
-#include "serval.h"
-#include "conf.h"
-#include "rhizome.h"
-#include "httpd.h"
-#include "strbuf.h"
-#include "keyring.h"
-#include "overlay_interface.h"
-#include "server.h"
-
-keyring_file *keyring=NULL;
-
-/* The caller must set up the keyring before calling this function, and the keyring must contain at
- * least one identity, otherwise MDP and routing will not work.
- */
-int overlayServerMode()
-{
-  IN();
-
-  /* Setup up client API sockets before writing our PID file
-     We want clients to be able to connect to our sockets as soon 
-     as servald start has returned. But we don't want servald start
-     to take very long. 
-     Try to perform only minimal CPU or IO processing here.
-  */
-  overlay_mdp_setup_sockets();
-  monitor_setup_sockets();
-  // start the HTTP server if enabled
-  httpd_server_start(HTTPD_PORT, HTTPD_PORT_MAX);    
- 
-  /* record PID file so that servald start can return */
-  if (server_write_pid())
-    RETURN(-1);
-  
-  /* For testing, it can be very helpful to delay the start of the server process, for example to
-   * check that the start/stop logic is robust.
-   */
-  const char *delay = getenv("SERVALD_SERVER_START_DELAY");
-  if (delay){
-    time_ms_t milliseconds = atoi(delay);
-    INFOF("Sleeping for %"PRId64" milliseconds", (int64_t) milliseconds);
-    sleep_ms(milliseconds);
-  }
-  overlay_queue_init();
-  
-  time_ms_t now = gettime_ms();
-  
-  // Periodically check for server shut down
-  RESCHEDULE(&ALARM_STRUCT(server_shutdown_check), now, now+30000, now);
-  
-  overlay_mdp_bind_internal_services();
-  
-  olsr_init_socket();
-
-  /* Calculate (and possibly show) CPU usage stats periodically */
-  RESCHEDULE(&ALARM_STRUCT(fd_periodicstats), now+3000, now+30000, TIME_MS_NEVER_WILL);
-
-  cf_on_config_change();
-  
-  // log message used by tests to wait for the server to start
-  INFO("Server initialised, entering main loop");
-  
-  /* Check for activitiy and respond to it */
-  while((serverMode==1) && fd_poll());
-  
-  serverCleanUp();
-  RETURN(0);
-  OUT();
-}
--- a/overlay_mdp.c
+++ b/overlay_mdp.c
@ -170,12 +170,16 @@ int overlay_mdp_setup_sockets()

  if (mdp_sock.poll.fd == -1) {
    mdp_sock.poll.fd = mdp_bind_socket("mdp.socket");
+    if (mdp_sock.poll.fd == -1)
+      return -1;
    mdp_sock.poll.events = POLLIN;
    watch(&mdp_sock);
  }
  
  if (mdp_sock2.poll.fd == -1) {
    mdp_sock2.poll.fd = mdp_bind_socket("mdp.2.socket");
+    if (mdp_sock2.poll.fd == -1)
+      return -1;
    mdp_sock2.poll.events = POLLIN;
    watch(&mdp_sock2);
  }
@ -214,8 +218,10 @@ int overlay_mdp_setup_sockets()
 	  WHY_perror("bind");
      }
      
-      if (fd!=-1)
+      if (fd!=-1){
 	close(fd);
+	return -1;
+      }
    }
  }
  return 0;
@ -1107,7 +1113,6 @@ int overlay_mdp_address_list(struct overlay_mdp_addrlist *request, struct overla

 struct routing_state{
  struct socket_address *client;
-  int fd;
 };

 static int routing_table(struct subscriber *subscriber, void *context)
--- a/server.c
+++ b/server.c
@ -33,11 +33,14 @@ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
 #include "overlay_interface.h"
 #include "overlay_packet.h"
 #include "server.h"
+#include "keyring.h"

 #define PROC_SUBDIR	  "proc"
 #define PIDFILE_NAME	  "servald.pid"
 #define STOPFILE_NAME	  "servald.stop"

+keyring_file *keyring=NULL;
+
 static char pidfile_path[256];

 static int server_getpid = 0;
@ -101,7 +104,59 @@ int server()
  sigaction(SIGHUP, &sig, NULL);
  sigaction(SIGINT, &sig, NULL);

-  overlayServerMode();
+  /* Setup up client API sockets before writing our PID file
+     We want clients to be able to connect to our sockets as soon 
+     as servald start has returned. But we don't want servald start
+     to take very long. 
+     Try to perform only minimal CPU or IO processing here.
+  */
+  if (overlay_mdp_setup_sockets()==-1)
+    RETURN(-1);
+  
+  if (monitor_setup_sockets()==-1)
+    RETURN(-1);
+  
+  // start the HTTP server if enabled
+  if (httpd_server_start(HTTPD_PORT, HTTPD_PORT_MAX)==-1)
+    RETURN(-1);
+ 
+  /* For testing, it can be very helpful to delay the start of the server process, for example to
+   * check that the start/stop logic is robust.
+   */
+  const char *delay = getenv("SERVALD_SERVER_START_DELAY");
+  if (delay){
+    time_ms_t milliseconds = atoi(delay);
+    INFOF("Sleeping for %"PRId64" milliseconds", (int64_t) milliseconds);
+    sleep_ms(milliseconds);
+  }
+  
+  /* record PID file so that servald start can return */
+  if (server_write_pid())
+    RETURN(-1);
+  
+  overlay_queue_init();
+  
+  time_ms_t now = gettime_ms();
+  
+  // Periodically check for server shut down
+  RESCHEDULE(&ALARM_STRUCT(server_shutdown_check), now, now+30000, now);
+  
+  overlay_mdp_bind_internal_services();
+  
+  olsr_init_socket();
+
+  /* Calculate (and possibly show) CPU usage stats periodically */
+  RESCHEDULE(&ALARM_STRUCT(fd_periodicstats), now+3000, now+30000, TIME_MS_NEVER_WILL);
+
+  cf_on_config_change();
+  
+  // log message used by tests to wait for the server to start
+  INFO("Server initialised, entering main loop");
+  
+  /* Check for activitiy and respond to it */
+  while((serverMode==1) && fd_poll());
+  
+  serverCleanUp();

  RETURN(0);
  OUT();
@ -322,6 +377,7 @@ void cf_on_config_change()
 DEFINE_ALARM(server_shutdown_check);
 void server_shutdown_check(struct sched_ent *alarm)
 {
+  // TODO we should watch a descriptor and quit when it closes
  /* If this server has been supplanted with another or Serval has been uninstalled, then its PID
      file will change or be unaccessible.  In this case, shut down without all the cleanup.
      Perform this check at most once per second.  */
--- a/sourcefiles.mk
+++ b/sourcefiles.mk
@ -65,7 +65,6 @@ SERVAL_DAEMON_SOURCES = \
 	monitor-client.c \
 	monitor-cli.c \
 	nonce.c \
-	overlay.c \
 	overlay_address.c \
 	overlay_buffer.c \
 	overlay_interface.c \
--- a/tests/server
+++ b/tests/server
@ -106,14 +106,25 @@ test_StartStart() {
   assert [ "$servald_pid" = "$start_pid" ]
 }

-doc_StartStopFast="Stop server before it finishes starting"
-setup_StartStopFast() {
+doc_StartTwice="Attempt to start the server twice at the same time"
+setup_StartTwice() {
   setup
-   export SERVALD_SERVER_START_DELAY=10000
+   export SERVALD_SERVER_START_DELAY=2000
+   set_instance +A
+   executeOk_servald config set debug.io on
 }
-test_StartStopFast() {
+start_other(){
   start_servald_server
+   echo $servald_pid > other_pid
+}
+test_StartTwice() {
+   fork %server start_other
+   start_servald_server
+   fork_wait %server
+   # both servald start commands should return success with the same PID
+   assertGrep other_pid "^$servald_pid$"
   stop_servald_server
+   assert_no_servald_processes
 }

 doc_RemovePid="Server stops when pid file removed"
@ -124,7 +135,6 @@ setup_RemovePid() {
 test_RemovePid() {
   rm $instance_servald_pidfile
   wait_until ! kill -0 $servald_pid 2>/dev/null
-
 }

 doc_NoZombie="Server process does not become a zombie"