It turns out that if the DB is locked, sqlite_prepare_v2() call can return
SQLITE_BUSY. The retry logic (implemented for issue #2) only provided for
sqlite_step() to return SQLITE_BUSY. It was a fairly straightforward matter to
extend the retry logic to cover statement preparation in an equally general
fashion.
The problem was observed while diagnosing failures in the rhizomeprotocol
DirectPush test case: the "servald rhizome list" command was failing due to a
locked database. See issue #9.
Must be enabled by using rhizome.api.addfile.*
Certainly polishing to be done, including using filename supplied
during HTTP POST. Now to fix that, and make it all work with
final rhizomeprotocol test case.
rhizomeprotocol test cases 8 and 9 currently fail post-merge. #9
at the end, and log2(filesize) instead of filesize. Equally
importantly BAR construction and parsing now uses #defines for
field sizes and offsets instead of it being hardwired without
meaningful documentation.
WILL BREAK BACKWARD COMPATIBILITY WITH PREVIOUS BUILDS.
YOU MUST DELETE AND REBUILD YOUR RHIZOME DATABASE AS OLD-FORMAT
BIDs WILL BE IN THERE AND GET SENT, AND STRANGE THINGS WILL HAPPEN.
This break with backwards compatibility is only reasonable to
consider because we have not yet had an official build using the
new Rhizome with old BAR format. 0.08 uses old Rhizome. #9
Objective is to avoid having to call system("servald rhizome import ...") to
handle a Rhizome direct POST /rhizome/bundle request. Antiquated code in and
around rhizome_import_bundle() needs much cleaning up, as indicated by some
TODO comments. Invocations must unnecessarily write the manifest into a file,
when they already have it in memory, ready to pass to the function.
All the 'rhizomeops' tests pass, but two 'rhizomeprotocol' tests are broken
by the changes in this commit.
Now returns series of "I have [newer]"'s and "Please send me"'s,
consisting of a 1 byte ID (0x01 or 0x02 respectively), followed
by the 64bit BID prefix from the BAR. As with all of Rhizome
Direct at present, the geo bounding box is ignored for now.
For some reason finds the same manifest several times (size bin
filtering seems to not be working right).
Also sync doesn't realise it has finished, and so doesn't return
when done.
size of associated data in a bundle, so that we can synchronise
small things first. Also preliminary work on making a general
cursor-type wrapper function for get_bars() so that it is easy
for any rhizome direct transport driver to iterate over the
known bundles in a rhizome datastore. #9
Closes#2.
Rewrite all Rhizome db query code using new retry primitives defined in
"rhizome.h": sqlite_step_retry(), sqlite_retry(), sqlite_retry_done(), etc.
Replace all calls to sqlite3_prepare_v2() with sqlite_prepare() which does
proper error logging.
Fix bug: re-invoking sqlite3_blob_close() on SQLITE_BUSY return causes process
to abort. Use an explicit BEGIN...COMMIT around the blob writing code instead.
Tested using repeated invocations of batphone/tests/meshms1.
Delete deprecated Rhizome db code in rhizome_crypto.c that has been replaced
with keyring file.
Much refactoring and removal of cruft.
SQL query errors are now logged with the filename, line number and function
where they were invoked, not of the low-level function that discovered the
error. This makes use of the new __HERE__ notation introduced last commit.
Replaces (const char *file, unsigned int line, const char *function) arguments
to all logging functions, simplifies malloc/free tracking code in
overlay_buffer.c and Rhizome manifest alloc/free tracking in rhizome_bundle.c.
Use __HERE__ macro instead of (__FILE__, __LINE__, __FUNCTION__) everywhere.
Special __NOWHERE__ macro is equivalent to (NULL, 0, NULL).
Declare net.c functions in new "net.h" header, so log.c doesn't have to pull
in the entire "serval.h" just to use write_str().
Facilitates progress on issue #2.
All the queries that used sqlite_exec_void() and sqlite_exec_int64() and
sqlite_exec_strbuf() now do a sleep-retry while the Rhizome db is locked.
There are other queries that still need conversion, and some old infinite
retry logic that needs replacing.
Add sqlite_exec_void_retry() function, use it in
rhizome_update_file_priority(). This should be reviewed to ensure that the
server process never sleeps.
The general problem remains of what the servald process should do if the
database is locked when it tries to update. Simplest solution is to sleep and
retry, but that blocks all other services and would hurt VoMP. A better
solution would be for each Rhizome operation to collect its database updates
into a single transaction and place that in a work queue that gets called using
schedule() (or even watch() if a file-descriptor event can somehow be used when
the database becomes available). Another solution is perhaps to perform all
Rhizome operations in a dedicated process that can block indefinitely on the
database without affecting servald responsiveness.
Do not add 'filehash' var to manifest if filesize=0
Do not accept 'filehash' var when parsing manifest with filesize=0
When responding to a new rhizome advertisement, do not try to HTTP
request a payload if filesize=0, just import the manifest directly
Various operations, eg "rhizome file add", do not report 'filehash'
fields where 'filesize' is zero
Do not delete rows from MANIFESTS table which have empty filehash
Various related bug fixes
Remove need to nul-terminate the received buffers in HTTP fetch reply handling
and HTTP server request parsing.
Remove redundant copying of data.
More rigorous parsing code, probably less vulnerable to overrun exploits.
Better debug logging of requests and responses.
Rhizome manifest parser now parses and validates all known fields, informs
about unsupported fields, and unpacks fields into relevant struct manifest
elements where appropriate. Is also stricter about whitespace.
Rhizome fetch code now logs debug messages if DEBUG_RHIZOME_RX bit is on.
Was not transmitting actual HTTP server port in rhizome announcements, was
always transmitting port 4110.
When trying for a free HTTP server port, sometimes bind() succeeds but listen()
fails with EADDRINUSE, so new logic to deal with that.