mirror of
https://github.com/tahoe-lafs/tahoe-lafs.git
synced 2025-01-18 18:56:28 +00:00
webapi/deep-manifest t=JSON: don't return the (large) manifest/SI/verifycap lists unless the operation has completed, to avoid the considerable CPU+memory cost of creating the JSON (for 330k dirnodes, it could take two minutes to generate 275MB of JSON). They must be paid eventually, but not on every poll
This commit is contained in:
parent
39a089dc7e
commit
7ee336b274
@ -213,6 +213,11 @@ GET /operations/$HANDLE?output=JSON (same)
|
||||
* whether the operation is complete, or if it is still running
|
||||
* how much of the operation is complete, and how much is left, if possible
|
||||
|
||||
Note that the final status output can be quite large: a deep-manifest of a
|
||||
directory structure with 300k directories and 200k unique files is about
|
||||
275MB of JSON, and might take two minutes to generate. For this reason, the
|
||||
full status is not provided until the operation has completed.
|
||||
|
||||
The HTML form will include a meta-refresh tag, which will cause a regular
|
||||
web browser to reload the status page about 60 seconds later. This tag will
|
||||
be removed once the operation has completed.
|
||||
@ -966,7 +971,10 @@ POST $DIRURL?t=start-manifest (must add &ophandle=XYZ)
|
||||
by a space.
|
||||
|
||||
If output=JSON is added to the queryargs, then the results will be a
|
||||
JSON-formatted dictionary with six keys:
|
||||
JSON-formatted dictionary with six keys. Note that because large directory
|
||||
structures can result in very large JSON results, the full results will not
|
||||
be available until the operation is complete (i.e. until output["finished"]
|
||||
is True):
|
||||
|
||||
finished (bool): if False then you must reload the page until True
|
||||
origin_si (base32 str): the storage index of the starting point
|
||||
|
@ -728,13 +728,27 @@ class ManifestResults(rend.Page, ReloadMixin):
|
||||
inevow.IRequest(ctx).setHeader("content-type", "text/plain")
|
||||
m = self.monitor
|
||||
s = m.get_status()
|
||||
status = {"manifest": s["manifest"],
|
||||
"verifycaps": list(s["verifycaps"]),
|
||||
"storage-index": list(s["storage-index"]),
|
||||
"stats": s["stats"],
|
||||
"finished": m.is_finished(),
|
||||
"origin": base32.b2a(m.origin_si),
|
||||
}
|
||||
|
||||
status = { "stats": s["stats"],
|
||||
"finished": m.is_finished(),
|
||||
"origin": base32.b2a(m.origin_si),
|
||||
}
|
||||
if m.is_finished():
|
||||
# don't return manifest/verifycaps/SIs unless the operation is
|
||||
# done, to save on CPU/memory (both here and in the HTTP client
|
||||
# who has to unpack the JSON). Tests show that the ManifestWalker
|
||||
# needs about 1092 bytes per item, the JSON we generate here
|
||||
# requires about 503 bytes per item, and some internal overhead
|
||||
# (perhaps transport-layer buffers in twisted.web?) requires an
|
||||
# additional 1047 bytes per item.
|
||||
status.update({ "manifest": s["manifest"],
|
||||
"verifycaps": [i for i in s["verifycaps"]],
|
||||
"storage-index": [i for i in s["storage-index"]],
|
||||
})
|
||||
# simplejson doesn't know how to serialize a set. We use a
|
||||
# generator that walks the set rather than list(setofthing) to
|
||||
# save a small amount of memory (4B*len) and a moderate amount of
|
||||
# CPU.
|
||||
return simplejson.dumps(status, indent=1)
|
||||
|
||||
def _si_abbrev(self):
|
||||
|
Loading…
Reference in New Issue
Block a user