Commit Graph

64 Commits

Author SHA1 Message Date
Brian Warner
437de4340b CheckResults: privatize remaining attributes 2012-06-02 11:39:10 -07:00
Brian Warner
0fcc054a61 CheckResults: use fat init, add type-checking assertions
Added assertions for sharemap, servermap, servers_responding,
list_corrupt_shares, and list_incompatible_shares.
2012-06-02 11:39:10 -07:00
Brian Warner
4867dca3f0 use the new CheckResult getters almost everywhere
The remaining get_data() calls are either in
web.check_results.json_check_results(), or functioning as repr()s in
various unit test failure cases.
2012-06-02 11:39:10 -07:00
Brian Warner
ccfcd4de37 change CheckResults to use a fat set_data()
i.e. change set_data() to accept lots of parameters, instead of taking
a single dictionary with lots of keys. Also Convert all CheckResults
creators to use it.
2012-06-02 11:39:10 -07:00
Brian Warner
e313cf6406 CheckResults: start hiding .data, first step to clean it up
The goal is to make CheckResults more strongly typed, and remove the
ambiguous ".data" field in favor of a bunch of specific counters and
sharelists, so I can changes .sharemap and .servermap to use IServer
instances instead of string serverids. By cleaning this up first, I hope
to get that task done with less debugging.
2012-06-02 11:39:09 -07:00
Brian Warner
17c5384f79 immutable.CiphertextFileNode.check_and_repair: simplify for refactoring
There were too many nested functions here, making some upcoming changes
too difficult, so let's refactor it first.
2012-06-02 11:39:09 -07:00
Brian Warner
3a1c02cfdf change UploadResults to return IServers, update users to match
This finally changes all callers of get_servermap()/get_sharemap() to
accept IServers, and changes UploadResults to provide them.
2012-05-21 21:18:37 -07:00
Brian Warner
29b11531b5 switch UploadResults to use getters, hide internal data, for all but .uri
This hides attributes with e.g. _sharemap, and creates getters like
get_sharemap() to access them, for every field except .uri . This will
make it easier to modify the internal representation of .sharemap
without requiring callers to adjust quite yet.

".uri" has so many users that it seemed better to update it in a
subsequent patch.
2012-05-21 21:14:28 -07:00
Brian Warner
9acf5beebd immutable repairer: populate servers-responding properly
If a server did not respond to the pre-repair filecheck, but did respond
to the repair, that server was not correctly added to the
RepairResults.data["servers-responding"] list. (This resulted from a
buggy usage of DictOfSets.union() in filenode.py).

In addition, servers to which filecheck queries were sent, but did not
respond, were incorrectly added to the servers-responding list
anyawys. (This resulted from code in the checker.py not paying attention
to the 'responded' flag).

The first bug was neatly masked by the second: it's pretty rare to have
a server suddenly start responding in the one-second window between a
filecheck and a subsequent repair, and if the server was around for the
filecheck, you'd never notice the problem. I only spotted the smelly
code while I was changing it for IServer cleanup purposes.

I added coverage to test_repairer.py for this. Trying to get that test
to fail before fixing the first bug is what led me to discover the
second bug. I also had to update test_corrupt_file_verno, since it was
incorrectly asserting that 10 servers responded, when in fact one of
them throws an error (but the second bug was causing it to be reported
anyways).
2012-05-16 16:55:09 -07:00
Brian Warner
fcc7e64759 Doc updates and cosmetic fixes for #1115 patch.
Removes the caveat from webapi.txt about count-good-share-hosts being wrong.

This series should close #1115.
2012-05-13 01:15:50 -07:00
Andrew Miller
04eb6086ad Fixed an error in previous commit where an empty servermap would throw an exception in 'count-good-share-hosts'. Augmented unit test.
Signed-off-by: Andrew Miller <amiller@dappervision.com>
2012-05-13 00:59:30 -07:00
Andrew Miller
4b7f34d179 Added tests for count-good-share-hosts under check and repair conditions. Patched the incorrect computation in immutable/filenode.py
Signed-off-by: Andrew Miller <amiller@dappervision.com>

Fixed missing import statements

Signed-off-by: Andrew Miller <amiller@dappervision.com>
2012-05-13 00:59:30 -07:00
Brian Warner
7da112ccef Rewrite download-status-timeline visualizer ('viz') with d3.js
* use d3.js v2.4.6
* add a "toggle misc events" button, to get hash/bitmap-checking details
* only draw data that's on screen, for speed
* add fragment-arg to fetch timeline data.json from somewhere else
2011-10-31 23:18:21 -07:00
Kevan Carstensen
c413a8fae1 immutable/filenode: fix pyflakes warnings 2011-08-06 17:45:14 -07:00
Kevan Carstensen
edf9858fb8 immutable/filenode: implement unified filenode interface 2011-08-01 19:09:05 -07:00
Brian Warner
fc5c2208fb prepare for viz: improve DownloadStatus events
consolidate IDownloadStatusHandlingConsumer stuff into DownloadNode
2011-06-29 15:25:42 -07:00
Brian Warner
ffd296fc5a Refactor StorageFarmBroker handling of servers
Pass around IServer instance instead of (peerid, rref) tuple. Replace
"descriptor" with "server". Other replacements:

 get_all_servers -> get_connected_servers/get_known_servers
 get_servers_for_index -> get_servers_for_psi (now returns IServers)

This change still needs to be pushed further down: lots of code is now
getting the IServer and then distributing (peerid, rref) internally.
Instead, it ought to distribute the IServer internally and delay
extracting a serverid or rref until the last moment.

no_network.py was updated to retain parallelism.
2011-02-20 17:58:04 -08:00
Brian Warner
5267fc17fe immutable/filenode.py: put off DownloadStatus creation until first read() call
This avoids spamming the "recent uploads and downloads" /status page from
FileNode instances that were created for a directory read but which nobody is
ever going to read from. I also cleaned up the way DownloadStatus instances
are made to only ever do it in the CiphertextFileNode, not in the
higher-level plaintext FileNode. Also fixed DownloadStatus handling of read
size, thanks to David-Sarah for the catch.
2010-08-09 15:50:55 -07:00
Brian Warner
2a05aa2d91 lazily create DownloadNode upon first read()/get_segment() 2010-08-04 00:28:08 -07:00
Brian Warner
797828f47f Rewrite immutable downloader (#798). This patch rearranges the rest of src/allmydata/immutable/ . 2010-08-04 00:26:39 -07:00
david-sarah
973f0afdd3 Change direct accesses to an_uri.storage_index to calls to .get_storage_index() (fixes #948) 2010-02-21 18:45:04 -08:00
david-sarah
6057bc02cc Prevent mutable objects from being retrieved from an immutable directory, and associated forward-compatibility improvements. 2010-01-26 22:44:30 -08:00
Brian Warner
96834da0a2 Simplify immutable download API: use just filenode.read(consumer, offset, size)
* remove Downloader.download_to_data/download_to_filename/download_to_filehandle
* remove download.Data/FileName/FileHandle targets
* remove filenode.download/download_to_data/download_to_filename methods
* leave Downloader.download (the whole Downloader will go away eventually)
* add util.consumer.MemoryConsumer/download_to_data, for convenience
  (this is mostly used by unit tests, but it gets used by enough non-test
   code to warrant putting it in allmydata.util)
* update tests
* removes about 180 lines of code. Yay negative code days!

Overall plan is to rewrite immutable/download.py and leave filenode.read() as
the sole read-side API.
2009-12-01 17:53:30 -05:00
Brian Warner
0cf320c2ab interface name cleanups: IFileNode, IImmutableFileNode, IMutableFileNode
The proper hierarchy is:
 IFilesystemNode
 +IFileNode
 ++IMutableFileNode
 ++IImmutableFileNode
 +IDirectoryNode

Also expand test_client.py (NodeMaker) to hit all IFilesystemNode types.
2009-11-19 23:52:55 -08:00
Brian Warner
d2badbea78 class name cleanups: s/FileNode/ImmutableFileNode/
also fix test/bench_dirnode.py for recent dirnode changes
2009-11-19 23:22:39 -08:00
Brian Warner
e046744f40 make get_size/get_current_size consistent for all IFilesystemNode classes
* stop caching most_recent_size in dirnode, rely upon backing filenode for it
* start caching most_recent_size in MutableFileNode
* return None when you don't know, not "?"
* only render None as "?" in the web "more info" page
* add get_size/get_current_size to UnknownNode
2009-11-18 11:16:24 -08:00
Brian Warner
131e05b155 clean up uri-vs-cap terminology, emphasize cap instances instead of URI strings
* "cap" means a python instance which encapsulates a filecap/dircap (uri.py)
 * "uri" means a string with a "URI:" prefix
 * FileNode instances are created with (and retain) a cap instance, and
   generate uri strings on demand
 * .get_cap/get_readcap/get_verifycap/get_repaircap return cap instances
 * .get_uri/get_readonly_uri return uri strings

* add filenode.download_to_filename() for control.py, should find a better way
* use MutableFileNode.init_from_cap, not .init_from_uri
* directory URI instances: use get_filenode_cap, not get_filenode_uri
* update/cleanup bench_dirnode.py to match, add Makefile target to run it
2009-11-11 14:26:19 -08:00
Brian Warner
0d5dc51617 Overhaul IFilesystemNode handling, to simplify tests and use POLA internally.
* stop using IURI as an adapter
* pass cap strings around instead of URI instances
* move filenode/dirnode creation duties from Client to new NodeMaker class
* move other Client duties to KeyGenerator, SecretHolder, History classes
* stop passing Client reference to dirnode/filenode constructors
  - pass less-powerful references instead, like StorageBroker or Uploader
* always create DirectoryNodes by wrapping a filenode (mutable for now)
* remove some specialized mock classes from unit tests

Detailed list of changes (done one at a time, then merged together)

always pass a string to create_node_from_uri(), not an IURI instance
always pass a string to IFilesystemNode constructors, not an IURI instance
stop using IURI() as an adapter, switch on cap prefix in create_node_from_uri()
client.py: move SecretHolder code out to a separate class
test_web.py: hush pyflakes
client.py: move NodeMaker functionality out into a separate object
LiteralFileNode: stop storing a Client reference
immutable Checker: remove Client reference, it only needs a SecretHolder
immutable Upload: remove Client reference, leave SecretHolder and StorageBroker
immutable Repairer: replace Client reference with StorageBroker and SecretHolder
immutable FileNode: remove Client reference
mutable.Publish: stop passing Client
mutable.ServermapUpdater: get StorageBroker in constructor, not by peeking into Client reference
MutableChecker: reference StorageBroker and History directly, not through Client
mutable.FileNode: removed unused indirection to checker classes
mutable.FileNode: remove Client reference
client.py: move RSA key generation into a separate class, so it can be passed to the nodemaker
move create_mutable_file() into NodeMaker
test_dirnode.py: stop using FakeClient mockups, use NoNetworkGrid instead. This simplifies the code, but takes longer to run (17s instead of 6s). This should come down later when other cleanups make it possible to use simpler (non-RSA) fake mutable files for dirnode tests.
test_mutable.py: clean up basedir names
client.py: move create_empty_dirnode() into NodeMaker
dirnode.py: get rid of DirectoryNode.create
remove DirectoryNode.init_from_uri, refactor NodeMaker for customization, simplify test_web's mock Client to match
stop passing Client to DirectoryNode, make DirectoryNode.create_with_mutablefile the normal DirectoryNode constructor, start removing client from NodeMaker
remove Client from NodeMaker
move helper status into History, pass History to web.Status instead of Client
test_mutable.py: fix minor typo
2009-08-15 04:28:46 -07:00
Zooko O'Whielacronx
22d390acbb immutable: base32-encode the keys to generate cache filenames that will work on all platforms 2009-07-08 08:26:33 -07:00
Zooko O'Whielacronx
c0d1e7deae directories: make initialization of the download cache lazy
If you open up a directory containing thousands of files, it currently computes the cache filename and checks for the cache file on disk immediately for each immutble file in that directory.  With this patch, it delays those steps until you try to do something with an immutable file that could use the cache.
2009-07-07 17:40:40 -07:00
Brian Warner
711c09bc5d clean up storage_broker interface: should fix #732 2009-06-21 16:51:19 -07:00
Brian Warner
c9803d5217 switch all foolscap imports to use foolscap.api or foolscap.logging 2009-05-21 17:38:23 -07:00
Brian Warner
bce4a5385b add --add-lease to 'tahoe check', 'tahoe deep-check', and webapi. 2009-02-17 19:32:43 -07:00
Brian Warner
fde2289e7b CLI #590: convert 'tahoe deep-check' to streaming form, improve display, add tests 2009-02-17 17:15:11 -07:00
Brian Warner
d8b3505cf5 filenode: add get_repair_cap(), which uses the read-write filecap for immutable files, and the verifycap for immutable files 2009-01-22 21:38:36 -07:00
Brian Warner
3920e6d1e7 immutable/download.py move recent-downloads history out of Downloader and into a separate class. upload/etc will follow soon. 2009-01-14 16:14:24 -07:00
Brian Warner
bf56e2bb51 deep-check-and-repair: improve results and their HTML representation 2009-01-12 18:56:19 -07:00
Zooko O'Whielacronx
25063688b4 immutable repairer
This implements an immutable repairer by marrying a CiphertextDownloader to a CHKUploader.  It extends the IDownloadTarget interface so that the downloader can provide some metadata that the uploader requires.
The processing is incremental -- it uploads the first segments before it finishes downloading the whole file.  This is necessary so that you can repair large files without running out of RAM or using a temporary file on the repairer.
It requires only a verifycap, not a readcap.  That is: it doesn't need or use the decryption key, only the integrity check codes.
There are several tests marked TODO and several instances of XXX in the source code.  I intend to open tickets to document further improvements to functionality and testing, but the current version is probably good enough for Tahoe-1.3.0.
2009-01-12 11:00:22 -07:00
Zooko O'Whielacronx
600196f571 immutable: refactor download to do only download-and-decode, not decryption
FileDownloader takes a verify cap and produces ciphertext, instead of taking a read cap and producing plaintext.
FileDownloader does all integrity checking including the mandatory ciphertext hash tree and the optional ciphertext flat hash, rather than expecting its target to do some of that checking.
Rename immutable.download.Output to immutable.download.DecryptingOutput. An instance of DecryptingOutput can be passed to FileDownloader to use as the latter's target.  Text pushed to the DecryptingOutput is decrypted and then pushed to *its* target.
DecryptingOutput satisfies the IConsumer interface, and if its target also satisfies IConsumer, then it forwards and pause/unpause signals to its producer (which is the FileDownloader).
This patch also changes some logging code to use the new logging mixin class.
Check integrity of a segment and decrypt the segment one block-sized buffer at a time instead of copying the buffers together into one segment-sized buffer (reduces peak memory usage, I think, and is probably a tad faster/less CPU, depending on your encoding parameters).
Refactor FileDownloader so that processing of segments and of tail-segment share as much code is possible.
FileDownloader and FileNode take caps as instances of URI (Python objects), not as strings.
2009-01-08 11:53:49 -07:00
Zooko O'Whielacronx
ecabcc674c immutable: Make more parts of download use logging mixins and know what their "parent msg id" is. 2009-01-08 11:25:30 -07:00
Zooko O'Whielacronx
5e6f90a015 rename "checker results" to "check results", because it is more parallel to "check-and-repair results" 2009-01-06 13:37:03 -07:00
Zooko O'Whielacronx
c35a6ee3a2 trivial: fix a bunch of pyflakes complaints 2009-01-06 08:00:54 -07:00
Zooko O'Whielacronx
6a12f316a4 immutable: new checker and verifier
New checker and verifier use the new download class.  They are robust against various sorts of failures or corruption.  They return detailed results explaining what they learned about your immutable files.  Some grotesque sorts of corruption are not properly handled yet, and those ones are marked as TODO or commented-out in the unit tests.
There is also a repairer module in this patch with the beginnings of a repairer in it.  That repairer is mostly just the interface to the outside world -- the core operation of actually reconstructing the missing data blocks and uploading them is not in there yet.
This patch also refactors the unit tests in test_immutable so that the handling of each kind of corruption is reported as passing or failing separately, can be separately TODO'ified, etc.  The unit tests are also improved in various ways to require more of the code under test or to stop requiring unreasonable things of it.  :-)
2009-01-05 18:28:18 -07:00
Zooko O'Whielacronx
c456ff8591 rename "get_verifier()" to "get_verify_cap()" 2008-12-08 12:44:11 -07:00
Zooko O'Whielacronx
b315619d6b download: refactor handling of URI Extension Block and crypttext hash tree, simplify things
Refactor into a class the logic of asking each server in turn until one of them gives an answer 
that validates.  It is called ValidatedThingObtainer.

Refactor the downloading and verification of the URI Extension Block into a class named 
ValidatedExtendedURIProxy.

The new logic of validating UEBs is minimalist: it doesn't require the UEB to contain any 
unncessary information, but of course it still accepts such information for backwards 
compatibility (so that this new download code is able to download files uploaded with old, and 
for that matter with current, upload code).

The new logic of validating UEBs follows the practice of doing all validation up front.  This 
practice advises one to isolate the validation of incoming data into one place, so that all of 
the rest of the code can assume only valid data.

If any redundant information is present in the UEB+URI, the new code cross-checks and asserts 
that it is all fully consistent.  This closes some issues where the uploader could have 
uploaded inconsistent redundant data, which would probably have caused the old downloader to 
simply reject that download after getting a Python exception, but perhaps could have caused 
greater harm to the old downloader.

I removed the notion of selecting an erasure codec from codec.py based on the string that was 
passed in the UEB.  Currently "crs" is the only such string that works, so 
"_assert(codec_name == 'crs')" is simpler and more explicit.  This is also in keeping with the 
"validate up front" strategy -- now if someone sets a different string than "crs" in their UEB, 
the downloader will reject the download in the "validate this UEB" function instead of in a 
separate "select the codec instance" function.

I removed the code to check plaintext hashes and plaintext Merkle Trees.  Uploaders do not 
produce this information any more (since it potentially exposes confidential information about 
the file), and the unit tests for it were disabled.  The downloader before this patch would 
check that plaintext hash or plaintext merkle tree if they were present, but not complain if 
they were absent.  The new downloader in this patch complains if they are present and doesn't 
check them.  (We might in the future re-introduce such hashes over the plaintext, but encrypt 
the hashes which are stored in the UEB to preserve confidentiality.  This would be a double-
check on the correctness of our own source code -- the current Merkle Tree over the ciphertext 
is already sufficient to guarantee the integrity of the download unless there is a bug in our 
Merkle Tree or AES implementation.) 

This patch increases the lines-of-code count by 8 (from 17,770 to 17,778), and reduces the 
uncovered-by-tests lines-of-code count by 24 (from 1408 to 1384).  Those numbers would be more 
meaningful if we omitted src/allmydata/util/ from the test-coverage statistics.
2008-12-05 08:17:54 -07:00
Brian Warner
6fa41e738b immutable: tolerate filenode.read() with a size= that's too big, rather than hanging 2008-11-04 15:29:19 -07:00
Brian Warner
ba019bfd3a #527: expire the cached files that are used to support Range: headers, every hour, when the file is unused and older than an hour 2008-10-30 13:39:09 -07:00
Brian Warner
b1db6d9ff2 web: add 'Repair' button to checker results when they indicate unhealthyness. Also add the object's uri to the CheckerResults instance. 2008-10-29 18:09:17 -07:00
Brian Warner
b1ca238176 #527: respond to GETs with early ranges quickly, without waiting for the whole file to download. Fixes the alacrity problems with the earlier code. Still needs cache expiration. 2008-10-28 17:56:18 -07:00
Brian Warner
37e3d8e47c #527: support HTTP 'Range:' requests, using a cachefile. Adds filenode.read(consumer, offset, size) method. Still needs: cache expiration, reduced alacrity. 2008-10-28 13:41:04 -07:00