Merge remote-tracking branch 'origin/master' into 3880-http-storage-logging

This commit is contained in:
Itamar Turner-Trauring 2023-03-27 13:35:30 -04:00
commit 4a396309d2
211 changed files with 27061 additions and 3647 deletions

78
.circleci/circleci.txt Normal file
View File

@ -0,0 +1,78 @@
# A master build looks like this:
# BASH_ENV=/tmp/.bash_env-63d018969ca480003a031e62-0-build
# CI=true
# CIRCLECI=true
# CIRCLE_BRANCH=master
# CIRCLE_BUILD_NUM=76545
# CIRCLE_BUILD_URL=https://circleci.com/gh/tahoe-lafs/tahoe-lafs/76545
# CIRCLE_JOB=NixOS 21.11
# CIRCLE_NODE_INDEX=0
# CIRCLE_NODE_TOTAL=1
# CIRCLE_PROJECT_REPONAME=tahoe-lafs
# CIRCLE_PROJECT_USERNAME=tahoe-lafs
# CIRCLE_REPOSITORY_URL=git@github.com:tahoe-lafs/tahoe-lafs.git
# CIRCLE_SHA1=ed0bda2d7456f4a2cd60870072e1fe79864a49a1
# CIRCLE_SHELL_ENV=/tmp/.bash_env-63d018969ca480003a031e62-0-build
# CIRCLE_USERNAME=alice
# CIRCLE_WORKFLOW_ID=6d9bb71c-be3a-4659-bf27-60954180619b
# CIRCLE_WORKFLOW_JOB_ID=0793c975-7b9f-489f-909b-8349b72d2785
# CIRCLE_WORKFLOW_WORKSPACE_ID=6d9bb71c-be3a-4659-bf27-60954180619b
# CIRCLE_WORKING_DIRECTORY=~/project
# A build of an in-repo PR looks like this:
# BASH_ENV=/tmp/.bash_env-63d1971a0298086d8841287e-0-build
# CI=true
# CIRCLECI=true
# CIRCLE_BRANCH=3946-less-chatty-downloads
# CIRCLE_BUILD_NUM=76612
# CIRCLE_BUILD_URL=https://circleci.com/gh/tahoe-lafs/tahoe-lafs/76612
# CIRCLE_JOB=NixOS 21.11
# CIRCLE_NODE_INDEX=0
# CIRCLE_NODE_TOTAL=1
# CIRCLE_PROJECT_REPONAME=tahoe-lafs
# CIRCLE_PROJECT_USERNAME=tahoe-lafs
# CIRCLE_PULL_REQUEST=https://github.com/tahoe-lafs/tahoe-lafs/pull/1251
# CIRCLE_PULL_REQUESTS=https://github.com/tahoe-lafs/tahoe-lafs/pull/1251
# CIRCLE_REPOSITORY_URL=git@github.com:tahoe-lafs/tahoe-lafs.git
# CIRCLE_SHA1=921a2083dcefdb5f431cdac195fc9ac510605349
# CIRCLE_SHELL_ENV=/tmp/.bash_env-63d1971a0298086d8841287e-0-build
# CIRCLE_USERNAME=bob
# CIRCLE_WORKFLOW_ID=5e32c12e-be37-4868-9fa8-6a6929fec2f1
# CIRCLE_WORKFLOW_JOB_ID=316ca408-81b4-4c96-bbdd-644e4c3e01e5
# CIRCLE_WORKFLOW_WORKSPACE_ID=5e32c12e-be37-4868-9fa8-6a6929fec2f1
# CIRCLE_WORKING_DIRECTORY=~/project
# CI_PULL_REQUEST=https://github.com/tahoe-lafs/tahoe-lafs/pull/1251
# A build of a PR from a fork looks like this:
# BASH_ENV=/tmp/.bash_env-63d40f7b2e89cd3de10e0db9-0-build
# CI=true
# CIRCLECI=true
# CIRCLE_BRANCH=pull/1252
# CIRCLE_BUILD_NUM=76678
# CIRCLE_BUILD_URL=https://circleci.com/gh/tahoe-lafs/tahoe-lafs/76678
# CIRCLE_JOB=NixOS 21.05
# CIRCLE_NODE_INDEX=0
# CIRCLE_NODE_TOTAL=1
# CIRCLE_PROJECT_REPONAME=tahoe-lafs
# CIRCLE_PROJECT_USERNAME=tahoe-lafs
# CIRCLE_PR_NUMBER=1252
# CIRCLE_PR_REPONAME=tahoe-lafs
# CIRCLE_PR_USERNAME=carol
# CIRCLE_PULL_REQUEST=https://github.com/tahoe-lafs/tahoe-lafs/pull/1252
# CIRCLE_PULL_REQUESTS=https://github.com/tahoe-lafs/tahoe-lafs/pull/1252
# CIRCLE_REPOSITORY_URL=git@github.com:tahoe-lafs/tahoe-lafs.git
# CIRCLE_SHA1=15c7916e0812e6baa2a931cd54b18f3382a8456e
# CIRCLE_SHELL_ENV=/tmp/.bash_env-63d40f7b2e89cd3de10e0db9-0-build
# CIRCLE_USERNAME=
# CIRCLE_WORKFLOW_ID=19c917c8-3a38-4b20-ac10-3265259fa03e
# CIRCLE_WORKFLOW_JOB_ID=58e95215-eccf-4664-a231-1dba7fd2d323
# CIRCLE_WORKFLOW_WORKSPACE_ID=19c917c8-3a38-4b20-ac10-3265259fa03e
# CIRCLE_WORKING_DIRECTORY=~/project
# CI_PULL_REQUEST=https://github.com/tahoe-lafs/tahoe-lafs/pull/1252
# A build of a PR from a fork where the owner has enabled CircleCI looks
# the same as a build of an in-repo PR, except it runs on th owner's
# CircleCI namespace.

View File

@ -11,32 +11,92 @@
#
version: 2.1
# Every job that pushes a Docker image from Docker Hub must authenticate to
# it. Define a couple yaml anchors that can be used to supply the necessary
# credentials.
# First is a CircleCI job context which makes Docker Hub credentials available
# in the environment.
#
# Contexts are managed in the CircleCI web interface:
#
# https://app.circleci.com/settings/organization/github/tahoe-lafs/contexts
dockerhub-context-template: &DOCKERHUB_CONTEXT
context: "dockerhub-auth"
# Next is a Docker executor template that gets the credentials from the
# environment and supplies them to the executor.
dockerhub-auth-template: &DOCKERHUB_AUTH
- auth:
username: $DOCKERHUB_USERNAME
password: $DOCKERHUB_PASSWORD
# A template that can be shared between the two different image-building
# workflows.
.images: &IMAGES
jobs:
- "build-image-debian-11":
<<: *DOCKERHUB_CONTEXT
- "build-image-ubuntu-20-04":
<<: *DOCKERHUB_CONTEXT
- "build-image-fedora-35":
<<: *DOCKERHUB_CONTEXT
- "build-image-oraclelinux-8":
<<: *DOCKERHUB_CONTEXT
# Restore later as PyPy38
#- "build-image-pypy27-buster":
# <<: *DOCKERHUB_CONTEXT
parameters:
# Control whether the image-building workflow runs as part of this pipeline.
# Generally we do not want this to run because we don't need our
# dependencies to move around all the time and because building the image
# takes a couple minutes.
#
# An easy way to trigger a pipeline with this set to true is with the
# rebuild-images.sh tool in this directory. You can also do so via the
# CircleCI web UI.
build-images:
default: false
type: "boolean"
# Control whether the test-running workflow runs as part of this pipeline.
# Generally we do want this to run because running the tests is the primary
# purpose of this pipeline.
run-tests:
default: true
type: "boolean"
workflows:
ci:
when: "<< pipeline.parameters.run-tests >>"
jobs:
# Start with jobs testing various platforms.
- "debian-10":
{}
- "debian-11":
{}
- "ubuntu-20-04":
{}
- "ubuntu-18-04":
requires:
- "ubuntu-20-04"
# Equivalent to RHEL 8; CentOS 8 is dead.
- "oraclelinux-8":
{}
- "nixos":
name: "NixOS 21.05"
nixpkgs: "21.05"
name: "<<matrix.pythonVersion>>"
nixpkgs: "22.11"
matrix:
parameters:
pythonVersion:
- "python310"
- "nixos":
name: "NixOS 21.11"
nixpkgs: "21.11"
name: "<<matrix.pythonVersion>>"
nixpkgs: "unstable"
matrix:
parameters:
pythonVersion:
- "python311"
# Eventually, test against PyPy 3.8
#- "pypy27-buster":
@ -54,6 +114,9 @@ workflows:
{}
- "integration":
# Run even the slow integration tests here. We need the `--` to
# sneak past tox and get to pytest.
tox-args: "-- --runslow integration"
requires:
# If the unit test suite doesn't pass, don't bother running the
# integration tests.
@ -65,66 +128,12 @@ workflows:
{}
images:
# Build the Docker images used by the ci jobs. This makes the ci jobs
# faster and takes various spurious failures out of the critical path.
triggers:
# Build once a day
- schedule:
cron: "0 0 * * *"
filters:
branches:
only:
- "master"
jobs:
# Every job that pushes a Docker image from Docker Hub needs to provide
# credentials. Use this first job to define a yaml anchor that can be
# used to supply a CircleCI job context which makes Docker Hub
# credentials available in the environment.
#
# Contexts are managed in the CircleCI web interface:
#
# https://app.circleci.com/settings/organization/github/tahoe-lafs/contexts
- "build-image-debian-10": &DOCKERHUB_CONTEXT
context: "dockerhub-auth"
- "build-image-debian-11":
<<: *DOCKERHUB_CONTEXT
- "build-image-ubuntu-18-04":
<<: *DOCKERHUB_CONTEXT
- "build-image-ubuntu-20-04":
<<: *DOCKERHUB_CONTEXT
- "build-image-fedora-35":
<<: *DOCKERHUB_CONTEXT
- "build-image-oraclelinux-8":
<<: *DOCKERHUB_CONTEXT
# Restore later as PyPy38
#- "build-image-pypy27-buster":
# <<: *DOCKERHUB_CONTEXT
<<: *IMAGES
# Build as part of the workflow but only if requested.
when: "<< pipeline.parameters.build-images >>"
jobs:
dockerhub-auth-template:
# This isn't a real job. It doesn't get scheduled as part of any
# workflow. Instead, it's just a place we can hang a yaml anchor to
# finish the Docker Hub authentication configuration. Workflow jobs using
# the DOCKERHUB_CONTEXT anchor will have access to the environment
# variables used here. These variables will allow the Docker Hub image
# pull to be authenticated and hopefully avoid hitting and rate limits.
docker: &DOCKERHUB_AUTH
- image: "null"
auth:
username: $DOCKERHUB_USERNAME
password: $DOCKERHUB_PASSWORD
steps:
- run:
name: "CircleCI YAML schema conformity"
command: |
# This isn't a real command. We have to have something in this
# space, though, or the CircleCI yaml schema validator gets angry.
# Since this job is never scheduled this step is never run so the
# actual value here is irrelevant.
codechecks:
docker:
- <<: *DOCKERHUB_AUTH
@ -133,10 +142,10 @@ jobs:
steps:
- "checkout"
- run:
- run: &INSTALL_TOX
name: "Install tox"
command: |
pip install --user tox
pip install --user 'tox~=3.0'
- run:
name: "Static-ish code checks"
@ -152,9 +161,7 @@ jobs:
- "checkout"
- run:
name: "Install tox"
command: |
pip install --user tox
<<: *INSTALL_TOX
- run:
name: "Make PyInstaller executable"
@ -169,12 +176,7 @@ jobs:
command: |
dist/Tahoe-LAFS/tahoe --version
debian-10: &DEBIAN
docker:
- <<: *DOCKERHUB_AUTH
image: "tahoelafsci/debian:10-py3.7"
user: "nobody"
debian-11: &DEBIAN
environment: &UTF_8_ENVIRONMENT
# In general, the test suite is not allowed to fail while the job
# succeeds. But you can set this to "yes" if you want it to be
@ -186,7 +188,7 @@ jobs:
# filenames and argv).
LANG: "en_US.UTF-8"
# Select a tox environment to run for this job.
TAHOE_LAFS_TOX_ENVIRONMENT: "py37"
TAHOE_LAFS_TOX_ENVIRONMENT: "py39"
# Additional arguments to pass to tox.
TAHOE_LAFS_TOX_ARGS: ""
# The path in which test artifacts will be placed.
@ -254,15 +256,11 @@ jobs:
/tmp/venv/bin/codecov
fi
debian-11:
<<: *DEBIAN
docker:
- <<: *DOCKERHUB_AUTH
image: "tahoelafsci/debian:11-py3.9"
user: "nobody"
environment:
<<: *UTF_8_ENVIRONMENT
TAHOE_LAFS_TOX_ENVIRONMENT: "py39"
# Restore later using PyPy3.8
# pypy27-buster:
@ -296,6 +294,14 @@ jobs:
integration:
<<: *DEBIAN
parameters:
tox-args:
description: >-
Additional arguments to pass to the tox command.
type: "string"
default: ""
docker:
- <<: *DOCKERHUB_AUTH
image: "tahoelafsci/debian:11-py3.9"
@ -308,28 +314,15 @@ jobs:
# Disable artifact collection because py.test can't produce any.
ARTIFACTS_OUTPUT_PATH: ""
# Pass on anything we got in our parameters.
TAHOE_LAFS_TOX_ARGS: "<< parameters.tox-args >>"
steps:
- "checkout"
# DRY, YAML-style. See the debian-9 steps.
- run: *SETUP_VIRTUALENV
- run: *RUN_TESTS
ubuntu-18-04: &UBUNTU_18_04
<<: *DEBIAN
docker:
- <<: *DOCKERHUB_AUTH
image: "tahoelafsci/ubuntu:18.04-py3.7"
user: "nobody"
environment:
<<: *UTF_8_ENVIRONMENT
# The default trial args include --rterrors which is incompatible with
# this reporter on Python 3. So drop that and just specify the
# reporter.
TAHOE_LAFS_TRIAL_ARGS: "--reporter=subunitv2-file"
TAHOE_LAFS_TOX_ENVIRONMENT: "py37"
ubuntu-20-04:
<<: *DEBIAN
docker:
@ -378,111 +371,34 @@ jobs:
Reference the name of a niv-managed nixpkgs source (see `niv show`
and nix/sources.json)
type: "string"
pythonVersion:
description: >-
Reference the name of a Python package in nixpkgs to use.
type: "string"
docker:
# Run in a highly Nix-capable environment.
- <<: *DOCKERHUB_AUTH
image: "nixos/nix:2.3.16"
environment:
# CACHIX_AUTH_TOKEN is manually set in the CircleCI web UI and
# allows us to push to CACHIX_NAME. We only need this set for
# `cachix use` in this step.
CACHIX_NAME: "tahoe-lafs-opensource"
executor: "nix"
steps:
- "run":
# The nixos/nix image does not include ssh. Install it so the
# `checkout` step will succeed. We also want cachix for
# Nix-friendly caching.
name: "Install Basic Dependencies"
command: |
nix-env \
--file https://github.com/nixos/nixpkgs/archive/nixos-<<parameters.nixpkgs>>.tar.gz \
--install \
-A openssh cachix bash
- "checkout"
- run:
name: "Cachix setup"
# Record the store paths that exist before we did much. There's no
# reason to cache these, they're either in the image or have to be
# retrieved before we can use cachix to restore from cache.
command: |
cachix use "${CACHIX_NAME}"
nix path-info --all > /tmp/store-path-pre-build
- "run":
# The Nix package doesn't know how to do this part, unfortunately.
name: "Generate version"
command: |
nix-shell \
-p 'python3.withPackages (ps: [ ps.setuptools ])' \
--run 'python setup.py update_version'
- "run":
name: "Build"
command: |
# CircleCI build environment looks like it has a zillion and a
# half cores. Don't let Nix autodetect this high core count
# because it blows up memory usage and fails the test run. Pick a
# number of cores that suites the build environment we're paying
# for (the free one!).
#
# Also, let it run more than one job at a time because we have to
# build a couple simple little dependencies that don't take
# advantage of multiple cores and we get a little speedup by doing
# them in parallel.
nix-build --cores 3 --max-jobs 2 --argstr pkgsVersion "nixpkgs-<<parameters.nixpkgs>>"
- "run":
name: "Test"
command: |
# Let it go somewhat wild for the test suite itself
nix-build --cores 8 --argstr pkgsVersion "nixpkgs-<<parameters.nixpkgs>>" tests.nix
- run:
# Send any new store objects to cachix.
name: "Push to Cachix"
when: "always"
command: |
# Cribbed from
# https://circleci.com/blog/managing-secrets-when-you-have-pull-requests-from-outside-contributors/
if [ -n "$CIRCLE_PR_NUMBER" ]; then
# I'm sure you're thinking "CIRCLE_PR_NUMBER must just be the
# number of the PR being built". Sorry, dear reader, you have
# guessed poorly. It is also conditionally set based on whether
# this is a PR from a fork or not.
#
# https://circleci.com/docs/2.0/env-vars/#built-in-environment-variables
echo "Skipping Cachix push for forked PR."
else
# If this *isn't* a build from a fork then we have the Cachix
# write key in our environment and we can push any new objects
# to Cachix.
#
# To decide what to push, we inspect the list of store objects
# that existed before and after we did most of our work. Any
# that are new after the work is probably a useful thing to have
# around so push it to the cache. We exclude all derivation
# objects (.drv files) because they're cheap to reconstruct and
# by the time you know their cache key you've already done all
# the work anyway.
#
# This shell expression for finding the objects and pushing them
# was from the Cachix docs:
#
# https://docs.cachix.org/continuous-integration-setup/circleci.html
#
# but they seem to have removed it now.
bash -c "comm -13 <(sort /tmp/store-path-pre-build | grep -v '\.drv$') <(nix path-info --all | grep -v '\.drv$' | sort) | cachix push $CACHIX_NAME"
fi
- "nix-build":
nixpkgs: "<<parameters.nixpkgs>>"
pythonVersion: "<<parameters.pythonVersion>>"
buildSteps:
- "run":
name: "Unit Test"
command: |
# The dependencies are all built so we can allow more
# parallelism here.
source .circleci/lib.sh
cache_if_able nix-build \
--cores 8 \
--argstr pkgsVersion "nixpkgs-<<parameters.nixpkgs>>" \
--argstr pythonVersion "<<parameters.pythonVersion>>" \
nix/tests.nix
typechecks:
docker:
- <<: *DOCKERHUB_AUTH
image: "tahoelafsci/ubuntu:18.04-py3.7"
image: "tahoelafsci/ubuntu:20.04-py3.9"
steps:
- "checkout"
@ -494,7 +410,7 @@ jobs:
docs:
docker:
- <<: *DOCKERHUB_AUTH
image: "tahoelafsci/ubuntu:18.04-py3.7"
image: "tahoelafsci/ubuntu:20.04-py3.9"
steps:
- "checkout"
@ -545,15 +461,6 @@ jobs:
docker push tahoelafsci/${DISTRO}:${TAG}-py${PYTHON_VERSION}
build-image-debian-10:
<<: *BUILD_IMAGE
environment:
DISTRO: "debian"
TAG: "10"
PYTHON_VERSION: "3.7"
build-image-debian-11:
<<: *BUILD_IMAGE
@ -562,14 +469,6 @@ jobs:
TAG: "11"
PYTHON_VERSION: "3.9"
build-image-ubuntu-18-04:
<<: *BUILD_IMAGE
environment:
DISTRO: "ubuntu"
TAG: "18.04"
PYTHON_VERSION: "3.7"
build-image-ubuntu-20-04:
<<: *BUILD_IMAGE
@ -598,7 +497,6 @@ jobs:
# build-image-pypy27-buster:
# <<: *BUILD_IMAGE
# environment:
# DISTRO: "pypy"
# TAG: "buster"
@ -606,3 +504,87 @@ jobs:
# # setting up PyPy 3 in the image building toolchain. This value is just
# # for constructing the right Docker image tag.
# PYTHON_VERSION: "2"
executors:
nix:
docker:
# Run in a highly Nix-capable environment.
- <<: *DOCKERHUB_AUTH
image: "nixos/nix:2.10.3"
environment:
# CACHIX_AUTH_TOKEN is manually set in the CircleCI web UI and allows us
# to push to CACHIX_NAME. CACHIX_NAME tells cachix which cache to push
# to.
CACHIX_NAME: "tahoe-lafs-opensource"
commands:
nix-build:
parameters:
nixpkgs:
description: >-
Reference the name of a niv-managed nixpkgs source (see `niv show`
and nix/sources.json)
type: "string"
pythonVersion:
description: >-
Reference the name of a Python package in nixpkgs to use.
type: "string"
buildSteps:
description: >-
The build steps to execute after setting up the build environment.
type: "steps"
steps:
- "run":
# Get cachix for Nix-friendly caching.
name: "Install Basic Dependencies"
command: |
NIXPKGS="https://github.com/nixos/nixpkgs/archive/nixos-<<parameters.nixpkgs>>.tar.gz"
nix-env \
--file $NIXPKGS \
--install \
-A cachix bash
# Activate it for "binary substitution". This sets up
# configuration tht lets Nix download something from the cache
# instead of building it locally, if possible.
cachix use "${CACHIX_NAME}"
- "checkout"
- "run":
# The Nix package doesn't know how to do this part, unfortunately.
name: "Generate version"
command: |
nix-shell \
-p 'python3.withPackages (ps: [ ps.setuptools ])' \
--run 'python setup.py update_version'
- "run":
name: "Build Dependencies"
command: |
# CircleCI build environment looks like it has a zillion and a
# half cores. Don't let Nix autodetect this high core count
# because it blows up memory usage and fails the test run. Pick a
# number of cores that suits the build environment we're paying
# for (the free one!).
source .circleci/lib.sh
# nix-shell will build all of the dependencies of the target but
# not the target itself.
cache_if_able nix-shell \
--run "" \
--cores 3 \
--argstr pkgsVersion "nixpkgs-<<parameters.nixpkgs>>" \
--argstr pythonVersion "<<parameters.pythonVersion>>" \
./default.nix
- "run":
name: "Build Package"
command: |
source .circleci/lib.sh
cache_if_able nix-build \
--cores 4 \
--argstr pkgsVersion "nixpkgs-<<parameters.nixpkgs>>" \
--argstr pythonVersion "<<parameters.pythonVersion>>" \
./default.nix
- steps: "<<parameters.buildSteps>>"

119
.circleci/lib.sh Normal file
View File

@ -0,0 +1,119 @@
# Run a command, enabling cache writes to cachix if possible. The command is
# accepted as a variable number of positional arguments (like argv).
function cache_if_able() {
# Dump some info about our build environment.
describe_build
if is_cache_writeable; then
# If the cache is available we'll use it. This lets fork owners set
# up their own caching if they want.
echo "Cachix credentials present; will attempt to write to cache."
# The `cachix watch-exec ...` does our cache population. When it sees
# something added to the store (I guess) it pushes it to the named
# cache.
cachix watch-exec "${CACHIX_NAME}" -- "$@"
else
if is_cache_required; then
echo "Required credentials (CACHIX_AUTH_TOKEN) are missing."
return 1
else
echo "Cachix credentials missing; will not attempt cache writes."
"$@"
fi
fi
}
function is_cache_writeable() {
# We can only *push* to the cache if we have a CACHIX_AUTH_TOKEN. in-repo
# jobs will get this from CircleCI configuration but jobs from forks may
# not.
[ -v CACHIX_AUTH_TOKEN ]
}
function is_cache_required() {
# If we're building in tahoe-lafs/tahoe-lafs then we must use the cache.
# If we're building anything from a fork then we're allowed to not have
# the credentials.
is_upstream
}
# Return success if the origin of this build is the tahoe-lafs/tahoe-lafs
# repository itself (and so we expect to have cache credentials available),
# failure otherwise.
#
# See circleci.txt for notes about how this determination is made.
function is_upstream() {
# CIRCLE_PROJECT_USERNAME is set to the org the build is happening for.
# If a PR targets a fork of the repo then this is set to something other
# than "tahoe-lafs".
[ "$CIRCLE_PROJECT_USERNAME" == "tahoe-lafs" ] &&
# CIRCLE_BRANCH is set to the real branch name for in-repo PRs and
# "pull/NNNN" for pull requests from forks.
#
# CIRCLE_PULL_REQUESTS is set to a comma-separated list of the full
# URLs of the PR pages which share an underlying branch, with one of
# them ended with that same "pull/NNNN" for PRs from forks.
! any_element_endswith "/$CIRCLE_BRANCH" "," "$CIRCLE_PULL_REQUESTS"
}
# Return success if splitting $3 on $2 results in an array with any element
# that ends with $1, failure otherwise.
function any_element_endswith() {
suffix=$1
shift
sep=$1
shift
haystack=$1
shift
IFS="${sep}" read -r -a elements <<< "$haystack"
for elem in "${elements[@]}"; do
if endswith "$suffix" "$elem"; then
return 0
fi
done
return 1
}
# Return success if $2 ends with $1, failure otherwise.
function endswith() {
suffix=$1
shift
haystack=$1
shift
case "$haystack" in
*${suffix})
return 0
;;
*)
return 1
;;
esac
}
function describe_build() {
echo "Building PR for user/org: ${CIRCLE_PROJECT_USERNAME}"
echo "Building branch: ${CIRCLE_BRANCH}"
if is_upstream; then
echo "Upstream build."
else
echo "Non-upstream build."
fi
if is_cache_required; then
echo "Cache is required."
else
echo "Cache not required."
fi
if is_cache_writeable; then
echo "Cache is writeable."
else
echo "Cache not writeable."
fi
}

View File

@ -9,7 +9,7 @@ BASIC_DEPS="pip wheel"
# Python packages we need to support the test infrastructure. *Not* packages
# Tahoe-LAFS itself (implementation or test suite) need.
TEST_DEPS="tox codecov"
TEST_DEPS="tox~=3.0 codecov"
# Python packages we need to generate test reports for CI infrastructure.
# *Not* packages Tahoe-LAFS itself (implement or test suite) need.

20
.circleci/rebuild-images.sh Executable file
View File

@ -0,0 +1,20 @@
#!/usr/bin/env bash
set -euo pipefail
# Get your API token here:
# https://app.circleci.com/settings/user/tokens
API_TOKEN=$1
shift
# Name the branch you want to trigger the build for
BRANCH=$1
shift
curl \
--verbose \
--request POST \
--url https://circleci.com/api/v2/project/gh/tahoe-lafs/tahoe-lafs/pipeline \
--header "Circle-Token: $API_TOKEN" \
--header "content-type: application/json" \
--data '{"branch":"'"$BRANCH"'","parameters":{"build-images":true,"run-tests":false}}'

View File

@ -45,14 +45,15 @@ fi
# A prefix for the test command that ensure it will exit after no more than a
# certain amount of time. Ideally, we would only enforce a "silent" period
# timeout but there isn't obviously a ready-made tool for that. The test
# suite only takes about 5 - 6 minutes on CircleCI right now. 15 minutes
# seems like a moderately safe window.
# timeout but there isn't obviously a ready-made tool for that. The unit test
# suite only takes about 5 - 6 minutes on CircleCI right now. The integration
# tests are a bit longer than that. 45 minutes seems like a moderately safe
# window.
#
# This is primarily aimed at catching hangs on the PyPy job which runs for
# about 21 minutes and then gets killed by CircleCI in a way that fails the
# job and bypasses our "allowed failure" logic.
TIMEOUT="timeout --kill-after 1m 15m"
TIMEOUT="timeout --kill-after 1m 45m"
# Run the test suite as a non-root user. This is the expected usage some
# small areas of the test suite assume non-root privileges (such as unreadable

View File

@ -6,6 +6,16 @@ on:
- "master"
pull_request:
# At the start of each workflow run, GitHub creates a unique
# GITHUB_TOKEN secret to use in the workflow. It is a good idea for
# this GITHUB_TOKEN to have the minimum of permissions. See:
#
# - https://docs.github.com/en/actions/security-guides/automatic-token-authentication
# - https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions
#
permissions:
contents: read
# Control to what degree jobs in this workflow will run concurrently with
# other instances of themselves.
#
@ -38,73 +48,67 @@ jobs:
- windows-latest
- ubuntu-latest
python-version:
- "3.7"
- "3.8"
- "3.9"
- "3.10"
- "3.11"
include:
# On macOS don't bother with 3.7-3.8, just to get faster builds.
# On macOS don't bother with 3.8, just to get faster builds.
- os: macos-latest
python-version: "3.9"
- os: macos-latest
python-version: "3.10"
python-version: "3.11"
# We only support PyPy on Linux at the moment.
- os: ubuntu-latest
python-version: "pypy-3.7"
- os: ubuntu-latest
python-version: "pypy-3.8"
- os: ubuntu-latest
python-version: "pypy-3.9"
steps:
# See https://github.com/actions/checkout. A fetch-depth of 0
# fetches all tags and branches.
- name: Check out Tahoe-LAFS sources
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
# To use pip caching with GitHub Actions in an OS-independent
# manner, we need `pip cache dir` command, which became
# available since pip v20.1+. At the time of writing this,
# GitHub Actions offers pip v20.3.3 for both ubuntu-latest and
# windows-latest, and pip v20.3.1 for macos-latest.
- name: Get pip cache directory
id: pip-cache
run: |
echo "::set-output name=dir::$(pip cache dir)"
# See https://github.com/actions/cache
- name: Use pip cache
uses: actions/cache@v2
with:
path: ${{ steps.pip-cache.outputs.dir }}
key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}
restore-keys: |
${{ runner.os }}-pip-
cache: 'pip' # caching pip dependencies
- name: Install Python packages
run: |
pip install --upgrade codecov tox tox-gh-actions setuptools
pip install --upgrade codecov "tox<4" tox-gh-actions setuptools
pip list
- name: Display tool versions
run: python misc/build_helpers/show-tool-versions.py
- name: Run tox for corresponding Python version
if: ${{ !contains(matrix.os, 'windows') }}
run: python -m tox
# On Windows, a non-blocking pipe might respond (when emulating Unix-y
# API) with ENOSPC to indicate buffer full. Trial doesn't handle this
# well, so it breaks test runs. To attempt to solve this, we pipe the
# output through passthrough.py that will hopefully be able to do the right
# thing by using Windows APIs.
- name: Run tox for corresponding Python version
if: ${{ contains(matrix.os, 'windows') }}
run: |
pip install twisted pywin32
python -m tox | python misc/windows-enospc/passthrough.py
- name: Upload eliot.log
uses: actions/upload-artifact@v1
uses: actions/upload-artifact@v3
with:
name: eliot.log
path: eliot.log
- name: Upload trial log
uses: actions/upload-artifact@v1
uses: actions/upload-artifact@v3
with:
name: test.log
path: _trial_temp/test.log
@ -161,21 +165,22 @@ jobs:
strategy:
fail-fast: false
matrix:
os:
- windows-latest
- ubuntu-latest
python-version:
- 3.7
- 3.9
include:
# On macOS don't bother with 3.7, just to get faster builds.
- os: macos-latest
python-version: 3.9
python-version: "3.9"
force-foolscap: false
- os: windows-latest
python-version: "3.9"
force-foolscap: false
# 22.04 has some issue with Tor at the moment:
# https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3943
- os: ubuntu-20.04
python-version: "3.11"
force-foolscap: false
steps:
- name: Install Tor [Ubuntu]
if: matrix.os == 'ubuntu-latest'
if: ${{ contains(matrix.os, 'ubuntu') }}
run: sudo apt install tor
# TODO: See https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3744.
@ -188,51 +193,51 @@ jobs:
- name: Install Tor [Windows]
if: matrix.os == 'windows-latest'
uses: crazy-max/ghaction-chocolatey@v1
uses: crazy-max/ghaction-chocolatey@v2
with:
args: install tor
- name: Check out Tahoe-LAFS sources
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Get pip cache directory
id: pip-cache
run: |
echo "::set-output name=dir::$(pip cache dir)"
- name: Use pip cache
uses: actions/cache@v2
with:
path: ${{ steps.pip-cache.outputs.dir }}
key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}
restore-keys: |
${{ runner.os }}-pip-
cache: 'pip' # caching pip dependencies
- name: Install Python packages
run: |
pip install --upgrade tox
pip install --upgrade "tox<4"
pip list
- name: Display tool versions
run: python misc/build_helpers/show-tool-versions.py
- name: Run "Python 3 integration tests"
if: "${{ !matrix.force-foolscap }}"
env:
# On macOS this is necessary to ensure unix socket paths for tor
# aren't too long. On Windows tox won't pass it through so it has no
# effect. On Linux it doesn't make a difference one way or another.
TMPDIR: "/tmp"
run: tox -e integration
run: |
tox -e integration
- name: Run "Python 3 integration tests (force Foolscap)"
if: "${{ matrix.force-foolscap }}"
env:
# On macOS this is necessary to ensure unix socket paths for tor
# aren't too long. On Windows tox won't pass it through so it has no
# effect. On Linux it doesn't make a difference one way or another.
TMPDIR: "/tmp"
run: |
tox -e integration -- --force-foolscap integration/
- name: Upload eliot.log in case of failure
uses: actions/upload-artifact@v1
uses: actions/upload-artifact@v3
if: failure()
with:
name: integration.eliot.json
@ -253,31 +258,19 @@ jobs:
steps:
- name: Check out Tahoe-LAFS sources
uses: actions/checkout@v2
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
- name: Get pip cache directory
id: pip-cache
run: |
echo "::set-output name=dir::$(pip cache dir)"
- name: Use pip cache
uses: actions/cache@v2
with:
path: ${{ steps.pip-cache.outputs.dir }}
key: ${{ runner.os }}-pip-${{ hashFiles('**/setup.py') }}
restore-keys: |
${{ runner.os }}-pip-
cache: 'pip' # caching pip dependencies
- name: Install Python packages
run: |
pip install --upgrade tox
pip install --upgrade "tox<4"
pip list
- name: Display tool versions
@ -291,7 +284,7 @@ jobs:
run: dist/Tahoe-LAFS/tahoe --version
- name: Upload PyInstaller package
uses: actions/upload-artifact@v2
uses: actions/upload-artifact@v3
with:
name: Tahoe-LAFS-${{ matrix.os }}-Python-${{ matrix.python-version }}
path: dist/Tahoe-LAFS-*-*.*

View File

@ -1,10 +0,0 @@
FROM python:2.7
ADD . /tahoe-lafs
RUN \
cd /tahoe-lafs && \
git pull --depth=100 && \
pip install . && \
rm -rf ~/.cache/
WORKDIR /root

View File

@ -1,25 +0,0 @@
FROM debian:9
LABEL maintainer "gordon@leastauthority.com"
RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get -yq upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get -yq install build-essential python-dev libffi-dev libssl-dev python-virtualenv git
RUN \
git clone https://github.com/tahoe-lafs/tahoe-lafs.git /root/tahoe-lafs; \
cd /root/tahoe-lafs; \
virtualenv --python=python2.7 venv; \
./venv/bin/pip install --upgrade setuptools; \
./venv/bin/pip install --editable .; \
./venv/bin/tahoe --version;
RUN \
cd /root; \
mkdir /root/.tahoe-client; \
mkdir /root/.tahoe-introducer; \
mkdir /root/.tahoe-server;
RUN /root/tahoe-lafs/venv/bin/tahoe create-introducer --location=tcp:introducer:3458 --port=tcp:3458 /root/.tahoe-introducer
RUN /root/tahoe-lafs/venv/bin/tahoe start /root/.tahoe-introducer
RUN /root/tahoe-lafs/venv/bin/tahoe create-node --location=tcp:server:3457 --port=tcp:3457 --introducer=$(cat /root/.tahoe-introducer/private/introducer.furl) /root/.tahoe-server
RUN /root/tahoe-lafs/venv/bin/tahoe create-client --webport=3456 --introducer=$(cat /root/.tahoe-introducer/private/introducer.furl) --basedir=/root/.tahoe-client --shares-needed=1 --shares-happy=1 --shares-total=1
VOLUME ["/root/.tahoe-client", "/root/.tahoe-server", "/root/.tahoe-introducer"]
EXPOSE 3456 3457 3458
ENTRYPOINT ["/root/tahoe-lafs/venv/bin/tahoe"]
CMD []

View File

@ -224,3 +224,62 @@ src/allmydata/_version.py:
.tox/create-venvs.log: tox.ini setup.py
tox --notest -p all | tee -a "$(@)"
# to make a new release:
# - create a ticket for the release in Trac
# - ensure local copy is up-to-date
# - create a branch like "XXXX.release" from up-to-date master
# - in the branch, run "make release"
# - run "make release-test"
# - perform any other sanity-checks on the release
# - run "make release-upload"
# Note that several commands below hard-code "meejah"; if you are
# someone else please adjust them.
release:
@echo "Is checkout clean?"
git diff-files --quiet
git diff-index --quiet --cached HEAD --
@echo "Clean docs build area"
rm -rf docs/_build/
@echo "Install required build software"
python3 -m pip install --editable .[build]
@echo "Test README"
python3 setup.py check -r -s
@echo "Update NEWS"
python3 -m towncrier build --yes --version `python3 misc/build_helpers/update-version.py --no-tag`
git add -u
git commit -m "update NEWS for release"
# note that this always bumps the "middle" number, e.g. from 1.17.1 -> 1.18.0
# and produces a tag into the Git repository
@echo "Bump version and create tag"
python3 misc/build_helpers/update-version.py
@echo "Build and sign wheel"
python3 setup.py bdist_wheel
gpg --pinentry=loopback -u meejah@meejah.ca --armor --detach-sign dist/tahoe_lafs-`git describe | cut -b 12-`-py3-none-any.whl
ls dist/*`git describe | cut -b 12-`*
@echo "Build and sign source-dist"
python3 setup.py sdist
gpg --pinentry=loopback -u meejah@meejah.ca --armor --detach-sign dist/tahoe-lafs-`git describe | cut -b 12-`.tar.gz
ls dist/*`git describe | cut -b 12-`*
# basically just a bare-minimum smoke-test that it installs and runs
release-test:
gpg --verify dist/tahoe-lafs-`git describe | cut -b 12-`.tar.gz.asc
gpg --verify dist/tahoe_lafs-`git describe | cut -b 12-`-py3-none-any.whl.asc
virtualenv testmf_venv
testmf_venv/bin/pip install dist/tahoe_lafs-`git describe | cut -b 12-`-py3-none-any.whl
testmf_venv/bin/tahoe --version
rm -rf testmf_venv
release-upload:
scp dist/*`git describe | cut -b 12-`* meejah@tahoe-lafs.org:/home/source/downloads
git push origin_push tahoe-lafs-`git describe | cut -b 12-`
twine upload dist/tahoe_lafs-`git describe | cut -b 12-`-py3-none-any.whl dist/tahoe_lafs-`git describe | cut -b 12-`-py3-none-any.whl.asc dist/tahoe-lafs-`git describe | cut -b 12-`.tar.gz dist/tahoe-lafs-`git describe | cut -b 12-`.tar.gz.asc

View File

@ -5,6 +5,47 @@ User-Visible Changes in Tahoe-LAFS
==================================
.. towncrier start line
Release 1.18.0 (2022-10-02)
'''''''''''''''''''''''''''
Backwards Incompatible Changes
------------------------------
- Python 3.6 is no longer supported, as it has reached end-of-life and is no longer receiving security updates. (`#3865 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3865>`_)
- Python 3.7 or later is now required; Python 2 is no longer supported. (`#3873 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3873>`_)
- Share corruption reports stored on disk are now always encoded in UTF-8. (`#3879 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3879>`_)
- Record both the PID and the process creation-time:
a new kind of pidfile in `running.process` records both
the PID and the creation-time of the process. This facilitates
automatic discovery of a "stale" pidfile that points to a
currently-running process. If the recorded creation-time matches
the creation-time of the running process, then it is a still-running
`tahoe run` process. Otherwise, the file is stale.
The `twistd.pid` file is no longer present. (`#3926 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3926>`_)
Features
--------
- The implementation of SDMF and MDMF (mutables) now requires RSA keys to be exactly 2048 bits, aligning them with the specification.
Some code existed to allow tests to shorten this and it's
conceptually possible a modified client produced mutables
with different key-sizes. However, the spec says that they
must be 2048 bits. If you happen to have a capability with
a key-size different from 2048 you may use 1.17.1 or earlier
to read the content. (`#3828 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3828>`_)
- "make" based release automation (`#3846 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3846>`_)
Misc/Other
----------
- `#3327 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3327>`_, `#3526 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3526>`_, `#3697 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3697>`_, `#3709 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3709>`_, `#3786 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3786>`_, `#3788 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3788>`_, `#3802 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3802>`_, `#3816 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3816>`_, `#3855 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3855>`_, `#3858 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3858>`_, `#3859 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3859>`_, `#3860 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3860>`_, `#3867 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3867>`_, `#3868 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3868>`_, `#3871 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3871>`_, `#3872 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3872>`_, `#3875 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3875>`_, `#3876 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3876>`_, `#3877 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3877>`_, `#3881 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3881>`_, `#3882 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3882>`_, `#3883 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3883>`_, `#3889 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3889>`_, `#3890 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3890>`_, `#3891 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3891>`_, `#3893 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3893>`_, `#3895 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3895>`_, `#3896 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3896>`_, `#3898 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3898>`_, `#3900 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3900>`_, `#3909 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3909>`_, `#3913 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3913>`_, `#3915 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3915>`_, `#3916 <https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3916>`_
Release 1.17.1 (2022-01-07)
'''''''''''''''''''''''''''

View File

@ -56,7 +56,7 @@ Once ``tahoe --version`` works, see `How to Run Tahoe-LAFS <docs/running.rst>`__
🐍 Python 2
-----------
Python 3.7 or later is now required.
Python 3.8 or later is required.
If you are still using Python 2.7, use Tahoe-LAFS version 1.17.1.

View File

@ -0,0 +1,138 @@
"""
First attempt at benchmarking uploads and downloads.
To run:
$ pytest benchmarks/upload_download.py -s -v -Wignore
To add latency of e.g. 60ms on Linux:
$ tc qdisc add dev lo root netem delay 30ms
To reset:
$ tc qdisc del dev lo root netem
Frequency scaling can spoil the results.
To see the range of frequency scaling on a Linux system:
$ cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_available_frequencies
And to pin the CPU frequency to the lower bound found in these files:
$ sudo cpupower frequency-set -f <lowest available frequency>
TODO Parameterization (pytest?)
- Foolscap vs not foolscap
- Number of nodes
- Data size
- Number of needed/happy/total shares.
CAVEATS: The goal here isn't a realistic benchmark, or a benchmark that will be
measured over time, or is expected to be maintainable over time. This is just
a quick and easy way to measure the speed of certain operations, compare HTTP
and Foolscap, and see the short-term impact of changes.
Eventually this will be replaced by a real benchmark suite that can be run over
time to measure something more meaningful.
"""
from time import time, process_time
from contextlib import contextmanager
from tempfile import mkdtemp
import os
from twisted.trial.unittest import TestCase
from twisted.internet.defer import gatherResults
from allmydata.util.deferredutil import async_to_deferred
from allmydata.util.consumer import MemoryConsumer
from allmydata.test.common_system import SystemTestMixin
from allmydata.immutable.upload import Data as UData
from allmydata.mutable.publish import MutableData
@contextmanager
def timeit(name):
start = time()
start_cpu = process_time()
try:
yield
finally:
print(
f"{name}: {time() - start:.3f} elapsed, {process_time() - start_cpu:.3f} CPU"
)
class ImmutableBenchmarks(SystemTestMixin, TestCase):
"""Benchmarks for immutables."""
# To use Foolscap, change to True:
FORCE_FOOLSCAP_FOR_STORAGE = False
# Don't reduce HTTP connection timeouts, that messes up the more aggressive
# benchmarks:
REDUCE_HTTP_CLIENT_TIMEOUT = False
@async_to_deferred
async def setUp(self):
SystemTestMixin.setUp(self)
self.basedir = os.path.join(mkdtemp(), "nodes")
# 2 nodes
await self.set_up_nodes(2)
# 1 share
for c in self.clients:
c.encoding_params["k"] = 1
c.encoding_params["happy"] = 1
c.encoding_params["n"] = 1
print()
@async_to_deferred
async def test_upload_and_download_immutable(self):
# To test larger files, change this:
DATA = b"Some data to upload\n" * 10
for i in range(5):
# 1. Upload:
with timeit(" upload"):
uploader = self.clients[0].getServiceNamed("uploader")
results = await uploader.upload(UData(DATA, convergence=None))
# 2. Download:
with timeit("download"):
uri = results.get_uri()
node = self.clients[1].create_node_from_uri(uri)
mc = await node.read(MemoryConsumer(), 0, None)
self.assertEqual(b"".join(mc.chunks), DATA)
@async_to_deferred
async def test_upload_and_download_mutable(self):
# To test larger files, change this:
DATA = b"Some data to upload\n" * 10
for i in range(5):
# 1. Upload:
with timeit(" upload"):
result = await self.clients[0].create_mutable_file(MutableData(DATA))
# 2. Download:
with timeit("download"):
data = await result.download_best_version()
self.assertEqual(data, DATA)
@async_to_deferred
async def test_upload_mutable_in_parallel(self):
# To test larger files, change this:
DATA = b"Some data to upload\n" * 1_000_000
with timeit(" upload"):
await gatherResults([
self.clients[0].create_mutable_file(MutableData(DATA))
for _ in range(20)
])

View File

@ -1,8 +1,7 @@
let
# sources.nix contains information about which versions of some of our
# dependencies we should use. since we use it to pin nixpkgs and the PyPI
# package database, roughly all the rest of our dependencies are *also*
# pinned - indirectly.
# dependencies we should use. since we use it to pin nixpkgs, all the rest
# of our dependencies are *also* pinned - indirectly.
#
# sources.nix is managed using a tool called `niv`. as an example, to
# update to the most recent version of nixpkgs from the 21.11 maintenance
@ -10,93 +9,41 @@ let
#
# niv update nixpkgs-21.11
#
# or, to update the PyPI package database -- which is necessary to make any
# newly released packages visible -- you likewise run:
#
# niv update pypi-deps-db
#
# niv also supports chosing a specific revision, following a different
# branch, etc. find complete documentation for the tool at
# https://github.com/nmattia/niv
sources = import nix/sources.nix;
in
{
pkgsVersion ? "nixpkgs-21.11" # a string which chooses a nixpkgs from the
pkgsVersion ? "nixpkgs-22.11" # a string which chooses a nixpkgs from the
# niv-managed sources data
, pkgs ? import sources.${pkgsVersion} { } # nixpkgs itself
, pypiData ? sources.pypi-deps-db # the pypi package database snapshot to use
# for dependency resolution
, pythonVersion ? "python310" # a string choosing the python derivation from
# nixpkgs to target
, pythonVersion ? "python37" # a string choosing the python derivation from
# nixpkgs to target
, extrasNames ? [ "tor" "i2p" ] # a list of strings identifying tahoe-lafs extras,
# the dependencies of which the resulting
# package will also depend on. Include all of the
# runtime extras by default because the incremental
# cost of including them is a lot smaller than the
# cost of re-building the whole thing to add them.
, extras ? [ "tor" "i2p" ] # a list of strings identifying tahoe-lafs extras,
# the dependencies of which the resulting package
# will also depend on. Include all of the runtime
# extras by default because the incremental cost of
# including them is a lot smaller than the cost of
# re-building the whole thing to add them.
, mach-nix ? import sources.mach-nix { # the mach-nix package to use to build
# the tahoe-lafs package
inherit pkgs pypiData;
python = pythonVersion;
}
}:
# The project name, version, and most other metadata are automatically
# extracted from the source. Some requirements are not properly extracted
# and those cases are handled below. The version can only be extracted if
# `setup.py update_version` has been run (this is not at all ideal but it
# seems difficult to fix) - so for now just be sure to run that first.
mach-nix.buildPythonPackage rec {
# Define the location of the Tahoe-LAFS source to be packaged. Clean up all
# as many of the non-source files (eg the `.git` directory, `~` backup
# files, nix's own `result` symlink, etc) as possible to avoid needing to
# re-build when files that make no difference to the package have changed.
src = pkgs.lib.cleanSource ./.;
with (pkgs.${pythonVersion}.override {
packageOverrides = import ./nix/python-overrides.nix;
}).pkgs;
callPackage ./nix/tahoe-lafs.nix {
# Select whichever package extras were requested.
inherit extras;
inherit extrasNames;
# Define some extra requirements that mach-nix does not automatically detect
# from inspection of the source. We typically don't need to put version
# constraints on any of these requirements. The pypi-deps-db we're
# operating with makes dependency resolution deterministic so as long as it
# works once it will always work. It could be that in the future we update
# pypi-deps-db and an incompatibility arises - in which case it would make
# sense to apply some version constraints here.
requirementsExtra = ''
# mach-nix does not yet support pyproject.toml which means it misses any
# build-time requirements of our dependencies which are declared in such a
# file. Tell it about them here.
setuptools_rust
# Define the location of the Tahoe-LAFS source to be packaged (the same
# directory as contains this file). Clean up as many of the non-source
# files (eg the `.git` directory, `~` backup files, nix's own `result`
# symlink, etc) as possible to avoid needing to re-build when files that
# make no difference to the package have changed.
tahoe-lafs-src = pkgs.lib.cleanSource ./.;
# mach-nix does not yet parse environment markers (e.g. "python > '3.0'")
# correctly. It misses all of our requirements which have an environment marker.
# Duplicate them here.
foolscap
eliot
pyrsistent
collections-extended
'';
# Specify where mach-nix should find packages for our Python dependencies.
# There are some reasonable defaults so we only need to specify certain
# packages where the default configuration runs into some issue.
providers = {
};
# Define certain overrides to the way Python dependencies are built.
_ = {
# Remove a click-default-group patch for a test suite problem which no
# longer applies because the project apparently no longer has a test suite
# in its source distribution.
click-default-group.patches = [];
};
passthru.meta.mach-nix = {
inherit providers _;
};
doCheck = false;
}

View File

@ -1,49 +0,0 @@
version: '2'
services:
client:
build:
context: .
dockerfile: ./Dockerfile.dev
volumes:
- ./misc:/root/tahoe-lafs/misc
- ./integration:/root/tahoe-lafs/integration
- ./src:/root/tahoe-lafs/static
- ./setup.cfg:/root/tahoe-lafs/setup.cfg
- ./setup.py:/root/tahoe-lafs/setup.py
ports:
- "127.0.0.1:3456:3456"
depends_on:
- "introducer"
- "server"
entrypoint: /root/tahoe-lafs/venv/bin/tahoe
command: ["run", "/root/.tahoe-client"]
server:
build:
context: .
dockerfile: ./Dockerfile.dev
volumes:
- ./misc:/root/tahoe-lafs/misc
- ./integration:/root/tahoe-lafs/integration
- ./src:/root/tahoe-lafs/static
- ./setup.cfg:/root/tahoe-lafs/setup.cfg
- ./setup.py:/root/tahoe-lafs/setup.py
ports:
- "127.0.0.1:3457:3457"
depends_on:
- "introducer"
entrypoint: /root/tahoe-lafs/venv/bin/tahoe
command: ["run", "/root/.tahoe-server"]
introducer:
build:
context: .
dockerfile: ./Dockerfile.dev
volumes:
- ./misc:/root/tahoe-lafs/misc
- ./integration:/root/tahoe-lafs/integration
- ./src:/root/tahoe-lafs/static
- ./setup.cfg:/root/tahoe-lafs/setup.cfg
- ./setup.py:/root/tahoe-lafs/setup.py
ports:
- "127.0.0.1:3458:3458"
entrypoint: /root/tahoe-lafs/venv/bin/tahoe
command: ["run", "/root/.tahoe-introducer"]

47
docs/check_running.py Normal file
View File

@ -0,0 +1,47 @@
import psutil
import filelock
def can_spawn_tahoe(pidfile):
"""
Determine if we can spawn a Tahoe-LAFS for the given pidfile. That
pidfile may be deleted if it is stale.
:param pathlib.Path pidfile: the file to check, that is the Path
to "running.process" in a Tahoe-LAFS configuration directory
:returns bool: True if we can spawn `tahoe run` here
"""
lockpath = pidfile.parent / (pidfile.name + ".lock")
with filelock.FileLock(lockpath):
try:
with pidfile.open("r") as f:
pid, create_time = f.read().strip().split(" ", 1)
except FileNotFoundError:
return True
# somewhat interesting: we have a pidfile
pid = int(pid)
create_time = float(create_time)
try:
proc = psutil.Process(pid)
# most interesting case: there _is_ a process running at the
# recorded PID -- but did it just happen to get that PID, or
# is it the very same one that wrote the file?
if create_time == proc.create_time():
# _not_ stale! another intance is still running against
# this configuration
return False
except psutil.NoSuchProcess:
pass
# the file is stale
pidfile.unlink()
return True
from pathlib import Path
print("can spawn?", can_spawn_tahoe(Path("running.process")))

View File

@ -980,6 +980,9 @@ the node will not use an Introducer at all.
Such "introducerless" clients must be configured with static servers (described
below), or they will not be able to upload and download files.
.. _server_list:
Static Server Definitions
=========================

View File

@ -32,6 +32,7 @@ Contents:
gpg-setup
servers
managed-grid
helper
convergence-secret
garbage-collection

342
docs/managed-grid.rst Normal file
View File

@ -0,0 +1,342 @@
Managed Grid
============
This document explains the "Grid Manager" concept and the
`grid-manager` command. Someone operating a grid may choose to use a
Grid Manager. Operators of storage-servers and clients will then be
given additional configuration in this case.
Overview and Motivation
-----------------------
In a grid using an Introducer, a client will use any storage-server
the Introducer announces (and the Introducer will announce any
storage-server that connects to it). This means that anyone with the
Introducer fURL can connect storage to the grid.
Sometimes, this is just what you want!
For some use-cases, though, you want to have clients only use certain
servers. One case might be a "managed" grid, where some entity runs
the grid; clients of this grid don't want their uploads to go to
"unmanaged" storage if some other client decides to provide storage.
One way to limit which storage servers a client connects to is via the
"server list" (:ref:`server_list`) (aka "Introducerless"
mode). Clients are given static lists of storage-servers, and connect
only to those. This means manually updating these lists if the storage
servers change, however.
Another method is for clients to use `[client] peers.preferred=`
configuration option (:ref:`Client Configuration`), which suffers
from a similar disadvantage.
Grid Manager
------------
A "grid-manager" consists of some data defining a keypair (along with
some other details) and Tahoe sub-commands to manipulate the data and
produce certificates to give to storage-servers. Certificates assert
the statement: "Grid Manager X suggests you use storage-server Y to
upload shares to" (X and Y are public-keys). Such a certificate
consists of:
- the version of the format the certificate conforms to (`1`)
- the public-key of a storage-server
- an expiry timestamp
- a signature of the above
A client will always use any storage-server for downloads (expired
certificate, or no certificate) because clients check the ciphertext
and re-assembled plaintext against the keys in the capability;
"grid-manager" certificates only control uploads.
Clients make use of this functionality by configuring one or more Grid Manager public keys.
This tells the client to only upload to storage-servers that have a currently-valid certificate from any of the Grid Managers their client allows.
In case none are configured, the default behavior (of using any storage server) prevails.
Grid Manager Data Storage
-------------------------
The data defining the grid-manager is stored in an arbitrary
directory, which you indicate with the ``--config`` option (in the
future, we may add the ability to store the data directly in a grid,
at which time you may be able to pass a directory-capability to this
option).
If you don't want to store the configuration on disk at all, you may
use ``--config -`` (the last character is a dash) and write a valid
JSON configuration to stdin.
All commands require the ``--config`` option and they all behave
similarly for "data from stdin" versus "data from disk". A directory
(and not a file) is used on disk because in that mode, each
certificate issued is also stored alongside the configuration
document; in "stdin / stdout" mode, an issued certificate is only
ever available on stdout.
The configuration is a JSON document. It is subject to change as Grid
Manager evolves. It contains a version number in the
`grid_manager_config_version` key which will increment whenever the
document schema changes.
grid-manager create
```````````````````
Create a new grid-manager.
If you specify ``--config -`` then a new grid-manager configuration is
written to stdout. Otherwise, a new grid-manager is created in the
directory specified by the ``--config`` option. It is an error if the
directory already exists.
grid-manager public-identity
````````````````````````````
Print out a grid-manager's public key. This key is derived from the
private-key of the grid-manager, so a valid grid-manager config must
be given via ``--config``
This public key is what is put in clients' configuration to actually
validate and use grid-manager certificates.
grid-manager add
````````````````
Takes two args: ``name pubkey``. The ``name`` is an arbitrary local
identifier for the new storage node (also sometimes called "a petname"
or "nickname"). The pubkey is the tahoe-encoded key from a ``node.pubkey``
file in the storage-server's node directory (minus any
whitespace). For example, if ``~/storage0`` contains a storage-node,
you might do something like this::
grid-manager --config ./gm0 add storage0 $(cat ~/storage0/node.pubkey)
This adds a new storage-server to a Grid Manager's
configuration. (Since it mutates the configuration, if you used
``--config -`` the new configuration will be printed to stdout). The
usefulness of the ``name`` is solely for reference within this Grid
Manager.
grid-manager list
`````````````````
Lists all storage-servers that have previously been added using
``grid-manager add``.
grid-manager sign
`````````````````
Takes two args: ``name expiry_days``. The ``name`` is a nickname used
previously in a ``grid-manager add`` command and ``expiry_days`` is
the number of days in the future when the certificate should expire.
Note that this mutates the state of the grid-manager if it is on disk,
by adding this certificate to our collection of issued
certificates. If you used ``--config -``, the certificate isn't
persisted anywhere except to stdout (so if you wish to keep it
somewhere, that is up to you).
This command creates a new "version 1" certificate for a
storage-server (identified by its public key). The new certificate is
printed to stdout. If you stored the config on disk, the new
certificate will (also) be in a file named like ``alice.cert.0``.
Enrolling a Storage Server: CLI
-------------------------------
tahoe admin add-grid-manager-cert
`````````````````````````````````
- `--filename`: the file to read the cert from
- `--name`: the name of this certificate
Import a "version 1" storage-certificate produced by a grid-manager A
storage server may have zero or more such certificates installed; for
now just one is sufficient. You will have to re-start your node after
this. Subsequent announcements to the Introducer will include this
certificate.
.. note::
This command will simply edit the `tahoe.cfg` file and direct you
to re-start. In the Future(tm), we should consider (in exarkun's
words):
"A python program you run as a new process" might not be the
best abstraction to layer on top of the configuration
persistence system, though. It's a nice abstraction for users
(although most users would probably rather have a GUI) but it's
not a great abstraction for automation. So at some point it
may be better if there is CLI -> public API -> configuration
persistence system. And maybe "public API" is even a network
API for the storage server so it's equally easy to access from
an agent implemented in essentially any language and maybe if
the API is exposed by the storage node itself then this also
gives you live-configuration-updates, avoiding the need for
node restarts (not that this is the only way to accomplish
this, but I think it's a good way because it avoids the need
for messes like inotify and it supports the notion that the
storage node process is in charge of its own configuration
persistence system, not just one consumer among many ... which
has some nice things going for it ... though how this interacts
exactly with further node management automation might bear
closer scrutiny).
Enrolling a Storage Server: Config
----------------------------------
You may edit the ``[storage]`` section of the ``tahoe.cfg`` file to
turn on grid-management with ``grid_management = true``. You then must
also provide a ``[grid_management_certificates]`` section in the
config-file which lists ``name = path/to/certificate`` pairs.
These certificate files are issued by the ``grid-manager sign``
command; these should be transmitted to the storage server operator
who includes them in the config for the storage server. Relative paths
are based from the node directory. Example::
[storage]
grid_management = true
[grid_management_certificates]
default = example_grid.cert
This will cause us to give this certificate to any Introducers we
connect to (and subsequently, the Introducer will give the certificate
out to clients).
Enrolling a Client: Config
--------------------------
You may instruct a Tahoe client to use only storage servers from given
Grid Managers. If there are no such keys, any servers are used
(but see https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3979). If
there are one or more keys, the client will only upload to a storage
server that has a valid certificate (from any of the keys).
To specify public-keys, add a ``[grid_managers]`` section to the
config. This consists of ``name = value`` pairs where ``name`` is an
arbitrary name and ``value`` is a public-key of a Grid
Manager. Example::
[grid_managers]
example_grid = pub-v0-vqimc4s5eflwajttsofisp5st566dbq36xnpp4siz57ufdavpvlq
See also https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3507 which
proposes a command to edit the config.
Example Setup of a New Managed Grid
-----------------------------------
This example creates an actual grid, but it's all just on one machine
with different "node directories" and a separate tahoe process for
each node. Usually of course each storage server would be on a
separate computer.
Note that we use the ``daemonize`` command in the following but that's
only one way to handle "running a command in the background". You
could instead run commands that start with ``daemonize ...`` in their
own shell/terminal window or via something like ``systemd``
We'll store our Grid Manager configuration on disk, in
``./gm0``. To initialize this directory::
grid-manager --config ./gm0 create
(If you already have a grid, you can :ref:`skip ahead <skip_ahead>`.)
First of all, create an Introducer. Note that we actually have to run
it briefly before it creates the "Introducer fURL" we want for the
next steps::
tahoe create-introducer --listen=tcp --port=5555 --location=tcp:localhost:5555 ./introducer
daemonize tahoe -d introducer run
Next, we attach a couple of storage nodes::
tahoe create-node --introducer $(cat introducer/private/introducer.furl) --nickname storage0 --webport 6001 --location tcp:localhost:6003 --port 6003 ./storage0
tahoe create-node --introducer $(cat introducer/private/introducer.furl) --nickname storage1 --webport 6101 --location tcp:localhost:6103 --port 6103 ./storage1
daemonize tahoe -d storage0 run
daemonize tahoe -d storage1 run
.. _skip_ahead:
We can now tell the Grid Manager about our new storage servers::
grid-manager --config ./gm0 add storage0 $(cat storage0/node.pubkey)
grid-manager --config ./gm0 add storage1 $(cat storage1/node.pubkey)
To produce a new certificate for each node, we do this::
grid-manager --config ./gm0 sign storage0 > ./storage0/gridmanager.cert
grid-manager --config ./gm0 sign storage1 > ./storage1/gridmanager.cert
Now, we want our storage servers to actually announce these
certificates into the grid. We do this by adding some configuration
(in ``tahoe.cfg``)::
[storage]
grid_management = true
[grid_manager_certificates]
default = gridmanager.cert
Add the above bit to each node's ``tahoe.cfg`` and re-start the
storage nodes. (Alternatively, use the ``tahoe add-grid-manager``
command).
Now try adding a new storage server ``storage2``. This client can join
the grid just fine, and announce itself to the Introducer as providing
storage::
tahoe create-node --introducer $(cat introducer/private/introducer.furl) --nickname storage2 --webport 6301 --location tcp:localhost:6303 --port 6303 ./storage2
daemonize tahoe -d storage2 run
At this point any client will upload to any of these three
storage-servers. Make a client "alice" and try!
::
tahoe create-client --introducer $(cat introducer/private/introducer.furl) --nickname alice --webport 6401 --shares-total=3 --shares-needed=2 --shares-happy=3 ./alice
daemonize tahoe -d alice run
tahoe -d alice put README.rst # prints out a read-cap
find storage2/storage/shares # confirm storage2 has a share
Now we want to make Alice only upload to the storage servers that the
grid-manager has given certificates to (``storage0`` and
``storage1``). We need the grid-manager's public key to put in Alice's
configuration::
grid-manager --config ./gm0 public-identity
Put the key printed out above into Alice's ``tahoe.cfg`` in section
``client``::
[grid_managers]
example_name = pub-v0-vqimc4s5eflwajttsofisp5st566dbq36xnpp4siz57ufdavpvlq
Now, re-start the "alice" client. Since we made Alice's parameters
require 3 storage servers to be reachable (``--happy=3``), all their
uploads should now fail (so ``tahoe put`` will fail) because they
won't use storage2 and thus can't "achieve happiness".
A proposal to expose more information about Grid Manager and
certificate status in the Welcome page is discussed in
https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3506

View File

@ -82,8 +82,9 @@ network: A
memory footprint: N/K*A
notes: Tahoe-LAFS generates a new RSA keypair for each mutable file that it
publishes to a grid. This takes up to 1 or 2 seconds on a typical desktop PC.
notes:
Tahoe-LAFS generates a new RSA keypair for each mutable file that it publishes to a grid.
This takes around 100 milliseconds on a relatively high-end laptop from 2021.
Part of the process of encrypting, encoding, and uploading a mutable file to a
Tahoe-LAFS grid requires that the entire file be in memory at once. For larger

View File

@ -3,7 +3,7 @@
Storage Node Protocol ("Great Black Swamp", "GBS")
==================================================
The target audience for this document is Tahoe-LAFS developers.
The target audience for this document is developers working on Tahoe-LAFS or on an alternate implementation intended to be interoperable.
After reading this document,
one should expect to understand how Tahoe-LAFS clients interact over the network with Tahoe-LAFS storage nodes.
@ -30,12 +30,12 @@ Glossary
introducer
a Tahoe-LAFS process at a known location configured to re-publish announcements about the location of storage servers
fURL
:ref:`fURLs <fURLs>`
a self-authenticating URL-like string which can be used to locate a remote object using the Foolscap protocol
(the storage service is an example of such an object)
NURL
a self-authenticating URL-like string almost exactly like a NURL but without being tied to Foolscap
:ref:`NURLs <NURLs>`
a self-authenticating URL-like string almost exactly like a fURL but without being tied to Foolscap
swissnum
a short random string which is part of a fURL/NURL and which acts as a shared secret to authorize clients to use a storage service
@ -64,6 +64,10 @@ Glossary
lease renew secret
a short secret string which storage servers required to be presented before allowing a particular lease to be renewed
The key words
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL"
in this document are to be interpreted as described in RFC 2119.
Motivation
----------
@ -119,8 +123,8 @@ An HTTP-based protocol can make use of TLS in largely the same way to provide th
Provision of these properties *is* dependant on implementers following Great Black Swamp's rules for x509 certificate validation
(rather than the standard "web" rules for validation).
Requirements
------------
Design Requirements
-------------------
Security
~~~~~~~~
@ -189,6 +193,9 @@ Solutions
An HTTP-based protocol, dubbed "Great Black Swamp" (or "GBS"), is described below.
This protocol aims to satisfy the above requirements at a lower level of complexity than the current Foolscap-based protocol.
Summary (Non-normative)
~~~~~~~~~~~~~~~~~~~~~~~
Communication with the storage node will take place using TLS.
The TLS version and configuration will be dictated by an ongoing understanding of best practices.
The storage node will present an x509 certificate during the TLS handshake.
@ -237,10 +244,10 @@ When Bob's client issues HTTP requests to Alice's storage node it includes the *
.. note::
Foolscap TubIDs are 20 bytes (SHA1 digest of the certificate).
They are encoded with Base32 for a length of 32 bytes.
They are encoded with `Base32`_ for a length of 32 bytes.
SPKI information discussed here is 32 bytes (SHA256 digest).
They would be encoded in Base32 for a length of 52 bytes.
`base64url`_ provides a more compact encoding of the information while remaining URL-compatible.
They would be encoded in `Base32`_ for a length of 52 bytes.
`unpadded base64url`_ provides a more compact encoding of the information while remaining URL-compatible.
This would encode the SPKI information for a length of merely 43 bytes.
SHA1,
the current Foolscap hash function,
@ -329,15 +336,117 @@ and shares.
A particular resource is addressed by the HTTP request path.
Details about the interface are encoded in the HTTP message body.
String Encoding
~~~~~~~~~~~~~~~
.. _Base32:
Base32
!!!!!!
Where the specification refers to Base32 the meaning is *unpadded* Base32 encoding as specified by `RFC 4648`_ using a *lowercase variation* of the alphabet from Section 6.
That is, the alphabet is:
.. list-table:: Base32 Alphabet
:header-rows: 1
* - Value
- Encoding
- Value
- Encoding
- Value
- Encoding
- Value
- Encoding
* - 0
- a
- 9
- j
- 18
- s
- 27
- 3
* - 1
- b
- 10
- k
- 19
- t
- 28
- 4
* - 2
- c
- 11
- l
- 20
- u
- 29
- 5
* - 3
- d
- 12
- m
- 21
- v
- 30
- 6
* - 4
- e
- 13
- n
- 22
- w
- 31
- 7
* - 5
- f
- 14
- o
- 23
- x
-
-
* - 6
- g
- 15
- p
- 24
- y
-
-
* - 7
- h
- 16
- q
- 25
- z
-
-
* - 8
- i
- 17
- r
- 26
- 2
-
-
Message Encoding
~~~~~~~~~~~~~~~~
The preferred encoding for HTTP message bodies is `CBOR`_.
A request may be submitted using an alternate encoding by declaring this in the ``Content-Type`` header.
A request may indicate its preference for an alternate encoding in the response using the ``Accept`` header.
These two headers are used in the typical way for an HTTP application.
Clients and servers MUST use the ``Content-Type`` and ``Accept`` header fields as specified in `RFC 9110`_ for message body negotiation.
The only other encoding support for which is currently recommended is JSON.
The encoding for HTTP message bodies SHOULD be `CBOR`_.
Clients submitting requests using this encoding MUST include a ``Content-Type: application/cbor`` request header field.
A request MAY be submitted using an alternate encoding by declaring this in the ``Content-Type`` header field.
A request MAY indicate its preference for an alternate encoding in the response using the ``Accept`` header field.
A request which includes no ``Accept`` header field MUST be interpreted in the same way as a request including a ``Accept: application/cbor`` header field.
Clients and servers MAY support additional request and response message body encodings.
Clients and servers SHOULD support ``application/json`` request and response message body encoding.
For HTTP messages carrying binary share data,
this is expected to be a particularly poor encoding.
However,
@ -350,10 +459,23 @@ Because of the simple types used throughout
and the equivalence described in `RFC 7049`_
these examples should be representative regardless of which of these two encodings is chosen.
The one exception is sets.
For CBOR messages, any sequence that is semantically a set (i.e. no repeated values allowed, order doesn't matter, and elements are hashable in Python) should be sent as a set.
Tag 6.258 is used to indicate sets in CBOR; see `the CBOR registry <https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>`_ for more details.
Sets will be represented as JSON lists in examples because JSON doesn't support sets.
There are two exceptions to this rule.
1. Sets
!!!!!!!
For CBOR messages,
any sequence that is semantically a set (i.e. no repeated values allowed, order doesn't matter, and elements are hashable in Python) should be sent as a set.
Tag 6.258 is used to indicate sets in CBOR;
see `the CBOR registry <https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml>`_ for more details.
The JSON encoding does not support sets.
Sets MUST be represented as arrays in JSON-encoded messages.
2. Bytes
!!!!!!!!
The CBOR encoding natively supports a bytes type while the JSON encoding does not.
Bytes MUST be represented as strings giving the `Base64`_ representation of the original bytes value.
HTTP Design
~~~~~~~~~~~
@ -368,55 +490,81 @@ one branch contains all of the share data;
another branch contains all of the lease data;
etc.
An ``Authorization`` header in requests is required for all endpoints.
The standard HTTP authorization protocol is used.
The authentication *type* used is ``Tahoe-LAFS``.
The swissnum from the NURL used to locate the storage service is used as the *credentials*.
If credentials are not presented or the swissnum is not associated with a storage service then no storage processing is performed and the request receives an ``401 UNAUTHORIZED`` response.
Clients and servers MUST use the ``Authorization`` header field,
as specified in `RFC 9110`_,
for authorization of all requests to all endpoints specified here.
The authentication *type* MUST be ``Tahoe-LAFS``.
Clients MUST present the `Base64`_-encoded representation of the swissnum from the NURL used to locate the storage service as the *credentials*.
There are also, for some endpoints, secrets sent via ``X-Tahoe-Authorization`` headers.
If these are:
If credentials are not presented or the swissnum is not associated with a storage service then the server MUST issue a ``401 UNAUTHORIZED`` response and perform no other processing of the message.
Requests to certain endpoints MUST include additional secrets in the ``X-Tahoe-Authorization`` headers field.
The endpoints which require these secrets are:
* ``PUT /storage/v1/lease/:storage_index``:
The secrets included MUST be ``lease-renew-secret`` and ``lease-cancel-secret``.
* ``POST /storage/v1/immutable/:storage_index``:
The secrets included MUST be ``lease-renew-secret``, ``lease-cancel-secret``, and ``upload-secret``.
* ``PATCH /storage/v1/immutable/:storage_index/:share_number``:
The secrets included MUST be ``upload-secret``.
* ``PUT /storage/v1/immutable/:storage_index/:share_number/abort``:
The secrets included MUST be ``upload-secret``.
* ``POST /storage/v1/mutable/:storage_index/read-test-write``:
The secrets included MUST be ``lease-renew-secret``, ``lease-cancel-secret``, and ``write-enabler``.
If these secrets are:
1. Missing.
2. The wrong length.
3. Not the expected kind of secret.
4. They are otherwise unparseable before they are actually semantically used.
the server will respond with ``400 BAD REQUEST``.
the server MUST respond with ``400 BAD REQUEST`` and perform no other processing of the message.
401 is not used because this isn't an authorization problem, this is a "you sent garbage and should know better" bug.
If authorization using the secret fails, then a ``401 UNAUTHORIZED`` response should be sent.
If authorization using the secret fails,
then the server MUST send a ``401 UNAUTHORIZED`` response and perform no other processing of the message.
Encoding
~~~~~~~~
* ``storage_index`` should be base32 encoded (RFC3548) in URLs.
* ``storage_index`` MUST be `Base32`_ encoded in URLs.
* ``share_number`` MUST be a decimal representation
General
~~~~~~~
``GET /v1/version``
!!!!!!!!!!!!!!!!!!!
``GET /storage/v1/version``
!!!!!!!!!!!!!!!!!!!!!!!!!!!
Retrieve information about the version of the storage server.
Information is returned as an encoded mapping.
For example::
This endpoint allows clients to retrieve some basic metadata about a storage server from the storage service.
The response MUST validate against this CDDL schema::
{ "http://allmydata.org/tahoe/protocols/storage/v1" :
{ "maximum-immutable-share-size": 1234,
"maximum-mutable-share-size": 1235,
"available-space": 123456,
"tolerates-immutable-read-overrun": true,
"delete-mutable-shares-with-zero-length-writev": true,
"fills-holes-with-zero-bytes": true,
"prevents-read-past-end-of-share-data": true,
"gbs-anonymous-storage-url": "pb://...#v=1"
},
"application-version": "1.13.0"
}
{'http://allmydata.org/tahoe/protocols/storage/v1' => {
'maximum-immutable-share-size' => uint
'maximum-mutable-share-size' => uint
'available-space' => uint
}
'application-version' => bstr
}
``PUT /v1/lease/:storage_index``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
The server SHOULD populate as many fields as possible with accurate information about its behavior.
For fields which relate to a specific API
the semantics are documented below in the section for that API.
For fields that are more general than a single API the semantics are as follows:
* available-space:
The server SHOULD use this field to advertise the amount of space that it currently considers unused and is willing to allocate for client requests.
The value is a number of bytes.
``PUT /storage/v1/lease/:storage_index``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Either renew or create a new lease on the bucket addressed by ``storage_index``.
@ -468,25 +616,41 @@ Immutable
Writing
~~~~~~~
``POST /v1/immutable/:storage_index``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
``POST /storage/v1/immutable/:storage_index``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Initialize an immutable storage index with some buckets.
The buckets may have share data written to them once.
A lease is also created for the shares.
The server MUST allow share data to be written to the buckets at most one time.
The server MAY create a lease for the buckets.
Details of the buckets to create are encoded in the request body.
The request body MUST validate against this CDDL schema::
{
share-numbers: #6.258([0*256 uint])
allocated-size: uint
}
For example::
{"share-numbers": [1, 7, ...], "allocated-size": 12345}
The request must include ``X-Tahoe-Authorization`` HTTP headers that set the various secrets—upload, lease renewal, lease cancellation—that will be later used to authorize various operations.
The server SHOULD accept a value for **allocated-size** that is less than or equal to the lesser of the values of the server's version message's **maximum-immutable-share-size** or **available-space** values.
The request MUST include ``X-Tahoe-Authorization`` HTTP headers that set the various secrets—upload, lease renewal, lease cancellation—that will be later used to authorize various operations.
For example::
X-Tahoe-Authorization: lease-renew-secret <base64-lease-renew-secret>
X-Tahoe-Authorization: lease-cancel-secret <base64-lease-cancel-secret>
X-Tahoe-Authorization: upload-secret <base64-upload-secret>
The response body includes encoded information about the created buckets.
The response body MUST include encoded information about the created buckets.
The response body MUST validate against this CDDL schema::
{
already-have: #6.258([0*256 uint])
allocated: #6.258([0*256 uint])
}
For example::
{"already-have": [1, ...], "allocated": [7, ...]}
@ -504,7 +668,7 @@ Handling repeat calls:
Discussion
``````````
We considered making this ``POST /v1/immutable`` instead.
We considered making this ``POST /storage/v1/immutable`` instead.
The motivation was to keep *storage index* out of the request URL.
Request URLs have an elevated chance of being logged by something.
We were concerned that having the *storage index* logged may increase some risks.
@ -539,31 +703,39 @@ Rejected designs for upload secrets:
it must contain randomness.
Randomness means there is no need to have a secret per share, since adding share-specific content to randomness doesn't actually make the secret any better.
``PATCH /v1/immutable/:storage_index/:share_number``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
``PATCH /storage/v1/immutable/:storage_index/:share_number``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Write data for the indicated share.
The share number must belong to the storage index.
The request body is the raw share data (i.e., ``application/octet-stream``).
*Content-Range* requests are required; for large transfers this allows partially complete uploads to be resumed.
The share number MUST belong to the storage index.
The request body MUST be the raw share data (i.e., ``application/octet-stream``).
The request MUST include a *Content-Range* header field;
for large transfers this allows partially complete uploads to be resumed.
For example,
a 1MiB share can be divided in to eight separate 128KiB chunks.
Each chunk can be uploaded in a separate request.
Each request can include a *Content-Range* value indicating its placement within the complete share.
If any one of these requests fails then at most 128KiB of upload work needs to be retried.
The server must recognize when all of the data has been received and mark the share as complete
The server MUST recognize when all of the data has been received and mark the share as complete
(which it can do because it was informed of the size when the storage index was initialized).
The request must include a ``X-Tahoe-Authorization`` header that includes the upload secret::
The request MUST include a ``X-Tahoe-Authorization`` header that includes the upload secret::
X-Tahoe-Authorization: upload-secret <base64-upload-secret>
Responses:
* When a chunk that does not complete the share is successfully uploaded the response is ``OK``.
The response body indicates the range of share data that has yet to be uploaded.
That is::
* When a chunk that does not complete the share is successfully uploaded the response MUST be ``OK``.
The response body MUST indicate the range of share data that has yet to be uploaded.
The response body MUST validate against this CDDL schema::
{
required: [0* {begin: uint, end: uint}]
}
For example::
{ "required":
[ { "begin": <byte position, inclusive>
@ -574,29 +746,12 @@ Responses:
]
}
* When the chunk that completes the share is successfully uploaded the response is ``CREATED``.
* When the chunk that completes the share is successfully uploaded the response MUST be ``CREATED``.
* If the *Content-Range* for a request covers part of the share that has already,
and the data does not match already written data,
the response is ``CONFLICT``.
At this point the only thing to do is abort the upload and start from scratch (see below).
``PUT /v1/immutable/:storage_index/:share_number/abort``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
This cancels an *in-progress* upload.
The request must include a ``X-Tahoe-Authorization`` header that includes the upload secret::
X-Tahoe-Authorization: upload-secret <base64-upload-secret>
The response code:
* When the upload is still in progress and therefore the abort has succeeded,
the response is ``OK``.
Future uploads can start from scratch with no pre-existing upload state stored on the server.
* If the uploaded has already finished, the response is 405 (Method Not Allowed)
and no change is made.
the response MUST be ``CONFLICT``.
In this case the client MUST abort the upload.
The client MAY then restart the upload from scratch.
Discussion
``````````
@ -616,48 +771,85 @@ From RFC 7231::
PATCH method defined in [RFC5789]).
``POST /v1/immutable/:storage_index/:share_number/corrupt``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Advise the server the data read from the indicated share was corrupt. The
request body includes an human-meaningful text string with details about the
corruption. It also includes potentially important details about the share.
``PUT /storage/v1/immutable/:storage_index/:share_number/abort``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
This cancels an *in-progress* upload.
The request MUST include a ``X-Tahoe-Authorization`` header that includes the upload secret::
X-Tahoe-Authorization: upload-secret <base64-upload-secret>
If there is an incomplete upload with a matching upload-secret then the server MUST consider the abort to have succeeded.
In this case the response MUST be ``OK``.
The server MUST respond to all future requests as if the operations related to this upload did not take place.
If there is no incomplete upload with a matching upload-secret then the server MUST respond with ``Method Not Allowed`` (405).
The server MUST make no client-visible changes to its state in this case.
``POST /storage/v1/immutable/:storage_index/:share_number/corrupt``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Advise the server the data read from the indicated share was corrupt.
The request body includes an human-meaningful text string with details about the corruption.
It also includes potentially important details about the share.
The request body MUST validate against this CDDL schema::
{
reason: tstr .size (1..32765)
}
For example::
{"reason": u"expected hash abcd, got hash efgh"}
{"reason": "expected hash abcd, got hash efgh"}
.. share-type, storage-index, and share-number are inferred from the URL
The report pertains to the immutable share with a **storage index** and **share number** given in the request path.
If the identified **storage index** and **share number** are known to the server then the response SHOULD be accepted and made available to server administrators.
In this case the response SHOULD be ``OK``.
If the response is not accepted then the response SHOULD be ``Not Found`` (404).
The response code is OK (200) by default, or NOT FOUND (404) if the share
couldn't be found.
Discussion
``````````
The seemingly odd length limit on ``reason`` is chosen so that the *encoded* representation of the message is limited to 32768.
Reading
~~~~~~~
``GET /v1/immutable/:storage_index/shares``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
``GET /storage/v1/immutable/:storage_index/shares``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Retrieve a list (semantically, a set) indicating all shares available for the
indicated storage index. For example::
Retrieve a list (semantically, a set) indicating all shares available for the indicated storage index.
The response body MUST validate against this CDDL schema::
#6.258([0*256 uint])
For example::
[1, 5]
An unknown storage index results in an empty list.
If the **storage index** in the request path is not known to the server then the response MUST include an empty list.
``GET /v1/immutable/:storage_index/:share_number``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
``GET /storage/v1/immutable/:storage_index/:share_number``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Read a contiguous sequence of bytes from one share in one bucket.
The response body is the raw share data (i.e., ``application/octet-stream``).
The ``Range`` header may be used to request exactly one ``bytes`` range, in which case the response code will be 206 (partial content).
Interpretation and response behavior is as specified in RFC 7233 § 4.1.
Multiple ranges in a single request are *not* supported; open-ended ranges are also not supported.
The response body MUST be the raw share data (i.e., ``application/octet-stream``).
The ``Range`` header MAY be used to request exactly one ``bytes`` range,
in which case the response code MUST be ``Partial Content`` (206).
Interpretation and response behavior MUST be as specified in RFC 7233 § 4.1.
Multiple ranges in a single request are *not* supported;
open-ended ranges are also not supported.
Clients MUST NOT send requests using these features.
If the response reads beyond the end of the data, the response may be shorter than the requested range.
The resulting ``Content-Range`` header will be consistent with the returned data.
If the response reads beyond the end of the data,
the response MUST be shorter than the requested range.
It MUST contain all data up to the end of the share and then end.
The resulting ``Content-Range`` header MUST be consistent with the returned data.
If the response to a query is an empty range, the ``NO CONTENT`` (204) response code will be used.
If the response to a query is an empty range,
the server MUST send a ``No Content`` (204) response.
Discussion
``````````
@ -686,8 +878,8 @@ Mutable
Writing
~~~~~~~
``POST /v1/mutable/:storage_index/read-test-write``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
``POST /storage/v1/mutable/:storage_index/read-test-write``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
General purpose read-test-and-write operation for mutable storage indexes.
A mutable storage index is also called a "slot"
@ -696,13 +888,27 @@ The first write operation on a mutable storage index creates it
(that is,
there is no separate "create this storage index" operation as there is for the immutable storage index type).
The request must include ``X-Tahoe-Authorization`` headers with write enabler and lease secrets::
The request MUST include ``X-Tahoe-Authorization`` headers with write enabler and lease secrets::
X-Tahoe-Authorization: write-enabler <base64-write-enabler-secret>
X-Tahoe-Authorization: lease-cancel-secret <base64-lease-cancel-secret>
X-Tahoe-Authorization: lease-renew-secret <base64-lease-renew-secret>
The request body includes test, read, and write vectors for the operation.
The request body MUST include test, read, and write vectors for the operation.
The request body MUST validate against this CDDL schema::
{
"test-write-vectors": {
0*256 share_number : {
"test": [0*30 {"offset": uint, "size": uint, "specimen": bstr}]
"write": [* {"offset": uint, "data": bstr}]
"new-length": uint / null
}
}
"read-vector": [0*30 {"offset": uint, "size": uint}]
}
share_number = uint
For example::
{
@ -725,6 +931,14 @@ For example::
The response body contains a boolean indicating whether the tests all succeed
(and writes were applied) and a mapping giving read data (pre-write).
The response body MUST validate against this CDDL schema::
{
"success": bool,
"data": {0*256 share_number: [0* bstr]}
}
share_number = uint
For example::
{
@ -736,37 +950,57 @@ For example::
}
}
A test vector or read vector that read beyond the boundaries of existing data will return nothing for any bytes past the end.
As a result, if there is no data at all, an empty bytestring is returned no matter what the offset or length.
A client MAY send a test vector or read vector to bytes beyond the end of existing data.
In this case a server MUST behave as if the test or read vector referred to exactly as much data exists.
For example,
consider the case where the server has 5 bytes of data for a particular share.
If a client sends a read vector with an ``offset`` of 1 and a ``size`` of 4 then the server MUST respond with all of the data except the first byte.
If a client sends a read vector with the same ``offset`` and a ``size`` of 5 (or any larger value) then the server MUST respond in the same way.
Similarly,
if there is no data at all,
an empty byte string is returned no matter what the offset or length.
Reading
~~~~~~~
``GET /v1/mutable/:storage_index/shares``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
``GET /storage/v1/mutable/:storage_index/shares``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Retrieve a set indicating all shares available for the indicated storage index.
For example (this is shown as list, since it will be list for JSON, but will be set for CBOR)::
The response body MUST validate against this CDDL schema::
#6.258([0*256 uint])
For example::
[1, 5]
``GET /v1/mutable/:storage_index/:share_number``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
``GET /storage/v1/mutable/:storage_index/:share_number``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Read data from the indicated mutable shares, just like ``GET /v1/immutable/:storage_index``
Read data from the indicated mutable shares, just like ``GET /storage/v1/immutable/:storage_index``.
The ``Range`` header may be used to request exactly one ``bytes`` range, in which case the response code will be 206 (partial content).
Interpretation and response behavior is as specified in RFC 7233 § 4.1.
Multiple ranges in a single request are *not* supported; open-ended ranges are also not supported.
The response body MUST be the raw share data (i.e., ``application/octet-stream``).
The ``Range`` header MAY be used to request exactly one ``bytes`` range,
in which case the response code MUST be ``Partial Content`` (206).
Interpretation and response behavior MUST be specified in RFC 7233 § 4.1.
Multiple ranges in a single request are *not* supported;
open-ended ranges are also not supported.
Clients MUST NOT send requests using these features.
If the response reads beyond the end of the data, the response may be shorter than the requested range.
The resulting ``Content-Range`` header will be consistent with the returned data.
If the response reads beyond the end of the data,
the response MUST be shorter than the requested range.
It MUST contain all data up to the end of the share and then end.
The resulting ``Content-Range`` header MUST be consistent with the returned data.
If the response to a query is an empty range, the ``NO CONTENT`` (204) response code will be used.
If the response to a query is an empty range,
the server MUST send a ``No Content`` (204) response.
``POST /v1/mutable/:storage_index/:share_number/corrupt``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
``POST /storage/v1/mutable/:storage_index/:share_number/corrupt``
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Advise the server the data read from the indicated share was corrupt.
Just like the immutable version.
@ -774,12 +1008,15 @@ Just like the immutable version.
Sample Interactions
-------------------
This section contains examples of client/server interactions to help illuminate the above specification.
This section is non-normative.
Immutable Data
~~~~~~~~~~~~~~
1. Create a bucket for storage index ``AAAAAAAAAAAAAAAA`` to hold two immutable shares, discovering that share ``1`` was already uploaded::
POST /v1/immutable/AAAAAAAAAAAAAAAA
POST /storage/v1/immutable/AAAAAAAAAAAAAAAA
Authorization: Tahoe-LAFS nurl-swissnum
X-Tahoe-Authorization: lease-renew-secret efgh
X-Tahoe-Authorization: lease-cancel-secret jjkl
@ -792,23 +1029,25 @@ Immutable Data
#. Upload the content for immutable share ``7``::
PATCH /v1/immutable/AAAAAAAAAAAAAAAA/7
PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
Authorization: Tahoe-LAFS nurl-swissnum
Content-Range: bytes 0-15/48
X-Tahoe-Authorization: upload-secret xyzf
<first 16 bytes of share data>
200 OK
{ "required": [ {"begin": 16, "end": 48 } ] }
PATCH /v1/immutable/AAAAAAAAAAAAAAAA/7
PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
Authorization: Tahoe-LAFS nurl-swissnum
Content-Range: bytes 16-31/48
X-Tahoe-Authorization: upload-secret xyzf
<second 16 bytes of share data>
200 OK
{ "required": [ {"begin": 32, "end": 48 } ] }
PATCH /v1/immutable/AAAAAAAAAAAAAAAA/7
PATCH /storage/v1/immutable/AAAAAAAAAAAAAAAA/7
Authorization: Tahoe-LAFS nurl-swissnum
Content-Range: bytes 32-47/48
X-Tahoe-Authorization: upload-secret xyzf
@ -818,16 +1057,17 @@ Immutable Data
#. Download the content of the previously uploaded immutable share ``7``::
GET /v1/immutable/AAAAAAAAAAAAAAAA?share=7
GET /storage/v1/immutable/AAAAAAAAAAAAAAAA?share=7
Authorization: Tahoe-LAFS nurl-swissnum
Range: bytes=0-47
200 OK
Content-Range: bytes 0-47/48
<complete 48 bytes of previously uploaded data>
#. Renew the lease on all immutable shares in bucket ``AAAAAAAAAAAAAAAA``::
PUT /v1/lease/AAAAAAAAAAAAAAAA
PUT /storage/v1/lease/AAAAAAAAAAAAAAAA
Authorization: Tahoe-LAFS nurl-swissnum
X-Tahoe-Authorization: lease-cancel-secret jjkl
X-Tahoe-Authorization: lease-renew-secret efgh
@ -842,7 +1082,7 @@ The special test vector of size 1 but empty bytes will only pass
if there is no existing share,
otherwise it will read a byte which won't match `b""`::
POST /v1/mutable/BBBBBBBBBBBBBBBB/read-test-write
POST /storage/v1/mutable/BBBBBBBBBBBBBBBB/read-test-write
Authorization: Tahoe-LAFS nurl-swissnum
X-Tahoe-Authorization: write-enabler abcd
X-Tahoe-Authorization: lease-cancel-secret efgh
@ -874,7 +1114,7 @@ otherwise it will read a byte which won't match `b""`::
#. Safely rewrite the contents of a known version of mutable share number ``3`` (or fail)::
POST /v1/mutable/BBBBBBBBBBBBBBBB/read-test-write
POST /storage/v1/mutable/BBBBBBBBBBBBBBBB/read-test-write
Authorization: Tahoe-LAFS nurl-swissnum
X-Tahoe-Authorization: write-enabler abcd
X-Tahoe-Authorization: lease-cancel-secret efgh
@ -906,24 +1146,33 @@ otherwise it will read a byte which won't match `b""`::
#. Download the contents of share number ``3``::
GET /v1/mutable/BBBBBBBBBBBBBBBB?share=3&offset=0&size=10
GET /storage/v1/mutable/BBBBBBBBBBBBBBBB?share=3
Authorization: Tahoe-LAFS nurl-swissnum
Range: bytes=0-16
200 OK
Content-Range: bytes 0-15/16
<complete 16 bytes of previously uploaded data>
#. Renew the lease on previously uploaded mutable share in slot ``BBBBBBBBBBBBBBBB``::
PUT /v1/lease/BBBBBBBBBBBBBBBB
PUT /storage/v1/lease/BBBBBBBBBBBBBBBB
Authorization: Tahoe-LAFS nurl-swissnum
X-Tahoe-Authorization: lease-cancel-secret efgh
X-Tahoe-Authorization: lease-renew-secret ijkl
204 NO CONTENT
.. _Base64: https://www.rfc-editor.org/rfc/rfc4648#section-4
.. _RFC 4648: https://tools.ietf.org/html/rfc4648
.. _RFC 7469: https://tools.ietf.org/html/rfc7469#section-2.4
.. _RFC 7049: https://tools.ietf.org/html/rfc7049#section-4
.. _RFC 9110: https://tools.ietf.org/html/rfc9110
.. _CBOR: http://cbor.io/
.. [#]
@ -968,7 +1217,7 @@ otherwise it will read a byte which won't match `b""`::
spki_encoded = urlsafe_b64encode(spki_sha256)
assert spki_encoded == tub_id
Note we use `base64url`_ rather than the Foolscap- and Tahoe-LAFS-preferred Base32.
Note we use `unpadded base64url`_ rather than the Foolscap- and Tahoe-LAFS-preferred Base32.
.. [#]
https://www.cvedetails.com/cve/CVE-2017-5638/
@ -979,6 +1228,6 @@ otherwise it will read a byte which won't match `b""`::
.. [#]
https://efail.de/
.. _base64url: https://tools.ietf.org/html/rfc7515#appendix-C
.. _unpadded base64url: https://tools.ietf.org/html/rfc7515#appendix-C
.. _attacking SHA1: https://en.wikipedia.org/wiki/SHA-1#Attacks

View File

@ -124,6 +124,35 @@ Tahoe-LAFS.
.. _magic wormhole: https://magic-wormhole.io/
Multiple Instances
------------------
Running multiple instances against the same configuration directory isn't supported.
This will lead to undefined behavior and could corrupt the configuration or state.
We attempt to avoid this situation with a "pidfile"-style file in the config directory called ``running.process``.
There may be a parallel file called ``running.process.lock`` in existence.
The ``.lock`` file exists to make sure only one process modifies ``running.process`` at once.
The lock file is managed by the `lockfile <https://pypi.org/project/lockfile/>`_ library.
If you wish to make use of ``running.process`` for any reason you should also lock it and follow the semantics of lockfile.
If ``running.process`` exists then it contains the PID and the creation-time of the process.
When no such file exists, there is no other process running on this configuration.
If there is a ``running.process`` file, it may be a leftover file or it may indicate that another process is running against this config.
To tell the difference, determine if the PID in the file exists currently.
If it does, check the creation-time of the process versus the one in the file.
If these match, there is another process currently running and using this config.
Otherwise, the file is stale -- it should be removed before starting Tahoe-LAFS.
Some example Python code to check the above situations:
.. literalinclude:: check_running.py
A note about small grids
------------------------

View File

@ -267,7 +267,7 @@ How well does this design meet the goals?
value, so there are no opportunities for staleness
9. monotonicity: VERY: the single point of access also protects against
retrograde motion
Confidentiality leaks in the storage servers
@ -332,8 +332,9 @@ MDMF design rules allow for efficient random-access reads from the middle of
the file, which would give the index something useful to point at.
The current SDMF design generates a new RSA public/private keypair for each
directory. This takes considerable time and CPU effort, generally one or two
seconds per directory. We have designed (but not yet built) a DSA-based
directory. This takes some time and CPU effort (around 100 milliseconds on a
relatively high-end 2021 laptop) per directory.
We have designed (but not yet built) a DSA-based
mutable file scheme which will use shared parameters to reduce the
directory-creation effort to a bare minimum (picking a random number instead
of generating two random primes).
@ -363,7 +364,7 @@ single child, looking up a single child) would require pulling or pushing a
lot of unrelated data, increasing network overhead (and necessitating
test-and-set semantics for the modification side, which increases the chances
that a user operation will fail, making it more challenging to provide
promises of atomicity to the user).
promises of atomicity to the user).
It would also make it much more difficult to enable the delegation
("sharing") of specific directories. Since each aggregate "realm" provides
@ -469,4 +470,3 @@ Preventing delegation between communication parties is just as pointless as
asking Bob to forget previously accessed files. However, there may be value
to configuring the UI to ask Carol to not share files with Bob, or to
removing all files from Bob's view at the same time his access is revoked.

View File

@ -7,6 +7,8 @@ These are not to be confused with the URI-like capabilities Tahoe-LAFS uses to r
An attempt is also made to outline the rationale for certain choices about these URLs.
The intended audience for this document is Tahoe-LAFS maintainers and other developers interested in interoperating with Tahoe-LAFS or these URLs.
.. _furls:
Background
----------
@ -31,6 +33,8 @@ The client's use of the swissnum is what allows the server to authorize the clie
.. _`swiss number`: http://wiki.erights.org/wiki/Swiss_number
.. _NURLs:
NURLs
-----
@ -47,27 +51,27 @@ This can be considered to expand to "**N**\ ew URLs" or "Authe\ **N**\ ticating
The anticipated use for a **NURL** will still be to establish a TLS connection to a peer.
The protocol run over that TLS connection could be Foolscap though it is more likely to be an HTTP-based protocol (such as GBS).
Unlike fURLs, only a single net-loc is included, for consistency with other forms of URLs.
As a result, multiple NURLs may be available for a single server.
Syntax
------
The EBNF for a NURL is as follows::
nurl = scheme, hash, "@", net-loc-list, "/", swiss-number, [ version1 ]
scheme = "pb://"
nurl = tcp-nurl | tor-nurl | i2p-nurl
tcp-nurl = "pb://", hash, "@", tcp-loc, "/", swiss-number, [ version1 ]
tor-nurl = "pb+tor://", hash, "@", tcp-loc, "/", swiss-number, [ version1 ]
i2p-nurl = "pb+i2p://", hash, "@", i2p-loc, "/", swiss-number, [ version1 ]
hash = unreserved
net-loc-list = net-loc, [ { ",", net-loc } ]
net-loc = tcp-loc | tor-loc | i2p-loc
tcp-loc = [ "tcp:" ], hostname, [ ":" port ]
tor-loc = "tor:", hostname, [ ":" port ]
i2p-loc = "i2p:", i2p-addr, [ ":" port ]
i2p-addr = { unreserved }, ".i2p"
tcp-loc = hostname, [ ":" port ]
hostname = domain | IPv4address | IPv6address
i2p-loc = i2p-addr, [ ":" port ]
i2p-addr = { unreserved }, ".i2p"
swiss-number = segment
version1 = "#v=1"
@ -87,11 +91,13 @@ These differences are separated into distinct versions.
Version 0
---------
A Foolscap fURL is considered the canonical definition of a version 0 NURL.
In theory, a Foolscap fURL with a single netloc is considered the canonical definition of a version 0 NURL.
Notably,
the hash component is defined as the base32-encoded SHA1 hash of the DER form of an x509v3 certificate.
A version 0 NURL is identified by the absence of the ``v=1`` fragment.
In practice, real world fURLs may have more than one netloc, so lack of version fragment will likely just involve dispatching the fURL to a different parser.
Examples
~~~~~~~~
@ -103,11 +109,8 @@ Version 1
The hash component of a version 1 NURL differs in three ways from the prior version.
1. The hash function used is SHA3-224 instead of SHA1.
The security of SHA1 `continues to be eroded`_.
Contrariwise SHA3 is currently the most recent addition to the SHA family by NIST.
The 224 bit instance is chosen to keep the output short and because it offers greater collision resistance than SHA1 was thought to offer even at its inception
(prior to security research showing actual collision resistance is lower).
1. The hash function used is SHA-256, to match RFC 7469.
The security of SHA1 `continues to be eroded`_; Latacora `SHA-2`_.
2. The hash is computed over the certificate's SPKI instead of the whole certificate.
This allows certificate re-generation so long as the public key remains the same.
This is useful to allow contact information to be updated or extension of validity period.
@ -122,7 +125,7 @@ The hash component of a version 1 NURL differs in three ways from the prior vers
*all* certificate fields should be considered within the context of the relationship identified by the SPKI hash.
3. The hash is encoded using urlsafe-base64 (without padding) instead of base32.
This provides a more compact representation and minimizes the usability impacts of switching from a 160 bit hash to a 224 bit hash.
This provides a more compact representation and minimizes the usability impacts of switching from a 160 bit hash to a 256 bit hash.
A version 1 NURL is identified by the presence of the ``v=1`` fragment.
Though the length of the hash string (38 bytes) could also be used to differentiate it from a version 0 NURL,
@ -140,7 +143,8 @@ Examples
* ``pb://azEu8vlRpnEeYm0DySQDeNY3Z2iJXHC_bsbaAw@localhost:47877/64i4aokv4ej#v=1``
.. _`continues to be eroded`: https://en.wikipedia.org/wiki/SHA-1#Cryptanalysis_and_validation
.. _`explored by the web community`: https://www.imperialviolet.org/2011/05/04/pinning.html
.. _`SHA-2`: https://latacora.micro.blog/2018/04/03/cryptographic-right-answers.html
.. _`explored by the web community`: https://www.rfc-editor.org/rfc/rfc7469
.. _Foolscap: https://github.com/warner/foolscap
.. [1] ``foolscap.furl.decode_furl`` is taken as the canonical definition of the syntax of a fURL.

View File

@ -264,3 +264,18 @@ the "tahoe-conf" file for notes about configuration and installing these
plugins into a Munin environment.
.. _Munin: http://munin-monitoring.org/
Scraping Stats Values in OpenMetrics Format
===========================================
Time Series DataBase (TSDB) software like Prometheus_ and VictoriaMetrics_ can
parse statistics from the e.g. http://localhost:3456/statistics?t=openmetrics
URL in OpenMetrics_ format. Software like Grafana_ can then be used to graph
and alert on these numbers. You can find a pre-configured dashboard for
Grafana at https://grafana.com/grafana/dashboards/16894-tahoe-lafs/.
.. _OpenMetrics: https://openmetrics.io/
.. _Prometheus: https://prometheus.io/
.. _VictoriaMetrics: https://victoriametrics.com/
.. _Grafana: https://grafana.com/

View File

@ -1,15 +1,6 @@
"""
Ported to Python 3.
"""
from __future__ import unicode_literals
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from future.utils import PY2
if PY2:
from future.builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, list, object, range, str, max, min # noqa: F401
import sys
import shutil
from time import sleep
@ -49,7 +40,6 @@ from .util import (
await_client_ready,
TahoeProcess,
cli,
_run_node,
generate_ssh_key,
block_with_timeout,
)
@ -66,6 +56,29 @@ def pytest_addoption(parser):
"--coverage", action="store_true", dest="coverage",
help="Collect coverage statistics",
)
parser.addoption(
"--force-foolscap", action="store_true", default=False,
dest="force_foolscap",
help=("If set, force Foolscap only for the storage protocol. " +
"Otherwise HTTP will be used.")
)
parser.addoption(
"--runslow", action="store_true", default=False,
dest="runslow",
help="If set, run tests marked as slow.",
)
def pytest_collection_modifyitems(session, config, items):
if not config.option.runslow:
# The --runslow option was not given; keep only collected items not
# marked as slow.
items[:] = [
item
for item
in items
if item.get_closest_marker("slow") is None
]
@pytest.fixture(autouse=True, scope='session')
def eliot_logging():
@ -380,7 +393,7 @@ def alice(
finalize=False,
)
)
await_client_ready(process)
pytest_twisted.blockon(await_client_ready(process))
# 1. Create a new RW directory cap:
cli(process, "create-alias", "test")
@ -410,10 +423,9 @@ alice-key ssh-rsa {ssh_public_key} {rwcap}
""".format(rwcap=rwcap, ssh_public_key=ssh_public_key))
# 4. Restart the node with new SFTP config.
process.kill()
pytest_twisted.blockon(_run_node(reactor, process.node_dir, request, None))
await_client_ready(process)
pytest_twisted.blockon(process.restart_async(reactor, request))
pytest_twisted.blockon(await_client_ready(process))
print(f"Alice pid: {process.transport.pid}")
return process
@ -427,7 +439,7 @@ def bob(reactor, temp_dir, introducer_furl, flog_gatherer, storage_nodes, reques
storage=False,
)
)
await_client_ready(process)
pytest_twisted.blockon(await_client_ready(process))
return process

View File

@ -3,11 +3,13 @@ Integration tests for getting and putting files, including reading from stdin
and stdout.
"""
from subprocess import Popen, PIPE
from subprocess import Popen, PIPE, check_output, check_call
import pytest
from pytest_twisted import ensureDeferred
from twisted.internet import reactor
from .util import run_in_thread, cli
from .util import run_in_thread, cli, reconfigure
DATA = b"abc123 this is not utf-8 decodable \xff\x00\x33 \x11"
try:
@ -47,6 +49,7 @@ def test_put_from_stdin(alice, get_put_alias, tmpdir):
assert read_bytes(tempfile) == DATA
@run_in_thread
def test_get_to_stdout(alice, get_put_alias, tmpdir):
"""
It's possible to upload a file, and then download it to stdout.
@ -62,3 +65,66 @@ def test_get_to_stdout(alice, get_put_alias, tmpdir):
)
assert p.stdout.read() == DATA
assert p.wait() == 0
@run_in_thread
def test_large_file(alice, get_put_alias, tmp_path):
"""
It's possible to upload and download a larger file.
We avoid stdin/stdout since that's flaky on Windows.
"""
tempfile = tmp_path / "file"
with tempfile.open("wb") as f:
f.write(DATA * 1_000_000)
cli(alice, "put", str(tempfile), "getput:largefile")
outfile = tmp_path / "out"
check_call(
["tahoe", "--node-directory", alice.node_dir, "get", "getput:largefile", str(outfile)],
)
assert outfile.read_bytes() == tempfile.read_bytes()
@ensureDeferred
async def test_upload_download_immutable_different_default_max_segment_size(alice, get_put_alias, tmpdir, request):
"""
Tahoe-LAFS used to have a default max segment size of 128KB, and is now
1MB. Test that an upload created when 128KB was the default can be
downloaded with 1MB as the default (i.e. old uploader, new downloader), and
vice versa, (new uploader, old downloader).
"""
tempfile = tmpdir.join("file")
large_data = DATA * 100_000
assert len(large_data) > 2 * 1024 * 1024
with tempfile.open("wb") as f:
f.write(large_data)
async def set_segment_size(segment_size):
await reconfigure(
reactor,
request,
alice,
(1, 1, 1),
None,
max_segment_size=segment_size
)
# 1. Upload file 1 with default segment size set to 1MB
await set_segment_size(1024 * 1024)
cli(alice, "put", str(tempfile), "getput:seg1024kb")
# 2. Download file 1 with default segment size set to 128KB
await set_segment_size(128 * 1024)
assert large_data == check_output(
["tahoe", "--node-directory", alice.node_dir, "get", "getput:seg1024kb", "-"]
)
# 3. Upload file 2 with default segment size set to 128KB
cli(alice, "put", str(tempfile), "getput:seg128kb")
# 4. Download file 2 with default segment size set to 1MB
await set_segment_size(1024 * 1024)
assert large_data == check_output(
["tahoe", "--node-directory", alice.node_dir, "get", "getput:seg128kb", "-"]
)

View File

@ -55,15 +55,18 @@ def i2p_network(reactor, temp_dir, request):
proto,
which("docker"),
(
"docker", "run", "-p", "7656:7656", "purplei2p/i2pd",
"docker", "run", "-p", "7656:7656", "purplei2p/i2pd:release-2.45.1",
# Bad URL for reseeds, so it can't talk to other routers.
"--reseed.urls", "http://localhost:1/",
# Make sure we see the "ephemeral keys message"
"--log=stdout",
"--loglevel=info"
),
)
def cleanup():
try:
proto.transport.signalProcess("KILL")
proto.transport.signalProcess("INT")
util.block_with_timeout(proto.exited, reactor)
except ProcessExitedAlready:
pass

View File

@ -31,7 +31,7 @@ def test_upload_immutable(reactor, temp_dir, introducer_furl, flog_gatherer, sto
happy=7,
total=10,
)
util.await_client_ready(edna)
yield util.await_client_ready(edna)
node_dir = join(temp_dir, 'edna')

View File

@ -42,8 +42,8 @@ if PY2:
def test_onion_service_storage(reactor, request, temp_dir, flog_gatherer, tor_network, tor_introducer_furl):
carol = yield _create_anonymous_node(reactor, 'carol', 8008, request, temp_dir, flog_gatherer, tor_network, tor_introducer_furl)
dave = yield _create_anonymous_node(reactor, 'dave', 8009, request, temp_dir, flog_gatherer, tor_network, tor_introducer_furl)
util.await_client_ready(carol, minimum_number_of_servers=2)
util.await_client_ready(dave, minimum_number_of_servers=2)
yield util.await_client_ready(carol, minimum_number_of_servers=2)
yield util.await_client_ready(dave, minimum_number_of_servers=2)
# ensure both nodes are connected to "a grid" by uploading
# something via carol, and retrieve it using dave.

121
integration/test_vectors.py Normal file
View File

@ -0,0 +1,121 @@
"""
Verify certain results against test vectors with well-known results.
"""
from __future__ import annotations
from functools import partial
from typing import AsyncGenerator, Iterator
from itertools import starmap, product
from attrs import evolve
from pytest import mark
from pytest_twisted import ensureDeferred
from . import vectors
from .vectors import parameters
from .util import reconfigure, upload, TahoeProcess
@mark.parametrize('convergence', parameters.CONVERGENCE_SECRETS)
def test_convergence(convergence):
"""
Convergence secrets are 16 bytes.
"""
assert isinstance(convergence, bytes), "Convergence secret must be bytes"
assert len(convergence) == 16, "Convergence secret must by 16 bytes"
@mark.slow
@mark.parametrize('case,expected', vectors.capabilities.items())
@ensureDeferred
async def test_capability(reactor, request, alice, case, expected):
"""
The capability that results from uploading certain well-known data
with certain well-known parameters results in exactly the previously
computed value.
"""
# rewrite alice's config to match params and convergence
await reconfigure(
reactor, request, alice, (1, case.params.required, case.params.total), case.convergence, case.segment_size)
# upload data in the correct format
actual = upload(alice, case.fmt, case.data)
# compare the resulting cap to the expected result
assert actual == expected
@ensureDeferred
async def skiptest_generate(reactor, request, alice):
"""
This is a helper for generating the test vectors.
You can re-generate the test vectors by fixing the name of the test and
running it. Normally this test doesn't run because it ran once and we
captured its output. Other tests run against that output and we want them
to run against the results produced originally, not a possibly
ever-changing set of outputs.
"""
space = starmap(
# segment_size could be a parameter someday but it's not easy to vary
# using the Python implementation so it isn't one for now.
partial(vectors.Case, segment_size=parameters.SEGMENT_SIZE),
product(
parameters.ZFEC_PARAMS,
parameters.CONVERGENCE_SECRETS,
parameters.OBJECT_DESCRIPTIONS,
parameters.FORMATS,
),
)
iterresults = generate(reactor, request, alice, space)
results = []
async for result in iterresults:
# Accumulate the new result
results.append(result)
# Then rewrite the whole output file with the new accumulator value.
# This means that if we fail partway through, we will still have
# recorded partial results -- instead of losing them all.
vectors.save_capabilities(results)
async def generate(
reactor,
request,
alice: TahoeProcess,
cases: Iterator[vectors.Case],
) -> AsyncGenerator[[vectors.Case, str], None]:
"""
Generate all of the test vectors using the given node.
:param reactor: The reactor to use to restart the Tahoe-LAFS node when it
needs to be reconfigured.
:param request: The pytest request object to use to arrange process
cleanup.
:param format: The name of the encryption/data format to use.
:param alice: The Tahoe-LAFS node to use to generate the test vectors.
:param case: The inputs for which to generate a value.
:return: The capability for the case.
"""
# Share placement doesn't affect the resulting capability. For maximum
# reliability of this generator, be happy if we can put shares anywhere
happy = 1
for case in cases:
await reconfigure(
reactor,
request,
alice,
(happy, case.params.required, case.params.total),
case.convergence,
case.segment_size
)
# Give the format a chance to make an RSA key if it needs it.
case = evolve(case, fmt=case.fmt.customize())
cap = upload(alice, case.fmt, case.data)
yield case, cap

View File

@ -7,18 +7,9 @@ Most of the tests have cursory asserts and encode 'what the WebAPI did
at the time of testing' -- not necessarily a cohesive idea of what the
WebAPI *should* do in every situation. It's not clear the latter
exists anywhere, however.
Ported to Python 3.
"""
from __future__ import unicode_literals
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from future.utils import PY2
if PY2:
from future.builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, list, object, range, str, max, min # noqa: F401
from __future__ import annotations
import time
from urllib.parse import unquote as url_unquote, quote as url_quote
@ -27,12 +18,15 @@ import allmydata.uri
from allmydata.util import jsonbytes as json
from . import util
from .util import run_in_thread
import requests
import html5lib
from bs4 import BeautifulSoup
from pytest_twisted import ensureDeferred
@run_in_thread
def test_index(alice):
"""
we can download the index file
@ -40,6 +34,7 @@ def test_index(alice):
util.web_get(alice, u"")
@run_in_thread
def test_index_json(alice):
"""
we can download the index file as json
@ -49,6 +44,7 @@ def test_index_json(alice):
json.loads(data)
@run_in_thread
def test_upload_download(alice):
"""
upload a file, then download it via readcap
@ -78,6 +74,7 @@ def test_upload_download(alice):
assert str(data, "utf-8") == FILE_CONTENTS
@run_in_thread
def test_put(alice):
"""
use PUT to create a file
@ -97,6 +94,7 @@ def test_put(alice):
assert cap.needed_shares == int(cfg.get_config("client", "shares.needed"))
@run_in_thread
def test_helper_status(storage_nodes):
"""
successfully GET the /helper_status page
@ -109,6 +107,7 @@ def test_helper_status(storage_nodes):
assert str(dom.h1.string) == u"Helper Status"
@run_in_thread
def test_deep_stats(alice):
"""
create a directory, do deep-stats on it and prove the /operations/
@ -252,10 +251,18 @@ def test_status(alice):
assert found_download, "Failed to find the file we downloaded in the status-page"
def test_directory_deep_check(alice):
@ensureDeferred
async def test_directory_deep_check(reactor, request, alice):
"""
use deep-check and confirm the result pages work
"""
# Make sure the node is configured compatibly with expectations of this
# test.
happy = 3
required = 2
total = 4
await util.reconfigure(reactor, request, alice, (happy, required, total), convergence=None)
# create a directory
resp = requests.post(
@ -313,7 +320,7 @@ def test_directory_deep_check(alice):
)
def check_repair_data(checkdata):
assert checkdata["healthy"] is True
assert checkdata["healthy"]
assert checkdata["count-happiness"] == 4
assert checkdata["count-good-share-hosts"] == 4
assert checkdata["count-shares-good"] == 4
@ -417,6 +424,7 @@ def test_directory_deep_check(alice):
assert dom is not None, "Operation never completed"
@run_in_thread
def test_storage_info(storage_nodes):
"""
retrieve and confirm /storage URI for one storage node
@ -428,6 +436,7 @@ def test_storage_info(storage_nodes):
)
@run_in_thread
def test_storage_info_json(storage_nodes):
"""
retrieve and confirm /storage?t=json URI for one storage node
@ -442,6 +451,7 @@ def test_storage_info_json(storage_nodes):
assert data[u"stats"][u"storage_server.reserved_space"] == 1000000000
@run_in_thread
def test_introducer_info(introducer):
"""
retrieve and confirm /introducer URI for the introducer
@ -460,6 +470,7 @@ def test_introducer_info(introducer):
assert "subscription_summary" in data
@run_in_thread
def test_mkdir_with_children(alice):
"""
create a directory using ?t=mkdir-with-children

View File

@ -1,22 +1,19 @@
"""
Ported to Python 3.
General functionality useful for the implementation of integration tests.
"""
from __future__ import unicode_literals
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from future.utils import PY2
if PY2:
from future.builtins import filter, map, zip, ascii, chr, hex, input, next, oct, open, pow, round, super, bytes, dict, list, object, range, str, max, min # noqa: F401
from __future__ import annotations
from contextlib import contextmanager
from typing import Any
from typing_extensions import Literal
from tempfile import NamedTemporaryFile
import sys
import time
import json
from os import mkdir, environ
from os.path import exists, join
from io import StringIO, BytesIO
from functools import partial
from subprocess import check_output
from twisted.python.filepath import (
@ -26,18 +23,30 @@ from twisted.internet.defer import Deferred, succeed
from twisted.internet.protocol import ProcessProtocol
from twisted.internet.error import ProcessExitedAlready, ProcessDone
from twisted.internet.threads import deferToThread
from twisted.internet.interfaces import IProcessTransport, IReactorProcess
from attrs import frozen, evolve
import requests
from cryptography.hazmat.primitives.asymmetric import rsa
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.serialization import (
Encoding,
PrivateFormat,
NoEncryption,
)
from paramiko.rsakey import RSAKey
from boltons.funcutils import wraps
from allmydata.util import base32
from allmydata.util.configutil import (
get_config,
set_config,
write_config,
)
from allmydata import client
from allmydata.interfaces import DEFAULT_IMMUTABLE_MAX_SEGMENT_SIZE
import pytest_twisted
@ -142,9 +151,40 @@ class _MagicTextProtocol(ProcessProtocol):
sys.stdout.write(data)
def _cleanup_tahoe_process(tahoe_transport, exited):
def _cleanup_process_async(transport: IProcessTransport, allow_missing: bool) -> None:
"""
Terminate the given process with a kill signal (SIGKILL on POSIX,
If the given process transport seems to still be associated with a
running process, send a SIGTERM to that process.
:param transport: The transport to use.
:param allow_missing: If ``True`` then it is not an error for the
transport to have no associated process. Otherwise, an exception will
be raised in that case.
:raise: ``ValueError`` if ``allow_missing`` is ``False`` and the transport
has no process.
"""
if transport.pid is None:
if allow_missing:
print("Process already cleaned up and that's okay.")
return
else:
raise ValueError("Process is not running")
print("signaling {} with TERM".format(transport.pid))
try:
transport.signalProcess('TERM')
except ProcessExitedAlready:
# The transport object thought it still had a process but the real OS
# process has already exited. That's fine. We accomplished what we
# wanted to. We don't care about ``allow_missing`` here because
# there's no way we could have known the real OS process already
# exited.
pass
def _cleanup_tahoe_process(tahoe_transport, exited, allow_missing=False):
"""
Terminate the given process with a kill signal (SIGTERM on POSIX,
TerminateProcess on Windows).
:param tahoe_transport: The `IProcessTransport` representing the process.
@ -153,14 +193,10 @@ def _cleanup_tahoe_process(tahoe_transport, exited):
:return: After the process has exited.
"""
from twisted.internet import reactor
try:
print("signaling {} with TERM".format(tahoe_transport.pid))
tahoe_transport.signalProcess('TERM')
print("signaled, blocking on exit")
block_with_timeout(exited, reactor)
print("exited, goodbye")
except ProcessExitedAlready:
pass
_cleanup_process_async(tahoe_transport, allow_missing=allow_missing)
print("signaled, blocking on exit")
block_with_timeout(exited, reactor)
print("exited, goodbye")
def _tahoe_runner_optional_coverage(proto, reactor, request, other_args):
@ -207,8 +243,33 @@ class TahoeProcess(object):
def kill(self):
"""Kill the process, block until it's done."""
print(f"TahoeProcess.kill({self.transport.pid} / {self.node_dir})")
_cleanup_tahoe_process(self.transport, self.transport.exited)
def kill_async(self):
"""
Kill the process, return a Deferred that fires when it's done.
"""
print(f"TahoeProcess.kill_async({self.transport.pid} / {self.node_dir})")
_cleanup_process_async(self.transport, allow_missing=False)
return self.transport.exited
def restart_async(self, reactor: IReactorProcess, request: Any) -> Deferred:
"""
Stop and then re-start the associated process.
:return: A Deferred that fires after the new process is ready to
handle requests.
"""
d = self.kill_async()
d.addCallback(lambda ignored: _run_node(reactor, self.node_dir, request, None, finalize=False))
def got_new_process(proc):
# Grab the new transport since the one we had before is no longer
# valid after the stop/start cycle.
self._process_transport = proc.transport
d.addCallback(got_new_process)
return d
def __str__(self):
return "<TahoeProcess in '{}'>".format(self._node_dir)
@ -237,19 +298,17 @@ def _run_node(reactor, node_dir, request, magic_text, finalize=True):
)
transport.exited = protocol.exited
tahoe_process = TahoeProcess(
transport,
node_dir,
)
if finalize:
request.addfinalizer(partial(_cleanup_tahoe_process, transport, protocol.exited))
request.addfinalizer(tahoe_process.kill)
# XXX abusing the Deferred; should use .when_magic_seen() pattern
def got_proto(proto):
transport._protocol = proto
return TahoeProcess(
transport,
node_dir,
)
protocol.magic_seen.addCallback(got_proto)
return protocol.magic_seen
d = protocol.magic_seen
d.addCallback(lambda ignored: tahoe_process)
return d
def _create_node(reactor, request, temp_dir, introducer_furl, flog_gatherer, name, web_port,
@ -300,6 +359,20 @@ def _create_node(reactor, request, temp_dir, introducer_furl, flog_gatherer, nam
u'log_gatherer.furl',
flog_gatherer,
)
force_foolscap = request.config.getoption("force_foolscap")
assert force_foolscap in (True, False)
set_config(
config,
'storage',
'force_foolscap',
str(force_foolscap),
)
set_config(
config,
'client',
'force_foolscap',
str(force_foolscap),
)
write_config(FilePath(config_path), config)
created_d.addCallback(created)
@ -357,6 +430,31 @@ class FileShouldVanishException(Exception):
)
def run_in_thread(f):
"""Decorator for integration tests that runs code in a thread.
Because we're using pytest_twisted, tests that rely on the reactor are
expected to return a Deferred and use async APIs so the reactor can run.
In the case of the integration test suite, it launches nodes in the
background using Twisted APIs. The nodes stdout and stderr is read via
Twisted code. If the reactor doesn't run, reads don't happen, and
eventually the buffers fill up, and the nodes block when they try to flush
logs.
We can switch to Twisted APIs (treq instead of requests etc.), but
sometimes it's easier or expedient to just have a blocking test. So this
decorator allows you to run the test in a thread, and the reactor can keep
running in the main thread.
See https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3597 for tracking bug.
"""
@wraps(f)
def test(*args, **kwargs):
return deferToThread(lambda: f(*args, **kwargs))
return test
def await_file_contents(path, contents, timeout=15, error_if=None):
"""
wait up to `timeout` seconds for the file at `path` (any path-like
@ -482,6 +580,7 @@ def web_post(tahoe, uri_fragment, **kwargs):
return resp.content
@run_in_thread
def await_client_ready(tahoe, timeout=10, liveness=60*2, minimum_number_of_servers=1):
"""
Uses the status API to wait for a client-type node (in `tahoe`, a
@ -549,26 +648,172 @@ def generate_ssh_key(path):
f.write(s.encode("ascii"))
def run_in_thread(f):
"""Decorator for integration tests that runs code in a thread.
Because we're using pytest_twisted, tests that rely on the reactor are
expected to return a Deferred and use async APIs so the reactor can run.
In the case of the integration test suite, it launches nodes in the
background using Twisted APIs. The nodes stdout and stderr is read via
Twisted code. If the reactor doesn't run, reads don't happen, and
eventually the buffers fill up, and the nodes block when they try to flush
logs.
We can switch to Twisted APIs (treq instead of requests etc.), but
sometimes it's easier or expedient to just have a blocking test. So this
decorator allows you to run the test in a thread, and the reactor can keep
running in the main thread.
See https://tahoe-lafs.org/trac/tahoe-lafs/ticket/3597 for tracking bug.
@frozen
class CHK:
"""
@wraps(f)
def test(*args, **kwargs):
return deferToThread(lambda: f(*args, **kwargs))
return test
Represent the CHK encoding sufficiently to run a ``tahoe put`` command
using it.
"""
kind = "chk"
max_shares = 256
def customize(self) -> CHK:
# Nothing to do.
return self
@classmethod
def load(cls, params: None) -> CHK:
assert params is None
return cls()
def to_json(self) -> None:
return None
@contextmanager
def to_argv(self) -> None:
yield []
@frozen
class SSK:
"""
Represent the SSK encodings (SDMF and MDMF) sufficiently to run a
``tahoe put`` command using one of them.
"""
kind = "ssk"
# SDMF and MDMF encode share counts (N and k) into the share itself as an
# unsigned byte. They could have encoded (share count - 1) to fit the
# full range supported by ZFEC into the unsigned byte - but they don't.
# So 256 is inaccessible to those formats and we set the upper bound at
# 255.
max_shares = 255
name: Literal["sdmf", "mdmf"]
key: None | bytes
@classmethod
def load(cls, params: dict) -> SSK:
assert params.keys() == {"format", "mutable", "key"}
return cls(params["format"], params["key"].encode("ascii"))
def customize(self) -> SSK:
"""
Return an SSK with a newly generated random RSA key.
"""
return evolve(self, key=generate_rsa_key())
def to_json(self) -> dict[str, str]:
return {
"format": self.name,
"mutable": None,
"key": self.key.decode("ascii"),
}
@contextmanager
def to_argv(self) -> None:
with NamedTemporaryFile() as f:
f.write(self.key)
f.flush()
yield [f"--format={self.name}", "--mutable", f"--private-key-path={f.name}"]
def upload(alice: TahoeProcess, fmt: CHK | SSK, data: bytes) -> str:
"""
Upload the given data to the given node.
:param alice: The node to upload to.
:param fmt: The name of the format for the upload. CHK, SDMF, or MDMF.
:param data: The data to upload.
:return: The capability for the uploaded data.
"""
with NamedTemporaryFile() as f:
f.write(data)
f.flush()
with fmt.to_argv() as fmt_argv:
argv = [alice, "put"] + fmt_argv + [f.name]
return cli(*argv).decode("utf-8").strip()
async def reconfigure(reactor, request, node: TahoeProcess,
params: tuple[int, int, int],
convergence: None | bytes,
max_segment_size: None | int = None) -> None:
"""
Reconfigure a Tahoe-LAFS node with different ZFEC parameters and
convergence secret.
TODO This appears to have issues on Windows.
If the current configuration is different from the specified
configuration, the node will be restarted so it takes effect.
:param reactor: A reactor to use to restart the process.
:param request: The pytest request object to use to arrange process
cleanup.
:param node: The Tahoe-LAFS node to reconfigure.
:param params: The ``happy``, ``needed``, and ``total`` ZFEC encoding
parameters.
:param convergence: If given, the convergence secret. If not given, the
existing convergence secret will be left alone.
:return: ``None`` after the node configuration has been rewritten, the
node has been restarted, and the node is ready to provide service.
"""
happy, needed, total = params
config = node.get_config()
changed = False
cur_happy = int(config.get_config("client", "shares.happy"))
cur_needed = int(config.get_config("client", "shares.needed"))
cur_total = int(config.get_config("client", "shares.total"))
if (happy, needed, total) != (cur_happy, cur_needed, cur_total):
changed = True
config.set_config("client", "shares.happy", str(happy))
config.set_config("client", "shares.needed", str(needed))
config.set_config("client", "shares.total", str(total))
if convergence is not None:
cur_convergence = config.get_private_config("convergence").encode("ascii")
if base32.a2b(cur_convergence) != convergence:
changed = True
config.write_private_config("convergence", base32.b2a(convergence))
if max_segment_size is not None:
cur_segment_size = int(config.get_config("client", "shares._max_immutable_segment_size_for_testing", DEFAULT_IMMUTABLE_MAX_SEGMENT_SIZE))
if cur_segment_size != max_segment_size:
changed = True
config.set_config(
"client",
"shares._max_immutable_segment_size_for_testing",
str(max_segment_size)
)
if changed:
# restart the node
print(f"Restarting {node.node_dir} for ZFEC reconfiguration")
await node.restart_async(reactor, request)
print("Restarted. Waiting for ready state.")
await await_client_ready(node)
print("Ready.")
else:
print("Config unchanged, not restarting.")
def generate_rsa_key() -> bytes:
"""
Generate a 2048 bit RSA key suitable for use with SSKs.
"""
return rsa.generate_private_key(
public_exponent=65537,
key_size=2048,
backend=default_backend()
).private_bytes(
encoding=Encoding.PEM,
format=PrivateFormat.TraditionalOpenSSL,
encryption_algorithm=NoEncryption(),
)

View File

@ -0,0 +1,30 @@
__all__ = [
"DATA_PATH",
"CURRENT_VERSION",
"MAX_SHARES",
"Case",
"Sample",
"SeedParam",
"encode_bytes",
"save_capabilities",
"capabilities",
]
from .vectors import (
DATA_PATH,
CURRENT_VERSION,
Case,
Sample,
SeedParam,
encode_bytes,
save_capabilities,
capabilities,
)
from .parameters import (
MAX_SHARES,
)

View File

@ -0,0 +1,58 @@
"""
Simple data type definitions useful in the definition/verification of test
vectors.
"""
from __future__ import annotations
from attrs import frozen
# CHK have a max of 256 shares. SDMF / MDMF have a max of 255 shares!
# Represent max symbolically and resolve it when we know what format we're
# dealing with.
MAX_SHARES = "max"
@frozen
class Sample:
"""
Some instructions for building a long byte string.
:ivar seed: Some bytes to repeat some times to produce the string.
:ivar length: The length of the desired byte string.
"""
seed: bytes
length: int
@frozen
class Param:
"""
Some ZFEC parameters.
"""
required: int
total: int
@frozen
class SeedParam:
"""
Some ZFEC parameters, almost.
:ivar required: The number of required shares.
:ivar total: Either the number of total shares or the constant
``MAX_SHARES`` to indicate that the total number of shares should be
the maximum number supported by the object format.
"""
required: int
total: int | str
def realize(self, max_total: int) -> Param:
"""
Create a ``Param`` from this object's values, possibly
substituting the given real value for total if necessary.
:param max_total: The value to use to replace ``MAX_SHARES`` if
necessary.
"""
if self.total == MAX_SHARES:
return Param(self.required, max_total)
return Param(self.required, self.total)

View File

@ -0,0 +1,93 @@
"""
Define input parameters for test vector generation.
:ivar CONVERGENCE_SECRETS: Convergence secrets.
:ivar SEGMENT_SIZE: The single segment size that the Python implementation
currently supports without a lot of refactoring.
:ivar OBJECT_DESCRIPTIONS: Small objects with instructions which can be
expanded into a possibly large byte string. These are intended to be used
as plaintext inputs.
:ivar ZFEC_PARAMS: Input parameters to ZFEC.
:ivar FORMATS: Encoding/encryption formats.
"""
from __future__ import annotations
from hashlib import sha256
from .model import MAX_SHARES
from .vectors import Sample, SeedParam
from ..util import CHK, SSK
def digest(bs: bytes) -> bytes:
"""
Digest bytes to bytes.
"""
return sha256(bs).digest()
def hexdigest(bs: bytes) -> str:
"""
Digest bytes to text.
"""
return sha256(bs).hexdigest()
# Just a couple convergence secrets. The only thing we do with this value is
# feed it into a tagged hash. It certainly makes a difference to the output
# but the hash should destroy any structure in the input so it doesn't seem
# like there's a reason to test a lot of different values.
CONVERGENCE_SECRETS: list[bytes] = [
b"aaaaaaaaaaaaaaaa",
digest(b"Hello world")[:16],
]
SEGMENT_SIZE: int = 128 * 1024
# Exercise at least a handful of different sizes, trying to cover:
#
# 1. Some cases smaller than one "segment" (128k).
# This covers shrinking of some parameters to match data size.
# This includes one case of the smallest possible CHK.
#
# 2. Some cases right on the edges of integer segment multiples.
# Because boundaries are tricky.
#
# 4. Some cases that involve quite a few segments.
# This exercises merkle tree construction more thoroughly.
#
# See ``stretch`` for construction of the actual test data.
OBJECT_DESCRIPTIONS: list[Sample] = [
# The smallest possible. 55 bytes and smaller are LIT.
Sample(b"a", 56),
Sample(b"a", 1024),
Sample(b"c", 4096),
Sample(digest(b"foo"), SEGMENT_SIZE - 1),
Sample(digest(b"bar"), SEGMENT_SIZE + 1),
Sample(digest(b"baz"), SEGMENT_SIZE * 16 - 1),
Sample(digest(b"quux"), SEGMENT_SIZE * 16 + 1),
Sample(digest(b"bazquux"), SEGMENT_SIZE * 32),
Sample(digest(b"foobar"), SEGMENT_SIZE * 64 - 1),
Sample(digest(b"barbaz"), SEGMENT_SIZE * 64 + 1),
]
ZFEC_PARAMS: list[SeedParam] = [
SeedParam(1, 1),
SeedParam(1, 3),
SeedParam(2, 3),
SeedParam(3, 10),
SeedParam(71, 255),
SeedParam(101, MAX_SHARES),
]
FORMATS: list[CHK | SSK] = [
CHK(),
# These start out unaware of a key but various keys will be supplied
# during generation.
SSK(name="sdmf", key=None),
SSK(name="mdmf", key=None),
]

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,155 @@
"""
A module that loads pre-generated test vectors.
:ivar DATA_PATH: The path of the file containing test vectors.
:ivar capabilities: The capability test vectors.
"""
from __future__ import annotations
from typing import TextIO
from attrs import frozen
from yaml import safe_load, safe_dump
from base64 import b64encode, b64decode
from twisted.python.filepath import FilePath
from .model import Param, Sample, SeedParam
from ..util import CHK, SSK
DATA_PATH: FilePath = FilePath(__file__).sibling("test_vectors.yaml")
# The version of the persisted test vector data this code can interpret.
CURRENT_VERSION: str = "2023-01-16.2"
@frozen
class Case:
"""
Represent one case for which we want/have a test vector.
"""
seed_params: Param
convergence: bytes
seed_data: Sample
fmt: CHK | SSK
segment_size: int
@property
def data(self):
return stretch(self.seed_data.seed, self.seed_data.length)
@property
def params(self):
return self.seed_params.realize(self.fmt.max_shares)
def encode_bytes(b: bytes) -> str:
"""
Base64 encode some bytes to text so they are representable in JSON.
"""
return b64encode(b).decode("ascii")
def decode_bytes(b: str) -> bytes:
"""
Base64 decode some text to bytes.
"""
return b64decode(b.encode("ascii"))
def stretch(seed: bytes, size: int) -> bytes:
"""
Given a simple description of a byte string, return the byte string
itself.
"""
assert isinstance(seed, bytes)
assert isinstance(size, int)
assert size > 0
assert len(seed) > 0
multiples = size // len(seed) + 1
return (seed * multiples)[:size]
def save_capabilities(results: list[tuple[Case, str]], path: FilePath = DATA_PATH) -> None:
"""
Save some test vector cases and their expected values.
This is logically the inverse of ``load_capabilities``.
"""
path.setContent(safe_dump({
"version": CURRENT_VERSION,
"vector": [
{
"convergence": encode_bytes(case.convergence),
"format": {
"kind": case.fmt.kind,
"params": case.fmt.to_json(),
},
"sample": {
"seed": encode_bytes(case.seed_data.seed),
"length": case.seed_data.length,
},
"zfec": {
"segmentSize": case.segment_size,
"required": case.params.required,
"total": case.params.total,
},
"expected": cap,
}
for (case, cap)
in results
],
}).encode("ascii"))
def load_format(serialized: dict) -> CHK | SSK:
"""
Load an encrypted object format from a simple description of it.
:param serialized: A ``dict`` describing either CHK or SSK, possibly with
some parameters.
"""
if serialized["kind"] == "chk":
return CHK.load(serialized["params"])
elif serialized["kind"] == "ssk":
return SSK.load(serialized["params"])
else:
raise ValueError(f"Unrecognized format: {serialized}")
def load_capabilities(f: TextIO) -> dict[Case, str]:
"""
Load some test vector cases and their expected results from the given
file.
This is logically the inverse of ``save_capabilities``.
"""
data = safe_load(f)
if data is None:
return {}
if data["version"] != CURRENT_VERSION:
print(
f"Current version is {CURRENT_VERSION}; "
f"cannot load version {data['version']} data."
)
return {}
return {
Case(
seed_params=SeedParam(case["zfec"]["required"], case["zfec"]["total"]),
segment_size=case["zfec"]["segmentSize"],
convergence=decode_bytes(case["convergence"]),
seed_data=Sample(decode_bytes(case["sample"]["seed"]), case["sample"]["length"]),
fmt=load_format(case["format"]),
): case["expected"]
for case
in data["vector"]
}
try:
with DATA_PATH.open() as f:
capabilities: dict[Case, str] = load_capabilities(f)
except FileNotFoundError:
capabilities = {}

View File

@ -0,0 +1,95 @@
#
# this updates the (tagged) version of the software
#
# Any "options" are hard-coded in here (e.g. the GnuPG key to use)
#
author = "meejah <meejah@meejah.ca>"
import sys
import time
from datetime import datetime
from packaging.version import Version
from dulwich.repo import Repo
from dulwich.porcelain import (
tag_list,
tag_create,
status,
)
from twisted.internet.task import (
react,
)
from twisted.internet.defer import (
ensureDeferred,
)
def existing_tags(git):
versions = sorted(
Version(v.decode("utf8").lstrip("tahoe-lafs-"))
for v in tag_list(git)
if v.startswith(b"tahoe-lafs-")
)
return versions
def create_new_version(git):
versions = existing_tags(git)
biggest = versions[-1]
return Version(
"{}.{}.{}".format(
biggest.major,
biggest.minor + 1,
0,
)
)
async def main(reactor):
git = Repo(".")
st = status(git)
if any(st.staged.values()) or st.unstaged:
print("unclean checkout; aborting")
raise SystemExit(1)
v = create_new_version(git)
if "--no-tag" in sys.argv:
print(v)
return
print("Existing tags: {}".format("\n".join(str(x) for x in existing_tags(git))))
print("New tag will be {}".format(v))
# the "tag time" is seconds from the epoch .. we quantize these to
# the start of the day in question, in UTC.
now = datetime.now()
s = now.utctimetuple()
ts = int(
time.mktime(
time.struct_time((s.tm_year, s.tm_mon, s.tm_mday, 0, 0, 0, 0, s.tm_yday, 0))
)
)
tag_create(
repo=git,
tag="tahoe-lafs-{}".format(str(v)).encode("utf8"),
author=author.encode("utf8"),
message="Release {}".format(v).encode("utf8"),
annotated=True,
objectish=b"HEAD",
sign=author.encode("utf8"),
tag_time=ts,
tag_timezone=0,
)
print("Tag created locally, it is not pushed")
print("To push it run something like:")
print(" git push origin {}".format(v))
if __name__ == "__main__":
react(lambda r: ensureDeferred(main(r)))

View File

@ -33,20 +33,11 @@ a mean of 10kB and a max of 100MB, so filesize=min(int(1.0/random(.0002)),1e8)
"""
from __future__ import annotations
import os, sys, httplib, binascii
import urllib, json, random, time, urlparse
try:
from typing import Dict
except ImportError:
pass
# Python 2 compatibility
from future.utils import PY2
if PY2:
from future.builtins import str # noqa: F401
if sys.argv[1] == "--stats":
statsfiles = sys.argv[2:]
# gather stats every 10 seconds, do a moving-window average of the last
@ -54,9 +45,9 @@ if sys.argv[1] == "--stats":
DELAY = 10
MAXSAMPLES = 6
totals = []
last_stats = {} # type: Dict[str, float]
last_stats : dict[str, float] = {}
while True:
stats = {} # type: Dict[str, float]
stats : dict[str, float] = {}
for sf in statsfiles:
for line in open(sf, "r").readlines():
name, str_value = line.split(":")

View File

@ -5,7 +5,7 @@ from __future__ import print_function
import sys, math
from allmydata import uri, storage
from allmydata.immutable import upload
from allmydata.interfaces import DEFAULT_MAX_SEGMENT_SIZE
from allmydata.interfaces import DEFAULT_IMMUTABLE_MAX_SEGMENT_SIZE
from allmydata.util import mathutil
def roundup(size, blocksize=4096):
@ -26,7 +26,7 @@ class BigFakeString(object):
def tell(self):
return self.fp
def calc(filesize, params=(3,7,10), segsize=DEFAULT_MAX_SEGMENT_SIZE):
def calc(filesize, params=(3,7,10), segsize=DEFAULT_IMMUTABLE_MAX_SEGMENT_SIZE):
num_shares = params[2]
if filesize <= upload.Uploader.URI_LIT_SIZE_THRESHOLD:
urisize = len(uri.LiteralFileURI("A"*filesize).to_string())

View File

@ -0,0 +1,36 @@
"""
Writing to non-blocking pipe can result in ENOSPC when using Unix APIs on
Windows. So, this program passes through data from stdin to stdout, using
Windows APIs instead of Unix-y APIs.
"""
from twisted.internet.stdio import StandardIO
from twisted.internet import reactor
from twisted.internet.protocol import Protocol
from twisted.internet.interfaces import IHalfCloseableProtocol
from twisted.internet.error import ReactorNotRunning
from zope.interface import implementer
@implementer(IHalfCloseableProtocol)
class Passthrough(Protocol):
def readConnectionLost(self):
self.transport.loseConnection()
def writeConnectionLost(self):
try:
reactor.stop()
except ReactorNotRunning:
pass
def dataReceived(self, data):
self.transport.write(data)
def connectionLost(self, reason):
try:
reactor.stop()
except ReactorNotRunning:
pass
std = StandardIO(Passthrough())
reactor.run()

View File

@ -1,3 +1,24 @@
[mypy]
ignore_missing_imports = True
plugins=mypy_zope:plugin
show_column_numbers = True
pretty = True
show_error_codes = True
warn_unused_configs =True
no_implicit_optional = True
warn_redundant_casts = True
strict_equality = True
[mypy-allmydata.test.cli.wormholetesting]
disallow_any_generics = True
disallow_subclassing_any = True
disallow_untyped_calls = True
disallow_untyped_defs = True
disallow_incomplete_defs = True
check_untyped_defs = True
disallow_untyped_decorators = True
warn_unused_ignores = True
warn_return_any = True
no_implicit_reexport = True
strict_equality = True
strict_concatenate = True

View File

@ -0,0 +1 @@
Tahoe-LAFS now includes a new "Grid Manager" specification and implementation adding more options to control which storage servers a client will use for uploads.

View File

@ -1 +0,0 @@
Added support for Python 3.10. Added support for PyPy3 (3.7 and 3.8, on Linux only).

View File

@ -1,8 +0,0 @@
The implementation of SDMF and MDMF (mutables) now requires RSA keys to be exactly 2048 bits, aligning them with the specification.
Some code existed to allow tests to shorten this and it's
conceptually possible a modified client produced mutables
with different key-sizes. However, the spec says that they
must be 2048 bits. If you happen to have a capability with
a key-size different from 2048 you may use 1.17.1 or earlier
to read the content.

View File

@ -1 +0,0 @@
Python 3.6 is no longer supported, as it has reached end-of-life and is no longer receiving security updates.

View File

@ -1 +0,0 @@
Python 3.7 or later is now required; Python 2 is no longer supported.

View File

@ -1 +0,0 @@
Share corruption reports stored on disk are now always encoded in UTF-8.

View File

@ -0,0 +1 @@
The new HTTPS-based storage server is now enabled transparently on the same port as the Foolscap server. This will not have any user-facing impact until the HTTPS storage protocol is supported in clients as well.

View File

@ -0,0 +1,5 @@
`tahoe run ...` will now exit when its stdin is closed.
This facilitates subprocess management, specifically cleanup.
When a parent process is running tahoe and exits without time to do "proper" cleanup at least the stdin descriptor will be closed.
Subsequently "tahoe run" notices this and exits.

View File

@ -0,0 +1 @@
Several minor errors in the Great Black Swamp proposed specification document have been fixed.

View File

@ -0,0 +1 @@
Work with (and require) newer versions of pycddl.

View File

@ -0,0 +1 @@
Uploading immutables will now better use available bandwidth, which should allow for faster uploads in many cases.

View File

@ -0,0 +1 @@
Downloads of large immutables should now finish much faster.

1
newsfragments/3961.other Normal file
View File

@ -0,0 +1 @@
The integration test suite now includes a set of capability test vectors (``integration/vectors/test_vectors.yaml``) which can be used to verify compatibility between Tahoe-LAFS and other implementations.

View File

@ -0,0 +1 @@
Mutable objects can now be created with a pre-determined "signature key" using the ``tahoe put`` CLI or the HTTP API. This enables deterministic creation of mutable capabilities. This feature must be used with care to preserve the normal security and reliability properties.

View File

@ -0,0 +1 @@
Python 3.7 is no longer supported, and Debian 10 and Ubuntu 18.04 are no longer tested.

View File

@ -0,0 +1 @@
Fix incompatibility with transitive dependency charset_normalizer >= 3 when using PyInstaller.

1
newsfragments/3971.minor Normal file
View File

@ -0,0 +1 @@
Changes made to mypy.ini to make mypy more 'strict' and prevent future regressions.

0
newsfragments/3974.minor Normal file
View File

1
newsfragments/3975.minor Normal file
View File

@ -0,0 +1 @@
Fixes truthy conditional in status.py

1
newsfragments/3976.minor Normal file
View File

@ -0,0 +1 @@
Fixes variable name same as built-in type.

View File

@ -0,0 +1 @@
Added support for Python 3.11.

0
newsfragments/3987.minor Normal file
View File

0
newsfragments/3988.minor Normal file
View File

View File

@ -0,0 +1 @@
tenacity is no longer a dependency.

0
newsfragments/3991.minor Normal file
View File

0
newsfragments/3993.minor Normal file
View File

0
newsfragments/3994.minor Normal file
View File

0
newsfragments/3996.minor Normal file
View File

View File

@ -0,0 +1 @@
Tahoe-LAFS is incompatible with cryptography >= 40 and now declares a requirement on an older version.

Some files were not shown because too many files have changed in this diff Show More