Merge pull request #463 from mapbox/stringpool

Limit the depth of the search in the string pool.
This commit is contained in:
Eric Fischer 2017-09-08 10:16:19 -07:00 committed by GitHub
commit e000bcc261
10 changed files with 181 additions and 97 deletions

View File

@ -1,3 +1,7 @@
## 1.24.1
* Limit the size and depth of the string pool for better performance
## 1.24.0
* Add feature filters using the Mapbox GL Style Specification filter syntax

View File

@ -41,7 +41,7 @@ Usage
-----
```sh
$ tippecanoe -o file.mbtiles [file.json file.geobuf ...]
$ tippecanoe -o file.mbtiles [options] [file.json file.geobuf ...]
```
If no files are specified, it reads GeoJSON from the standard input.
@ -52,23 +52,39 @@ You can concatenate multiple GeoJSON features or files together,
and it will parse out the features and ignore whatever other objects
it encounters.
Docker Image
------------
Try this first
--------------
A tippecanoe Docker image can be built from source and executed as a task to
automatically install dependencies and allow tippecanoe to run on any system
supported by Docker.
If you aren't sure what options to use, try this:
```docker
$ docker build -t tippecanoe:latest .
$ docker run -it --rm \
-v /tiledata:/data \
tippecanoe:latest \
tippecanoe --output=/data/output.mbtiles /data/example.geojson
```sh
$ tippecanoe -o out.mbtiles -zg --drop-densest-as-needed in.geojson
```
The commands above will build a Docker image from the source and compile the
latest version. The image supports all tippecanoe flags and options.
The `-zg` option will make Tippecanoe choose a maximum zoom level that should be
high enough to reflect the precision of the original data. (If it turns out still
not to be as detailed as you want, use `-z` manually with a higher number.)
If the tiles come out too big, the `--drop-densest-as-needed` option will make
Tippecanoe try dropping what should be the least visible features at each zoom level.
(If it drops too many features, use `-x` to leave out some feature attributes that
you didn't really need.)
Examples
--------
Create a tileset of TIGER roads for Alameda County, to zoom level 13, with a custom layer name and description:
```sh
$ tippecanoe -o alameda.mbtiles -l alameda -n "Alameda County from TIGER" -z13 tl_2014_06001_roads.json
```
Create a tileset of all TIGER roads, at only zoom level 12, but with higher detail than normal,
with a custom layer name and description, and leaving out the `LINEARID` and `RTTYP` attributes:
```
$ cat tiger/tl_2014_*_roads.json | tippecanoe -o tiger.mbtiles -l roads -n "All TIGER roads, one zoom" -z12 -Z12 -d14 -x LINEARID -x RTTYP
```
Options
-------
@ -116,7 +132,7 @@ If your input is formatted as newline-delimited GeoJSON, use `-P` to make input
### Parallel processing of input
* `-P` or `--read-parallel`: Use multiple threads to read different parts of each input file at once.
* `-P` or `--read-parallel`: Use multiple threads to read different parts of each GeoJSON input file at once.
This will only work if the input is line-delimited JSON with each Feature on its
own line, because it knows nothing of the top-level structure around the Features. Spurious "EOF" error
messages may result otherwise.
@ -127,6 +143,8 @@ If the input file begins with the [RFC 8142](https://tools.ietf.org/html/rfc8142
parallel processing of input will be invoked automatically, splitting at record separators rather
than at all newlines.
Parallel processing will also be automatic if the input file is in Geobuf format.
### Projection of input
* `-s` _projection_ or `--projection=`_projection_: Specify the projection of the input data. Currently supported are `EPSG:4326` (WGS84, the default) and `EPSG:3857` (Web Mercator). In general you should use WGS84 for your input files if at all possible.
@ -142,8 +160,8 @@ than at all newlines.
### Tile resolution
* `-d` _detail_ or `--full-detail=`_detail_: Detail at max zoom level (default 12, for tile resolution of 4096)
* `-D` _detail_ or `--low-detail=`_detail_: Detail at lower zoom levels (default 12, for tile resolution of 4096)
* `-d` _detail_ or `--full-detail=`_detail_: Detail at max zoom level (default 12, for tile resolution of 2^12=4096)
* `-D` _detail_ or `--low-detail=`_detail_: Detail at lower zoom levels (default 12, for tile resolution of 2^12=4096)
* `-m` _detail_ or `--minimum-detail=`_detail_: Minimum detail that it will try if tiles are too big at regular detail (default 7)
All internal math is done in terms of a 32-bit tile coordinate system, so 1/(2^32) of the size of Earth,
@ -166,7 +184,7 @@ resolution is obtained than by using a smaller _maxzoom_ or _detail_.
Example: to find the Natural Earth countries with low `scalerank` but high `LABELRANK`:
```
tippecanoe -o filtered.mbtiles -j '{ "ne_10m_admin_0_countries": [ "all", [ "<", "scalerank", 3 ], [ ">", "LABELRANK", 5 ] ] }' ne_10m_admin_0_countries.geojson
tippecanoe -z5 -o filtered.mbtiles -j '{ "ne_10m_admin_0_countries": [ "all", [ "<", "scalerank", 3 ], [ ">", "LABELRANK", 5 ] ] }' ne_10m_admin_0_countries.geojson
```
### Dropping a fixed fraction of features by zoom level
@ -297,17 +315,6 @@ Environment
Tippecanoe ordinarily uses as many parallel threads as the operating system claims that CPUs are available.
You can override this number by setting the `TIPPECANOE_MAX_THREADS` environmental variable.
Example
-------
```sh
$ tippecanoe -o alameda.mbtiles -l alameda -n "Alameda County from TIGER" -z13 tl_2014_06001_roads.json
```
```
$ cat tiger/tl_2014_*_roads.json | tippecanoe -o tiger.mbtiles -l roads -n "All TIGER roads, one zoom" -z12 -Z12 -d14 -x LINEARID -x RTTYP
```
GeoJSON extension
-----------------
@ -437,6 +444,24 @@ sudo apt-get install -y g++-5
export CXX=g++-5
```
Docker Image
------------
A tippecanoe Docker image can be built from source and executed as a task to
automatically install dependencies and allow tippecanoe to run on any system
supported by Docker.
```docker
$ docker build -t tippecanoe:latest .
$ docker run -it --rm \
-v /tiledata:/data \
tippecanoe:latest \
tippecanoe --output=/data/output.mbtiles /data/example.geojson
```
The commands above will build a Docker image from the source and compile the
latest version. The image supports all tippecanoe flags and options.
Examples
------

View File

@ -255,6 +255,9 @@ std::vector<drawvec_type> readGeometry(protozero::pbf_reader &pbf, size_t dim, d
dv.dv = readMultiLine(coords, lengths, dim, e, true);
} else if (type == MULTIPOLYGON) {
dv.dv = readMultiPolygon(coords, lengths, dim, e);
} else {
// GeometryCollection
return ret;
}
dv.type = type / 2 + 1;

View File

@ -37,7 +37,7 @@ $ brew install tippecanoe
.PP
.RS
.nf
$ tippecanoe \-o file.mbtiles [file.json file.geobuf ...]
$ tippecanoe \-o file.mbtiles [options] [file.json file.geobuf ...]
.fi
.RE
.PP
@ -48,24 +48,42 @@ The GeoJSON features need not be wrapped in a FeatureCollection.
You can concatenate multiple GeoJSON features or files together,
and it will parse out the features and ignore whatever other objects
it encounters.
.SH Docker Image
.SH Try this first
.PP
A tippecanoe Docker image can be built from source and executed as a task to
automatically install dependencies and allow tippecanoe to run on any system
supported by Docker.
If you aren't sure what options to use, try this:
.PP
.RS
.nf
$ docker build \-t tippecanoe:latest .
$ docker run \-it \-\-rm \\
\-v /tiledata:/data \\
tippecanoe:latest \\
tippecanoe \-\-output=/data/output.mbtiles /data/example.geojson
$ tippecanoe \-o out.mbtiles \-zg \-\-drop\-densest\-as\-needed in.geojson
.fi
.RE
.PP
The commands above will build a Docker image from the source and compile the
latest version. The image supports all tippecanoe flags and options.
The \fB\fC\-zg\fR option will make Tippecanoe choose a maximum zoom level that should be
high enough to reflect the precision of the original data. (If it turns out still
not to be as detailed as you want, use \fB\fC\-z\fR manually with a higher number.)
.PP
If the tiles come out too big, the \fB\fC\-\-drop\-densest\-as\-needed\fR option will make
Tippecanoe try dropping what should be the least visible features at each zoom level.
(If it drops too many features, use \fB\fC\-x\fR to leave out some feature attributes that
you didn't really need.)
.SH Examples
.PP
Create a tileset of TIGER roads for Alameda County, to zoom level 13, with a custom layer name and description:
.PP
.RS
.nf
$ tippecanoe \-o alameda.mbtiles \-l alameda \-n "Alameda County from TIGER" \-z13 tl_2014_06001_roads.json
.fi
.RE
.PP
Create a tileset of all TIGER roads, at only zoom level 12, but with higher detail than normal,
with a custom layer name and description, and leaving out the \fB\fCLINEARID\fR and \fB\fCRTTYP\fR attributes:
.PP
.RS
.nf
$ cat tiger/tl_2014_*_roads.json | tippecanoe \-o tiger.mbtiles \-l roads \-n "All TIGER roads, one zoom" \-z12 \-Z12 \-d14 \-x LINEARID \-x RTTYP
.fi
.RE
.SH Options
.PP
There are a lot of options. A lot of the time you won't want to use any of them
@ -122,7 +140,7 @@ specified, the files are all merged into the single named layer, even if they tr
.SS Parallel processing of input
.RS
.IP \(bu 2
\fB\fC\-P\fR or \fB\fC\-\-read\-parallel\fR: Use multiple threads to read different parts of each input file at once.
\fB\fC\-P\fR or \fB\fC\-\-read\-parallel\fR: Use multiple threads to read different parts of each GeoJSON input file at once.
This will only work if the input is line\-delimited JSON with each Feature on its
own line, because it knows nothing of the top\-level structure around the Features. Spurious "EOF" error
messages may result otherwise.
@ -133,6 +151,8 @@ rather than a stream that can only be read sequentially.
If the input file begins with the RFC 8142 \[la]https://tools.ietf.org/html/rfc8142\[ra] record separator,
parallel processing of input will be invoked automatically, splitting at record separators rather
than at all newlines.
.PP
Parallel processing will also be automatic if the input file is in Geobuf format.
.SS Projection of input
.RS
.IP \(bu 2
@ -154,9 +174,9 @@ specified maximum zoom and to any levels added beyond that.
.SS Tile resolution
.RS
.IP \(bu 2
\fB\fC\-d\fR \fIdetail\fP or \fB\fC\-\-full\-detail=\fR\fIdetail\fP: Detail at max zoom level (default 12, for tile resolution of 4096)
\fB\fC\-d\fR \fIdetail\fP or \fB\fC\-\-full\-detail=\fR\fIdetail\fP: Detail at max zoom level (default 12, for tile resolution of 2
.IP \(bu 2
\fB\fC\-D\fR \fIdetail\fP or \fB\fC\-\-low\-detail=\fR\fIdetail\fP: Detail at lower zoom levels (default 12, for tile resolution of 4096)
\fB\fC\-D\fR \fIdetail\fP or \fB\fC\-\-low\-detail=\fR\fIdetail\fP: Detail at lower zoom levels (default 12, for tile resolution of 2
.IP \(bu 2
\fB\fC\-m\fR \fIdetail\fP or \fB\fC\-\-minimum\-detail=\fR\fIdetail\fP: Minimum detail that it will try if tiles are too big at regular detail (default 7)
.RE
@ -188,7 +208,7 @@ Example: to find the Natural Earth countries with low \fB\fCscalerank\fR but hig
.PP
.RS
.nf
tippecanoe \-o filtered.mbtiles \-j '{ "ne_10m_admin_0_countries": [ "all", [ "<", "scalerank", 3 ], [ ">", "LABELRANK", 5 ] ] }' ne_10m_admin_0_countries.geojson
tippecanoe \-z5 \-o filtered.mbtiles \-j '{ "ne_10m_admin_0_countries": [ "all", [ "<", "scalerank", 3 ], [ ">", "LABELRANK", 5 ] ] }' ne_10m_admin_0_countries.geojson
.fi
.RE
.SS Dropping a fixed fraction of features by zoom level
@ -363,19 +383,6 @@ tippecanoe \-o roads.mbtiles \-c 'if [ $1 \-lt 11 ]; then grep "\\"MTFCC\\": \\"
.PP
Tippecanoe ordinarily uses as many parallel threads as the operating system claims that CPUs are available.
You can override this number by setting the \fB\fCTIPPECANOE_MAX_THREADS\fR environmental variable.
.SH Example
.PP
.RS
.nf
$ tippecanoe \-o alameda.mbtiles \-l alameda \-n "Alameda County from TIGER" \-z13 tl_2014_06001_roads.json
.fi
.RE
.PP
.RS
.nf
$ cat tiger/tl_2014_*_roads.json | tippecanoe \-o tiger.mbtiles \-l roads \-n "All TIGER roads, one zoom" \-z12 \-Z12 \-d14 \-x LINEARID \-x RTTYP
.fi
.RE
.SH GeoJSON extension
.PP
Tippecanoe defines a GeoJSON extension that you can use to specify the minimum and/or maximum zoom level
@ -519,6 +526,24 @@ sudo apt\-get install \-y g++\-5
export CXX=g++\-5
.fi
.RE
.SH Docker Image
.PP
A tippecanoe Docker image can be built from source and executed as a task to
automatically install dependencies and allow tippecanoe to run on any system
supported by Docker.
.PP
.RS
.nf
$ docker build \-t tippecanoe:latest .
$ docker run \-it \-\-rm \\
\-v /tiledata:/data \\
tippecanoe:latest \\
tippecanoe \-\-output=/data/output.mbtiles /data/example.geojson
.fi
.RE
.PP
The commands above will build a Docker image from the source and compile the
latest version. The image supports all tippecanoe flags and options.
.SH Examples
.PP
Check out some examples of maps made with tippecanoe \[la]MADE_WITH.md\[ra]

View File

@ -6,7 +6,7 @@ struct memfile {
char *map;
long long len;
long long off;
long long tree;
unsigned long tree;
};
struct memfile *memfile_open(int fd);

View File

@ -1,6 +1,7 @@
#pragma once
#include <assert.h>
#include <math.h>
#include <cmath>
#if defined(_MSC_VER)
#include "msinttypes/stdint.h"
@ -379,10 +380,10 @@ inline void Prettify(std::string &buffer, int length, int k) {
inline std::string dtoa_milo(double value) {
std::string buffer;
if (isnan(value)) {
if (std::isnan(value)) {
return "nan";
}
if (isinf(value)) {
if (std::isinf(value)) {
if (value < 0) {
return "-inf";
} else {

View File

@ -2,47 +2,43 @@
#include <stdlib.h>
#include <string.h>
#include <limits.h>
#include <math.h>
#include "memfile.hpp"
#include "pool.hpp"
static unsigned char swizzle[256] = {
0x00, 0xBF, 0x18, 0xDE, 0x93, 0xC9, 0xB1, 0x5E, 0xDF, 0xBE, 0x72, 0x5A, 0xBB, 0x42, 0x64, 0xC6,
0xD8, 0xB7, 0x15, 0x74, 0x1C, 0x8B, 0x91, 0xF5, 0x29, 0x46, 0xEC, 0x6F, 0xCA, 0x20, 0xF0, 0x06,
0x27, 0x61, 0x87, 0xE0, 0x6E, 0x43, 0x50, 0xC5, 0x1B, 0xB4, 0x37, 0xC3, 0x69, 0xA6, 0xEE, 0x80,
0xAF, 0x9B, 0xA1, 0x76, 0x23, 0x24, 0x53, 0xF3, 0x5B, 0x65, 0x19, 0xF4, 0xFC, 0xDD, 0x26, 0xE8,
0x10, 0xF7, 0xCE, 0x92, 0x48, 0xF6, 0x94, 0x60, 0x07, 0xC4, 0xB9, 0x97, 0x6D, 0xA4, 0x11, 0x0D,
0x1F, 0x4D, 0x13, 0xB0, 0x5D, 0xBA, 0x31, 0xD5, 0x8D, 0x51, 0x36, 0x96, 0x7A, 0x03, 0x7F, 0xDA,
0x17, 0xDB, 0xD4, 0x83, 0xE2, 0x79, 0x6A, 0xE1, 0x95, 0x38, 0xFF, 0x28, 0xB2, 0xB3, 0xA7, 0xAE,
0xF8, 0x54, 0xCC, 0xDC, 0x9A, 0x6B, 0xFB, 0x3F, 0xD7, 0xBC, 0x21, 0xC8, 0x71, 0x09, 0x16, 0xAC,
0x3C, 0x8A, 0x62, 0x05, 0xC2, 0x8C, 0x32, 0x4E, 0x35, 0x9C, 0x5F, 0x75, 0xCD, 0x2E, 0xA2, 0x3E,
0x1A, 0xC1, 0x8E, 0x14, 0xA0, 0xD3, 0x7D, 0xD9, 0xEB, 0x5C, 0x70, 0xE6, 0x9E, 0x12, 0x3B, 0xEF,
0x1E, 0x49, 0xD2, 0x98, 0x39, 0x7E, 0x44, 0x4B, 0x6C, 0x88, 0x02, 0x2C, 0xAD, 0xE5, 0x9F, 0x40,
0x7B, 0x4A, 0x3D, 0xA9, 0xAB, 0x0B, 0xD6, 0x2F, 0x90, 0x2A, 0xB6, 0x1D, 0xC7, 0x22, 0x55, 0x34,
0x0A, 0xD0, 0xB5, 0x68, 0xE3, 0x59, 0xFD, 0xFA, 0x57, 0x77, 0x25, 0xA3, 0x04, 0xB8, 0x33, 0x89,
0x78, 0x82, 0xE4, 0xC0, 0x0E, 0x8F, 0x85, 0xD1, 0x84, 0x08, 0x67, 0x47, 0x9D, 0xCB, 0x58, 0x4C,
0xAA, 0xED, 0x52, 0xF2, 0x4F, 0xF1, 0x66, 0xCF, 0xA5, 0x56, 0xEA, 0x7C, 0xE9, 0x63, 0xE7, 0x01,
0xF9, 0xFE, 0x0C, 0x99, 0x2D, 0x0F, 0x3A, 0x41, 0x45, 0xA8, 0x30, 0x2B, 0x73, 0xBD, 0x86, 0x81,
};
int swizzlecmp(const char *a, const char *b) {
while (*a || *b) {
int aa = swizzle[(unsigned char) *a];
int bb = swizzle[(unsigned char) *b];
ssize_t alen = strlen(a);
ssize_t blen = strlen(b);
int cmp = aa - bb;
if (cmp != 0) {
return cmp;
}
a++;
b++;
if (strcmp(a, b) == 0) {
return 0;
}
return 0;
long long hash1 = 0, hash2 = 0;
for (ssize_t i = alen - 1; i >= 0; i--) {
hash1 = (hash1 * 37 + a[i]) & INT_MAX;
}
for (ssize_t i = blen - 1; i >= 0; i--) {
hash2 = (hash2 * 37 + b[i]) & INT_MAX;
}
int h1 = hash1, h2 = hash2;
if (h1 == h2) {
return strcmp(a, b);
}
return h1 - h2;
}
long long addpool(struct memfile *poolfile, struct memfile *treefile, const char *s, char type) {
long long *sp = &treefile->tree;
unsigned long *sp = &treefile->tree;
size_t depth = 0;
// In typical data, traversal depth generally stays under 2.5x
size_t max = 3 * log(treefile->off / sizeof(struct stringpool)) / log(2);
if (max < 30) {
max = 30;
}
while (*sp != 0) {
int cmp = swizzlecmp(s, poolfile->map + ((struct stringpool *) (treefile->map + *sp))->off + 1);
@ -58,6 +54,23 @@ long long addpool(struct memfile *poolfile, struct memfile *treefile, const char
} else {
return ((struct stringpool *) (treefile->map + *sp))->off;
}
depth++;
if (depth > max) {
// Search is very deep, so string is probably unique.
// Add it to the pool without adding it to the search tree.
long long off = poolfile->off;
if (memfile_write(poolfile, &type, 1) < 0) {
perror("memfile write");
exit(EXIT_FAILURE);
}
if (memfile_write(poolfile, (void *) s, strlen(s) + 1) < 0) {
perror("memfile write");
exit(EXIT_FAILURE);
}
return off;
}
}
// *sp is probably in the memory-mapped file, and will move if the file grows.
@ -78,6 +91,16 @@ long long addpool(struct memfile *poolfile, struct memfile *treefile, const char
exit(EXIT_FAILURE);
}
if (off >= LONG_MAX || treefile->off >= LONG_MAX) {
// Tree or pool is bigger than 2GB
static bool warned = false;
if (!warned) {
fprintf(stderr, "Warning: string pool is very large.\n");
warned = true;
}
return off;
}
struct stringpool tsp;
tsp.left = 0;
tsp.right = 0;

View File

@ -2,9 +2,9 @@
#define POOL_HPP
struct stringpool {
long long left;
long long right;
long long off;
unsigned long left;
unsigned long right;
unsigned long off;
};
long long addpool(struct memfile *poolfile, struct memfile *treefile, const char *s, char type);

View File

@ -425,7 +425,10 @@ int serialize_feature(struct serialization_state *sst, serial_feature &sf) {
if (sf.geometry.size() > 0 && (sf.bbox[2] < sf.bbox[0] || sf.bbox[3] < sf.bbox[1])) {
fprintf(stderr, "Internal error: impossible feature bounding box %llx,%llx,%llx,%llx\n", sf.bbox[0], sf.bbox[1], sf.bbox[2], sf.bbox[3]);
}
if (sf.bbox[2] - sf.bbox[0] > (2LL << (32 - sst->maxzoom)) || sf.bbox[3] - sf.bbox[1] > (2LL << (32 - sst->maxzoom))) {
if (sf.bbox[0] == LLONG_MAX) {
// No bounding box (empty geometry)
// Shouldn't happen, but avoid arithmetic overflow below
} else if (sf.bbox[2] - sf.bbox[0] > (2LL << (32 - sst->maxzoom)) || sf.bbox[3] - sf.bbox[1] > (2LL << (32 - sst->maxzoom))) {
inline_meta = false;
if (prevent[P_CLIPPING]) {

View File

@ -1,6 +1,6 @@
#ifndef VERSION_HPP
#define VERSION_HPP
#define VERSION "tippecanoe v1.24.0\n"
#define VERSION "tippecanoe v1.24.1\n"
#endif