Previous topic

Hustle Query Guide

Next topic

hustle – Hustle Distributed OLAP Database

This Page

Inserting Data To Hustle

The process of inserting data into a Hustle cluster is referred to as a distributed insert. It is distributed because the client machine does the heavy lifting of creating a Marble, which is a self-contained large grained database fragment, which is then pushed into the distributed file system DDFS, which is a relatively inexpensive HTTP operation. The write throughput to the Hustle cluster, then, is only bound by the number of machines inserting into it.

Hustle currently supports one JSON object per line style input, as well as Disco’s native results format.

Here is an example insert:

from hustle import Table, insert
impressions = Table.from_tag('impressions')
insert(impressions, './imprsions-june-8.json', server='disco://hustle')

Hustle provides a command-line tool for inserting data located at bin/insert. Here is the –help for it:

➜ hustle/bin > ./insert --help
usage: insert [-h] [-s SERVER] [-f INFILE] [-m MAXSIZE] [-t TMPDIR]
          [-p PROCESSOR] [--disco-chunk]
          TABLE FILES [FILES ...]

Hustle bulk load

positional arguments:
  TABLE                 The Hustle table to insert to
  FILES                 The JSON formated files to insert

optional arguments:
  -h, --help            show this help message and exit
  -s SERVER, --server SERVER
                        DDFS server destination
  -f INFILE             A file containing a list of all files to be inserted
  -m MAXSIZE, --maxsize MAXSIZE
                        Initial size of Hustle marble
  -t TMPDIR, --tmpdir TMPDIR
                        Temporary directory for Hustle marble creation
  -p PROCESSOR          a module.function for the Hustle import preprocessor
  --disco-chunk         Indicated if the input files are in Disco CHUNK format

See also

Page Hustle Integration Test Suite
Hustle’s Integration Test Suite for creating and inserting to partitioned Tables.

insert() function