snowshu.samplings.samplings package

Submodules

snowshu.samplings.samplings.brute_force_sampling module

class snowshu.samplings.samplings.brute_force_sampling.BruteForceSampling(probability: float = 0.1, min_sample_size: int = 1000, max_allowed_rows: int = 1000000)

Bases: snowshu.core.samplings.bases.base_sampling.BaseSampling

Heuristic sampling using raw % size for sample size and Bernoulli sampling.

Parameters
  • probability – The % sample size desired in decimal format from 0.01 to 0.99. Default 10%.

  • min_sample_size – The minimum number of records to retrieve from the population. Default 1000.

prepare(relation: Relation, source_adapter: BaseSourceAdapter) → None

Runs all necessary pre-activities and instanciates the sample method.

Prepare will be called before primary query compile time, so it can be used to do any necessary pre-compile activities (such as collecting a histogram from the relation).

Parameters
  • relation – The Relation object to prepare.

  • source_adapter – The source adapter instance to use for executing prepare queries.

size: int = None

snowshu.samplings.samplings.default_sampling module

class snowshu.samplings.samplings.default_sampling.DefaultSampling(margin_of_error: float = 0.02, confidence: float = 0.99, min_sample_size: int = 1000, max_allowed_rows: int = 1000000)

Bases: snowshu.core.samplings.bases.base_sampling.BaseSampling

Basic sampling using Cochrans theorem for sample size and Bernoulli sampling.

This default sampling assumes high volatility in the population

Parameters
  • margin_of_error – The acceptable error % expressed in a decimal from 0.01 to 0.10 (1% to 10%). Default 0.02 (2%). https://en.wikipedia.org/wiki/Margin_of_error

  • confidence – The confidence interval to be observed for the sample expressed in a decimal from 0.01 to 0.99 (1% to 99%). Default 0.99 (99%). http://www.stat.yale.edu/Courses/1997-98/101/confint.htm

  • min_sample_size – The minimum number of records to retrieve from the population. Default 1000.

prepare(relation: Relation, source_adapter: BaseSourceAdapter) → None

Runs all nessesary pre-activities and instanciates the sample method.

Prepare will be called before primary query compile time, so it can be used to do any nessesary pre-compile activites (such as collecting a histogram from the relation).

Parameters
  • relation – The Relation object to prepare.

  • source_adapter – The source adapter instance to use for executing prepare queries.

size: int = None

Module contents