• Given I have loaded {name}

    Step definition and Rspec integration test helpers cascade through a variety of options for ensuring a dataset is ready for testing; in order of fastest to most correct.

    @survey = [
      load_from_database {name},
      load_from_file {name},
      load_from_server {name},
      load_from_source_data {name}
    ].detect &:present?

    Once, the dataset is loaded, metadata such as variables, etc which may have changed during a test and are cheap to reset are played over the dataset

    reset_survey_metadata(@survey)
  • def load_from_database(name)

    Because we’re not using SQL dumps, we can avoid worrying about IDs and be comfortable having all datasets alongside each other.

    Survey.where(name: name).first
  • def load_from_file(name)

    File is a compatible dataset containing both source data and calculated/generated data. File is cached on disk according to a versioning scheme.

    file = current_version_filename(name)
    
    survey = if file.exists?
      FileAdapter.load(file)
    end
  • def load_from_server(name)

    Check if a remote server has a valid copy of this version of the dataset. Given that CI will probably be doing most of the generating, this will probably be CI.

    remote_name = current_version_filename(name)
    file = SCP.file("deploy@host/#{remote_name}")
    
    survey = if file.exists?
      FileAdapter.load(file)
    end
  • def load_from_source_data(name)

    This is ultimate source of truth used to generate a fully calculated dataset.

      survey = FileAdapter.load(source_dataset)
      configure(survey, name)
      survey.start_generating!
      saved_dataset = create_sss_archive(survey)
      upload_to_server saved_dataset
      return survey
  • def reset_metadata(survey)

    The high level concept here is to reset the more transient state such as variables. It should be relatively easy to add different types of stuff here, like norms. This should be in a text format like YAML, JSON or just plain Ruby/FactoryGirl, or even a DSL.

  • def current_version_filename(name)

    This filename needs to key itself from all possible reasons it should be marked invalid. For example:

    • schema_version: Have done a migration
    • code_version: [dev, 545] - needs to increment. Only sane way to do that across branches is to include the branch name and an incrementing number or date.

    We could then mark the current cached datasets invalid (because we’ve created a new calculation) by doing:

    rake dataset:touch
  • def source_dataset(name)

    Any file adapter format - checked into main repo

  • def configure(name)

    Aim here is to configure a freshly loaded dataset to the point it can generate.

    Options include:

    • Using a full-blown cucumber to describe
    • Simple text-based format - like JSON
    • Ruby DSL
    dataset_metadata_factory :ipsos_defaults do
      norms "ipsos_norms.csv"
    end
    
    dataset_metadata_factory :uk_retailer do
      include_from :ipsos_defaults
    
      category "retailers"
      variable :market_size, 10_000_000
    
      question :attributes, "ATTR%"
      variable :barrier_groups, :something_smart
    end

    Hypothetical DSL

  • def create_sss_archive(survey)

    Simply use straight Sassy export to create zipped SSS export. Meta-data needs to be stored in repo in whatever serialisation format we decide.

  • def upload_to_server(dataset)

    SCP.upload dataset, host, filename
{"cards":[{"_id":"35cd806c41f42afa74000018","treeId":"35cd804841f42afa74000015","seq":1,"position":1,"parentId":null,"content":"# Given I have loaded {name}\n\nStep definition and Rspec integration test helpers cascade through a variety of options for ensuring a dataset is ready for testing; in order of fastest to most correct.\n\n```\n@survey = [\n load_from_database {name},\n load_from_file {name},\n load_from_server {name},\n load_from_source_data {name}\n].detect &:present?\n```\n\nOnce, the dataset is loaded, metadata such as variables, etc which may have changed during a test and are cheap to reset are played over the dataset\n\n```\nreset_survey_metadata(@survey)\n```"},{"_id":"35cd8cbf41f42afa74000019","treeId":"35cd804841f42afa74000015","seq":1,"position":1,"parentId":"35cd806c41f42afa74000018","content":"### `def load_from_database(name)`\n\nBecause we're not using SQL dumps, we can avoid worrying about IDs and be comfortable having all datasets alongside each other.\n\n```\nSurvey.where(name: name).first\n```"},{"_id":"35ce1b6041f42afa74000034","treeId":"35cd804841f42afa74000015","seq":1,"position":1.5,"parentId":"35cd806c41f42afa74000018","content":"### `def load_from_file(name)`\n\nFile is a compatible dataset containing both source data and calculated/generated data. File is cached on disk according to a versioning scheme.\n\n```\nfile = current_version_filename(name)\n\nsurvey = if file.exists?\n FileAdapter.load(file)\nend\n```"},{"_id":"35ce2f5f41f42afa74000035","treeId":"35cd804841f42afa74000015","seq":1,"position":1,"parentId":"35ce1b6041f42afa74000034","content":"### `def current_version_filename(name)`\n\nThis filename needs to key itself from all possible reasons it should be marked invalid. For example:\n\n* **schema_version**: Have done a migration\n* **code_version**: [dev, 545] - needs to increment. Only sane way to do that across branches is to include the branch name and an incrementing number or date.\n\nWe could then mark the current cached datasets invalid (because we've created a new calculation) by doing:\n\n```\nrake dataset:touch\n```"},{"_id":"35cd91e641f42afa7400001a","treeId":"35cd804841f42afa74000015","seq":1,"position":2,"parentId":"35cd806c41f42afa74000018","content":"### `def load_from_server(name)`\n\nCheck if a remote server has a valid copy of this version of the dataset. Given that CI will probably be doing most of the generating, this will probably be CI.\n\n```\nremote_name = current_version_filename(name)\nfile = SCP.file(\"deploy@host/#{remote_name}\")\n\nsurvey = if file.exists?\n FileAdapter.load(file)\nend\n```"},{"_id":"35ce3b1641f42afa74000036","treeId":"35cd804841f42afa74000015","seq":1,"position":3,"parentId":"35cd806c41f42afa74000018","content":"### `def load_from_source_data(name)`\n\nThis is ultimate source of truth used to generate a fully calculated dataset.\n\n```\n survey = FileAdapter.load(source_dataset)\n configure(survey, name)\n survey.start_generating!\n saved_dataset = create_sss_archive(survey)\n upload_to_server saved_dataset\n return survey\n```"},{"_id":"35ce404f41f42afa74000037","treeId":"35cd804841f42afa74000015","seq":1,"position":1,"parentId":"35ce3b1641f42afa74000036","content":"### `def source_dataset(name)`\n\nAny file adapter format - checked into main repo"},{"_id":"35ce5c2041f42afa74000038","treeId":"35cd804841f42afa74000015","seq":1,"position":2,"parentId":"35ce3b1641f42afa74000036","content":"### `def configure(name)`\n\nAim here is to configure a freshly loaded dataset to the point it can generate.\n\nOptions include:\n* Using a full-blown cucumber to describe\n* Simple text-based format - like JSON\n* Ruby DSL\n\n```\ndataset_metadata_factory :ipsos_defaults do\n norms \"ipsos_norms.csv\"\nend\n\ndataset_metadata_factory :uk_retailer do\n include_from :ipsos_defaults\n\n category \"retailers\"\n variable :market_size, 10_000_000\n\n question :attributes, \"ATTR%\"\n variable :barrier_groups, :something_smart\nend\n```\nHypothetical DSL "},{"_id":"35ce5e3541f42afa74000039","treeId":"35cd804841f42afa74000015","seq":1,"position":3,"parentId":"35ce3b1641f42afa74000036","content":"### `def create_sss_archive(survey)`\n\nSimply use straight Sassy export to create zipped SSS export. Meta-data needs to be stored in repo in whatever serialisation format we decide."},{"_id":"35ce5edf41f42afa7400003a","treeId":"35cd804841f42afa74000015","seq":1,"position":4,"parentId":"35ce3b1641f42afa74000036","content":"### `def upload_to_server(dataset)`\n\n```\nSCP.upload dataset, host, filename\n```"},{"_id":"35d363af41f42afa7400003b","treeId":"35cd804841f42afa74000015","seq":1,"position":4,"parentId":"35cd806c41f42afa74000018","content":"### `def reset_metadata(survey)`\n\nThe high level concept here is to reset the more transient state such as variables. It should be relatively easy to add different types of stuff here, like norms. This should be in a text format like YAML, JSON or just plain Ruby/FactoryGirl, or even a DSL.\n\n\n"}],"tree":{"_id":"35cd804841f42afa74000015","name":"Ref data 2.0","publicUrl":"ref-data-20"}}