pica.io package¶

Submodules¶

pica.io.io module¶

pica.io.io.collate_training_data(genotype_records: List[pica.structure.records.GenotypeRecord], phenotype_records: List[pica.structure.records.PhenotypeRecord], group_records: List[pica.structure.records.GroupRecord], universal_genotype: bool = False, verb: bool = False) → List[pica.structure.records.TrainingRecord][source]¶

Returns a list of TrainingRecord from two lists of GenotypeRecord and PhenotypeRecord. To be used for training and CV of TrexClassifier. Checks if 1:1 mapping of phenotypes and genotypes exists, and if all PhenotypeRecords pertain to same trait.

Parameters

genotype_records – List[GenotypeRecord]
phenotype_records – List[PhenotypeRecord]
group_records – List[GroupRecord] optional, if leave one group out is the split strategy
universal_genotype – Whether to use an universal genotype file.
verb – toggle verbosity.

Returns

List[TrainingRecord]

pica.io.io.load_genotype_file(input_file: str) → List[pica.structure.records.GenotypeRecord][source]¶

Loads a genotype .tsv file and returns a list of GenotypeRecord for each entry.

Parameters: input_file – The path to the input genotype file.
Returns: List[GenotypeRecord] of records in the genotype file

pica.io.io.load_groups_file(input_file: str, selected_rank: str = None) → List[pica.structure.records.GroupRecord][source]¶

Loads a .tsv file which contains group or taxid for each sample in the other training files. Group-Ids may be ncbi-taxon-ids or arbitrary group names. Taxon-Ids are only used if a standard rank is selected, otherwise user-specified group-ids are assumed. Automatically classifies the [TODO missing text?]

Parameters

input_file – path to the file that is processed
selected_rank – the standard rank that is selected (optional) if not set, the input file is assumed to contain groups, i.e., each unique entry of the ID will be a new group

Returns

a list of GroupRecords

pica.io.io.load_phenotype_file(input_file: str, sign_mapping: Dict[str, int] = None) → List[pica.structure.records.PhenotypeRecord][source]¶

Loads a phenotype .tsv file and returns a list of PhenotypeRecord for each entry.

Parameters

input_file – The path to the input phenotype file.
sign_mapping – an optional Dict to change mappings of trait sign. Default: {“YES”: 1, “NO”: 0}

Returns

List[PhenotypeRecord] of records in the phenotype file

pica.io.io.load_training_files(genotype_file: str, phenotype_file: str, groups_file: str = None, selected_rank: str = None, universal_genotype: bool = False, verb=False) → Tuple[List[pica.structure.records.TrainingRecord], List[pica.structure.records.GenotypeRecord], List[pica.structure.records.PhenotypeRecord], List[pica.structure.records.GroupRecord]][source]¶

Convenience function to load phenotype and genotype file together, and return a list of TrainingRecord.

Parameters

genotype_file – The path to the input genotype file.
phenotype_file – The path to the input phenotype file.
groups_file – The path to the input groups file.
selected_rank – The selected standard rank to use for taxonomic grouping
universal_genotype – Whether to use an universal genotype file.
verb – toggle verbosity.

Returns

Tuple[List[TrainingRecord], List[GenotypeRecord], List[PhenotypeRecord]]

pica.io.io.write_cccv_accuracy_file(output_file: str, cccv_results)[source]¶

Function to write the cccv accuracies in the exact format that phendb uses as input.

Parameters

output_file – file
cccv_results –

Returns

nothing

pica.io.io.write_misclassifications_file(output_file: str, records: List[pica.structure.records.TrainingRecord], misclassifications, use_groups: bool = False)[source]¶

Function to write the misclassifications file.

Parameters

output_file – name of the outputfile
records – List of trainingRecord objects
misclassifications – List of percentages of misclassifications
use_groups – toggles average over groups and groups output

Returns

pica.io.io.write_weights_file(weights_file: str, weights: Dict)[source]¶

Function to write the weights to specified file in tab-separated fashion with header

Parameters

weights_file – The path to the file to which the output will be written
weights – sorted dictionary storing weights with feature names as indices

Returns

nothing

pica.io package¶

Submodules¶

pica.io.io module¶

Module contents¶

Table of Contents

Previous topic

Next topic

This Page