pica.io package¶
Submodules¶
pica.io.io module¶
-
pica.io.io.
collate_training_data
(genotype_records: List[pica.structure.records.GenotypeRecord], phenotype_records: List[pica.structure.records.PhenotypeRecord], group_records: List[pica.structure.records.GroupRecord], universal_genotype: bool = False, verb: bool = False) → List[pica.structure.records.TrainingRecord][source]¶ Returns a list of TrainingRecord from two lists of GenotypeRecord and PhenotypeRecord. To be used for training and CV of TrexClassifier. Checks if 1:1 mapping of phenotypes and genotypes exists, and if all PhenotypeRecords pertain to same trait.
- Parameters
genotype_records – List[GenotypeRecord]
phenotype_records – List[PhenotypeRecord]
group_records – List[GroupRecord] optional, if leave one group out is the split strategy
universal_genotype – Whether to use an universal genotype file.
verb – toggle verbosity.
- Returns
List[TrainingRecord]
-
pica.io.io.
load_genotype_file
(input_file: str) → List[pica.structure.records.GenotypeRecord][source]¶ Loads a genotype .tsv file and returns a list of GenotypeRecord for each entry.
- Parameters
input_file – The path to the input genotype file.
- Returns
List[GenotypeRecord] of records in the genotype file
-
pica.io.io.
load_groups_file
(input_file: str, selected_rank: str = None) → List[pica.structure.records.GroupRecord][source]¶ Loads a .tsv file which contains group or taxid for each sample in the other training files. Group-Ids may be ncbi-taxon-ids or arbitrary group names. Taxon-Ids are only used if a standard rank is selected, otherwise user-specified group-ids are assumed. Automatically classifies the [TODO missing text?]
- Parameters
input_file – path to the file that is processed
selected_rank – the standard rank that is selected (optional) if not set, the input file is assumed to contain groups, i.e., each unique entry of the ID will be a new group
- Returns
a list of GroupRecords
-
pica.io.io.
load_phenotype_file
(input_file: str, sign_mapping: Dict[str, int] = None) → List[pica.structure.records.PhenotypeRecord][source]¶ Loads a phenotype .tsv file and returns a list of PhenotypeRecord for each entry.
- Parameters
input_file – The path to the input phenotype file.
sign_mapping – an optional Dict to change mappings of trait sign. Default: {“YES”: 1, “NO”: 0}
- Returns
List[PhenotypeRecord] of records in the phenotype file
-
pica.io.io.
load_training_files
(genotype_file: str, phenotype_file: str, groups_file: str = None, selected_rank: str = None, universal_genotype: bool = False, verb=False) → Tuple[List[pica.structure.records.TrainingRecord], List[pica.structure.records.GenotypeRecord], List[pica.structure.records.PhenotypeRecord], List[pica.structure.records.GroupRecord]][source]¶ Convenience function to load phenotype and genotype file together, and return a list of TrainingRecord.
- Parameters
genotype_file – The path to the input genotype file.
phenotype_file – The path to the input phenotype file.
groups_file – The path to the input groups file.
selected_rank – The selected standard rank to use for taxonomic grouping
universal_genotype – Whether to use an universal genotype file.
verb – toggle verbosity.
- Returns
Tuple[List[TrainingRecord], List[GenotypeRecord], List[PhenotypeRecord]]
-
pica.io.io.
write_cccv_accuracy_file
(output_file: str, cccv_results)[source]¶ Function to write the cccv accuracies in the exact format that phendb uses as input.
- Parameters
output_file – file
cccv_results –
- Returns
nothing
-
pica.io.io.
write_misclassifications_file
(output_file: str, records: List[pica.structure.records.TrainingRecord], misclassifications, use_groups: bool = False)[source]¶ Function to write the misclassifications file.
- Parameters
output_file – name of the outputfile
records – List of trainingRecord objects
misclassifications – List of percentages of misclassifications
use_groups – toggles average over groups and groups output
- Returns
-
pica.io.io.
write_weights_file
(weights_file: str, weights: Dict)[source]¶ Function to write the weights to specified file in tab-separated fashion with header
- Parameters
weights_file – The path to the file to which the output will be written
weights – sorted dictionary storing weights with feature names as indices
- Returns
nothing