Welcome to the phenotrex documentation!¶
phenotrex¶
Microbial Phenotype Prediction, re-implemented with Python 3.7 and scikit-learn
Supported platforms: Linux, MacOS, Windows
Free software: MIT license
Installation¶
Stable release¶
To install pheno-trex, run this command in your terminal:
$ pip install phenotrex
This is the preferred method to install pheno-trex, as it will always install the most recent stable release.
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources¶
The sources for pheno-trex can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/univieCUBE/PICA2
Or download the tarball:
$ curl -OL https://github.com/univieCUBE/PICA2/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
Usage¶
To use pheno-trex in a project:
from phenotrex.io import ... # file I/O
from phenotrex.ml import ... # classifiers and training/CV functionality
from phenotrex.util import ... # plotting and util functions
phenotrex¶
phenotrex package¶
Subpackages¶
phenotrex.cli package¶
Submodules¶
phenotrex.cli.cccv module¶
phenotrex.cli.clf_opt module¶
phenotrex.cli.compute_genotype module¶
phenotrex.cli.cv module¶
phenotrex.cli.generic_func module¶
phenotrex.cli.generic_opt module¶
phenotrex.cli.get_weights module¶
phenotrex.cli.main module¶
phenotrex.cli.plot module¶
phenotrex.cli.predict module¶
phenotrex.cli.train module¶
Module contents¶
phenotrex.io package¶
Submodules¶
phenotrex.io.flat module¶
phenotrex.io.serialization module¶
Module contents¶
phenotrex.ml package¶
Subpackages¶
Submodules¶
phenotrex.ml.cccv module¶
phenotrex.ml.feature_select module¶
phenotrex.ml.trex_classifier module¶
phenotrex.ml.vectorizer module¶
Module contents¶
phenotrex.structure package¶
Submodules¶
phenotrex.structure.records module¶
-
class
phenotrex.structure.records.
GenotypeRecord
(identifier: str, features: List[str])[source]¶ Bases:
object
Genomic features of a sample referenced by identifier.
-
features
: List[str] = None¶
-
identifier
: str = None¶
-
-
class
phenotrex.structure.records.
GroupRecord
(identifier: str, group_name: Optional[str], group_id: Optional[int])[source]¶ Bases:
object
Group label of sample identifier. Notes —– Useful for leave-one-group-out cross-validation (LOGO-CV), for example, to take taxonomy into account.
-
group_id
: Optional[int] = None¶
-
group_name
: Optional[str] = None¶
-
identifier
: str = None¶
-
-
class
phenotrex.structure.records.
PhenotypeRecord
(identifier: str, trait_name: str, trait_sign: int)[source]¶ Bases:
object
Ground truth labels of sample identifier, indicating presence/absence of trait trait_name:
0 if trait is absent
1 if trait is present
-
identifier
: str = None¶
-
trait_name
: str = None¶
-
trait_sign
: int = None¶
-
class
phenotrex.structure.records.
TrainingRecord
(identifier: str, group_name: Optional[str], group_id: Optional[int], trait_name: str, trait_sign: int, features: List[str])[source]¶ Bases:
phenotrex.structure.records.GenotypeRecord
,phenotrex.structure.records.PhenotypeRecord
,phenotrex.structure.records.GroupRecord
Sample containing Genotype-, Phenotype- and GroupRecords, suitable as machine learning input for a single observation.
-
features
= None¶
-
identifier
= None¶
-
Module contents¶
phenotrex.transforms package¶
Submodules¶
phenotrex.transforms.annotation module¶
phenotrex.transforms.resampling module¶
-
class
phenotrex.transforms.resampling.
TrainingRecordResampler
(random_state: float = None, verb: bool = False)[source]¶ Bases:
object
Instantiates an object which can generate versions of a TrainingRecord resampled to defined completeness and contamination levels. Requires prior fitting with full List[TrainingRecord] to get sources of contamination for both classes.
- Parameters
random_state – Randomness seed to use while resampling
verb – Toggle verbosity
-
fit
(records: List[phenotrex.structure.records.TrainingRecord])[source]¶ Fit TrainingRecordResampler on full TrainingRecord list to determine set of positive and negative features for contamination resampling.
- Parameters
records – the full List[TrainingRecord] on which ml training will commence.
- Returns
True if fitting was performed, else False.
-
get_resampled
(record: phenotrex.structure.records.TrainingRecord, comple: float = 1, conta: float = 0) → phenotrex.structure.records.TrainingRecord[source]¶ Resample a TrainingRecord to defined completeness and contamination levels. Comple=1, Conta=1 will double set size.
- Parameters
comple – completeness of returned TrainingRecord features. Range: 0 - 1
conta – contamination of returned TrainingRecord features. Range: 0 - 1
record – the input TrainingRecord
- Returns
a resampled TrainingRecord.
phenotrex.util package¶
Submodules¶
phenotrex.util.helpers module¶
-
phenotrex.util.helpers.
get_groups
(records: List[phenotrex.structure.records.TrainingRecord]) → numpy.ndarray[source]¶ Get groups from list of TrainingRecords
- Parameters
records –
- Returns
list for groups
-
phenotrex.util.helpers.
get_x_y_tn
(records: List[phenotrex.structure.records.TrainingRecord]) → Tuple[numpy.ndarray, numpy.ndarray, str][source]¶ Get separate X and y from list of TrainingRecord. Also infer trait name from first TrainingRecord.
- Parameters
records – a List[TrainingRecord]
- Returns
separate lists of features and targets, and the trait name
phenotrex.util.logging module¶
phenotrex.util.plotting module¶
-
phenotrex.util.plotting.
compleconta_plot
(cccv_results: Union[Dict[float, Dict[float, Dict[str, float]]], List[Dict[float, Dict[float, Dict[str, float]]]]], conditions: List[str] = (), each_n: List[int] = None, title: str = '', fontsize: int = 16, figsize=(10, 7), plot_comple: bool = True, plot_conta: bool = True, colors: List = None, save_path: Union[str, pathlib.Path] = None, **kwargs)[source]¶ Plots Compleconta CV result for one or multiple models. For perfect completeness and variable contamination as well as perfect contamination and variable completeness, the resulting mean balanced accuracy over folds is plotted.
- Parameters
cccv_results – a ComplecontaCV result, or list thereof
conditions – A list of condition names associated cccv_results
each_n – A list of sample counts in datasets associated with cccv_results
title – The plot title
fontsize – The fontsize of the plot
figsize – The figure size (tuple of width, height)
plot_comple – Whether to plot completeness
plot_conta – Whether to plot contamination
colors –
save_path – The save path of the plot; if None, display it with plt.show()
kwargs – any further keyword arguments passed to plt.plot()
- Returns
None
phenotrex.util.taxonomy module¶
Module contents¶
Module contents¶
Top-level package for phenotrex.
Credits¶
Development Lead¶
Lukas Lüftinger <lukas.lueftinger@outlook.com>
Contributors¶
Patrick Hyden <hydenp89@univie.ac.at>
Roman Feldbauer <roman.feldbauer@univie.ac.at>