Centogene AG
Biotech

Automate scientific discovery

HTT are increasingly producing too much data to analyse manually. Using machine learning & scalable infrastructure we automated the analysis of large proteomic and genomic datasets. Replacing a 3 week manual analysis with 5 minutes of computation.

Laboratory instruments for genomics and metabolomics are producing an increasing amount of data that is unsuitable for manual analysis.

By using machine learning, Centogene performs automated pattern recognition on large datasets ̶ allowing them to shorten analysis time to obtain new insights about rare diseases.

This in turn helps them to identify multiple biomarkers or additional biomarker patterns in order to accelerate the development of treatments for patients.

We are very pleased with the support provided by Data Revenue - in particular their knowledge, and the application of machine learning, project management, and client interaction.

Volkmar Weckesser

,

CTO

Tools we used

AWS
-
One stop for compute infrastructure.
Dask
-
Leaves Spark in the dust.
Docker
-
Portable, flexible and simple deployments.
Kubernetes
-
Faster, more efficient and agile infrastructure.
Luigi
-
For managing all tasks in an execution graph – like Airflow.
Pandas
-
Great for exploration and feature engineering.
Python
-
Our main development language.
scikit-learn
-
Grab-box of algorithms and more.
XGBoost
-
Gradient boosting powerhouse.