Detecting Disease from Genomics Data Using Intersect Labs

Capturing wild fish or shellfish can no longer meet the increasing demand of human society, and overfishing is creating big problems socio-economically and ecologically. The Center for Aquaculture Technologies provides technology to the fish farming industry, so they can solve that problem, so that people can access affordable and healthy animal proteins into the future.

Our genetics department works with farmers and breeders to help select traits of interest  identify what a fish will be like. For example, we use DNA information to predict whether a fish will exhibit a certain scale color, or be vulnerable to disease.

Using advanced genetic sequencing techniques, we generate massive amounts of data. Interpreting so much data is difficult. Usually, a highly trained bioinformatician will use complex statistical models, or rely on “expert companies” for analyses — expensive, with overwhelming technical details.

We’re currently working on a project with Intersect Labs to identify a disease-related genetic condition in an offshore aquaculture species. Massive sequencing data were obtained from selected individuals, and then formatted to a dataset with 32,000 columns. Each column corresponds to a Single Nucleotide Polymorphism (SNP) — each representing a single mutation on specific position — so each column could, theoretically, be important.

Without any pre-processing, we simply dumped this data into Intersect Labs and built a classifier to detect our unwanted genetic condition. It took about two minutes for the platform to clean our data and train models — the final result was a classification model that’s over 99% accurate.

We still have a problem, though. We want to be quick in detecting if a fish is going to resist that disease, and sequencing for 32,000 SNP’s is intensive. By relying on Intersect Labs to find the most predictive features of our dataset, we’ve been able to whittle down to 56 SNPs… without sacrificing the accuracy of the classifier.

With Intersect Labs we’re able to process a massive genome dataset far more quickly and accurately, with far less effort and high repeatability. The entire process took us less than a couple hours, and the model is available for us whenever we need it.

Case Study written by
E Hu, Research Scientist
The Center for Aquaculture Technologies Inc.
