Working with the SIB

In Switzerland, the air is cold and clean, the food is expensive and the people seem to be friendly.

My new postdoctoral position came with a shiny new affiliation to the Swiss Institute of Bioinformatics (SIB), and I found myself invited to the SIB days conference before  had even begun working – a slightly embarrassing conversation starter when people ask who you work with and you don’t know.

The reason for this confusion being, of course, that working for the SIB means being under the banner of a Group Leader. A wise and extremely computer literate young(ish) guru who provides you with advice, contacts, and most importantly, a MAC. These mighty Bioinformaticians run huge labs of researchers who range from the biologist who can hold there own with a Markov Model to the dedicated computer scientists who may not know what a gene does exactly but can tell you exactly how it is programmed into their databases and where the memory cache would be most efficient in order to retrieve complex queries.

Working with a group leader does not necessarily mean a lot of contact with them, as your lab head does not have to be your group leader. Hence my not knowing who my group leader was.

The conference began with a long talk on our duties; a bizarrely motivating experience where we were treated like a special operations unit who have been tasked with being deployed (or embedded as we say here) into the most inhospitable of biology labs in order to educate, advise and code. Fortunately this mission statement has the desired effect of making you want to immediately find yourself a wet-lab worker and explain in fine detail how you could make their data entries more efficient and how it is your duty, responsibility and most desperate desire to do so.

Basically they make you feel special.

With this in mind – now that it has finally been confirmed that I am special – I will start to update more regularly with anything that I have learned that might be worthwhile to others.

Unfortunately, seemingly as with most of science, there is never a perfect way of doing something. Or anything really. But perhaps knowing how this massive collective of Swiss Bioinformaticians do things might make others feel special too.

 

Posted in SIB | Tagged , , | Leave a comment

Assembling Genomes

As a Bioinformatician, it is expected that you should be efficient with large volumes of data. You should be able to easily and effectively tackle complex tasks with entire genomes as quickly (if not quicker) as someone working  with a single gene. With a favorite scripting language firmly in hand, this is not usually a problem… until it comes to genome assembly.

 

Many programs exist that boast faster and more accurate results than ever before; academic programs will often explain how they are made, commercial will often not. For the confused Bioinformatician the answer is usually to ‘build your own’. Unfortunately, this is not usually feasible for assembling a genome.

 

So how do you choose what will work best in a continuously developing field where the sequencing technologies and software are racing each other in an elaborate and confusing fashion?

 

After assembling 18 novel genomes, I would like to share the following wisdom:

optimization beats innovation

 

There are a multitude products on the market, and as of yet no one really knows which is best, and when they do, it will still only be for a given genome in a given situation.

 

Most of the reliable programs are based on one of two algorithms: the Overlap Layout Consensus for long reads, and the de Bruijn approach for short reads. So just pick one (safe in the knowledge that they are basically the same) and spend your time optimising the hell out of it, because everyone’s data is different… even when it is supposed to be the same!

…and here is how to do that:

 

  1. Know your data – Do you have short or long reads, paired or unpaired? What are your paired end distances? What direction do the pairs face? You WILL need to know this information!
  2. Filtering – Screen for vectors, screen for contamination. Do your Illumina reads contain “N’s”? Get rid of them! Low complexity? Get rid of it! Are your read ends low quality? Mask them! The more data that goes in, the more aggressive the filtering will need to be.
  3. Choose quickly - Software changes frequently, and often in ways that makes very little difference to the overall result. Laboring for weeks over the choice of software is confusing and ultimately pointless, as is trialling vast numbers of software (I say this from experience). Generally well established programs that are being frequently updated are the most reliable.
  4. Know your program - Commercial software will usually use less memory and time than academic software, but here is my caution: Do you know what it is actually doing? Getting the best results out of your chosen program is usually more effective than trying different programs. Be sure you know what it expects, and what it is actually doing.
  5. Scaffold carefully - The largest danger with scaffolding is knowing exactly what the software is doing and more importantly WHAT DOES IT EXPECT? Scaffolders (built-in and stand-alone) will expect pairs to be a certain distance and direction apart. This is where steps 1 and 2 come in to play… you MUST get it right. The scaffolder will not usually tell you if not, and it may not be until quite far down the line that you notice!

 

So those are my steps for genome optimisation, and advice on assembly. I hope they are useful to someone, because many of them were painful to learn!

 

Posted in Assembly | Tagged , , , | Leave a comment

Welcome!

Today marks the creation of my very own Bioinformatics website, which is likely to be constantly evolving in a glorious Lamarckian style!

I will be updating every month, or whenever there is something to say.

For now I will leave with my favourite scientific quote taken out of context (I wish I could remember where exactly it was from):

… nothing would change if time were to flow backwards…

Jordan et al., 2005

 

Posted in Welcome | Tagged | 3 Comments