Current Challenges in #Bioinformatics According to #ScienceTwitter

In my role as global product manager for the CLC portfolio at QIAGEN, I spend a lot of time fretting about “getting it right” with regards to laying out our development roadmaps. This is not only so that the development team has a clear picture of where we’re going over the next 12 – 24 months, but also so that our customers (some 20k+ users globally) feel that CLC is “worth it”. It’s a tough challenge, as our software covers the entire field of genomics – which, as you might guess, results in many customer voices with differing and sometimes conflicting opinions on what we should be building next. If you’re a CLC user, you may have received a survey from me, or an invitation by email from me directly to connect for a customer interview (I do a lot of these). Then it occured to me – why not ask the folks who follow me on Twitter? So – a few weeks ago I asked just this…

The results were pretty interesting and IMHO captures some salient points that would be important to know for anyone who develops bioinformatics software – commercial or otherwise. I’ve tried to summarize the results below, as a very informal count of the number of times each topic was mentioned in various tweets replying to the initial question and the subsequent replies.

Easier installation13
Better Documentation11
Crappy support / Abandonware10
Tutorials and Best Practice Recommedations6
Lack of clear / fair benchmarking studies or datasets5
Lack of Computational Resources4
Improved Curation of Existing Public Databases4
Bioinformatics as an afterthought during project or experiment planning3
Poor standards for file and data formats2
Too few bioinformaticians / training opportunities2

To me – there were some surprises in the results.

First, have we solved the file format issue in bioinformatics? Probably not, but there was a time that this was widely discussed as one of the biggest issues in our field. There have been major gains and improvements in this space – but the “Yet Another Standard” meme from XKCD still regularly shows up on Twitter and BioStars. 😉

The other end of the spectrum was also a surprise: Easier Installation topped the list. This underscores the importance of bioinformatics developers to continue to embrace containers (i.e. Docker, etc) for their tools, so that people can spend more time on the science and less on adventures in dependency hell. In my own day to day work, I don’t have to worry about this – as CLC is built to be installed and run on essentially any OS (mac, windows, linux, cloud, whatever). Our existing users also never complain about this issue – for the same reason, it just works. But my impression from my colleagues who are still immersed in opensource development has been that “everyone uses Docker” (or something similar) and installation is not really an issue anymore – which is great! But – given the results above – perhaps the practice of “Dockerizing” new algos is not as widespread as many bioinformaticians think? These results could (perhaps likely) stem from my own sampling bias (that being active scientists on twitter, most of whom are in the microbiology or virology space). It could also be reflective of echo-chamber bias from bioinformaticians who interact mostly with each others inside their field. Perhaps there’s a gap in understanding the massive challenge it can be for most bench scientists who have no experience with the command line to simply run a dockerized tool.

I remember my own experience with Docker. it was probably about 4 or 5 years ago – sort of early in the “dockerize everything” days. A colleague of mine from another institution was like “Yeah, it’s a docker image. Just install docker and then you’re good to go.”

it took me several hours to get docker properly installed on our system, and another few hours to get the dockerized tools to work how I wanted. (OK – I probably spent time reading documentation, etc etc – but still). I know Docker is much easier to install, use, configure, etc nowadays – but nonetheless for many bench scientists – with no coding / command line experience – this still can be a major test of patience and requires a major time commitment (especially when you have other experiments running on the bench that simply can’t wait).

So – maybe the high frequency of “Installation” issues above indicate that Docker and other containers are really solving the challenges faced by many users (or would be users) of bioinformatics tools. Perhaps someone needs to develop a drag and drop docker canvas whereby any tool can be run without any command-line interface. Visual programming perhaps for bioinformatics. That would be cool actually… 🤔 🧐😉

What do you think are the biggest challenges in bioinformatics these days? Leave a comment below and share your view point on this! Thanks for reading. – J