tag:blogger.com,1999:blog-1071661434473559589.post5446796062885720064..comments2024-03-27T16:30:10.302+11:00Comments on The Genome Factory: Minimum standards for bioinformatics command line toolsTorsten Seemannhttp://www.blogger.com/profile/12241185247897084810noreply@blogger.comBlogger8125tag:blogger.com,1999:blog-1071661434473559589.post-65935137710024232512024-01-31T22:32:58.327+11:002024-01-31T22:32:58.327+11:00Bioinformatics software encompasses a broad array ...<a href="https://pumasai.com/" rel="nofollow">Bioinformatics software</a> encompasses a broad array of computational tools and platforms designed to analyze, interpret, and visualize biological data. PumasAihttps://www.blogger.com/profile/08291824769394119955noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-34600678979010762392013-09-19T09:41:25.970+10:002013-09-19T09:41:25.970+10:00Nice read. Regarding #10, I'd recommend 'u...Nice read. Regarding #10, I'd recommend 'use autodie;' over 'use Fatal;'. According to the Fatal docs: "Fatal has been obsoleted by the new autodie pragma. Please use autodie in preference to Fatal . autodie supports lexical scoping, throws real exception objects, and provides much nicer error messages."Anonymoushttps://www.blogger.com/profile/08111599436928346467noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-57662803297068801252013-08-31T08:22:19.883+10:002013-08-31T08:22:19.883+10:00One of the 'debates' on this issue is wher...One of the 'debates' on this issue is where to print the -help and -version information to: Stderr or Stdout? I use Stderr for _everything_ not related to algorithm output, but some people disagree. Of course they are wrong ;-)<br />Torsten Seemannhttps://www.blogger.com/profile/12241185247897084810noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-57038864537597138622013-08-31T04:21:25.049+10:002013-08-31T04:21:25.049+10:00I'm glad you mentioned #4, one thing I hadn...I'm glad you mentioned #4, one thing I hadn't payed much attention to.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-31224400011037607382013-08-13T07:56:19.030+10:002013-08-13T07:56:19.030+10:00It's funny coz its true :-PIt's funny coz its true :-PTorsten Seemannhttps://www.blogger.com/profile/12241185247897084810noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-13351288900220351842013-08-13T05:40:34.510+10:002013-08-13T05:40:34.510+10:00Error: can't load /home/steven/work/biotool/da...Error: can't load /home/steven/work/biotool/data/vector.seq<br /># ARRRGGGGHHH!<br /><br />LMFAO!!!Jonathan Jacobshttps://www.blogger.com/profile/06133232985480734844noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-26341097853304930842013-08-09T23:12:28.189+10:002013-08-09T23:12:28.189+10:00Hi Torsten, great post! This is one of those areas...Hi Torsten, great post! This is one of those areas of software engineering that is almost completely determined by convention (e.g., what people expect based on their prior experience with command-line tools), so you don’t have to be a great programmer to do it right. In fact, writing command-line arguments is more like writing user documentation or tutorials. I find the best approach is to simply think through the possible scenarios that your end users will encounter, and pick what you think will be the least unexpected. Here are two additional points:<br /><br />RE #1: there is a long-standing convention in UNIX that tools should be written to communicate with each other through pipes, which is why a lot of programs will default to accepting stdin. But you are right that this is confusing. I've actually been caught confused by my own programs that have this behavior! One work-around is to print a help message when there are no arguments, but to accept the special filename ‘-’ in your input flag to indicate “read from stdin” (another UNIX convention), e.g.<br /><br />$ cat bigfile | biotool -i -<br /><br />Another workaround is to print a message to stderr indicating taht the program. This is what I do in SeqDB:<br /><br />$ seqdb profile<br />seqdb-profile: profiling FASTQ records from ''<br /><br />It would probably be even clearer if you printed another message “use - to exit” (one of the most frustrating things for a new UNIX user is not knowing how to exit a program and get back to the shell!).<br /><br />RE #7: namespace pollution is a problem everywhere -- not just in bioinformatics -- and its simply because more software is available now than 50 years ago when UNIX was invented and no one had laid claim to cat, head, tail, nm, more, etc. ImageMagick had some real balls to claim convert! I think the best solution here is the model that git uses, e.g. name your programs git-* then have a single wrapper called git that forwards the user to the appropriate program. This way, you only create a single entry (git) in the namespace that is likely to conflict with other software, and all of your other entries are derived from that name an unlikely to conflict with anything. It's like domains and sub-domains for the WWW. For an example of a shell wrapper to do the forwarding, see<br /><br />https://bitbucket.org/mhowison/seqdb/src/master/scripts/seqdb.in<br />https://bitbucket.org/caseywdunn/agalma/src/master/agalma/agalma.in<br /><br />The other advantage of using a shell wrapper like this is you can set additional environment variables you need for the sub-programs.<br /><br />Best,<br />MarkMark Howisonhttps://www.blogger.com/profile/02809492114023703792noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-3078340417629917032013-08-09T18:01:20.509+10:002013-08-09T18:01:20.509+10:00Sounds like you are in favor of the getopts standa...Sounds like you are in favor of the getopts standard for command line parameters?Egon Willighagenhttps://www.blogger.com/profile/07470952136305035540noreply@blogger.com