Bracketing part of a dna sequence in Javascript

Sometimes you just need a quick script to make life easier. This is an example of a small task: You have a dna sequence extracted from some bigger corpus, such as a reference genome sequence, and you know that there is something interesting going on 200 bp into the sequence. So you’ve downloaded a fasta file, and you want the computer to count 200 bases of flanking sequence and highlight the interesting thing for you. This problem arises naturally when designing primers, for example. You can use any scripting language you want, but if you want it to be used by a person who does not (yet) use command line based tools, if you’d like it to run on their computer that runs a different operating system than yours does, and if want to make a graphical user interface as quickly as possible — Javascript is actually an excellent weapon of choice.

Therefore, the other day, I wrote this little thing in Javascript. Mind you, it was ages since I last touched Javascript or html, and it’s certainly not a thing of beauty. But it does save time and I think I’ve avoided the most egregious errors. Everything is contained in the html page, so it’s actually a small bioinformatics script that runs in the browser.

Here is the Javascript part: a single procedure that extracts the parameters (the sequence and the number of flanking bases before and after), uses regular expressions to remove any fasta header, whitespaces and newlines, slices the resulting cleaned sequence up in three pieces, and outputs it with some extra numbers.

function bracketSequence (form) {
   var before = form.flank_before.value;
   var after = form.flank_after.value;
   var seq = form.sequence.value;

   // Find and remove fasta header
   fastaHeader = seq.match(/^>.*[\r\n|\n|\r]/, "");
   seq = seq.replace(/(^>.*(\r\n|\n|\r))/, "");

   // Remove whitespace and newlines
   seq = seq.replace(/(\r\n|\n|\r| )/gm, "");

   seqBefore = seq.slice(0, before);
   seqCore = seq.slice(before, seq.length - after);
   seqAfter = seq.slice(seq.length - after, seq.length);
   form.output.value = seqBefore + "[" + seqCore + "]" + seqAfter;

   form.fasta_header.value = fastaHeader;

   document.getElementById("core_length").innerHTML = seqCore.length;
   document.getElementById("before_length").innerHTML = seqBefore.length;
   document.getElementById("after_length").innerHTML = seqAfter.length;
}

I know, I know — the regular expressions look like emoticons that suffered through a serious accident. The fasta header expression ”^>.*[\r\n|\n|\r]/” matches the beginning of the file up to any linebreak character. The other expression ”(\r\n|\n|\r| )” just matches linebreaks and whitespace. This means the fasta header has to start with the ”>”; it cannot have any whitespace before. Other than whitespace, strange characters in the sequence should be preserved and counted.

The input and output uses a html form and a couple of named pieces of html (‘spans’). Here we put two textboxes to input the numbers, a big textarea for the sequence, and then a textarea, a textbox and some spans for the output. The onClick action of the button runs the procedure.

<form name="form" action="" method="get">

<p>Flanking bases before: <input type="text" name="flank_before"
value="" /></p>

<p>Flanking bases after: <input type="text" name="flank_after"
value="" /></p>

<p>Your sequence:</p>

<textarea name="sequence" rows="5" cols="80">
</textarea>

<p><input type="button" name="button" value="Go"
onClick="bracketSequence(this.form)" /></p>

<p>Output sequence:</p>

<textarea name="output" rows="5" cols="80">
</textarea>

<p>Fasta header: <input type="text" name="fasta_header" value="" /></p>

<p>Length of core sequence: <span id="core_length"></span></p>
<p>Length of flank before: <span id="before_length"></span></p>
<p>Length of flank after: <span id="after_length"></span></p>

</form>

On unicorns and genes

Martin Johnsson's blog about genetics and sundry things

Bracketing part of a dna sequence in Javascript

Dela detta:

Relaterade