10/25/2019 Geneious Blast Table One Hit Per Query
This setting will return a list of the top hits and an alignment for each one, plus a query-centric alignment. Leave the other settings as they are, then click the Search button. Geneious will then send your query to the NCBI and create a new search folder. This will appear as a subfolder of the folder that contains your query sequence.
. Quickly scan through the individual assemblies and assess whether or not each disagreement (if present) needs a manual edit. A manual edit ONLY needs to made if you feel the consensus sequences has been called incorrectly (or there is a gap that needs to be deleted). If Geneious calls the consensus sequence correctly, NO changes should be made to individual traces. To manually edit an assembly, the “Allow Editing” button in the toolbar of the contig window should be clicked on (see image above). If you are unhappy with the trimmed portions, you can edit these manually by clicking on, and dragging, the red bar indicating the trimmed region.
Do not forget to save your edits. You will be prompted to do this when you try to close the assembly. NoteIt’s important to note that TranslatorX only checks the forward reading frames, so you need to Reverse-Complement the matK sequences before putting them into this alignment program otherwise you will receive errors. Export the consensus sequences (of good assemblies only) as a FASTA file then import this file into the program. We suggest you leave the Protein Alignment Option method selected as “Muscle”.
In the Genetic Code box select the relevant reading frame and be sure to check the “Guess most likely reading frame” option. Then hit Submit Query. If the program runs OK and doesn’t encounter any errors, it will return an alignment of the nucleotides and also an alignment of the amino acids. You may download the fasta file of both, however, the alignment of amino acids is what will be used for the second quality check.
Import the fasta file(s) of the alignments into Geneious for further analyses. Use the alignment to address any issue that you can see i.e. A clear difference between one sequence to the others (Remember this can be possible if the sequences are distantly related but still cross reference the alignment to the individual assemblies). Also, gaps must be assessed and resolved. Major differences in the alignment may also indicate that one or more of the sequences are contaminants (use BLAST to determine this). You may need to repeat the alignment step a number of times as you cross reference the assemblies and make edits. Save the edits, re-export all the consensus sequences and create a new alignment with these new consensus fasta files.
If more than a handful of edits need to be made to the consensus sequence, the assembly should be discarded and the sample re-sequenced. You need to make a judgement call on this.
Once you have made your selections, click “Search” button in the “BLAST” window. The search progress appears in the Document Window.
If this is too slow, or you want to exit the search for whatever reason, click on the “Stop” button in the top left of the Document Window. Once complete, the results are saved in a subfolder (folder name ends with “- nr Megablast”) within the folder containing your query sequence(s). If you did a batch search, there will be further subfolders containing BLAST results for each of the sequences you entered into the BLAST search. In the results folder the BLAST results are displayed in the “Hit Table” tab. Various information is included e.g. Hit Accession number, Query coverage,% Pairwise Identity, etc.
You can chose what is displayed by clicking on the manage columns icon found in the upper right of the table. Further information is found in the other tabs of the folder (Query Centric View, Annotations, Distances, Info).
. 90 wrote:Hello,I am running blastn, blastx and tblastx searches on NCBI's nt, est, nr and HTGS databases using transcriptome data containing 56,000 contigs. I have been able to produce biopython scripts to run these searches with only non-matching blast queries retrieved. Now I would like to retrieve blast queries which only show hits in the taxon 'caudata' i.e.protein-coding transcripts unique to urodeles. Is there a specifc boolean query I can put in the entrez query parameter of the qblast function which will perform this? Or will I have to do something more intensive such as perform the search specifc to each taxon and find the queries which only have hits in the caudata taxon.Thanks for any help,Regards,James.
Hi, thanks for the help but I feel I may not have expessed the question correctly.For example, say Contig 1 only has a hit in the caudata taxon because the protein it produces is caudata specific. Where as the protein Contig 2 produces can be found in many taxons as it is a univeral protein needed for general organism growth.
Is there a way to filter the blast results to retrieve the queries which only have hits in the caudata taxon? I am using biopython to perform the BLAST searches and parse them.Thanks, James. ♦ 5.8k wrote:Since you are using the QBLAST API, just pass that the Entrez search term - something like this where the. That will be harder - one way would be to do a full unrestricted BLAST search and then filter the results. With BLAST+ 2.2.28 onwards you can get the taxid in the tabular output (assuming the database supports this, NR does) but I'm not sure if the BLAST XML includes this. Otherwise you'd perhaps have to filter on species names in the description which is unreliable.Alternatively, it would be simpler to do two BLAST searches, one unrestricted and one against caudata only, and compare the two. This would be a bit slower as you are doing two BLAST searches, but the analysis should be much less complicated.
Your second method seems much easier to understand and I'll have to implement something like that in the future, currently I've ended up writing a script which will take the gi number from the blast hit description, convert this to the taxon ID and retrieve the full lineage of the blast hit. There i check if the string 'Caudata' is present and if so I append to a list, if the number of hits for the query is equal to the number of entries in the list then the query is caudata specific. Thanks for the help!
Comments are closed.
|