Hart et al. (2013) provides a nice description and a set of tools for estimating your needed sequencing depth and number of samples. They provide an Excel based calculator for calculating number of samples. Their numbers are surprisingly large to me ;).
In a proposal for an exploratory effort to discover differentially expressed genes, I would suggest 3-5 biological replicates with 30-50 million reads each. More reads is usually cheaper than more replicates, so 50-100m reads may give you more power to resolve smaller fold changes.
If you do your sequencing at the MSU Core Facility, you’ll get an e-mail from them when you’re samples are ready. The e-mail will give you an FTP site, a username, and a password, as well as a URL. You can use these to download your data. For example, if you get:
hostname: titan.bch.msu.edu
username: rnaseqmodel
password: QecheJa6
URI: ftp://rnaseqmodel:QecheJa6@titan.bch.msu.edu
you can go to ftp://rnaseqmodel:QecheJa6@titan.bch.msu.edu in your Web browser; that is, it lets you combine your username and password to open that link.
In this case, you will see a ‘testdata’ directory. If you click on that, you’ll see a bunch of fastq.gz files. These are the files that you want to get onto the HPC.
To download these files onto the HPC, log into the HPC, go to the directory on the HPC you want to put the files in, and run a ‘wget’ – for example, on the HPC:
mkdir ~/testdata
cd ~/testdata
wget -r -np -nH ftp://rnaseqmodel:QecheJa6@titan.bch.msu.edu/testdata/
This will download _all_ of the files in that directory. You can also do them one at a time, e.g. to get ‘Ath_Mut_1_R1.fastq.gz’, you would do
Tada!
Even if all you plan to do is change the filenames you’re operating on, you’ll need to develop your own analysis pipeline. Here are some tips.
Next: More resources