…start using R, from scratch!

Some time ago, since I was able to use R by myself, have found some fellows and other people who wanted to learn R as well. Then I pointed them to help pages, to CRAN repositories… but in some cases they said that didn’t know how to start using those resources. Obviously, the main self-perceived limitation for non-programmers is the use of “commands” -ok, many of the 80’s kids will remember the use of some command lines to access games such as PacMan, Frogger… :).

At the same time, they also wanted to refresh some basic statistics, acquiring a general knowledge of their data before asking for a statistician’s help. An idea to quickly help them was to make some scripts to guide them through basic commands, seeing results on real-time, and being able to recycle them for their own data.

If you have just started using R, maybe they can be useful for you. However, I will recommend that you use some open “plain text” file(s) to paste your favorite commands and clone/modify them to suit your needs. Remember to store the files where you can access them later!

  • Tip: you can change the extension of your mytext.txt file into mytext.R file, telling Windows to open it with the Notepad again. It will be also a plain text document, but some text editors will recognize it as an “R script” and will highlight the content according to that.
  • Apart from the Notepad in Windows, you also have a bunch of other text/code editors which are more pleasant to use. See for example R-studio and Notepad ++.

Copy the Gists below into your own text files, and begin playing with R!

Dealing with strings in R

FreshBiostats

As I mentioned in previous posts, I often have to work with Next Generation Sequencing data. This implies dealing with several variables that are text data or sequences of characters that might also contain spaces or numbers, e.g. gene names, functional categories or amino acid change annotations. This type of data is called string in programming language.

Finding matches is one of the most common tasks involving strings. In doing so, it is sometimes necessary to format or recode this kind of variables, as well as search for patterns.

Some R functions I have found quite useful when handling this data include the following ones:

  • colsplit ( ) in the reshape package. It allows to split up a column based on a regular expression
  • grepl ( ) for subsetting based on string values that match a given pattern. Here again we use regular expressions to describe the pattern

As you…

Ver la entrada original 151 palabras más