Just how many (bad) -omics are there anyway? Let’s find out.
Update: code and data now at Github
1. Get the raw data
It would be nice if we could search PubMed for titles containing all -omics:
However, we cannot since leading wildcards don’t work in PubMed search. So let’s just grab all articles from 2013:
and save them in a format which includes titles. I went with “Send to…File”, “Format…CSV”, which returns 575 068 records in pubmed_result.csv, around 227 MB in size.
2. Extract the -omics
Titles are in column 1 and we only want the -omics, so:
Note: grep changed so the following now works.
Note: this approach will miss a very few cases where omics is preceded by a hyphen. That included the classic stain-omics..
It also ignores the standalone term “-omics”, which is used quite often
Of course, this results in some…
Ver la entrada original 93 palabras más