If you use any of this data in your work, please cite:
See the Bergsma and Wang (2007) paper for all details on how the queries were collected, how the segmentations were annotated, and how the feature information was used. The queries were adapted from the AOL query dataset, available online (Greg Pass, Abdur Chowdhury, and Cayley Torgeson. 2006. A Picture of Search. In The First International Conference on Scalable Information Systems).
Please send an e-mail to firstname.lastname@example.org if you use the query segmentation data. We'd also be happy to help if you need any assistance.
The following frequency information was collected from the Google SOAP search API during March, 2007. Each line in the file is an exact query (searched with quotations around it) and the corresponding page count. The feature that each file corresponds to should be fairly self-explanatory after reading the paper and looking at the contents of the file. The order in Table 2 roughly corresponds to the order listed here, with counts.ngrams providing the web-count and pair-count (and trigram-count, etc.) information.
Thanks and good luck!