Evaluation Scripts/Data for Bilingual Lexicon Induction from Web Images

We provide the scripts and data used in the experiments for our IJCAI 2011 paper (below) on creating bilingual lexicons using the visual similarity of web images. We include the downloaded images, automatically-extracted word lists, bilingual evaluation lexicons, and processing code/scripts. Processing details are provided in the paper. More details on the shared files are available in the included README.

[lexImage.share.tgz]
- This is the main collection of scripts/data, including all the shared data for the 500-word lists experiments

[lexImage.share.v2.tgz]
- Updated: January 23, 2012: Added missing ClusterCode/Makefile to the above collection (be sure to go 'make depend', then 'make')

[20,000-word image sets]
- This provides the collections of downloaded images for the 20,000-word experiments (quite large).

All files have been compressed and archived using "tar -czvf" To extract the archive, please run "tar -xzvf", e.g. "tar -xzvf lexImage.share.tgz" creates a directory "Share" containing subdirectories with all the files.

Paper

If you use any of the provided data or scripts in your work, please cite as:




Shane Bergsma
June 13, 2011