Machine Learning

Semantic Code Search

Last semester, I took the couse CS 585: Introduction to Natural Language Processing taught by Mohit Iyyer. As it is graduate level, a significant portion of the curriculum is centered around a team project. In this post, I would like to share our team’s final report.

After hearing about the launch of Github’s CodeSearchNet Challenge, we choose to conduct a small survey of different word embedding techniques when integrated into the provided semantic search pipeline assembled by the Github team. For the tl;dr, we obtained quite interesting results with the Continuous Bag of Words embedding. The results of that model can be found on Weights & Biases.