IPARC Challenge

The project aimed to address the IPARC Challenge, resembling ARC but with defined background knowledge. My initial attempts involved applying various search methods, such as exhaustive search, greedy search, beam search, and A* search, to Category A simple tasks. Despite experimenting with different similarity metrics for heuristics, the high search space led to poor performance. Subsequently, I used dreamcoder along with inductive logic for solving these tasks . Initially achieving an 11 out of 20 success rate, the model’s performance improved to 15 out of 20 tasks (75% success rate) after experimenting with input pairs and triplets.

Predicting ATP binding sites for protein sequences

Predicting binding sites between ATP and proteins holds significant importance in the realms of Biology and Medicine. Traditionally, extensive research in this field relied on time and resource-consuming ‘wet experiments’ conducted in laboratories. However, in recent years, there has been a shift towards leveraging computational methods, specifically employing advanced Deep Learning and Natural Language Processing (NLP) algorithms.

Code Search and code clone detection

Built a preliminary model for code search with a simple encoder-decoder architecture that computes the cosine similarity of the embeddings for searching. Additionally, fine-tuned the CodeBERT model specifically for Code Search on C/C++. Moreover, trained and fine-tuned a code clone detection model across multiple languages such as Python, Java, C/C++ for detecting plagiarized code.