Inverted Index Search Engine

Built a search engine with an inverted index for efficient content retrieval.

This project involved creating a search engine using the inverted index technique to index and retrieve content efficiently. The dataset consisted of 25,000 news articles in Farsi, with some parts in English. Handling the bidirectional text was particularly challenging due to Farsi’s right-to-left orientation and English’s left-to-right structure. Tokenization required careful design to ensure accuracy, but the system achieved high performance, processing queries and ranking results in under 0.3 seconds.

The search engine delivered results with 90% recall and precision, demonstrating its effectiveness in content retrieval. Multiple similarity functions were implemented to compare their impact on ranking quality. This project, developed for an Information Retrieval course, provided a comprehensive understanding of retrieval systems and performance evaluation.

GitHub Repository