-
Chat2Find Publishes 255M+ Token Trilingual AI Corpus on Hugging Face and LankaData

COLOMBO, Sri Lanka LankaData has announced the public release of the Chat2Find Corpus, a major trilingual conversational dataset, marking a key milestone in Sri Lanka’s growing AI ecosystem. The Chat2Find Corpus consists of over 255 Million tokens (~279,248 records) in Sinhala, Tamil, and English, including code-mixed language such as Singlish and Tanglish. Designed for training
-
LankaData Net Launched by Chat2Find, Marking a Major Leap in AI-Driven Data Access in Sri Lanka

Sri Lanka’s data and knowledge ecosystem took a significant step forward this week with the official launch of LankaData Net by Chat2Find. The new network brings together years of systematically compiled and structured datasets spanning law, taxation, business, and multiple other critical sectors, positioning LankaData Net as a foundational digital infrastructure for reliable information access in the country.
Search
About
LankaData is Sri Lanka’s pioneer in structured data collection and intelligent data access, built to transform how national information is stored, searched, and used. We systematically compile, digitise, and organise authoritative datasets across law, taxation, economics, business, and public policy, converting fragmented and complex information into clean, reliable, and searchable digital assets.
Building on these structured repositories, LankaData delivers an intelligent access layer powered by advanced retrieval-augmented AI models. Each dataset is paired with a dedicated domain-specific AI expert, enabling users to ask complex questions in natural language and receive precise, source-linked responses. This intelligence is seamlessly delivered through the Chat2Find platform, making LankaData not just a data provider, but a practical decision-support system for professionals




