-
Chat2Find Publishes 255M+ Token Trilingual AI Corpus on Hugging Face and LankaData

COLOMBO, Sri Lanka LankaData has announced the public release of the Chat2Find Corpus, a major trilingual conversational dataset, marking a key milestone in Sri Lanka’s growing AI ecosystem. The Chat2Find Corpus consists of over 255 Million tokens (~279,248 records) in Sinhala, Tamil, and English, including code-mixed language such as Singlish and Tanglish. Designed for training
-
Free Access to Sri Lanka’s Legal Data Expanded Through LankaData Network and LankaLaw Collaboration

LankaData Network has announced the free public availability of a comprehensive Sri Lankan legal dataset through a collaboration with LankaLaw, marking a significant step toward widening access to legal information in the country. Through this initiative, core sources of law—including Acts of Parliament and Ordinances from independence to the present, consolidated statutes, the Constitution, and key legislative
Search
About
LankaData is Sri Lanka’s pioneer in structured data collection and intelligent data access, built to transform how national information is stored, searched, and used. We systematically compile, digitise, and organise authoritative datasets across law, taxation, economics, business, and public policy, converting fragmented and complex information into clean, reliable, and searchable digital assets.
Building on these structured repositories, LankaData delivers an intelligent access layer powered by advanced retrieval-augmented AI models. Each dataset is paired with a dedicated domain-specific AI expert, enabling users to ask complex questions in natural language and receive precise, source-linked responses. This intelligence is seamlessly delivered through the Chat2Find platform, making LankaData not just a data provider, but a practical decision-support system for professionals




