DATA MAPPING
OpenPIN Backend Data Havestor
Open Public Information Network (OpenPIN) is the backend layer designed to harvest, collect, organize, and structure data from publicly available digital sources. It continuously indexes websites, records, publications, and open datasets, transforming fragmented information into a unified and searchable knowledge network. This is the core backend component within Lanka Data Net that powers the intelligent layer to deliver trusted, structured, and scalable intelligence for search, analytics, research, compliance, and decision-making.

OpenPIN
Open Public Information Network (OpenPIN) is the core backend framework of Lanka Data Net, designed to harvest, collect, and store Sri Lankaโs public data.
BASE DATA
Open Source Data Sets
Lanka Data Net (LDN) brings together a powerful Sri Lanka-focused data corpus exceeding 10 billion data points, forming the foundation for accurate trilingual capabilities across Sinhala, Tamil, and English. This is complemented by high-quality conversational and structured datasets spanning key domains enabling precise, context-aware insights for real-world applications.
Data Corpus
The Chat2Find Corpus is a high-quality trilingual conversational dataset derived from real-world interactions on the Chat2Find platform. It contains approximately 255 Million tokens in Sinhala (เทเทเถเทเถฝ), Tamil (เฎคเฎฎเฎฟเฎดเฏ), and English, including significant instances of Singlish and Tanglish
Conversational Data
The full dataset is a premium, high-logic instruction dataset designed for training state-of-the-art conversational AI models. It contains 279,260 trilingual records optimized for complex problem-solving, chain-of-thought reasoning, and sophisticated tool-calling interactions in Sinhala, Tamil, and English.
BASE MODELS
Open Source Intelligent Layer
At its core, the Chat2Find base model is a robust large language model trained on extensive localized data, delivering strong multilingual understanding, while its fine-tuned trilingual models further enhance performance by capturing linguistic nuances and cultural context, resulting in highly natural, accurate, and reliable AI interactions.
Base Model
At the heart of Chat2Find lies a robust base model pre-trained on rich, localized data corpus enabling powerful multilingual understanding tailored specifically for Sri Lanka.

Fine Tuned LLM
Refined to perfection, Chat2Findโs fine-tuned models capture linguistic nuance and cultural depth, delivering seamless, natural, and highly accurate interactions in Sinhala, Tamil, and English.



















