Earn 5.76% APY staking with Solana Compass + help grow Solana's ecosystem

Stake natively or with our LST compassSOL to earn a market leading APY

Conference Talk Accelerate 25

Scale or Die at Accelerate 2025: Messari x Solana Dev

Solana 🧭 Compass By Solana 🧭 Compass May 20, 2025 6 min read

Learn how Messari is revolutionizing data-driven development on Solana with AI-powered tools and scalable solutions

The notes below are AI generated and may not be 100% accurate. Watch the video to be sure!

Messari's Diran Li unveils groundbreaking tools and techniques for Solana developers, promising to revolutionize data-driven applications with AI-powered solutions and scalable data engineering practices.

Summary

At Accelerate 2025, Diran Li from Messari presented a compelling talk on the importance of scalability and data-driven development in the Solana ecosystem. Li outlined the challenges faced by developers in providing curated, high-signal data at scale, including data fragmentation, ingestion difficulties, and the need to process multiple data types simultaneously.

Messari's journey from simple ETL pipelines to sophisticated ELT processes was highlighted, emphasizing the importance of storing raw data and implementing robust data observability practices. Li stressed the significance of proper data engineering in AI pipelining, enabling advanced techniques like fine-tuning and model training.

The presentation introduced two powerful tools for Solana developers: the Signal dataset for evaluating data curation pipelines, and the AI toolkit, which brings crypto knowledge into a single assistant. These tools, leveraging Messari's vast data warehouse, aim to provide real-time, source-grounded insights to enhance Solana-based protocols and applications.

Key Points:

Data Curation Challenges

Diran Li began by addressing the common challenges faced by developers in the blockchain space, particularly when it comes to providing valuable insights to users. He highlighted three main issues: data fragmentation across the industry and ecosystem, difficulties in data ingestion due to the noisy nature of blockchain data, and the complexity of producing data at scale from multiple sources simultaneously. These challenges often result in developers juggling numerous tabs and sources to make sense of daily blockchain activities.

Messari's Data Engineering Evolution

Li provided a brief history of Messari's data engineering journey, starting from simple ETL (Extract, Transform, Load) pipelines in 2018 to more complex systems. As demand grew, they added more jobs, services, and databases, which eventually led to issues with data fragmentation and identifying the source of truth. The proliferation of Large Language Models (LLMs) and AI in 2022 further complicated their data curation efforts, prompting a shift in their approach.

ELT: A Game-Changing Approach

One of the most significant revelations shared by Li was Messari's transition from ETL to ELT (Extract, Load, Transform) processes. This shift involved always storing raw data before transformation, allowing for a more traceable and reproducible data pipeline. This approach enables easier error detection and correction, as transformations can be replayed from specific points in the process. Li emphasized this as a crucial learning for anyone working with large-scale data transformation.

Data Observability and AI Integration

Li stressed the importance of data observability in managing complex pipelines. He showcased an example of a backend job, illustrating how clear visualization of data flows, job schedules, and dependencies can significantly improve data management. Furthermore, Li highlighted the connection between effective AI pipelining and robust data engineering practices. This includes the ability to process data at scale, maintain good data lineage, and ensure proper sourcing and citation for AI-generated insights.

Developer Tools and Resources

The presentation concluded with an introduction to two powerful tools Messari is making available to Solana developers. The Signal dataset allows for evaluation of data curation pipelines, leveraging the entire data warehouse to provide AI insights on trending topics, key opinion leaders, and asset sentiment. The AI toolkit, described as bringing all crypto knowledge into one assistant, offers real-time, source-grounded answers via API, including citations, tables, and charts. Li emphasized that these tools are freely available for Solana developers to try and integrate into their projects.

Facts + Figures

  • Messari has built a Solana portal providing insights on token unlocks, news, research, fundraising, and key events.
  • The company's data warehouse contains 170 terabytes of curated data.
  • Messari's journey in data engineering began in 2018 with simple ETL pipelines.
  • The shift from ETL to ELT occurred around 2022, coinciding with the proliferation of LLMs and AI.
  • Messari's AI toolkit pulls from 170 terabytes of curated data to provide real-time, source-grounded answers.
  • The AI toolkit is being integrated into projects like Coinbase AI agent kit and Eliza OS.
  • Messari offers a free tier for every developer on Solana to try out their API.

Top quotes

  1. "Scale or die. It's incredible to be among so many talented engineers who are pushing the boundaries on what's possible on Solana."
  2. "Data is fragmented. It's fragmented across the industry. It's fragmented across the ecosystem."
  3. "Everything changed in about 2022 where LLM started proliferating and AI became very popular."
  4. "We always store the raw data. No matter how big that is, we always store the raw data and then transform afterwards."
  5. "Doing AI pipelining well means doing data engineering well."
  6. "The AI toolkit brings all of crypto knowledge into one assistant."

Questions Answered

What are the main challenges in providing curated, high-signal data at scale in the blockchain space?

The main challenges include data fragmentation across the industry and ecosystem, difficulties in data ingestion due to the noisy nature of blockchain data, and the complexity of producing data at scale from multiple sources simultaneously. These issues make it challenging for developers to provide valuable insights to users without sifting through numerous sources and data points.

How has Messari's approach to data engineering evolved over time?

Messari started with simple ETL (Extract, Transform, Load) pipelines in 2018, primarily focusing on ingesting market data. As demand grew, they added more jobs, services, and databases, which eventually led to a complex system with data fragmentation issues. In 2022, with the rise of LLMs and AI, Messari shifted to an ELT (Extract, Load, Transform) approach, emphasizing the storage of raw data before transformation to improve traceability and error correction.

What is the significance of switching from ETL to ELT in data processing?

The switch from ETL to ELT is significant because it allows for better data lineage and error correction. By always storing raw data before transformation, Messari can trace the history of data transformations more easily. This approach enables them to locate errors more efficiently and replay transformations from specific points, ensuring a more robust and reliable data pipeline.

How does Messari's AI toolkit benefit Solana developers?

Messari's AI toolkit brings all crypto knowledge into one assistant, leveraging 170 terabytes of curated data. It provides real-time, source-grounded answers via API, including citations, tables, and charts. This tool allows Solana developers to integrate crypto intelligence into their protocols and applications, enhancing their functionality and user experience. Messari offers a free tier for Solana developers to try out the API and integrate it into their projects.

What is the Signal dataset, and how does it help in evaluating data curation pipelines?

The Signal dataset is a tool provided by Messari that leverages their entire data warehouse to provide AI insights on trending topics, key opinion leaders, and asset sentiment. It helps evaluate the output of data curation pipelines by offering a comprehensive view of what the community is talking about most, which key opinion leaders and assets are gaining mindshare, and overall sentiment trends. This tool is crucial for developers looking to build data-driven applications on Solana.



Comments

Please login to leave a comment.

Related Content

Jump Crypto: How To Improve Solana?

Jump Crypto's Michael McGee reveals where Solana's biggest performance wins are hiding, how Firedancer achieves hundreds of thousands of TPS, and why most blockchain problems are just bugs waiting to be fixed.

Reframing the Solana Narrative with Kel (Messari)

Discover how Solana is redefining blockchain capabilities with NFT compression, unparalleled performance, and innovative applications. Insights from Messari researcher Kel on Solana's competitive edge over Ethereum.

Jump Crypto: The State Of Firedancer | Michael McGee

Michael McGee from Jump Crypto discusses Firedancer's development challenges, the conformance problem, Alpenglow impact, and why Solana's compute limit is holding back performance.

Radix: Why Sharding is Crypto's Next 0 to 1 Unlock | Piers Ridyard, RDX Works

Explore how Radix is tackling blockchain scalability, security, and developer experience through innovative sharding and a novel programming language.

The Long-Term Vision for the DA Layer w/ Connor O'Hara (Celestia Labs)

Explore the intricacies of data availability layers, blockchain scalability solutions, and the evolving landscape of cryptocurrency with Connor O'Hara from Celestia Labs.

Understanding zkTLS With Opacity Network | ep. 42

Deep dive into zkTLS technology with Opacity Network's leadership team. Learn how this protocol solves the verifiability problem and why they chose Solana and Jito Restaking.

Reactions: Pump.Fun Confirms ICO

Pump.fun confirms its token launch at a $4B valuation with $720M from private investors. Experts debate the ICO structure, launchpad competition, and Solana ecosystem impact.

Ship or Die at Accelerate 2025: Lightning Talk: Vana

Discover how Vana is revolutionizing data ownership and AI training with user-controlled data pools and Solana integration

Scale or Die at Accelerate 2025: Decompiling Solana Programs

Revolutionary Solana program decompilation tools unveiled, boosting ecosystem transparency and security

HyperLiquid & Crypto's Defi Renaissance I Sunny Shi (@Defi_Monk) - Messari Research

Explore HyperLiquid's revolutionary approach to on-chain trading, its challenges to traditional DeFi models, and its potential to reshape the crypto landscape.

Storing Solana History on IPFS/Filecoin - Project Old Faithful with Brian Long

Discover how Project Old Faithful is making Solana's entire transaction history accessible through IPFS and Filecoin, transforming blockchain data availability for developers and users alike.

Storing the Solana history on IPFS/Filecoin - Project Old Faithful w/ Brian Long from Triton

Discover how Triton's Project Old Faithful is making Solana's entire ledger history accessible through IPFS and Filecoin, transforming blockchain data availability for developers and users alike.

Solana Changelog June 22 - Token Metadata, Anchor 28, QUIC on Turbine

Discover the latest Solana updates including the Token Metadata Interface proposal, Anchor 0.28.0 release, and QUIC implementation for Turbine in this comprehensive changelog.

Can ETH Outperform SOL In 2025?

Deep dive into whether Ethereum can continue outperforming Solana, examining stablecoin network effects, DeFi liquidity, and what matters for long-term blockchain value.

HyperLiquid & Crypto's Defi Renaissance | Sunny Shi (@Defi_Monk) - Messari Research

Explore HyperLiquid's groundbreaking approach to on-chain perpetual exchanges, its unique Layer 1 architecture, and potential to reshape DeFi trading

Solana tokens

Solana Token Markets

Explore all tokens →