Getting Started with DuckDB: A practical guide for accelerating your data science, data analytics, and data engineering workflows

Aubury, Simon, Letcher, Ned, Jenkins, Kris

  • 出版商: Packt Publishing
  • 出版日期: 2024-06-24
  • 售價: $2,150
  • 貴賓價: 9.5$2,043
  • 語言: 英文
  • 頁數: 382
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1803241004
  • ISBN-13: 9781803241005
  • 相關分類: Data Science
  • 海外代購書籍(需單獨結帳)

商品描述

Analyze and transform data efficiently with DuckDB, a versatile, modern, in-process SQL database

Key Features

- Use DuckDB to rapidly load, transform, and query data across a range of sources and formats

- Gain practical experience using SQL, Python, and R to effectively analyze data

- Learn how open source tools and cloud services in the broader data ecosystem complement DuckDB's versatile capabilities

- Purchase of the print or Kindle book includes a free PDF eBook

Book Description

DuckDB is a fast in-process analytical database. Its ease of use, versatile feature set, and powerful analytical capabilities make DuckDB a valuable addition to the data practitioner's toolkit.

Getting Started with DuckDB offers a practical overview of DuckDB's fundamentals and guidance for effectively using its powerful capabilities. Through extensive hands-on examples, you'll learn how to use DuckDB to load, transform, and query a variety of data sources and formats, including CSV, JSON, and Parquet files, semi-structured data, remotely-hosted files, and external databases. You'll also find out how to leverage DuckDB's performance optimizations and friendly SQL enhancements. You'll explore how to use DuckDB's extensions for specialized applications, such as geospatial analysis and text search over document collections. In addition to working through examples in SQL, Python, and R, you'll also dive into using DuckDB for analyzing public datasets and discover the wider ecosystem of open-source tools and cloud services that supercharge DuckDB-powered workflows and applications.

Whether you're a seasoned data practitioner or new to working with analytical data, this book will rapidly get you up to speed with DuckDB's versatile and powerful capabilities, enabling you to apply them in your analytical workflows and projects.

What you will learn

- Understand the properties and applications of a columnar in-process database

- Use SQL to load, transform, and query a range of data formats

- Discover DuckDB's rich extensions and learn how to apply them

- Use nested data types to model semi-structured data and extract and model JSON data

- Integrate DuckDB into your Python and R analytical workflows

- Effectively leverage DuckDB's convenient SQL enhancements

- Explore the wider ecosystem and pathways for building DuckDB-powered data applications

Who this book is for

If you're interested in expanding your analytical toolkit, this book is for you. It will be particularly valuable for data analysts wanting to rapidly explore and query complex data, data and software engineers looking for a lean and versatile data processing tool, along with data scientists needing a scalable data manipulation library that integrates seamlessly with Python and R. You will get the most from this book if you have some familiarity with SQL and foundational database concepts, as well as exposure to a programming language such as Python or R.

Table of Contents

- An Introduction to DuckDB

- Loading Data into DuckDB

- Data Manipulation with DuckDB

- DuckDB Operations and Performance

- DuckDB Extensions

- Semi-Structured Data Manipulation

- Setting up the DuckDB Python Client

- Exploring DuckDB's Python API

- Exploring DuckDB's R API

- Using DuckDB Effectively

- Hands-On Exploratory Data Analysis with DuckDB

- DuckDB - The Wider Pond

商品描述(中文翻譯)

分析和轉換數據的高效工具:DuckDB,一個多功能、現代化的內部 SQL 數據庫

主要特點
- 使用 DuckDB 快速加載、轉換和查詢各種來源和格式的數據
- 獲得使用 SQL、Python 和 R 進行有效數據分析的實踐經驗
- 學習開源工具和雲服務如何在更廣泛的數據生態系統中補充 DuckDB 的多功能能力
- 購買印刷版或 Kindle 版書籍可獲得免費 PDF 電子書

書籍描述
DuckDB 是一個快速的內部分析數據庫。其易用性、多功能特性和強大的分析能力使 DuckDB 成為數據從業者工具包中的寶貴補充。

《DuckDB 入門》提供了 DuckDB 基礎知識的實用概述,以及有效使用其強大功能的指導。通過大量的實作範例,您將學習如何使用 DuckDB 加載、轉換和查詢各種數據來源和格式,包括 CSV、JSON 和 Parquet 文件、半結構化數據、遠端托管文件和外部數據庫。您還將了解如何利用 DuckDB 的性能優化和友好的 SQL 增強功能。您將探索如何使用 DuckDB 的擴展來應對專門應用,例如地理空間分析和文檔集合的文本搜索。除了在 SQL、Python 和 R 中進行範例操作外,您還將深入了解如何使用 DuckDB 分析公共數據集,並發現更廣泛的開源工具和雲服務生態系統,這些工具和服務能夠強化基於 DuckDB 的工作流程和應用。

無論您是經驗豐富的數據從業者還是剛接觸分析數據的新手,本書都將迅速幫助您掌握 DuckDB 的多功能和強大能力,使您能夠在分析工作流程和項目中應用這些能力。

您將學到的內容
- 理解列式內部數據庫的特性和應用
- 使用 SQL 加載、轉換和查詢各種數據格式
- 發現 DuckDB 的豐富擴展並學習如何應用它們
- 使用嵌套數據類型來建模半結構化數據,並提取和建模 JSON 數據
- 將 DuckDB 整合到您的 Python 和 R 分析工作流程中
- 有效利用 DuckDB 的便捷 SQL 增強功能
- 探索更廣泛的生態系統和構建基於 DuckDB 的數據應用的途徑

本書適合誰
如果您有興趣擴展您的分析工具包,本書將非常適合您。對於希望快速探索和查詢複雜數據的數據分析師、尋找精簡且多功能數據處理工具的數據和軟體工程師,以及需要可擴展數據操作庫並與 Python 和 R 無縫整合的數據科學家,本書將特別有價值。如果您對 SQL 和基礎數據庫概念有一定的了解,並且接觸過 Python 或 R 等程式語言,您將能夠從本書中獲益良多。

目錄
- DuckDB 簡介
- 將數據加載到 DuckDB
- 使用 DuckDB 進行數據操作
- DuckDB 操作與性能
- DuckDB 擴展
- 半結構化數據操作
- 設置 DuckDB Python 客戶端
- 探索 DuckDB 的 Python API
- 探索 DuckDB 的 R API
- 有效使用 DuckDB
- 使用 DuckDB 進行實作探索性數據分析
- DuckDB - 更廣泛的池塘