Your Cart

LLM Book 2 - How to Preprocess, Tokenize, and Pretrain Language Models for Rookies

On Sale
$5.29
Pay what you want: (minimum $5.29)
$
Added to cart

Table of Contents

1. Introduction

1.1 What are Language Models and Why are They Important? 1.2 Overview of the Book and Learning Objectives

2. Data Preprocessing

2.1 Data Cleaning and Normalization

2.2 Data Splitting and Sampling

2.3 Data Augmentation and Paraphrasing

3. Tokenization

3.1 What is Tokenization and Why is It Necessary?

3.2 Types of Tokenization Methods

3.3 How to Choose and Implement a Tokenizer

4. Pretraining

4.1 What is Pretraining and Why is It Beneficial?

4.2 Types of Pretraining Objectives and Architectures

4.3 How to Select and Fine-tune a Pretrained Model

5. Evaluation

5.1 How to Measure the Performance of Language Models 5.2 Common Evaluation Metrics and Benchmarks

5.3 How to Interpret and Report the Results

6. Conclusion

6.1 Summary of the Main Points and Takeaways

6.2 Future Directions and Challenges

6.3 Resources and References 

You will get a PDF (512KB) file