Sunday, April 26, 2026

3.2 Gen AI: Models & Architecture

 

3.2 Gen AI: Models & Architecture

Sun, 26 Apr 26

Course Structure & Feedback

  • Curriculum designed by Michigan team + Simply Learn collaboration

  • Instructors deliver pre-designed content, cannot modify curriculum

  • Feedback collection encouraged for curriculum team review

  • Student feedback drives potential content updates

Industry Context & Complexity

  • YouTube tutorials insufficient for production environments

  • Production deployment requires understanding of:

    • Platform integration

    • Security considerations

    • Scalability requirements

    • Cost optimization

  • Class accommodates varying experience levels (beginners to advanced)

Natural Language Processing Fundamentals

  • NLP enables machines to understand human language (text, audio, speech, video)

  • Core capabilities:

    1. Natural Language Understanding (NLU)

    2. Natural Language Generation (NLG)

  • Applications: translation, speech recognition, sentiment analysis

  • Three model categories:

    1. Rule-based systems (if/else logic)

    2. Deep learning models (RNN, CNN)

    3. Statistical models (Markov chains)

Markov Chains & N-grams

  • Markov chains predict next word based solely on previous word

  • Limitations:

    • Short-term memory only

    • Cannot capture long-term dependencies

    • Lack semantic understanding

  • N-grams improvement:

    • Unigram: relies on 1 previous word

    • Bigram: relies on 2 previous words

    • Trigram: relies on 3 previous words

  • Historical applications: speech recognition, spelling correction, early chatbots

Large Language Models Architecture

  • Calculate joint probability distribution of entire sentences

  • Process full context simultaneously (not sequential like Markov chains)

  • Context window determines maximum input size

    • Gemini 1.5 Pro: 1 million tokens (~1,500 pages)

  • Key advantages over traditional models:

    • Parallel processing capability

    • Self-attention mechanism

    • Positional encoding for word relationships

LLM Training Process

  • Data preparation & corpus cleaning

  • Tokenization (character, word, or sub-word based)

  • Embedding creation (vector representations)

  • Neural network training through transformer architecture

  • Fine-tuning vs pre-training distinction:

    • Pre-training: creates foundation models

    • Fine-tuning: adapts for specific domains/use cases

Model Comparison Demo Results

  • Same mathematical word problem given to multiple LLMs

  • Inconsistent responses across models:

    • ChatGPT: “insufficient information” then “0 students”

    • DeepSeek: “0 students”

    • Claude Sonnet 4.6: “52 students”

    • Copilot: varied responses based on model version

  • Demonstrates probabilistic nature and production challenges

Production Considerations

  • Security: avoid sharing PII/enterprise data with public models

  • Evaluation: need ground truth datasets for testing

  • Cost optimization: balance model capability with expense

  • Latency requirements: thinking models slower but more accurate

  • Model selection factors:

    • Task specificity (general vs specialized)

    • Context length requirements

    • Licensing (open source vs proprietary)

    • Infrastructure needs for deployment



No comments:

Post a Comment