3e8

assemble a wormhole network
- wormhole
- speculative
- requires a lot of overhead transport to setup?
- requires complete understanding of the wormhole is it even exists
- use ASI to automate
ion thrusters
- ion thruster
- need more energy on board to push electrons
antimatter propulsion
- antimatter

Energy

nuclear fusion
- tokamak
- railgun
  - Helion Energy
black hole harvesting
solar (during travel)
energy transfer
- electricity
  - superconductors
energy storage
- batteries
- stars
- black holes (to harvest later)

Solving Intelligence

understanding consciousness
- understand the brain
  - neuroscience + electro-chemisry
- find aliens
  - long distance lens / communication
  - space travel and exploration near alien-like solar system clusters
    - cryogenic sleep
    - AI autopilot
    - sustainable energy for life support & propulsion (nuclear fusion)
ASI
- the bitter lesson
- compute infrastructure
  - robot construction
  - automated chip design
  - sustainable energy for compute
  - materials required for large clusters
    - asteroid mining
      - assembly theory level autonomy in the limit
      - on board energy source for propulsion
      - powerful AGI should be able to automate all of the non-physical aspects of this
        
        simulations
- self-improving AI
  - alignment / safety
    - what 2026 looks like? https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like
    - https://situational-awareness.ai/
    - situational awareness
      - official
    - lesswrong
    - https://www.alignmentforum.org/s/mzgtmmTKKn5MuCzFJ
    - rich sutton's views on alignment/safety
    - https://darioamodei.com/machines-of-loving-grace
    - mech interp
      - www.lesswrong.com/posts/jLAvJt8wuSFySN975/mechanistic interpretability quickstart guide
      - Alignmentforum.org
      - Tools
        
        TransformerLens
        
        Neuroscope
      - 200 different problems in mech interp as see in the post
      - Superposition and Polysemanticity
        
        Neuron polysemanticity is the observed phenomena that many neurons seem to fire (have large, positive activations) on multiple unrelated concepts. Superposition is a specific explanation for neuron (or attention head) polysemanticity, where a neural network represents more sparse features than there are neurons (or number of/dimension of attention heads) in near-orthogonal directions.
      - Neel nanda pioneered this stuff Neel Nanda
      - Einsum and einops to avoid bugs
      - Induction heads
        
        The induction head pays attention to a token in the past (let's call it A).
        
        It then looks for the next occurrence of A in the sequence.
        
        If it finds A again, it predicts that the token that followed A the first time will likely follow A again.
        
        so sort of like how we "induce" something or do an induction proof. its not direct, but we can use our axioms to declare that since something "over there" happened, we infer something "here" also happened similarly.
        
        example: "Also did you know..." - "Also lets jump back to..."
      - sparse autoencoders
        
        original paper
        
        grok 3's advice w/ deepthink
  - AGI
    - define AGI
      - books
        
        superintelligence - Nick Bostrom -
        
        Life 3.0 - Max Tegmark - Be sure to read "The Tale of the Omega Team" -
      - papers / articles
        
        Levels of AGI: Operationalizing Progress on the Path to AGI
        
        What is Meant by AGI? On the Definition of Artificial General Intelligence
        
        Artiﬁcial General Intelligence:Concept, State of the Art, and Future Prospects
      - modalities
        
        text
        
        image
        
        audio
        
        video
        
        touch
      - creativity
        
        benching creativity
      - embodiment (requiring a human by its side to operate)
        
        Minecraft AIs - - Reddit - Dive into anything - - - ok so i should probably go with fabric if im not reverse engineering minerl or malmo (latest versions, top performance, modularity, works across all OSes) - the idea is to replicate minerl down to game ticks (forgot to define that i was aiming for this from the start). other non-perfectly synced options work but it wont make me feel complete. - GitHub - FabricMC/fabric-example-mod: Example Fabric mod is what im starting with - need both the fabric installer (run with java -jar filename...) and www.curseforge.com/minecraft/mc mods/fabric api/download/5897810
        
        paste from google keep: - for local minecraft nn, consider implementing implicit learning so i can simply type a certain key when i come across a type of mob. then the neural net can learn what im looking at. the real problem comes in when we need to draw a bounding box. if we only click when looking at it, the neural net may not learn explicit feature that defined a mob (green color change defines a creeper). still useful though
        
        humanoid - -
        
        dog -
        
        car - - -
        
        aircraft - - -
    - non-performance (reasoning) breakthroughs >= transformers
    - other paradigms to make machines intelligent/learn
      - no backpropagation
      - no gradient descent
      - no loss functions
      - evolutionary algorithms
    - compute effect of using more powerful AIs to assist with AGI research
    - audio models
    - video models
    - AI agents
      - environments
        
        minerl & minerl github
        
        openai gym - openai gym github - paper
      - agents performance
        
        factorio systems engineering/design by hierarchical agents - -
    - vision models
      - yolo (you only look once)
        
        yolov8
        
        yolov11
        
        sota obj det models
      - vision transformers (ViTs)
        
        JARVIS-VLA (minecraft)
        
        ViT -
        
        Swin Transformer
    - training / inference / CUDA
      - kernel generation:
      - kernel gen
        
        - x.com/i/grok/share/dT7QTvzfbE2FW8l9vnXNJ8VHT
        
        MY QUESTIONS TO GROK: - if i wanted to completely automate CUDA kernel generation with reasoning LLMs. my thought process is a bit scramble but looks like this: you ask the llm to generate a faster kernel with a text prompt (optional) along with some pytorch code. you take some input shape, do a bunch of operations and get an output shape to the final result. the idea is instead of seperately launching fast kernels, we fuse them into a SINGLE fast kernel. going lower level a model like deepseek-r1 would output some GPU code and we would have a verifier to ensure the gpu results match pytorch results. we would also have a speedup counter to see how much better the current optimization was from the past state. if the results dont match, we would feed it back through a loop where the prompt is customized to fix the script and encourage different approaches to figuring out why they dont match (error checking macros, stride/indexing errors, incorrect reductions, numerical instability, etc). once the results do match, we compare performance. performance is the other feedback loop which consists of a different set of prompts to optimize further (ill design this later). we could reformat the prompt each time for both ensuring correctness, and that the increase in performance during the optimization phase is done properly. ideally we would port each WORKING optimzation to different scripts with a specific name generated at random (adjective_noun). ANYWAYS... curious to hear your feedback on this - suppose this would be easier if we easier put more effort into a finetuning (SFT) a reasoning model on the CUDA compatibility matrix, device stats like ./deviceQuery from cuda-samples, etc. or use RAG, or just have a system of agents that places this info directly into the context window and figures out step by step (different agents have different system prompts as to how they take data and think up some good optimizations) - we actually would only need maybe flash attention2 and 3, thunderkittens, cutlass, and a couple (2) other projects. the rest could be RL because output verification and performance optimization are VERY EASY when it comes to crafting reward functions. like kernelbench from stanford for example: https://arxiv.org/pdf/2502.10517v1
        
        kernelbench
        
        github
        
        arxiv
      - guide for distributed computing (huggingface)
      - attention
        
        Native Sparse Attention - DeepSeek
        
        FlashAttention
        
        FlashAttention2
        
        FlashAttention3
        
        SageAttention
        
        Paged Attention
        
        FlexAttention
      - general
        
        Triton
        
        DeepSeek-V3 Technical Report
        
        DeepSpeed
      - inference only
        
        KV Cache
        
        Speculative Decoding
        
        DeepSpeed Inference
        
        vLLM
        
        llama.cpp
        
        tensorRT-LLM
        
        SGLang
      - optimizers
        
        ZeRO
    - existing techniques
      - distillation
        
        token-level distillation
        
        online logit distillation
      - Mixture-of-Experts
        
        Original MoE Paper
        
        Auxiliary-Loss-Free Load Balancing Strategy for MoEs
      - learning hacks to shorten training time
        
        GrokFast
      - quantization (int4/fp4 optimal?)
        
        k-bit scaling
        
        The Era of 1-bit LLMs: All LLMs are in 1.58 Bits
      - tokenization hacks
        
        SpaceByte
        
        SuperBPE
      - test-time compute (reasoning & think tokens)
        
        DeepSeek-R1
      - reasoning in latent space
        
        Quiet-STaR
        
        Recurrent Depth Approach to Latent Reasoning
      - embeddings / pos enc
        
        V-JEPA
        
        Rotary Positional Embedding
      - RLHF (ppo/dpo)
        
        Learning to Summarize from Human Feedback
        
        Deep Reinforcement Learning from Human Preferences
        
        Fine-Tuning Language Models from Human Preferences
        
        Training Language Models to Follow Instructions with Human Feedback
        
        Scaling Laws for Reward Model Overoptimization
        
        Direct Preference Optimization: Your Language Model is Secretly a Reward Model
      - RL (in general)
        
        Proximal Policy Optimization Algorithms
      - diffusion models
        
        Diffusion Models Beat GANs on Image Synthesis
        
        Denoising Diffusion Probabilistic Models
      - synthetic data
        
        Improving the Scaling Laws of Synthetic Data with Deliberate Practice
        
        OpenAI VPT
      - base transformer additions
        
        differential attention (for attention noise reduction)
        
        Kolmogorov-Arnold-Network (KAN) - KANs - KAN 2.0 - FastKANs - paper - code - FasterKANs - code
        
        normalizations - layernorm - rmsnorm - postnorm vs prenorm