Get live statistics and analysis of Himanshu Singh's profile on X / Twitter

0 following244 followers

Analyzing Profile...

We're analyzing this profile to determine their X archetype. This usually takes just a few seconds.

Impressions
0
$0
Likes
0
0%
Retweets
0
0%
Replies
0
0%
Bookmarks
0
0%

Most engaged tweets of Himanshu Singh

Will OCR get replaced by this?🤔 Attention + OCR = a crazy open source model for you paper: TokenOCR: An Attention Based Foundational Model for Intelligent Optical Character Recognition. This paper evolved the fundamental OCR technique. This paper tells us how to shift from character level to token level recognition by merging CNN with transformers. Here tokens are predicted in place of characters which increases efficiency & accuracy for even the complex documents. Here's how it works: 1) hybrid vision language core: ResNet50 extracts feature spatial features, encoder& decoder transformers use 2D positional embeddings & multi head attention to just align the text & image features. 2) token prediction: it predicts semantic tokens by using "sentencepiece" rather than predicting characters like the classic OCR model. this helps to reduce the error & increase context based understanding. 3) Training: trained on 6M synthetic images where from random letters → nonsensical words → real sentences all this is trained via adaptive learning rate decay and dropout to get better generalization. 4) Strategy & result: used TRDG generated synthetic data with various distortions which outperformed existing models(tesseract & TrOCR on blurry data) Why this matters: -Combines visual & linguistic embeddings for better understandings -higher accuracy even on the noisy images -524M parameterized model. Not only this model reads it. it understands it

37k

Will NLP be in Backseat with this Paper?🤔 This paper is going to be equal to "Attention is All you need" in terms of revolution. (Give it a read) Paper: DeepSeek-OCR: Contexts Optical Compression DeepSeek OCR compresses the long document contexts by turning it into images. This is done by combining it with vision & language for very efficient, high precision text recognition even on complex & multi format documents. Here's how it works: 1) DeepEncoder: This is the core module which uses SAM as a base for windowed attention & CLIP(large) for global knowledge which is connected with convolutional downsampling to reduce the token count & its activation. 2) Multi Resolution processing: Supports the tokens sets from a range of 60-800 vision tokens which are adjusted dynamically for layouts. This helps to keep the details upto(1280*1280). 3) MoE Decoder: DeepSeek's 3B MoE with 570M parameters which are activated by expert routing & acts like a 3B parameterized model but with the efficiency closer to 500M. 4) Training: This model is trained on 30M+ PDF's, 10M+ scene images having(charts, formulas, multilingual texts) this increases its robustness. 5) Performance: It achieves 97% precision on 10 times compression & retains 60% at 20 times compression. with outperforms GOT-OCR2.0 & other benchmarks by using very few tokens in comparison to them. Why this matters: > Makes us understand deeply about compression techniques of longer context text parsing in NLP & vision models. > More cost efficient SOTA for documents with even a complex layout & languages. >This paper enables the possibility of multimodal Ai with higher batch processing to enter. DeepSeek's OCR doesn't just decodes text, it compresses it optically by making you understand complex documents with ease.

2k

My brain cell's froze when i went through this paper. Just a tiny 7M parameter model which beats DeepSeek-R1, Gemini 2.5 pro & o3-mini at reasoning tasks on both ARG-AGI 1 & ARC 2. Its Tiny Recursive Model (TRM) by Samsung. This model is damn good. Here's how it works: 1. Draft an Initial Answer: Not like other LLM writing word-by-word, TRM first generates a quick, complete "draft" of the solution. Think of it as a rough sketch. 2. Creates a "Thought Space": it then creates a separate space for its internal thoughts, a latent reasoning. Truly a changing block of the whole model. 3. Inner Self Audit: here TRM enters a recursive loop, which compares the draft against the prompt & refining logic inside the thought space over and over (for around 6 times), by asking itself "will my logic hold up. Where's the flaw?" 4. Reconstruct the Answer: it uses the improved internal reasoning to generate a stronger and more logical second draft which is more cleaner, sharper & accurate. 5. Repeat Until Mastery: This draft reflects revise & the process continues for up to 16 cycles, each time getting more close to truth level reasoning. Why this matters: This is what algorithmic advantages look like- Business leaders: While competitors are paying ton's of money for inference costs for brute-force scale, a smarter, more efficient model can deliver superior reasoning at low cost. Researchers: A win-win solution for neuro symbolic design. Recursive thought before the action proves that architecture. Practitioners: SOTA reasonings within a smaller model that can fit in your laptop. TRM offers a lean, recursive framework for building competent, domain specific thinkers anywhere. This is an architectural evolution. A model that thinks, not just predicts.

749

Supercharge your 𝕏 game,
Grow with SuperX!

Get Started for Free