Geographer and Lead GeoData Scientist at @HomesEngland. Daddy of 2, Raspberry Pi Geek, and enthusiastic mountaineering. Also a Spaniard ( Valencia) in the UK.
Transform document images into structured data with LlamaParse (automated validation) 📊
Converting document images such as receipts to structured spreadsheet data requires tedious typing and careful validation.
LlamaParse automates document data extraction by combining OCR parsing with schema validation, eliminating manual typing and human error.
Here is an example pipeline for extracting receipt data:
• Parse receipt images to markdown using LlamaParse OCR engine
• Define receipt structure with Pydantic models (company, date, items, totals)
• Extract structured data automatically with OpenAI integration
• Validate types and enforce business rules (positive prices, valid dates)
• Export to pandas DataFrames or spreadsheets for analysis
🚀 Full article: bit.ly/46ODSl3
☕️ Run this code: bit.ly/46SQrdF#DataScience#Python#MachineLearning#AI
I’ve been deep in R for GIS mapping for years, but today marks a thrilling pivot: my very first Python tutorial is live!
We’ll extract Google’s latest satellite embeddings, run k-means analysis, and map our results
Everyone, watch the tutorial 👉youtu.be/WjoB7mou2n8
Remember Global Buildings Atlas with
2.75 BILLION footprints for 97% of world's buildings?
You can view buildings for any city using this interactive map.
Link below 👇
THREAD: 1/4 If you have a retro console that has broken I would like to honestly recommend @topvint . My OG Xbox stopped working and didn't even switch. I contacted his business and quickly put my mind at ease thinking it would be a capacitor but would need to check.
3/4 With sometimes a lot of cowboys out there charging ridiculous prices just to open an Xbox, finding that #Topvint actually charges a normal price and actually cares about the repair was a blessing. Also super thorough with the repair and diagnostic
4/4 So if you are in need for a #RetroConsole repair or for some modding too, that is the place to go at @topvint . Very friendly but also professional, and for people like me that treasure my old consoles, trust is very important.Hope they get the credit they deserve #recommend
Anybody else prefers the streamlit folium library to the native @streamlit pydeck? I've found more customisation and easy to render maps with folium. But I would like to hear other people's thoughts. Labels and widget controllers are a win in Folium, but 3D render nice in pydeck
So awesome to have DeepMind 🧠 helping create datasets for #EarthEngine! Check out the Satellite Embedding dataset created with the AlphaEarth Foundations model 🛰️🌍👇
medium.com/google-earth/ai-p…
Our new AI model AlphaEarth Foundations is mapping the planet in astonishing detail. 🌏🔍
Scientists will now be able to track the impact of deforestation, monitoring crop health, and more – significantly faster, thanks to our new datasets. 🧵
I've prepared a short blog post on basic Markdown.
I use Markdown a lot in all of my Python projects.
And when I use it, 90% of time it's those 4 commands over and over again.
The blog will be out soon, but if you can't be bothered to read it, this graphics pretty much sums it up!
Enjoy!
Had mixed feelings with VSCode and one of the Copilot's Pro Agents. Tested the Agent mode and seems to randomly change all the files and scripts despite asking for specific instructions. Has anyone else had the same issue? Looking for tips or help. #AI#CopilotPro#Agent#Python
Amazing news with another new version of streamlit. Can't wait to try the new features and use them I'm @databricks apps which are just available as public previews in the UK South Region!
🎈 What's new in Streamlit 1.45?
👀 Announcing the general availability of st.user!
▲ Add new options to st.multiselect and st.selectbox
↔️ Set the width of text alerts and exceptions
⭕ Add an icon to to text and number inputs
See more 👇 buff.ly/3xYLyZe
Have you seen pg_parquet yet?
Setting up `pg_parquet` lets you export tables or query results to Parquet format and read Parquet files into Postgres tables.
A typical workflow might look like this:
- Creating partitions using pg_partman, dividing your data by day, week, month, etc.
- Synching partitions to parquet for long term archiving and analytics
- Dropping partitions you no longer need in your transactional database, freeing up space
- Read from Parquet when you want to query the data lake. Create views to union multiple parquet files.
github.com/CrunchyData/pg_pa…
A nice treat for the upcoming Easter break holidays here in the UK. I'm looking forward to learning more from @__mharrison__ latest book. I did read the Effective Pandas 2 and that was very good, so hoping this helps me more on my data science path with better strategies for viz
When you see a normal data distribution on a Train on your journey home, explaining how the central theorem also applies when people go in the train. I'm a proper data science geek now and can't under what I've seen !. #DataScience#DataEverywhere
I've been playing with @streamlit recently and I need to admit that it is a game changing in the way Data Scientists can showcase their capabilities. Their API and documentation showing examples is extraordinary! A very powerful tool that every data scientist should know #python
I'm thinking about Data Science Apprenticeship level 6 to master my skills and get much more in depth of all the maths and algorithms behind the scenes. Anyone who has done this path recently, can recommend me providers or talk about their experience in something similar? TIA
I was quite surprised using today @databricks GPU Cluster with 16 cores and 110 GB RAM memory using just 2,5 DBU for some geospatial data ingestion. I could easily have all the OS NGD from @OrdnanceSurvey in different notebooks running in parallel and processing close to 200GB