This won’t be news to people who already know how to do data science work in Python but I came across a ‘new’ package the other day in a presentation at work – pandas-profilings – see https://github.com/pandas-profiling/pandas-profiling
Learning data science, I have been learning the fundamentals a.k.a how to do stuff from scratch. I think that’s very useful but it does take time. It’s always a pleasure to then find a way of doing all the stuff you have just learnt in one easy step. That’s what pandas-profiling tries to do.
Pandas by itself is a wonderful collection of really useful data wrangling/ profiling tools but so far I’m not missing having to do things one at a time.
This package is worth having in your tool kit, especially when you are working with unfamiliar data. It takes the ‘grind’ out of getting to know the data before moving on to finding insights 🙂
For the past few months, I have been on a sort of Magical Mystery Tour learning about ‘machine learning’, data science and AI. I still can’t tell you the exact differences between them all but it does not matter to me, I just prefer to let art wash over me 😜
My journey began late last year when I set up an interest, nay, ‘working group’ with a few colleagues at work. It started small and now we have over 60 people in the working group. There’s not much ‘working’ going on 😂 but we have fun trying
Right now, the keen people in the group are all learning some variety of data science/ machine learning. We have 4 learning groups:
The Python and Azure groups are the most popular so far, Python already has a big fan base at work (as a more generalised programming and scripting language) and Microsoft’s Azure machine learning tools are very approachable, I think Microsoft have done a really good job! 👍😀
I am ‘leading’ the R machine learning learning group, its a bit like the ‘blind leading the blind’ as I didn’t know any R before I started and now I’m half way through a very good course on a site called datacamp.com #DataCamp The course I am doing is called “Data science with R”.
I am also doing some other courses in parallel and while I am enjoying it all I find that I am spreading myself a bit thin and recently I have had to pause some courses to focus on fewer at a time. There just aren’t enough hours in the day 🤓
So far, I am finding that I seem to have an affinity for ‘data wrangling’ and my background in data stuff gives me a good base to work from. The hard part for me is knowing what algorithms to use when it comes to making sense of the data, I just don’t have the maths/stats background for this but I am hoping it is something I can grow into along the way. The most interesting part is really learning the ‘work flow’ involved and the ‘mindset’ around setting goals, trying out ideas, failing, learning and trying again.
Another great resource that I have been using is edX.org #edX.org. My journey started there with a course called “Data Science: R basics” and I have since done quite a few other courses, including “Introduction to AI” … I am current contemplating doing a full course with one of the university providers on edX.org HarverdX is one of the providers and looks to have a really good one but I am still undecided as it would be a huge time commitment over and above what I’m doing now
For now I am happy to do ‘bespoke’ modules on datacamp and edX.org and develop my overall skills and experience. One thing I really like about datacamp is access to supporting resources like their ‘data science cheat sheets’ – A3 size posters/ infographics that give you a nice summary of a topic to put on the wall or in a folder near your computer.
That’s about all for my opening blog on my machine learning/ data science magical mystery tour… more to follow as I continue my journey! 😎
Wow, its been quite a few months since I last blogged, where did the time go?!??
Over the past few months, I have been on a ‘machine learning’ magical mystery tour, and its still going 😂
I think I will blog about some of my experiences, starting now, while I sit here with my coffee and 2nd breakfast, typing away on my iPad Pro…
Read and learn 😀
Just a short post to promote a really nice ‘cheat sheet’ for Machine Learning algorithms from Microsoft Azure.
Other links on the same page include:
This one’s relatively old, 2015, but gives a good insight into one of the now (hot) Python based machine learning frameworks, Pandas.
Wow, is there any adjective you can’t put in front of the word ‘data’? 😉 Now its Hot Data
Its an interesting article and worth reading. There is even an e-book if you want to join the mailing list and download it.