Herding Code 237: Tess Ferrandez on Three Real World Machine Learning Projects

Download / Listen: Herding Code 237: Tess Ferrandez on Three Real World Machine Learning Projects

At DevSum Stockholm, Jon talks with Tess Ferrandez about some machine learning applications she’s worked on recently, from sports to shoplifting to cancer detection. Tess talks about the specific ethical considerations that come up when classifying and predicting behavior, and how they worked with them in these real-life examples.

Topics:

  • (00:20) Tess has been working on some applied machine learning projects with large customers lately, all focused on computer vision. One project detects soccer goals using computer vision (saving money over hardware based solutions), another detects cancer in microscopy slides, and the third detects shoplifting patterns to minimize
  • (02:55) Tess has been doing this work in Python rather than .NET. Jon asks if it’s possible to use ML.NET, but Tess says Python is necessary, both because the language is better suited and the community libraries are all in Python.
  • (04:35) Jon asks Tess about her experiences moving from .NET to Python, and Tess says it’s a struggle since it’s not strongly typed. You can use testing on the parts that handle data, but not on the machine learning parts.
  • (05:40) Jon asks how much of Tess’ work is done using Jupyter Notebooks. For data exploration, Jypyter works great, but for the actual execution you’ll want to use scripts so it’s testable.
  • (07:00) Jon asks more about how you can detect shoplifting behavior, since it’s an activity that happens over time. Tess says it’s also difficult because the prediction may be biased against a demographic, e.g. 20-40 year old men.
  • (07:54) Tess say ethics and machine learning are close to causing the third machine winter, and goes on to describe the previous two machine winters. We now have the machines and the data, but often the data is so unfair that it could lead to severe ripple effects. This can cause bias in predicting behavior racially, biasing against things like medical analysis due to sample source, etc.
  • (11:30) Jon and Tess discuss the dangers of creating bad feedback loops. Tess talks about an example where Amazon created a system to review CV’s which was biased against women because historically women have had fewer software engineering positions, so this system would have reinforced that by preventing women from getting software engineering positions in the future.
  • (13:35) There’s also a danger of classifying people based on pictures, since we may assume the computer is unbiased even though the bias may have been introduced due to the sample data. Classifying based in pictures would imply that either people were born criminals or criminality changes their appearance, neither of which are acceptable assumptions.
  • (16:09) Going back to the shoplifting case, we need to make sure we’re detecting the action of shoplifting rather than classifying the individual’s appearance. For instance, detecting poses, whether the individual was alone. Pre-trained models for things like object and activities help. There are also subtle sources of bias, for instance if all the source videos are from Christmas, the model may be biased against Santa Claus, so you also need to use pre-trained models for background subtraction.
  • (18:13) Jon asks how important it is to be able to understand how the decisions were made. Tess says it depends based on the impact of the decision, and explains how in the case of cancer detection they determined that color differentiation could be used as a predictor, so the actual application didn’t require machine learning. In the case of football goal detection, there was such a large amount of data (time, video, and sound), it was possible to get very good results.
  • (21:26) Jon asks how developers can learn more. Tess says that software engineers don’t need to start with math – you can use pre-trained models and go from there. She recommends a book called Deep Learning with Python by Francois Chollet – it’s very approachable. Tess also recommends the Machine Learning at Microsoft YouTube channel.

Herding Code 236: Will Green on Going Serverless With AWS

Download / Listen: Herding Code 236: Will Green on Going Serverless With AWS

Kevin and Jon talk with Will Green (@hotgazpacho) about how his small team uses serverless development on the AWS platform to maximize their productivity.

Topics:

  • (00:20) Will’s team builds the FireEye Market, which enables you to “discover apps, extensions, and add-ons that integrate with and extend your FireEye experience.”
  • (02:51) FireEye is a relatively large company, but Will’s team is just four people, and they’re using serverless development to scale and get a lot done quickly. The FireEye Market is a greenfield development project. It’s primarily a single page application that uses GraphQL. When new apps are published, an external provider pings webhooks that kick off background process that cache binaries, notify consumers, etc.
  • (07:05) Kevin asks about what pushed their team towards serverless technology. Will talks about how serverless lets them maximize the time they devote to delivering business value.
  • (08:30) Will talks about how they were able to successfully pitch the project internally. While there were some additional costs as they scaled up, they’ve also been able to take advantage of new AWS services that allow them to scale on demand, which has led to savings.
  • (11:10) Jon asks for more clarification of what Apollo GraphQL‘s role in their architecture.
  • (12:38) Kevin asks about the learning curve. Will says a lot of it was pretty natural since the team already had a Node background, but learning things like cold start took some work.
  • (14:25) They used the serverless framework, which helped take care of setting up tedious infrastructure. If they were starting today, they’d seriously look at AWS Amplify, which is a lot more feature rich and includes support for CI/CD.
  • (15:50) Jon asks how they handle failures, including both code errors and service outages.
  • (19:49) Kevin asks about concerns with vendor lock-in. Will explains why he prefers to just pick a cloud vendor and learn it.
  • (20:49) Kevin asks how they manage the complexity of many small services interacting; Will talks about the use of AWS Step Functions to manage state and workflow, and keeping updated diagrams really helps.
  • (22:40) Kevin asks about the local vs. cloud development experience. Will talks about some local development emulators from the community, but it’s not quite the same as actually hitting the real service.
  • (24:00) Kevin asks about the testing strategy.
  • (25:15) Jon asks how things work with version control. Will explains how AWS CodeBuild handles git push build and deploy for them.
  • (26:00) Jon asks how Will keeps up with all the different AWS services, especially since many aren’t intuitively named. Will defines all the different services they’re using.
  • (28:48) Will describes his bias against containers: you still have to worry about the underlying operating system, whereas with serverless that’s all abstracted away.
  • (30:00) Will explains how they designed the system, starting with diagrams on draw.io, continuing to work through requirements, and evolving the system.
  • (31:52) Will explains what’s different about working with DyanmoDB. There’s a lot, especially access patterns.
  • (36:03) Jon asks how they handle versioning multiple services and data changes; Will talks about using Step Functions and handling data failures.
  • (38:25) Jon asks for advice for people who are getting started with serverless on AWS, and Will highly recommends AWS Amplify. There are lots of samples for serverless framework.
  • (40:39) Kevin asks if it’s possible to migrate an existing application to a serverless architecture. Will says it’s challenging, but you can use CloudFront as a router to start distributing work to serverless services based on URL path segmentation.
  • (41:50) Kevin asks about the experience of moving from Ruby development to JavaScript development.
  • (42:40) Will’s team is hiring right now, here’s the job listing: Senior Developer (US Remote – Prefer Eastern Time Zone).

Herding Code 235: Matthew Renze on Data Science for Software Developers

Download / Listen: Herding Code 235: Matthew Renze on Data Science for Software Developers

At DevSum Stockholm, Jon talks to Matthew Renze (@matthewrenze ) about data science practices to improve both the products they are creating and their software development practices.

Topics:

  • (00:20) Matthew explains how he’s been speaking to software developers about applying data science practices to improve both the products they are creating and their software development practices.
  • (00:40) Data science can add intelligence to applications, machine learning to automate decision-making processes, and deep learning to modify the user interface using anticipatory design.
  • (03:57) The other side to this is using data science to help build software. The DevOps pipeline provides a lot of objective measures to help improve our software development processes and practices.
  • (05:51) Software telemetry data can help us prioritize the time we spend on features towards those that are actively used.
  • (07:12) Jon asks which terms he really needs to understand as a developer. Matthew defines data science, machine learning, deep learning, and reinforcement learning. They discuss how text suggestions and language understanding have progressed, and where generated text can and can’t help.
  • (13:55) Machine learning can be used for good and for evil – for instance, it’s now possible to forge video in a way that’s really tough to detect. What do we do now? Matthew talks about what we can do as developers to educate those around us and apply ethics to the software we contribute to.
  • (19:50) How do we handle things like legal liability for machines that are making decisions, like self-driving cars? Matthew puts it in historical context and talks about how we’ll need to adapt our society to accommodate.
  • (24:12) Jon asks where to get started applying data science today. Matthew gives some pointers on where to get started learning, and how to start with some quick wins like A/B testing and objective software quality metrics.

Herding Code 234: Dylan Beattie on Social Impacts of Technology and the Meaning of Developer Seniority

Download / Listen: Herding Code 234: Dylan Beattie on Social Impacts of Technology and the Meaning of Developer Seniority

At DevSum Stockholm, Jon talks to Dylan Beattie (@dylanbeattie ) about the impacts our technology choices have on our world, different kinds of seniority for software developers, and how to get started as a conference speaker.

Topics:

  • (01:00) Dylan explains how he juggles writing and delivering several keynote presentations (and a bit about the Rockstar programming language). He talks about writing a presentation as an essay first, rather than starting with slides.
  • (06:52) Jon asks Dylan about the themes he’s hoping to bring up in his presentations. Dylan talks about the difference between the things we’re building software to do versus the actual important things we should be focusing on as humans. What is the cost of chasing the new and shiny things, and why can’t we be satisfied with the technology we have?
  • (13:10) Jon asks Dylan about how to convince people to act in the long term interests of humanity. Dylan talks about YouTube’s perfect user is someone who watches movies nonstop for the rest of their life. Jon and Dylan discuss the effectiveness and difficulties of legislating technology.
  • (17:05) So what can we do? Dylan says a good place is to explain things just one level deeper to our non-technical friends. And… heresy alert… you don’t have to build software on the absolute newest technology, either. Jon and Dylan talk about how many of our modern application experiences are inferior to basic HTML.
  • (21:50) Jon asks how developers should advance their careers. Do we need to become managers? Dylan discusses the concept of a “senior developer” and describes four strands: management, leadership, expertise, and mentoring.
  • (24:55) Dylan talks about the example of how Linus Torvalds reacted when confronted over hostility on Linux mailing lists. One important thing is that Linus didn’t put the responsibility of telling him how to fix his behavior on those who confronted him over it.
  • (27:15) Jon asks Dylan how we can apply this to our careers. Dylan discusses the tradeoff – growing in one area will likely cause others to suffer. He explains how to progress in each of these areas, and explains how impactful mentorship doesn’t need to be a big time commitment.
  • (31:00) Jon asks for advice for developers who are interested in getting started with public speaking.

Herding Code 233: Dino Esposito on Blazor, ASP.NET Core, Writing Technical Books, and Machine Learning

Download / Listen: Herding Code 233: Dino Esposito on Blazor, ASP.NET Core, Writing Technical Books, and Machine Learning

Jon talks to Dino Esposito at dotNext (Saint Petersburg, Russia) about Blazor, ASP.NET Core, Writing Technical Books, and Machine Learning.

Topics:

  • (00:45) Blazor
  • (16:50) ASP.NET Core, the ASP.NET Core pipeline and proliferation of available endpoints
  • (27:45: Writing technical books
  • (30:05) Machine Learning and ML.NET

Herding Code 232: Scott Koon on getting out of Tech, GitHub Package Registry, Build 2019 Recap

Download / Listen: Herding Code 232: Scott Koon on getting out of Tech, GitHub Package Registry, Build 2019 Recap

Kevin, Scott K, and Jon talk about Scott Koon’s bold adventure out of the tech industry, GitHub Package Registry, and a Build 2019 Recap.