Posts

  • Consistent Hashing

    Suppose we want to provide a percentage based rollout of features to a set of users without storing the full set of features for each user. It may sound overkill, but if you have millions of users and 50 features, suddenly you are talking about storing 50M items and thats no small amount.

  • JIRA CLI Tooling

  • Typed react native router

  • Tools Matter

  • Thoughts on monorepos

  • Infra graphs with neo4j

    I spent some time recently mucking around with neo4j attempting to model infrastructure, incidents, teams, users, etc. Basically what does it take to answer questions about organizations.

  • 8 months of go

    For the past 8 months I’ve primarily been writing in go. I was hesitant to take a job that used go as its primary language for a lot of reasons, but I decided to give it a try because a lot of companies these days are using it, and it doesn’t hurt to broaden my skillset. In this post I’ll describe the pros and cons of using go from my own experience.

  • Productionalizing ECS

    This post was originally posted on my company’s engineering blog here: http://engineering.curalate.com/2018/05/16/productionalizing-ecs.html

  • Debugging "Maximum String literal length exceeded" with scala

    Today I ran into a fascinating bug. We use ficus as a HOCON auto parser for scala. It works great, because parsing configurations into strongly typed case classes is annoying. Ficus works by using a macro to invoke implicitly in scope Reader[T] classes for data types and recursively builds the nested parser.

  • AETR an open source workflow engine

    For the past several years I’ve been thinking about the idea of an open source workflow execution engine. Something like AWS workflow but simpler. No need to upload python, or javascript, or whatever. Just call an API with a callback url, and when the API completes its step, callback to the coordinator with a payload. Have the coordinator then send that payload to the next step in the workflow, etc.

  • Chaos monkey for docker

    I work at a mostly AWS shop, and while we still have services on raw EC2, nearly all of our new development is on Amazon ECS in docker. I like docker because it provides a unified unit of operation (a container) that makes it easy to build shared tooling regardless of language/application. It also lets you reproduce your applications local in the same environment they run remote, as well as starting fast and deploying fast.

  • Tracking batch queue fanouts

    Edit: This code now exists at https://github.com/paradoxical-io/carlyle

  • Sbt sonatypeRelease on Travis

    I figured I’d drop a quick note here for anyone else running into an issue. If you are trying to do a sonatypeRelease via sbt 1.0.3 on travis and getting a

  • Functors in scala

    A coworker of mine and I frequently talk about higher kinded types, category theory, and lament about the lack of unified types in scala: namely functors. A functor is a fancy name for a thing that can be mapped on. Wanting to abstract over something that is mappable comes up more often than you think. I don’t necessarily care that its an Option, or a List, or a whatever. I just care that it has a map.

  • Tracing High Volume Services

    This post was originally posted at engineering.curalate.com

  • Design patterns

    I was asked by a coworker to help write up some simple examples for junior engineers explaining some of the gang of four design patterns in a simpler more digestable format. I took a stab at this this weekend and figured I’d share it to anyone who stumbles here. It’s hosted on github pages and available via github:

  • From Thrift to Finatra

    Originally posted on the curalate engineering blog

  • The HTTP driver pattern

    Yet another SOA blog post, this time about calling services. I’ve seen a lot of posts, articles, even books, on how to write services but not a good way about calling services. It may seem trivial, isn’t calling a service a matter of making a web request to one? Yes, it is, but in a larger organization it’s not always so trivial.

  • Bit packing Pacman

    Haven’t posted in a while, since I’ve been heads down in building a lot of cool tooling at work (blog posts coming), but had a chance to mess around a bit with something that came up in an interview question this week.

  • Strongly typed http headers in finatra

    When building service architectures one thing you need to solve is how to pass context between services. This is usually stuff like request id’s and other tracing information (maybe you use zipkin) between service calls. This means that if you set request id FooBar123 on an entrypoint to service A, if service A calls service B it should know that the request id is still FooBar123. The bigger challenge is usually making sure that all thread locals keep this around (and across futures/execution contexts), but before you attempt that you need to get it into the system in the first place.

  • Dont be afraid of dependency updates

    Lots of place I’ve worked at have had an irrational fear of upgrading their dependencies. I understand why, when you have something that works you don’t want to rock the boat. You want to focus on building your product, not dealing with potential runtime errors. Your ops team is happy, things are stable. Life is great.

  • Deployment the paradoxical way

    First and foremost, this is all Jake Swensons brain child. But it’s just too cool to not share and write about. Thanks Jake for doing all the hard work :)

  • Coproducts and polymorphic functions for safety

    I was recently exploring shapeless and a coworker turned me onto the interesting features of coproducts and how they can be used with polymorphic functions.

  • CassieQ @ Cassandra Summit

    I had the great chance to talk at Cassandra summit 2016 this year about cassieq, the project I worked on with Jake Swenson at Paradoxical. For anyone interested, here’s the video!

  • Mocking nested objects with mockito

    Yes, I know its a code smell. But I live in the real world, and sometimes you need to mock nested objects. This is a scenario like:

  • Extracting scala method names from objects with macros

    I have a soft spot in me for AST’s ever since I went through the exercise of building my own language. Working in Java I missed the dynamic ability to get compile time information, though I knew it was available as part of the annotation processing pipleine during compilation (which is how lombok works). Scala has something similiar in the concept of macros: a way to hook into the compiler, manipulate or inspect the syntax tree, and rewrite or inject whatever you want. It’s a wonderfully elegant system that reminds me of Lisp/Clojure macros.

  • Dealing with a bad symbolic reference in scala

    Every time this hits me I have to think about it. The compiler barfs at you with something ambiguous like

  • Scripting deployment of clusters in asgard

    We use asgard at work to do deployments in both qa and production. Our general flow is to check in, have jenkins build, an AMI is created, and then … we have to manually go to asgard and deploy it. That sucks.

  • Unit testing DNS failovers

    Something that’s come up a few times in my career is the difficulty of validating if and when your code can handle actual DNS changes. A lot of times testing that you have the right JVM settings and that your 3rd party clients can handle it involves mucking with hosts files, nameservers, or stuff like Route53 and waiting around. Then its hard to automate and deterministically reproduce. However, you can hook into the DNS resolution in the JVM to control what gets resolved to what. And this way you can tweak the resolution in a test and see what breaks! I found some info at this blog post and cleaned it up a bit for usage in scala.

  • CassieQ at the Seattle Cassandra Users Meetup

    Last night Jake and I presented CassieQ (the distributed message queue on cassandra) at the seattle cassandra users meetup at the Expedia building in Bellevue. Thanks for everyone who came out and chatted with us, we certainly learned a lot and had some great conversations regarding potential optimizations to include in CassieQ.

  • Consistent hashing for fun

    I think consistent hashing is pretty fascinating. It lets you define a ring of machines that shard out data by a hash value. Imagine that your hash space is 0 -> Int.Max, and you have 2 machines. Well one machine gets all values hashed from 0 -> Int.Max/2 and the other from Int.Max/2 -> Int.Max. Clever. This is one of the major algorithms of distributed systems like cassandra and dynamoDB.

  • A toy generational garbage collector

    Had a little downtime today and figured I’d make a toy generational garbage collector, for funsies. A friend of mine was once asked this as an interview question so I thought it might make for some good weekend practice.

  • RMQ failures from suspended VMs

    My team recently ran into a bizarre RMQ partition failure in a production cluster. RMQ doesn’t handle partition failures well, and while you can set up auto recovery (such as suspension of minority groups) you need to manually recover from it. The one time I’ve encountered this I got a very useful message in the admin managment page indicating that parts of the cluster were in partition failure, but this time things went weird.

  • Logging the easy way

    This is a cross post from the original posting at godaddy’s engineering blog. This is a project I have spent considerable time working on and leverage a lot.

  • Serialization of lombok value types with jackson

    For anyone who uses lombok with jackson, you should checkout jackson-lombok which is a fork from xebia that allows lombok value types (and lombok generated constructors) to be json creators.

  • Cassandra DB migrations

    When doing any application that involves a persistent data storage you usually need a way to upgrade and change your database using a set of scripts. Working with patterns like ActiveRecord you get easy up/down by version migrations. But with cassandra, which traditionally was schemaless, there aren’t that many tools out there to do this.

  • Dalloc - coordinating resource distribution using hazelcast

    A fun problem that has come up during the implementation of cassieq (a distributed queue based on cassandra) is how to evenly distribute resources across a group of machines. There is a scenario in cassieq where writes can be delayed, and as such there is a custom worker in the app (by queue) who watches a queue to see if a delayed write comes in and republishes the message to a bucket later on. It’s transparent to the user, but if we have multiple workers on the same queue we could potentially republish the message twice. While technically that falls within the SLA we’ve set for cassieq (at least once delivery) it’d be nice to avoid this particular race condition.

  • Leadership election with cassandra

    Cassandra has a neat feature that lets you expire data in a column. Using this handy little feature, you can create simple leadership election using cassandra. The whole process is described here which talks about leveraging Cassandras consensus and the column expiration to create leadership electors.

  • Plugin class loaders are hard

    Plugin based systems are really common. Jenkins, Jira, wordpress, whatever. Recently I built a plugin workflow for a system at work and have been mired in the joys of the class loader. For the uninitiated, a class in Java is identified uniquely by the class loader instance it is created from as well as its fully qualified class name. This means that foo.bar class loaded by class loader A is not the same as foo.bar class loaded by class loader B.

  • Project angelhair: Building a queue on cassandra

    Edit: this project has since been moved to CassieQ: https://github.com/paradoxical-io/cassieq

  • Dynamic HAProxy configs with puppet

    I’ve posted a little about puppet and our teams ops in the past since my team has pretty heavily invested in the dev portion of the ops role. Our initial foray into ops included us building a pretty basic puppet role based system which we use to coordinate docker deployments of our java services.

  • Adventures in pretty printing JSON in haskell

    Today I gave atom haskell-ide a whirl and wanted to play with haskell a bit more. I’ve played with haskell in the past and always been put off by the tooling. To be fair, I still kind of am. I love the idea of the language but the tooling is just not there to make it an enjoyable exploratory experience. I spend half my time in the repl inspecting types, the other half on hoogle, and the 3rd half (yes I know) being frustrated that I can’t just type in package names and explore API’s in sublime or atom or wherever I am. Now that I’m on a mac, maybe I’ll give leksah another try. I tried it a while ago it didn’t work well.

  • Automating deployments with salt, puppet, jenkins and docker

    I know, its a buzzword mouthful. My team has had good first success leveraging jenkins, salt, sensu, puppet, and docker to package and monitor distributed java services with a one click deployment story so I wanted to share how we’ve set things up.

  • Testing puppet with docker and python

    In all the past positions I’ve been in I’ve been lucky enough to have a dedicated ops team to handle service deployment, cluster health, and machine managmenent. However, at my new company there is much more of a “self serve” mentality such that each team needs to handle things themselves. On the one hand this is a huge pain in my ass, since really the last thing I want to do is deal with clusters and machines. On the other hand though, because we have the ability to spin up openstack boxes in our data centers at the click of a button, each team has the flexibility to host their own infrastructrure and stack.

  • Converting akka scala futures to java futures

    Back in akka land! I’m using the ask pattern to get results back from actors since I have a requirement to block and get a result (I can’t wait for an actor to push at a later date). Thats fine, but converting from scala futures to java completable futures is a pain. I also, (like mentioned in another post) want to make sure that my async responses capture and set the MDC for proper logging.

  • Shareable zsh environment: EnvZ

    Introducing EnvZ.

  • Adding MDC logging to akka

    I’ve mentioned before, but I’m working heavily in a project that is leveraging akka. I am really enjoying the message passing model and so far things are great, but tying in an MDC for the SLFJ logging context proved complicated. I had played with the custom executor model described here but hadn’t attempted the akka custom dispatcher.

  • Getting battery percentage in zsh

    I’m on osx maverick still at home on my laptop and I spent part of today dicking around customizing my zsh shell. I wanted to be able to show my battery percentage in the shell and it’s really pretty easy.

  • Handling subclassed constraints with a DSL in java 8

    I really like doing all of my domain modeling with clean DSL’s (domain specific languages). Basically I want my code to read like a sentence, and to hide all the magic behind things. When things read clearly even a non professional can determine if something is wrong. The ideal scenario is to have your code read like pseudocode since nobody really cares what the internals are, what matters is your general solution.

  • Installing leinigen on windows

    Figured I’d spend part of the afternoon and play with clojure but was immediately thwarted trying to install leiningen on windows via powershell. I tried the msi installer but it didn’t seem to do anything, so I went to my ~/.lein/bin folder and ran

  • Simplifying class matching with java 8

    I’m knee deep in akka these days and its a great queueing framework, but unfortunately I’m stuck using java and not able to use scala (business decisions, not mine!) so pattern matching on incoming untyped events can be kind of nasty.

  • Auto scaling akka routers

    I’m working on a project where I need to multiplex many requests through a finite set of open sockets. For example, I have 200 messages, but I can only have at max 10 sockets open. To accomplish this I’ve wrapped the sockets in akka actors and am using an akka routing mechanism to “share” the 10 open sockets through a roundrobin queue.

  • Tiny types scala edition

    Previously I wrote about generating value type wrappers on top of C# primitives for better handling of domain level knowledge. This time I decided to try it out in scala as I’m jumping into the JVM world.

  • Simple log context wrapper

    I’m still toying around with the scala play! framework and I wanted to check out how I can make logging contextual information easy. In the past with .NET I’ve used and written libraries that wrap the current log provider and give you extra niceties with logging. One of my favorites was being able to do stuff like

  • Conditional injection with scala play and guice

    It’s been a crazy year for me. For those who don’t know I moved from the east coast to the west coast to work for a rather large softare company in seattle (I’ll let you figure which one out) and after a few short weeks realized I made a horrible mistake and left the team. I then found a cool job at a smaller .net startup that was based in SF and met some awesome people and learned a lot. But, I’ve been poached by an old coworker and am now going to go work at a place that uses more open source things so I decided to kick into gear and investigate scala and play.

  • Quickly associate file types with a default program

    I use JuJuEdit to open all my log files since it starts up fast, is pretty bare bones, but better than notepad. The way my log4net appender is set up is that log files are kept for 10 days and get a .N appended to them for each backup. I.e.

  • Creating stronger value type contracts

    I’ve long been annoyed that value types don’t have strong semantic information attached to them such that the compiler would barf if I try and pass an value type that isn’t semantically the same as what the function wanted. For example, what does the following signature mean other than than taking in 2 ints and returning a bool?

  • AngularJS for .Net developers

    A few months ago I was asked to be a technical reviewer on a new packt pub book called AngularJS for .Net developers. It mostly revolves around ServiceStack (not web API) and building a full stack application with angular. I actually really enjoyed reading it and thought it touched on a lot of great points that a developer who is serious needs to know about.

  • Leveraging message passing to do currying in ruby

    I’m not much of a ruby guy, but I had the inkling to play with it this weekend. The first thing I do when I’m in a new language is try to map constructs that I’m familiar with, from basic stuff like object instantiation, singletons, inheritance, to more complicated paradigms like lambdas and currying.

  • Sometimes you have to fail hard

    This was a post I wrote in the middle of 2013 but never published. I wanted to share this since it’s a common story across all technologies and developers of all skill levels. Sometimes things really just don’t work. As a post-script, I did come back to this project and had a lot of success. When in doubt, let time figure it out :)

  • wcf Request Entity Too Large

    I ran into a stupid issue today with WCF request entity too large errors. If you’re sure your bindings are set properly on both the server and client, make sure to double check that the service name and contract’s are set properly in the server.

  • Short and sweet powershell prompt with posh-git

    My company has fully switched to git and it’s been great. Most people at work use SourceTree as a gui to manage their git workflow, some use only command line, and I use a mixture of posh-git in powershell with tortoise git when I need to visualize things.

  • Multiple SignalR clients and ASMX service calls from the same application

    I was writing a test application to simulate what multiple signalR clients to a server would act like. The clients were triggered by the server and then would initiate a sequence of asmx web service calls back to the server using a legacy web service. This way I was using signalR as a triggering mechanism and not as a data transport. For my purpose this worked out great.

  • Constraint based sudoku solver

    A few weekends ago I decided to give solving Sudoku a try. In case you aren’t familiar with Sudoku, here is what an unsolved board looks like

  • Creating futures

    Futures (and promises) are a fun and useful design pattern in that they help encapsulate asynchronous work into composable objects. That and they help hide away the actual asynchronous execution implementation. It doesn’t matter if the future is finally resolved on the threadpool, in a new thread, or in an event loop (like nodejs).

  • Instagram viewer with node and angular

    I have an artist buddy who is working on an art installation and asked me if there was a way to display a realtime view of an instagram hashtag feed on a projector.

  • Building a prefix trie

    Prefix trie’s are cool data structures that let you compress a dictionary of words based on their shared prefix. If you think about it, this makes a lot of sense. Why store abs, abbr, and abysmal when you only need to store a,b,b,r,s,y,s,m,a,l. Only storing what you have to (based on prefix) in this example gives you a 70% compression ratio! Not too bad, and it would only get better the more words you added.

  • Avoiding nulls with expression trees

    I’ve blogged about this subject before, but I REALLY hate null refs. This is one of the reasons I love F# and other functional languages, null ref’s almost never happen. But, in the real world I work as a C# dev and have to live with C#’s… nuisances.

  • Strongly typed powershell csv parser

    Somehow I missed the powershell boat. I’ve been a .NET developer for years and I trudged through using the boring old cmd terminal, frequently mumbling about how much I missed zsh. But something snapped and I decided to really dive into powershell and learn why those who use it really love the hell out of it. After realizing that the reason everyone loves it is because everything is strongly typed and you can use .NET in your shell I was totally sold.

  • A simple templating engine

    I wanted to talk about templating, since templating is a common thing you run into. Often times you want to cleanly do a string replace on a bunch of text, and sometimes even need minimal language processing to do what you want. For example, Java has a templating engine called Velocity, but lots of languages have libraries that do this kind of work. I thought it’d be fun to create a small templating engine from scratch with F# as an after work exercise.

  • RxJava Observables and Akka actors

    I was playing with both akka and rxjava and came across the following post that described how to map rxjava observables from messages posted to akka actors.

  • Debugging F# NUnit equals for mixed type tuples

    Twitter user Richard Dalton asked a great question recently:

  • Single producer many consumer

    When I’m bored, I like to roll my own versions of things that already exist. That’s not to say I use them in production, but I find that they are great learning tools. If you read the blog regularly you probably have realized I do this A LOT. Anyways, today is no different. I was thinking about single producer, multiple consumer functions, like an SNS Topic, but for your local machine. In reality, the best way to do this would be to publish your event through an Rx stream and consume it with multiple subscribers, but that’s no fun. I want to roll my own!

  • Building LINQ in Java pt 2

    In my last post I discussed building a static class that worked as the fluent interface exposing different iterator sources that provide transformations. For 1:1 iterators, like take, skip, while, for, nth, first, last, windowed, etc, you just do whatever you need to do internally in the iterator by manipulating the output the stream.

  • Logitech mx mouse zoom button middle click on Ubuntu

    Any good engineer has their own tools of their trade: keyboard, mouse, and licenses to their favorite editors (oh and a badass chair).

  • Filter on deep object properties in angularjs

    AngularJS provides a neat way of filtering arrays in an ng-repeat tag by piping elements to a built in filter filter which will filter based on a predicate. Doing it this way you can filter items based on a function, or an expression (evaluated to a literal), or by an object.

  • A daily programmer - nuts and bolts

    I’ve mentioned r/dailyprogrammer in previous posts, since I think they are fun little problems to solve when I have time on my hands. They’re also great problem sets to do when learning a new language.

  • Getting started with haskell

    I wanted to share how I’ve finally settled on my haskell development environment and how I got it set up, since the process in the end wasn’t that trivial. Hopefully anyone else starting in haskell can avoid the annoyances and pitfalls that I ran into and get up and running (and doing haskell) quickly.

  • Building LINQ in Java

    Now that Java 8 has lambdas, I decided to check out what kind of lazy collection support their streams functionality had. It had some cool stuff, like

  • Checking if a socket is connected

    Testing if a socket is still open isn’t as easy at it sounds. Anyone who has ever dealt with socket programming knows this is hassle. The general pattern is to poll on the socket to see if its still available, usually by sitting in an infinite loop. However, with f# this can be done more elegantly using async and some decoupled functions.

  • Pulling back all repos of a github user

    I recently had to relinquish my trusty dev machine (my work laptop) since I got a new job, and as such am relegated to using my old mac laptop at home for development until I either find a new personal dev machine or get a new work laptop. For those who don’t know, I’m leaving the DC area and moving to Seattle to work for Amazon, so that’s pretty cool! Downside is that it’s Java and Java kind of sucks, but I can still do f#, haskell, and all the other fun stuff on the side.

  • F# utilities in haskell

    Slowly I am getting more familiar with Haskell, but there are some things that really irk me. For example, a lot of the point free functions are right to left, instead of left to right. Coming from an F# background this drives me nuts. I want to see what happens first first not last.

  • 24 hour time ranges

    Dealing with time is hard, it’s really easy to make a mistake. Whenever I’m faced with a problem that deals with time I tend to spend an inordinate amount of time making sure I’m doing things right.

  • Java lambdas

    I’m not a java person. I’ve never used it in production, nor have I spent any real time with it outside of my professional work. However, when a language dawns upon lambdas I am drawn to try out their implementation. I’ve long since despised Java for the reasons of verbosity, lack of real closures or events, type erasure in generics, and an over obsession with anonymous classes, so I’ve shied away from doing anything in it.

  • Reading socket commands

    A few weeks ago I was working on a sample application that would simulate a complex state machine. The idea is that there is one control room, and many slave rooms, where each slave room has its own state. The control room can dispatch a state advance or state reverse to any room or collection of rooms, as well as query room states, and other room metadata.

  • The Arrow operator

    Continuing my journey in functional programming, I decided to try doing the 99 haskell problems to wean my way into haskell. I’ve found this to be a lot of fun since they give you the answers to each problem and, even though I have functional experience, the haskell way is sometimes very different from what I would have expected.

  • Review of my first time experience with haskell editors

    When you start learning a new language the first hurdle to overcome is how to edit, compile, and debug an application. In my professional career I rely heavily on visual studio and intellij IDEA as my two IDE workhorses. Things just work with them. I use visual studio for C#, C++, and F# development and IDEA for everything else (including scala, typescript, javascript, sass, ruby, and python).

  • Machine Learning with disaster video posted

    A few weeks ago we had our second DC F# meetup with speaker Phil Trelford where he led a hands on session introducing decision trees. The goal of meetup was to see how good of a predictor we could make of who would live and die on the titanic. Kaggle has an excellent data set that shows age, sex, ticket price, cabin number, class, and a bunch of other useful features describing Titanic passengers.

  • Till functions

    Just wanted to share a couple little functions that I was playing with since it made my code terse and readable. At first I needed a way to fold a function until a predicate. This way I could stop and didn’t have to continue through the whole list. Then I needed to be able to do the same kind of thing but choosing all elements up until a predicate.

  • Angular with typescript architecture

    Bear with me, this is going to be a long post.

  • Seq.unfold and creating bit masks

    In the course of working on ParsecClone I needed some code that could take in an arbitrary byte array and convert it to a corresponding bit array. The idea is if I have an array of

  • Thinking about haskell functors in .net

    I’ve been teaching myself haskell lately and came across an interesting language feature called functors. Functors are a way of describing a transformation when you have a boxed container. They have a generic signature of

  • ParsecClone on nuget

    Today I published the first version of ParsecClone to nuget. I blogged recently about creating my own parser combinator and it’s come along pretty well. While FParsec is more performant and better optimized, mine has other advantages (such as being able to work on arbitrary consumption streams such as binary or bit level) and work directly on strings with regex instead of character by character. Though I wouldn’t recommend using ParsecClone for production string parsing if you have big data sets, since the string parsing isn’t streamed. It works directly on a string. That’s still on the todo list, however the binary parsing does work on streams.

  • Machine learning from disaster

    If any of my readers are in the DC/MD/VA area you should all come to the next DC F# meetup that I’m organizing on september 16th (monday). The topic this time is machine learning from disaster, and we’ll get to find out who lives and dies on the Titanic! We’re bringing in guest speaker Phil Trelford so you know its going to be awesome! Phil is in the DC area on his way to the F# skills matters conference in NYC a few days later. I won’t be there but I expect that it will be top notch since all the big F# players are there (such as Don Syme and Tomas Petricek)!.

  • Implementing the game "Arithmetic"

    There is a subreddit on reddit called /r/dailyprogrammer and while they don’t actually post exercises daily, they do sometimes post neat questions that are fun to solve. About a week ago, they posted a problem that I solved with F# that I wanted to share. For the impatient, my full source is available at this fssnip.

  • Tech talk: Pattern matching

    Today’s tech talk was about functional pattern matching. This was a really fun one since I’ve been sort of “evangelizing” functional programming at work, and it was a blast seeing everyone ask poignant and intersting questions regarding pattern matching.

  • Parse whatever with your own parser combinator

    In a few recent posts I talked about playing with fparsec to parse data into usable syntax trees. But, even after all the time spent fiddling with it, I really didn’t fully understand how combinators actually worked. With that in mind, I decided to build a version of fparsec from scratch. What better way to understand something than to build it yourself? I had one personal stipulation, and that was to not look at the fparsec source. To be fair, I cheated with one function (the very first one) so I kind of cheated a lot, but I didn’t peek at anything else, promise.

  • Coding Dojo: a gentle introduction to Machine Learning with F# review

    Recently I organized an F# meetup in DC, and for our first event we brought in a wonderful speaker (Mathias Brandewinder) who’s topic was called: “Coding Dojo: a gentle introduction to Machine Learning with F#”.

  • F# class getter fun

    I was playing with Neo4J (following a recent post I stumbled upon by Sergey Tihon), and had everything wired up and ready to test out, but when I tried running my code I kept getting errors saying that I hadn’t connected to the neo4j database. This puzzled me because I had clearly called connect, but every time I tried to access my connection object I got an error.

  • Trees and continuation passing style

    For no reason in particular I decided to revisit tree traversal as a kind of programming kata. There are two main kinds of tree traversal:

  • Strongly typing SignalR

    I’m a big fan of strong typing. If you can leverage the compiler to give you an error (or warning) before you deploy code, all the better. That means you won’t, ideally, push a bug into the field. So I have a big problem with frameworks and libraries that rely on dynamic objects, or even worse, stringly typing thing. Don’t get me wrong, sometimes dynamics are the only way to solve the problem, but whenever I run into one I’m always afraid that I’m going to get a runtime error since I don’t really know what I’m acting on till later.

  • F# and Machine learning Meetup in DC

    As you may have figured out, I like F# and I like functional languages. At some point I tweeted to the f# community lamenting that there was a dearth of F# meetups in the DC area. Lo and behold, tons of people replied saying they’d be interested in forming one, and some notable speakers piped up and said they’d come and speak if I set something up.

  • SignalR on ios and a single domain

    Safari on ios has a limitation that you can only have one concurrent request to a particular domain at a time. Normally this is fine, since once a request completes the next one that is queued up fires off. But what if you are using a realtime persistent connection library like signalR? In this case your one allowed connection is held up with the signalR request. If you’re not on a mac or linux and you use windows 7 or earlier you can’t use websockets so you’re stuck using http. Most suggestions involve buying a second domain, but sometimes thats not possible, especially if your application is a distributable web app that can run on client machines. You can’t expect clients to have to buy a second domain just so your realtime push works.

  • Tech talk: CLR Memory Diagnostics

    Today’s tech talk we discussed the recent release from Microsoft of ClrMD that lets you attach and debug processes using an exposed API. You used to be able to do this in WinDbg using the SOS plugin, but now they’ve wrapped SOS in a managed dll that you can use to inspect CLR process information. The nice thing about this is you can now automate debugging inspections. It’s now as easy as

  • Reworking my language parser with fparsec

    Since I was playing with fparsec last week, I decided to redo (or mostly) the parser for my homebrew language that I’ve previously posted about. Using fparsec made the parser surprisingly succinct and expressive. In fact I was able to do most of this in an afternoon, which is impressive consideringmy last C# attempt took 2 weeks to hammer out.

  • Locale parser with fparsec

    Localizing an application consists of extracting out user directed text and managing it outside of hardcoded strings in your code. This lets you tweak strings without having to recompile, and if done properly, allows you to support multiple languages. Localizing is no easy task, it messes up spacing, formatting, name/date other cultural information, but thats a separate issue. The crux of localizing is text.

  • Linear separability and the boundary of wx+b

    In machine learning, everyone talks about weights and activations, often in conjunction with a formula of the form wx+b. While reading machine learning in action I frequently saw this formula but didn’t really understand what it meant. Obviously its a line of some sort, but what does the line mean? Where does w come from? I was able to muddle past this for decision trees, and naive bayes, but when I got to support vector machines I was pretty confused. I wasn’t able to follow the math and conceptually things got muddled.

  • Ordered Consumable

    I had the need for a specific collection type where I would only ever process an element once, but be able to arbitrarily jump around and process different elements. Once a jump happened, the elements would be processed in circular order: continue to the end, then loop around to the beginning and process any remaining items.

  • Threadpooling in netduino

    Sometimes you want to do asynchronous work without holding up your current thread but the work that needs to be done doesn’t really warrant the cost of spinning up a new thread (though what the exact cost is on an embedded environment I’m not sure).

  • Qconn NYC 2013

    If anyone is at qconn this year come find me (I’m wearing an adult swim hoodie)! There won’t be a tech talk this week since I’m busy at the conf but things will return back to normal next week.

  • Automatic fogbugz triage with naive bayes

    At my work we use fogbugz for our bugtracker and over the history of our company's lifetime we have tens of thousands of cases. I was thinking recently that this is an interesting repository of historical data and I wanted to see what I could do with it. What if I was able to predict, to some degree of acuracy, who the case would be assigned to based soley on the case title? What about area? Or priority? Being able to predict who a case gets assigned to could alleviate a big time burden on the bug triager.

    Thankfully, I'm reading "Machine Learning In Action" and came across the naive bayes classifier, which seemed a good fit for me to use to try and categorize cases based on their titles. Naive bayes is most famously used as part of spam filtering algorithms. The general idea is you train the classifier with some known documents to seed the algorithm. Once you have a trained data set you can run new documents through it to see what they classify as (spam or not spam).

    For those who've never used Fogbugz, let me illustrate the data that's available to me. I've highlighted a few areas I'm going to use. The title is what we're going to use as the prediction value (highlighted blue), and the other red highlights are categories I want to predict (area, priority, and who the case is assigned to).

    2013-05-30 19_42_05-FogBugz

    For the impatient, full source code of my bayes classifier is available on my github.

    Conditional probability

    Conditional probability describes the probability of an item given you already know something about it. Formally it's described in the syntax of P(A | B), which is pronounced as "probability of A given B". A good example is provided for in the machine learning book. Imagine you have 7 marbles. 3 white marbles, and 4 black marbles. Whats the probability of a white marble? It's 3/7. How about a black marble? It's 4/7.

    Now imagine you introduce two buckets: a blue bucket and a red bucket. In the red bucket, you have 2 white marbles and 2 black marbles. In the blue bucket you have 1 white marble and 2 black marbles. Whats the probability of getting a white marble from the blue bucket? It's 1/3. There is only one white marble in the blue bucket, and 3 total marbles. So, P(white marble | blue bucket) is 1/3.

    buckets

    Bayes Formula

    This doesn't really help though. What you really want is to be able to calculate P(red bucket | white marble). This is where bayes rule comes into play:

    This formula describes how items and their conditions relate (marbles and buckets).

    Conditional Independence

    Naive bayes is called naive because it assumes that each occurrence of an item is just as likely as any other. Getting a white marble isn't dependent on first getting a black marble. To put it another way, the word "delicious" is just as likely to be next to "sandwich" as it is to "stupid". It's not really the case. In reality "delicious" is much more likely to be next to "sandwich" than "stupid".

    The naive portion is important to note, because it allows us to use the following property of conditionally independent data:

    Independent product formula

    What this formula means is that the probability of one thing AND another thing is the probability of each multiplied together. This applies to us since if the text is composed of words, and words are conditionally independent, then we can use the above property to determine the probability of text. In other words, you can expand P(text | spam) to be

    ```
    text = word1 ∪ word2 ∪ word3 ∪ ... ∪ wordN

    P(text | spam) = P(word1 | spam)*P(word2 | spam)...*P(wordN | spam)

  • Tech talk: B-Trees

    Yesterdays tech talk was on b-trees. B-trees are an interesting tree data structure that are used to minimize disk read access. Also, since they are self balancing, and optimized for sequential reads and inserts, they’re really good for file systems and databases. CouchDB, MongoDB, SQLite, SQL Server and other datbases all use either a b-tree or a b+ tree as their data indexes, so it was interesting to discuss b-tree properties.

  • Working on a long term svn branch

    I work on a reasonably small team and for the most part everyone works in trunk. But it can happen where you need to switch over to a long term feature branch (more than a week or two) that can last sometimes months. The problem here is that your branch can easily diverge from trunk. If the intent is that the feature branch will eventually become the master (trunk) then you should merge the feature branch frequently. For me, this method has worked really well.

  • Building an ID3 decision tree

    After following Mathias Brandewinder’s series on converting the python from “Machine Learning in Action” to F#, I decided I’d give the book a try myself. Brandewinder’s blog is great and he went through chapter by chapter working through F# conversions. If you followed his series, this won’t be anything new. Still, I decided to do the same thing as a way to solidify the concepts for myself, and in order to differentiate my posts I am reworking the python code into C#. For the impatient, the full source is available at my github.

  • Tech Talk: Sorting of ratings

    Today’s tech talk discussed different ways to sort ratings system. The topic revolved around a blog post we discovered a while ago breaking down different problems with star based sorts.

  • Byte arrays, typed values, binary reader, and fwrite

    I was trying to read a binary file created from a native app using the C# BinaryReader class but kept getting weird numbers. When I checked the hex in visual studio I saw that the bytes were backwards from what I expected, indicating endianess issues. This threw me for a loop since I was writing the file from C++ on the same machine that I was reading the file in C# in. Also, I wasn’t sending any data over the network so I was a little confused. Endianess is usually an issue across machine architectures or over the network.

  • Why \d is slower than [0-9]

    I learned an interesting thing today about regular expressions via this stackoverflow question. \d, commonly used as a shorthand for digits (which we usually think of as 0-9) actually checks against all valid unicode digits.

  • Minimizing the null ref with dynamic proxies

    In a production application you frequently can find yourself working with objects that have a large accessor chain like

  • Bad image format "Invalid access to memory location"

    Wow, two bad image format posts in one day. So, the previous post talked about debugging 64bit vs 32 bit assemblies. But after that was solved I ran into another issue. This time with the message:

  • Determining 64bit or 32 bit .NET assemblies

    I work on a 64 bit machine but frequently deploy to 32 bit machines. The code I work on though has native hooks so I always need to deploy assembly entry points at 32 bit. This means I am usually paranoid about the build configuration. However, sometimes things slip up and a 64 bit dll gets sent out or an entrypoint is built with ANY CPU set. Usually this is caught on our continuous build server with some cryptic reason for a unit test that should be working is actually failing.

  • Streaming video to ios device with custom httphandler in asp.net

    I ran into an interesting tidbit just now while trying to dynamically stream a video file using a custom http handler. The idea here is to bypass the static handler for a file so that I can perform authentication/preprocessing/etc when a user requests a video resource and I don’t have to expose a static folder with potentially sensitive resources.

  • Users by connections in SignalR

    SignalR gives you events when users connect, disconnect, and reconnect, however the only identifying piece of information you have at this point is their connection ID. Unfortunately it’s not very practical to identify all your connected users strictly off their connectionIDs - usually you have some other identifier in your application (userID, email, etc).

  • Tech Talk: Path finding algorithms

    Today’s tech talk was about path finding algorithms. The topic was picked because of a recent linked shared to reddit that visualized different algorithms. The neat thing about the link is that you can really see how different algorithms and heuristics modify the route.

  • Building better regular expressions

    Every software developer has at one point in time heard the adage

  • The largest mass problem

    I was recently asked to write some code to find the largest contiguous group of synonymous elements in a two dimensional array. The idea is that you want to find the largest “land mass” in a problem where you have a game board that looks something like

  • Capturing union values with fparsec

    I just started playing with fparsec which is a parser combinatorics library that lets you create chainable parsers to parse DSL’s. After having built my own parser, lexer, and interpreter, playing with other libraries is really fun, I like seeing how others have done it. Unlike my mutable parser written in C#, with FParsec the idea is that it will encapsulate the underlying stream state and result into a parser object. Since F# is mostly immutable, this is how the underlying modified stream state gets captured and passed as a new stream to the next parser. I actually like this kind of workflow since you don’t need to create a grammar which is parsed and creates code for you (which is what ANTLR does). There’s something very appealing to have it be dynamic.

  • Tech Talk: AngularJS

    Today’s tech talk was a continuation on front-end discussions we’re having. Last week we talked about typescript (I forgot to write it up) and this week we discussed the basics of angular. Angular is a front-end MVC framework written by google that, at first glance, looks completely different from previous javascript/html development. The basic gist is to strongly decouple logic into encapsulated modules. But that’s not all there is, there’s a lot to it. Angular has a templating engine, dependency injection, double bindings between views and controllers, event dispatching, etc.

  • Debugging Serialization Exception: The constructor to deserialize an object was not found.

    Today I was debugging an exception that was occuring when remoting a data object between two .NET processes. I kept getting

  • Separation of concerns in node.js

    I’ve been playing with typescript and node.js and I wanted to talk a little about how I’ve broken up my app source. It’s always good to modularize an application into smaller bits, and while node lets you do a lot, quickly, with just a little bit of code, as your application grows you really can’t put all your logic in one big app.ts.

  • Images, memory leaks, GDI+, and the aggregate function

    I ran into a neat C# memory leak today that I wanted to share. It’s not often you get a clear undeniable leak in C# and so I really had fun figuring this one out.

  • A response to "Ten reasons to not use a functional programming language"

    If you haven’t read the top ten reasons to not use a functional programming language, I think you should. It’s a well written post and ironically debunks a lot of the major trepidations people have with functional languages.

  • Tech Talk: Text Editors

    Today’s tech talk was a little less tech but no less important. We got together and talked about the different text editors that we use and why we like them.

  • Command pattern with SignalR

    I’m using SignalR as a long poll mechanism between multiple .NET clients because part of my projects requirements is to have everything over http/https. There’s no point in rolling my own socket based long poll since SignalR has already done all the heavy lifting. Unfortunately since the app I work on is distributed I can’t upgrade my SignalR version from what I have (0.5.2) since the newer SignalR versions aren’t backwards compatabile. This means I have to make do with what this version of SignalR gives me.

  • Jon Skeet, C#, and Resharper

    Today, at 1pm EST, the venerable Jon Skeet had a goto meeting webinar sponsored by JetBrains reviewing weird and cool stuff about C# and Resharper. For those who aren’t in the know, Resharper is a static analysis tool for C# that is pretty much the best thing ever. Skeet’s a great speaker and my entire team at work and I watched the webinar in our conference room while eating lunch.

  • Capturing mutables in f#

    I was talking about F# with a coworker recently and we were discussing the merits of a stateless system. Both of us really like the enforcement of having to inject state, and when necessary, returning a new modified copy of state. Functional languages want you to work with this pattern, but like with all things software, it’s good to be able to break the rules. This is one of the things I like about F#, you can create mutables and do work imperatively if you need to.

  • Tech talk: Hacking droid

    Todays tech talk was based off of a blog entry posted by facebook recently where they described the things they needed to do to get their mobile app running on android OS Froyo (v 2.2).

  • Advice to young engineers

    I had the opportunity to represent the company I work for at an engineering networking event at the University of Maryland today catered to young engineering students of all disciplines. The basic idea was to be available for students to ask questions they don’t normally get to ask of working professionals such as “what’s the day to day like?” [lots of coffee, followed by coding all day], “what advice would you give to someone looking to get into xyz field”, etc.

  • Flyweight Locking

    Locking is a necessary aspect of multithreading code: it prevents unpredictable behavior and makes sure code that is expected to run synchronously does so. Some situations can leverage lockless code, but not always. When you do need to do a lock you shouldn’t do it carelessly, if you lock a section of code that does some major work (such as database access) and it blocks other pending calls you need to be cognizant that there could be a delay or bottleneck. However, just because we have to lock doesn’t mean we can’t do some simple optimizations depending on what our business logic is. If we only need to lock items per a defined group then we can leverage flyweight locking. Lets go through an example to make this scenario clearer.

  • Mongoose with TypeScript

    Mongoose is a library for node.js that wraps the mongoDB driver. Since I’ve been playing with typescript, I wanted to show a short demo of strongly typing mongoose with unit tests run in nodeunit all using typescript.

  • Tech talk: Javascript Memory Leaks and JSWhiz

    Todays tech talk revolved around the recently published JSWhiz whitepaper from google. The paper discusses common javascript memory leak patterns. It also goes over how those leaks can be created and how google automated detection of them using Closure type annotations.

  • Merging two immutable dictionaries in F#

    If you ever need to merge two immutable dictionaries (maps) that may share the same key, here is how I did it

  • When to abort a thread

    When is it OK to abort a thread is a question that comes up every so often. Usually everyone jumps on the bandwagon that you should never ever do a thread abort, but I don’t agree. Certainly there are times when it’s valid and if you understand what you are doing then it’s ok to use.

  • Implementing partial functions

    This next section I had a lot of fun with, and originally I didn’t plan on implementing it at all. The only reason I did it is because I had a stroke of genius while in the shower one morning. Today, I’m going to talk about how I supported partial functions in my toy programming language.

  • Tech talk: Service stack

    Today’s tech talk the team and I talked about ServiceStack. I’ve heard a lot of hype about it but never really understood what it did or was about. Today, unfortunately, didn’t really clear any of that up.

  • Fixing "Calling LoadLibraryEx on ISAPI filter v4.0.30319 aspnet_filter.dll failed"

    [code wraplines=”true”]Calling LoadLibraryEx on ISAPI filter “C:\Windows\Microsoft.NET\Framework\v4.0.30319\aspnet_filter.dll” failed[/code]

  • Adding static typing and scope references, part 3: solving forward references

    In an earlier post I gave a brief overview of the scope builder and its jobs. There I mentioned that supporting forward references required some extra work. In this post I’ll talk more about how I solved forward references.

  • Just another brainfuck interpreter

    Why?

    Honestly, why not?

    The entry point

    Not much to tell:

    ```csharp
    static void Main(string[] args)
    {
    var parser = new Parser("++++++++++[>+++++++>++++++++++>+++>+<<<<-]>++.>+.+++++++..+++.>++.<<+++++++++++++++.>.+++.------.-------- .\>+.\>.");

  • Add scheduled task and run even if on battery power

    Just wanted to share a little helpful snippet in case anyone needs it. To add a scheduled task and make sure it starts even when on battery power do this:

  • Adding static typing and scope validation, part 2: type inference and validation

    This post continues my series describing how I solved certain problems while creating a toy programming language. Today I’ll discuss static typing and type inference.

  • Double encoding: URI and HTML encoding

    URL’s have specific characters that are special, like % and & that if you need to use as part of your GET URI then you need to encode them. For example:

  • Adding static typing and scope validation into the language, part 1

    Continuing on my series discussing the language I wrote, this next post is going to talk about the basics of static typing and scope rules. So far my language implementation follows very closely to Parr’s examples in his book Language Implementation Patterns, which is what gave me the inspiration to do this project.

  • Configure all the things

    I personally think that just about everything should be configurable, unless it’s absolutely never going to change. Even then, make it configurable, because it may change in the future. Think about your favorite command line tools, and the extensibility they have. They’re powerful because they are dynamic. They can be configured for a myriad of options and scenarios.

  • A handrolled language parser

    In my previous post about building a custom lexer I mentioned that, for educational purposes, I created a simple toy programming language (still unnamed). There, I talked about building a tokenizer and lexer from scratch. In this post I’ll discuss building a parser that is responsible for generating an abstract syntax tree (AST) for my language. This syntax tree can then be passed to other language components such as a scope and type resolver, and finally an interpreter.

  • Tech talk: Bloom Filters

    Each Thursday at work my team and I do a 45 minute to an hour discussion on any technical subject that we find interesting. We call these Thursday get togethers tech talks and I think they are awesome. We’ve been doing them for years and I’m hoping to start reposting our subjects and a blurb about our discussions each week after they happen.

  • Event emitters with success and fail methods for node.js

    When it comes to node.js you hear a lot of hype, good and bad, so I’ve finally decided to take the plunge and investigate for myself what the fuss is about. So far it’s been interesting.

  • Building a custom lexer

    As a software engineer I spend all day (hopefully) writing code. I love code and I love that there are languages that help me solve problems and create solutions. But as an engineer I always want to know more about the tools I work with so I recently picked up “Language Implementation Patterns” by Terence Parr and decided I was going to learn how to build a language. After reading through most of the book and working on examples for about 5 weeks I ended up building an interpreted toy general purpose language that has features like:

  • Thread Synchronization With Aspects

    This article was originally published at tech.blinemedical.com

  • IxD 2013: Rhythm, Flow, and Style

    This article was originally published at tech.blinemedical.com

  • IxD 2013 - Production ready CSS workshop

    This article was originally published at tech.blinemedical.com

  • K-Means Step by Step in F#

    This article was originally published at tech.blinemedical.com

  • Tracing computation expressions

    This article was originally published at tech.blinemedical.com

  • Reading input in F#

    This article was originally published at tech.blinemedical.com

  • Debugging piped operations in F#

    This article was originally published at tech.blinemedical.com

  • RESTful web endpoints on Netduino Plus

    This article was originally published at tech.blinemedical.com

  • Async producer/consumer the easy way

    This article was originally published at tech.blinemedical.com

  • Dropped packets with promiscuous raw sockets and winsock

    This article was originally published at tech.blinemedical.com

  • Run with real data

    This article was originally published at tech.blinemedical.com

  • Inter process locking

    This article was originally published at tech.blinemedical.com

  • A collection of simple AS3 string helpers

    This article was originally published at tech.blinemedical.com

  • Handle reconnections to signalR host

    This article was originally published at tech.blinemedical.com

subscribe via RSS