Mar 23, 2023

Create a playground with a custom bin/console script

When I was first learning Ruby, I was amazed at the power of Pry. In fact, Pry is what let me comprehend object-oriented programming. It helped me see how any point-in-code understood the rest of the world. It helped me explore what was possible by making available methods, variables, documentation and source code immediately present.

In addition to being an explorative learning tool, Pry is also a potent “get-stuff-done” tool. Ruby’s design makes it a killer tool at the command line. The APIs are robust, the syntax can be terse and flexible, and it’s designed to be chainable, which makes it easy to iterate against.

Often times when I need to do data work, I’ll hop into a Pry or IRB shell. I’ll connect to our SQL Server DB with Sequel, or make a series of requests with HTTParty or Ferrum. Ruby’s potent Enumerable module makes whipping data into shape a delight.

Recently I was introducing a colleague to Ruby for a data extraction task. While fluent in Python, he was generally new to Ruby syntax and the style of hold-onto-your-butts we’re-doin’-it-live exploration and development tools like Pry can provide. Asking him to run pry and set up the environment to hack felt like a burden. Lots of things to remember for someone not used to the syntax.

Enter: bin/console

I realized that I could set up a comfortable playground to explore our problem by creating a helpful entrypoint. Instead of starting from scratch with pry or irb each time, I could provide a richer experience with information and helpers. Just like a good-citizen in the CLI world will provide the helpful --help option, bin/console can prepare you for hacking. It can provide information and examples to set you up for success, and set you up in your environment with all of your favorite tools.

more >

May 13, 2021

Better Testing and Organization for 1-off Block Transforms in Kiba

Kiba provides flexibility when defining transformations for your data. Common, reusable, configurable data transformations can be set up as Class Transforms. They are isolated from your pipeline definition, and as such, easily testable. Your pipeline code tends to be easier to read, as it just describes what it’s going to do, leaving the implementation behind the scenes.

Sometimes, you need something that is extremely specific to the pipeline, and would be harder to reason about in a separate class or file. Or maybe you are prototyping, and not really sure what things look like yet. Or, let’s be honest, maybe this is just a one-and-done throwaway script. For this, you can use Block Transforms. Block Transforms are very convenient, but can leave your pipeline definition hard to read. They also are harder to test, since they are inlined into the pipeline definition.

Tonight I came up with a strategy that makes those 1-off block transforms more testable, and the pipeline as a whole easier to read, while still not affording them the pomp and circumstance of their own file. In addition to Class Transforms and Block Transforms, by leveraging proc.to_block, you can also define transformations with lambdas or methods.

I’ll take a moment to caution here: I haven’t actually used this yet outside of prototyping and exploring for this post. I wanted to get it out on paper to force me to think about more. Judge for yourself if it’s helpful, I’m biased by the excitement of discovery right now.

more >

Mar 18, 2020

Yield Multiple Rows with a Block Transform in Kiba

Kiba recently introduced the ability to yield multiple rows in a transform. This is great when you need to explode rows. For example, I get a lot of Excel files where the values are comma separated values. Aye!

# Given an incoming row like:
{
  url: 'www.example.org',
  zip_codes: '55802, 90210, 10108'
}

# I want to process three separate rows:
[
  {
    url: 'www.example.org',
    zip_code: '55802'
  },  {
    url: 'www.example.org',
    zip_code: '90210'
  },  {
    url: 'www.example.org',
    zip_code: '10108'
  }
]

A limitation is that it must be a class-based transformation, and not a block-based transformation. Sometimes this isn’t practical. I’ll show you how to snap this technical limitation into pieces with a small helper transformation!

more >

Sep 26, 2019

Kiba ETL Patterns and Moves

I’ve been working with Kiba ETL a lot. I love this tool. It doesn’t do much itself, and it’s premise is so simple. It’s a DSL that asks for objects to implement a very simple interface to orchestrate an ETL pipeline. Sources should enumerate data with #each, transformations are performed by running an object through a #process method, and destinations get #writen to. Kiba helps you compose many elements together in a way that is very nice to work with and extend. If you aren’t familiar with it, I’d love to recommend some of Thibaut’s content: https://github.com/thbar/kiba#useful-links

I wanted to jot down some of my moves and patterns for future reference. Each of the examples below are self-contained, and generally use the kiba and awesome_print gems.

Since it’s a long post with lots of code, here’s a quick table of contents:

more >

Nov 9, 2018

Reflection on RubyConf

I’ve been studying Ruby for four years now, and use it as my secret weapon for getting all sorts of odds and ends done. As a scrape engineer at work, I am often working outside of our established product pipeline and implementing whatever technology I need to get my work done… but the rest of the company lives in a Windows world. All of our products and servers have roots and branches in Microsoft products: Windows Server, .NET, and SQL Server. This is to simply say that Ruby, and Linux servers don’t exist in my company’s world.

more >

Aug 7, 2018

Using Simple Delegator for Page Objects

I manage a lot of web scraping where I work, and have stumbled into a pattern I very much enjoy using to maintain page parsers. This technique is a variant of the page object pattern, which helps represent web pages as full-fledged objects, aware of it’s own capabilities. It uses a tool in the Ruby standard library called SimpleDelegator that helps keep sharp focus on those capabilities, while maintaining the generic features of your parser.

more >

Mar 7, 2018

GitLab "Push to Create New Project" and a shortcut for Bash and Windows

Recently GitLab announced a really fantastic feature: Push to Create New Project. I was excited for this feature because it’s always a buzzkill when I’m about to commit+push a fresh project, and suddenly realize “This push is about to fail because I didn’t let GitLab know about this yet.” And so, tonight I upgraded our GitLab server from 10.3 to 10.5.

The command is a bit verbose, and very samey because it references your full GitLab path, and you weren’t able to just copy it from the root page. But your shells have you covered.

more >

Feb 27, 2018

Using a proxy with Watir + Chrome Remote

Getting a proxy to correctly register with Watir + Chrome can be difficult, as examples in the wild don’t always match the current reality, and documentation is a bit willy-nilly. The documentation for Watir says that you can pass a switches option to set the proxy in the Chrome startup arguments. This works when using Chrome directly, but does not when connecting to remote Selenium + Chrome with :remote (or, as :chrome but with a :url specified).

After a lot of trial and error, I got it sorted using the Selenium::WebDriver::Proxy object.

more >

Sep 5, 2017

A solution to ensure a string ends with X number '0's in SQL Server.

This is a very specific problem and solution, so it’s helpfulness is questionable. It very likely makes bad data even worse. However, at times the things that are right and just in this world are not what needs to come out the other side of the machine at the end of the day.

While working in a SQL Server sproc that generates data for a client, I needed to:

Ensure this variable length nvarchar value has at least 4 trailing zeros. The given values had their 0s chopped off at times. And sometimes it’s not all of them.

(Dear God, why?)

Because the product codes are variable length, I can’t rely on the standard “pad this string until it is length 10” idea.

Here’s how I’m working through it right now.

more >

Feb 22, 2017

A Quick Story of Tiny_TDS, Sequel, Ruby, MS SQL Server, and NVARCHAR(MAX)

Last night I spent hours debugging a situation where a result from DB['select huge_nvarchar from table'] was being truncated to 32256 characters.

I tried searching for people having this problem tossing all sorts of combinations of %w( tiny_tds sequel ruby sqlserver mssql nvarchar truncate 32256 ) about and got nowhere. Even things like nvarchar 32256, which I assumed would orient me in a new, fruitful direction led nowhere at all. These terms just don’t live together on m/any pages.

more >

Dec 3, 2016

MS SQL Server on Linux in Docker -- neat

Today I was looking into a .net core app we run on Windows, and wanted to see if I could create some disposable test / review app environments using Docker. In short, it didn’t work great because the .net core app uses Sentry/Raven for c#, and that library is not yet compatible with .net core – so we still have a 4.6 dependency that I had forgotten about. Bummer.

However, I got pretty close to standing it up, and I wanted to drop a few notes here for future me.

more >

Sep 22, 2016

Just a simple quick thing. (Okay, it got long.)

I’ve been learning and using Rails for about a year and a half now. I’ve built a lot of little internal Ruby apps and Rails web apps for managing our data, and getting some mustard cut. This week I earned a new achievement: I’ve launched a public facing web app for our company. It’s nothing to really link to, since it’s still an internal app for certain types of data management and reporting, but it’s hosted on the other side of my firewall (and it’s terrifying).

more >

Mar 7, 2016

Quick Tip: Bring upstream changes into your feature branch quickly

git pull --rebase origin master

And for advanced sorcery:

git config alias.update 'pull --rebase origin master'

Now you can just hit git update.

more >

Mar 2, 2016

Just a couple quick links to encrypt all the things

I’ve been using the free StartSSL service for my SSL certs for a few years now. A colleague recently reminded me of a newer service that aims to make the process of obtaining and renewing a certificate much easier. I don’t have time right now to write a big post about how to set up your web server to use SSL, but if you haven’t done it before, or generally used encryption keys before, there’s a lot of lingo to learn before you can really dive in. In addition, many tutorials have “their way” of doing it, which leaves out the idea that it isn’t the “only way”. When you go to those other tutorials, you are left thinking “Wait, but I thought I needed to…”

more >

Feb 12, 2016

I feel a bit like a magician deploying my rails app with Docker behind an NginX reverse proxy container.

This week I ended up propping up my first “Other people are going to use this application” rails app in production mode at work to help with normalizing and mapping some really ugly data. I’ve built a lot of half-baked tools for my own personal usage, but nothing yet that I’ve been comfortable or confident with to ask others to use it.

more >

Feb 5, 2016

GitLab - Copy your backup to a mounted drive

By default gitlab-rake gitlab:backup:create creates a backup .tar at /var/opt/gitlab/backups.

If you change that folder to a mounted cifs partition, you end up with broken .tar files (“Error File changed while writing .tar”)

Alternatively, you can configure fog (included with Gitlab by default for exactly this purpose) to instead copy the backup to a directory after it’s been assembled:

# /etc/gitlab/gitlab.rb
#####

# ...
# Upload the backup using the Fog library
gitlab_rails['backup_upload_connection'] = {
  'provider' => 'Local',
  'local_root' => '/mnt/gitlab_bak'
}
# ...

I’ve come across trying to solve this problem a few times on my own installations, and also in some questions in the wild – leaving this here for future me to remember what I’m talking about.

more >

Nov 25, 2015

Docker - I got it.

I finally took the time to grok Docker. It took a day to get it, but I’m slinging containers like a Tupperware party now.

Docker: Build an environment one step at a time, with each step being committed with a diff commit. When you’re done, execute a single process to run in your new environment.

Docker-Compose: Link a bunch of awesome things together with magic. Real, actual magic. Orchestrate a whole symphony of services that can talk to each other with simple hostnames like “pg” and “redis” without having to actually configure anything at all.

Today I went from having no clue why I keep hearing about Docker, to learning about developing a rails app in a docker container. Cool! http://blog.codeship.com/running-rails-development-environment-docker/

more >

Oct 29, 2015

Gitlab CI. I can't believe how easy this is.

GitLab Continuous Integration always intimidated me. It used to be a separate server, which also seemed to required other separate servers. I’ve had GitLab running for about a year, but was always terrified of how much more server management trying to implement CI looked like. These days (post ver 8), the CI server is baked into the omnibus installation of GitLab. I was still hesitant because there were all these “runners” to worry about. All of the documentation just said “set up a runner and go!” but it didn’t make sense. What is a runner? Does it need it’s own dedicated server? It sounds like a lot more extra management.

more >

Sep 8, 2015

Rails: Turning Complex HTML Container Snippets Into Partials

Today I struggled trying to understand the Rails Way of cleaning up some of my boilerplate styling code in my view. The terms “partial, layout, template, helper” all get tossed around a lot, but few of the examples hint towards creating your own html helper tags – condensing complex HTML containers into simple ruby expressions.

more >

Jul 3, 2015

Rails + Postgres + Vagrant

I’ve struggled immenslely trying to get a Vagrant provision that would successfully set up Postgres so that I could jump into a rails project. I don’t know why it was so hard, but it was. People insisted on Puppet, Chef and Ansible. I don’t understand why generic shell scripting isn’t more popular for the Vagrantfile. None of these solutions resulted in a database that I could hook up to with rails.

more >

Jun 30, 2015

Vagrant VMs for development

Today I discovered how to use Vagrant. I’ve seen the term tossed about here and there, and gave a quick glance at it – even installed it – but never actually dug in. There were a few key concepts that weren’t upfront that made me uncomfortable with how it works.

more >

Jun 23, 2015

Today's Mountains

I dove back into a Ruby/Rails project I started about a month ago after reading a great book on Rails. It got me really excited to jump in, but I quickly lost myself the moment I wanted to bend away from the tutorials. The project I was working on today was a basic management tool for helping me manage various web services, the providers they live on, the projects they are attached to, and other similar connections.

more >

May 27, 2015

DEVIntersection May, 2015: asp.net 5 / vnext has literally blown my mind.

I work in a Microsoft shop. Mostly dictated by the fact that we use MS SQL Server for everything, and because a lot of the code practices started many years ago, and haven’t really been updated. In the past, I’ve learned the basics of Python, Ruby, and some others and have come away feeling strong distaste for the .net world I live in. Things seem so much easier, and more programmer-friendly. That’s not just the Ruby Marketing Mantra talking – it truly is a joy to program in. Aside for lack of syntactical sugar, my biggest gripe is the licensing walls and requirements of “Windows this, IIS that.” I really despise not being able to toss together a quick linux VM for playing around with code, and being able to deploy small, individual tools to my intranet infrastructure without having to upgrade an entire Windows system to do it.

more >

May 6, 2015

Documentation for the Beast

Documentation

This week we have been discussing hiring some new people in the extraction department. Our current documentation situation is poor, at best. In the past few months, I’ve been experimenting with a GitLab installation to function as part code repo, and part documentation host. It’s not exactly tuned to be a documentation host, but it’s easy to use, and readily available. I appreciate the built in wiki features, and easy-to-adopt .md markdown language.

more >

> That background, tho...

Posts

Create a playground with a custom bin/console script

Enter: bin/console

Better Testing and Organization for 1-off Block Transforms in Kiba

Yield Multiple Rows with a Block Transform in Kiba

Kiba ETL Patterns and Moves

Reflection on RubyConf

Using Simple Delegator for Page Objects

GitLab "Push to Create New Project" and a shortcut for Bash and Windows

Using a proxy with Watir + Chrome Remote

A solution to ensure a string ends with X number '0's in SQL Server.

A Quick Story of Tiny_TDS, Sequel, Ruby, MS SQL Server, and NVARCHAR(MAX)

MS SQL Server on Linux in Docker -- neat

Just a simple quick thing. (Okay, it got long.)

Quick Tip: Bring upstream changes into your feature branch quickly

Just a couple quick links to encrypt all the things

I feel a bit like a magician deploying my rails app with Docker behind an NginX reverse proxy container.

GitLab - Copy your backup to a mounted drive

Docker - I got it.

Gitlab CI. I can't believe how easy this is.

Rails: Turning Complex HTML Container Snippets Into Partials

Rails + Postgres + Vagrant

Vagrant VMs for development

Today's Mountains

DEVIntersection May, 2015: asp.net 5 / vnext has literally blown my mind.

Documentation for the Beast

Documentation