PayScale HackDay v0.12
What is hack day? The tech team spends a day each quarter working on whatever they want, turning up the music, eating good food, then voting. Sometimes projects are about taking risks and experimenting or learning a new technology. Often times projects turn into features in production that our customers use every day and I expect that some of these will too.
Rob: Graph Databases
I’m working on a very compact an efficient form for representing profile data, including title rollups (ours and ONet), industry hierarchy, etc. The idea is to use a graph. In this model, the title, industry, skills, certs, only existing once in the database. This means that our profiles are order(s) of magnitude smaller and very efficient in terms of querying things. The following simple database uses Neo4J graph database and has two profiles (96, 97) which has skills, job title, title has rollups both ours and ONet. Some new things that we can’t do is that the title itself has a relationship to skill and the employer relationship contains the number of years you worked for that employer.
Queries are very simple:
- How many people have the skill java:
MATCH (p:Profile)-[:hasSkill]->(s) WHERE s.name=”Java” RETURN count(p)
Which is very SQL like. I hope that we could actually load the entire profile db into memory on a reasonably sized machine using this approach. Still lots of details to work out.
- Oh and you can do crazy queries like what relationships exist between job titles and other entities.
AdamA: CAPTCHAs for matching job titles and tasks
I worked on prototyping a CAPTCHA that requires users to appropriately match job tasks with job titles in the system. The theory is that we can leverage our users to perhaps find more appropriate tasks for a job title than we otherwise would through task writers, while simultaneously slowing down spammers and other abusers of our site. Much like other CAPTCHA systems, it uses two titles, one for control and one that is unknown. With enough data, we could see trends with a large enough sample that may show that that a new task might be applicable to a job title that we haven’t already created.
The prototype used Node.js with the Restify framework and MongoDB in the backend, and Knockout.js for the frontend. If we decide to implement it in production, we could expand our internal portfolio of F/OSS technology by having a service that provides this CAPTCHA and runs independently of our MS stack. It still requires a bit of work to create the analysis tools to find out if a task is good, and also just determining if the task matching task itself is too onerous for users compared to a normal CAPTCHA.
Ryan: Data Dashboard
This application allows you type in a job title and out will come the number of profiles over the past two years for the specific title, the rollup the title is in, the number profiles the entire rollup has, the total number of model profile counts, the entire task list, and the number of reports run for the Rollup in PSP over the last year.
Basically, this keeps me from having to open the datadashboard files and from cycling through all of Scott’s export reports over the last year to find usage.
AdamP: PayScale Live Map (fail) then PayScale Glossary (success)
My first attempt this hackday was to rebuild the “live tracker” of profiles coming in. The plan was:
- Hook into the profile object, and when it’s saved (and marked as “finished”) push it to a queue in RabbitMQ
- Build a small node.js app that read items off of the queue
- The app pushes the messages down to the client via a web socket
- The client puts that profile onto a map
My second idea was to build a centralized “glossary” for PayScale. We have a lot of terms that (a) we’ve invented, or (b) aren’t common vernacular. We now have a place to store those definitions, and link to them. This is what it supports (or will, by the time we release it):
- Add definitions
- Show definitions
- Search definitions by synonyms or key word
- Request via JSON (to use in mouseovers)
Alex: Modernizing the front-end build pipeline and architecture
The problem: Creating a beautiful, easy to use, feature rich site is challenging in part due to using older tools and approaches to building our site. Let’s take advantage of the great tools out there to let us more forward. These don’t necessarily replace what we have but instead complements it.
(Aside: The “front-end” roughly refers to all the code that our customers probably think of as the product: the user interface, the user experience, and manipulating data. The “back end” tends to refer to how we store and process our data, such as services for job title matching and salary calculations.)
Joe: PayScale Student
I made an app for students to explore career possibilities. PayScale Student tells you the maximum monthly payment you should make on a student loan for your planned job and location. You can use this figure to determine if your college choice and student loan is a good investment.
Emmett: Protocol Buffers (to speed up accessing our data)
I did a prototype implementation of using “protocol buffers” for storing our report objects. Our report object is generally how we store all report data for Consumer, the Research Center, our internal analysis tools, and Insight.
- Protocol Buffers are a Google project for fast binary serialization. You create a definition file for your type, then code-gen classes for reading/writing that type in any language – C#, python, etc.
- Using a sample of 322 TableReport objects, I was able to see load times drop 57%.
- However it uses 4.4x as much storage than gzipped xml.
Eve – Upload Problems
Some corners of our product are especially difficult for our support team and customers. My hack was to make the darkest/hardest corner better. But, it involves a lot product details that we can’t share.
Jesse – Big Data Experiments
I did something that probably shouldn’t be talked about externally. Regardless, it’s cool, and our customers will love it.