Category Archives: Default

Roemer paper on Kantian cooperation, part 2: tax policy implications

As the previous post noted, John Roemer has a paper and book asserting that we should think about real world prisoner’s dilemmas in light of the possibility that people do not always act like selfish Nashian optimizers, but may instead base their decisions on the Kantian question of what decision would be best if adopted by everyone who is facing a given choice. Hence the classic prisoner in the dilemma may hold out rather than confess and implicate his colleagues, and an individual may choose (say) to recycle on the view that it’s better for everyone than no one to do so.

From this starting point, Roemer constructs a model in which universal Kantianism would lead to Pareto-superior outcomes that were better for everyone than the set of such outcomes that would result from selfish Nashian optimization. So neoclassical economics (with selfish optimizers but perfect markets) is not the only route to maximizing efficiency. And, in his model, efficiency and equity aims no longer need be in conflict. For example, in his tax instantiation, there will be no deadweight loss whether the labor income tax rate is 0 percent, 100 percent, or anywhere in between, because will ignore the tax rate in deciding how much labor (yielding such income) to supply.
How does he get there? Let’s start with a hypothetical society in which (a la Mirrlees) the government simply levies a labor income tax to fund a demogrant. Suppose initially that everyone has the same (1) “ability” or wage rate, (2) preferences, and indeed (3) labor income. Only, one of these individuals is now considering increasing her labor supply, thus increasing as well her income and her tax liability.
If she’s a selfish Nashian optimizer, she’ll evaluate the choice in light of the fact that she’ll only get to keep the after-tax income. While it will also increase the revenues that are used to fund the demogrant, her share of that, in a large society, is trivial.
But now suppose she’s a Kantian. She’ll ask herself: How I would be affected if EVERYONE increased labor supply, and thus income, by this amount. The answer, given the society’s assumed homogeneity, is that the tax and demogrant would be exactly equal. E.g., suppose that, in a 10-person society with a 100% tax rate, she earns an extra $100, but the other 9 members do so as well. She’ll keep zero of the extra earnings after-tax but pre-grant, but her grant will go up by $100 (i.e., one-tenth of the newly generated $1,000).
So she bases her labor supply choice on pre-tax income, reflecting that with the demogrant (at any consistently applied tax rate, but everyone’s doing the same thing) the tax and demogrant will be a wash. By contrast, the selfish Nashian optimizer would ask: What if only I increase my income by $100? In this 10-person set-up, I get to keep only $10 from the increased demogrant.
Again, the Kantian’s motivation is not that maybe other people will in fact work more, too, if she does. Rather, it is that the moral, cooperative way to think about the question is to ask: What would be best if everyone made the same choice as I do?
Now let’s add heterogeneity to the picture. At first, just in wage rate. Suppose she can earn $100 an hour, while all other members of the society, being less “able,” can earn only $5 an hour. In John’s model this makes no difference, because what she asks herself is: What if everyone worked long enough to earn $100 more? Then I would still break even from the tax plus demogrant, and so pre-tax income is the right metric.
What if we add in heterogeneous tastes? Then two individuals who are both Kantians may make different marginal choices. One deems the disutility of working more to be adequately offset by getting to consume more, as determined based on pretax income. The other likes leisure more and market consumption less, so she decides not to work more. We end up getting a transfer, at the margin, from the first of these individuals to the second. This result, while not rebutting the claim of Pareto superiority from the system, strikes me as a bit perverse, in the sense that we are transferring $$ from one who subjectively values them more at the margin compatred to leisure, to someone who subjectively values them less in this sense. But John isn’t claiming that the system yields overall welfare maximization or otherwise defined optimality, and he regards its effectively decentralizing decision-making from any central planning function to all of the individual workers as a virtue.
What if we now change the model so that the government is funding goods and services with its tax revenues, rather than demogrants? This doesn’t change things fundamentally, although it’s true that Kantians’ possibly varying beliefs regarding the benefit derived from government spending might result in differentiating their choices. But it does seem that here “pretax income” becomes a less precise statement of what the Kantian will be evaluating when making a marginal labor supply choice.
This brings us to the question from my post earlier today of exactly what sorts of questions the categorical imperative might be thought to demand that we ask. Again, “cooperate vs. defect” is easy; “how cooperate” less so. But an eminent NYU philosopher once told me that he personally decided on whether, say, to accept a consulting engagement, for which he was being offered $X, based on the pretax amount, not the after-tax amount. This appeared to reflect a Kantian feeling that it was morally wrong to look only at the after-tax amount, given that the tax payment wasn’t being lost – it was merely being transferred from his individual pocket to the collective one.
That strikes me as a more salient and intuitive way to think about Kantian labor supply behavior than doing so in terms of “efficiency units in labor supply,” in the manner of the Roemer paper. But why might we expect anyone to think about pretax versus after-tax income even in that way? Are people Kantian enough to do that, even assuming that it captures how they would be Kantian?
The paper notes evidence that tax compliance is higher than it “ought” to be given people’s actual economic incentives (at low audit levels) and risk preferences as otherwise discerned. And the tax compliance literature extensively shows that “tax morale” – reflecting, for example, perceptions regarding others’ compliance behavior, the tax system’s “fairness,” the overall political system’s fairness, and so forth – can have a major impact on compliance behavior, even holding constant the actual “audit lottery” odds (given penalties as well as audit levels).
So the extent to which people act as Kantians by focusing on pretax income, when they make labor supply choices, might likewise reflect considerations analogous to morale in compliance. But I’m not certain that they do, since for me there’s a lot of context-specific sociology to the question of how people who have non-zero Kantian inclinations will interpret the demands of a taste for cooperation in practice. There’s no reason to think that they (or I) do so in a universal and logically consistent fashion.
But if a degree of Kantianism here is plausible, then one might be able to reduce labor supply elasticity by addressing morale-type considerations about social solidarity, faith and trust in government, and others’ willingness to overlook tax planning considerations and focus on pretax income.
One last set of questions potentially raised by the paper goes to fleshing out how a Kantian might think about all the various choices that we face in making tax planning decisions. “Work more and thereby earn more” is only one possible choice. One could also try to apply the reasoning, say, to lawful tax avoidance (ranging from the clearly “intended” to the arguably “unintended” even if efficacious). But for now I will leave them to the reader to ponder, if he or she likes. 

Tax policy colloquium, week 4: John Roemer’s “A theory of cooperation in games with an application to market socialism”

Yesterday we were pleased to have John Roemer as our speaker, discussing this paper and his related forthcoming book: How We Cooperate: A Theory of Kantian Optimization. The basic thesis is intellectually important, and likely to get some attention from economists, as well as from philosophers who are willing to look over the walls of their silo, so I will discuss it in general terms here, before turning to the tax aspect that caused it to be a good fit for us in the Tax Policy Colloquium (where, of course, each week can be totally different from the ones before and after).

Prisoner’s dilemmas are pervasive in public policy. One gets them whenever there are positive or negative externalities that no institutions (be they Coasean markets or Pigovian taxes and subsidies) adequately address.

Pollution and over-fishing are among the classic examples. E.g., if I want to drive my car a lot, run the heat and AC to the max, etc, but everyone’s doing this causes catastrophic global warming, then, from a selfish standpoint, the best thing would be if everyone BUT me curtailed their activities suitably. But, given my individually trivial contribution to the overall problem, I’m best off defecting whether or not everyone else is cooperating, absent sanctions or other ways of internalizing to me the marginal cost of my causing carbon release.

With selfish players, a one-shot prisoner’s dilemma has a simple Nash answer: everyone defects, so everyone loses relative to the case where everyone cooperated. While there may be real world mitigating solutions, such as repeat play with sanctions from the other players, wouldn’t it be nice if people were willing to cooperate voluntarily, despite the selfish unilateral incentive to defect?

John answers: Not only would it be nice, but we do in fact frequently cooperate! So the Nashian view of people as always selfishly pursuing just their own welfare is inaccurate. Indeed, evolution has yielded in us a species that is unusually, and among the great apes uniquely, inclined towards cooperating with each other under suitable conditions (such as where we feel solidarity and trust towards fellow group members).

While sanctions for defection may plan an important role in preserving cooperative non-Nash equilbria, they’re not the only reason we cooperate. Nor is altruism the main reason, as it tends to be limited to a much smaller core group (such as immediate family) than the set of people with whom one is willing to cooperate.

John also finds it largely unhelpful to posit exotic preferences, such as a “warm glow” achieved subjectively by cooperating, as the explanation for the behavior. It seems to him both too hand-tailored (like Ptolemaic epicycles to reconcile celestial movements to data) and backwards, in the sense that I don’t cooperate to get a warm glow, even if I in fact get one from cooperating. I cooperate because I believe it’s right to do so.

While I see his point here, I think the “warm glow” framing is intellectually helpful for a particular reason. Even if I cooperate because I think it’s right to do so, and that this differs from eating chocolate because I think it tastes good, real-world cooperators are likely to be trading off their desire to cooperate against other things they care about. Suppose I recycle because I think it’s right to do so, not because the city might find out and fine me if I don’t. I still would likely start recycling a lot less if, say, it took several hours a week.

John says that those who cooperate, rather than defect, in prisoner’s dilemmas are generally being Kantians, as I’ll discuss shortly. But while the paper we discussed yesterday doesn’t discuss Kantianism that’s limited by one’s trading it off against selfish preferences, it does discuss conditional Kantians – that is, those whose willingness to beuave cooperatively depends on how prevalent they believe cooperative behavior is in the relevant population. (See Figure 1, at page 33 of the paper, for a visual depiction of an equilibrium at which the % actually cooperating equals the % that are willing to cooperate at that level of cooperation.)

I gather that philosophers have questioned this set-up, saying you aren’t actually a Kantian if you’re being conditional about it. While this is true as a matter of definition, once one has defined Kantians as they choose to, it is intellectually unhelpful, and would appear to be an instance of narrow-minded and retrograde siloing (an inclination that I’ve encountered from other disciplines, in my project on literature and high-end inequality).

Returning to prisoner’s dilemmas, a Kantian who faces one may ask: What is the decision that would be best if ALL of us made it? With the classic PD structure, the answer (of course) is Cooperate, don’t defect. So the Kantian does what would be best if all did it, simply because this is the right thing to do, and not based on any actual presumed effect of one’s own decision on what others will decide. So the Kantian (for example) recycles – and, I would think, also considers following a code with respect to carbon emissions that, if universalized, would properly curtail global warming and other adverse climate change.

But how does one identify the proper Kantian course of action? In a simple prisoner’s dilemma set-up, it’s obvious, since there are just two choices, Cooperate and Defect. Maybe one should think of recycling that way. As to global carbon abatement, it’s not as clear, not to mention that the motivation to cooperate (even assuming one can determine how) will be weaker if one is among John’s conditional Kantians.

John notes that many people do in fact recycle, beyond the point that sanctions and conventional incentives would seem to be inducing. There may also be a bit of Kantian behavior around carbon abatement. For example, while I am sure I do not do nearly enough in that regard, or as much as I would do if I were responding via standard incentives to a global carbon tax that had been set at an appropriate level, it is something I have in mind, and that induces me to disfavor what I feel is overly wasteful behavior. So yes, I am, upon a reflection, somewhat of a Kantian, albeit a conditional one both in John’s sense of being influenced by what I think others are doing, and my sense of trading off my preference for doing what is right in the Kantian sense against more selfish considerations.

In calling my own behavior Kantian, however imperfectly so, I am agreeing with John about the underlying psychology. Whether or not the categorical imperative is exactly the right formulation, the underlying sentiment of fairness does appear to me (from self-reflection) to have something to do with symmetry and consistency between what people do for themselves and expect from others. And in my case, but I suspect for many other people as well, a lot of it is driven by notions of reciprocity. I neither want to be a sucker, who cooperates when everyone else is defecting, nor a jerk, who defects when everyone else is cooperating. This gives psychological appeal to conditional Kantianism. And it’s not just me, if tit-for-tat sentiments, embracing both the good and the bad, are more generally intuitive.

But what does all this have to do with tax? I’ll address that in a separate post.

Kantian background to discussing John Roemer paper

In my previous post, I set at 80 percent the probability that, at yesterday’s NYU Tax Policy Colloquium discussion of John Roemer’s A Theory of Cooperation in Games With an Application to Market Socialism, I would “end up recounting the tale of the unfair bad grade (worst of my career) that I got as a freshman on a Kant paper.” These subjective odds reflected that the story, which reading the paper had helped to return from long hibernation to the forefront of my mind, actually relates to issues of prime interest that the paper raises.

As it happens, I didn’t end up recounting the story either in the AM class or at the PM public session, as it would have taken too much airtime. But I’ll indulge myself by leading with it here, before turning more particularly to the paper in a follow-up post.

It’s September or perhaps very early in October 1974, and I’ve recently arrived at Princeton University as a 17-year old freshman. (I later ascertained that 94% of the class was older than me – this in an era when 18 was the legal drinking age and there was a on-campus student pub at which you’d be carded.)

Having both a competitive nature and a family background that placed intense value on “intelligence” and academic achievement, I was eager to rate myself against the field, as well as judge myself against demanding self-expectations.  I also made a point from the start of taking classes in which there were frequent student papers, because I liked writing, along with the greater control over content that they offered relative to answering exam questions.

The first short paper I got back, presumably in history or political science, came out in accordance with my self-demands. But then came the second one, in Intro to Moral Philosophy. This was a lecture course taught by Thomas Scanlon, but my “preceptor” (as they called the leaders of the weekly small-group seminar meetings) was a graduate student in the philosophy department whose name I still recall.

This paper’s subject was Kant, and more particularly the categorical imperative, which might be stated (per Wikipedia) as follows: “Act according to the maxim that you would wish all other rational people to follow, as if it were a universal law.”

Intellectually unformed though I then was, I realized that, in interpreting it, one faces what I might today call a “level of generality” problem. The example I thought of was as follows: While it DOES mean, say, that I shouldn’t lie because if everyone lied we would lose the ability to have the truth believed, it surely DOESN’T mean that I can’t go to the Wawa Market on Alexander Street at 8 pm, on the ground that no one could go there if everyone tried to at the same time. So, in attempting to apply the categorical imperative, there is a broader issue, which may have no simple or obvious answer, regarding the level of generality at which one should state the maxims that one is testing for rational consistency.

To this day, I don’t think that’s bad for what was presumably a 2-page (or at the most 5-page) paper in an undergraduate Intro to Philosophy class. But I got it back with a grade of C+ and some sort of peremptory, even angry or at least disgusted / impatient, scrawl – which might as well have been in crayon – to the effect of: No, that’s wrong, that’s not what the categorical imperative says. No effort beyond that to engage or explain where or how the grad student thought I had gone wrong.

These days, when a student gets a poor grade and comes in to see me, I’ll try to reconstruct the reasons for it (if it’s an exam that doesn’t have comments like a graded paper), but I’ll also say very strongly if this appears to be among the student’s concerns: This DOESN”T mean you’re a bad student, or not good at law or at tax, etc. – it’s just a thing that happened one time in terms of answering one question that might have been either well or poorly chosen (and then graded) by me.

But I didn’t have the older me to tell me this at the time, nor did I go talk to the graduate student, towards whom I now felt hostile. (Plus, I knew it was generally bad form to complain about grades.) What I should have done, of course, is go see Scanlon – not to complain about the grade as such, but to get broader dialogue and feedback, but the thought of doing this never occurred to me. I think I viewed him, through no fault of his own, as too far removed and remote from me.

Taking the whole thing far too seriously, I was shaken by the grade, which hurt my self-confidence (hence, I told no one about it at the time), even though I felt that it was misguided, unfair, perhaps biased for some specific reason that I couldn’t fathom, and stupid. I also concluded that maybe I wasn’t fated to do as well in philosophy classes as in other liberal arts subjects. I responded by working more diligently for the rest of that semester then I ever would again. (Once I had restored my self-confidence via my final fall 1974 results, I continued to take my schoolwork, for the most part, reasonably seriously, but I developed a tendency to prefer pursuing my own intellectual interests to those of a particular course or instructor.)

Anyway, the very interesting Roemer paper raises, among other questions, that of how good Kantians should frame the maxims that they are hypothetically universalizing in their minds. Depending on the context, the answer to this question is sometimes clear, but other times much less so.

Everyone has a favorite Kant story (or maybe not)

At next Tuesday’s NYU Tax Policy Colloquium, we will be discussing with John Roemer a paper on Kantian cooperation and (inter alia) tax policy.

I see about an 80% chance that I will end up recounting the tale of the unfair bad grade (worst of my career) that I got as a freshman on a Kant paper. Not that I’m still brooding about it or anything!

NYU Tax Policy Colloquium, week 3: David Kamin’s Effects of Capital Gains Rate Uncertainty on Realization

Yesterday at the Tax Policy Colloquium, my colleague David Kamin presented his paper (coauthored by Jason Oh of UCLA Law School), The Effects of Capital Gains Rate Uncertaintyon Realization. The piece capably addresses important issues that are known to be there but have been under-explored in prior literature.
The paper’s starting point is that, while one would expect capital gain realizations (and elasticities) to depend, not just on current CG rates but also on expected future CG rates, work in the field, including revenue estimates at different CG rates, has tended to under-appreciate how great the effect might be. It’s well-understood that a capital gains rate change has both short-term and long-term revenue effects, where the former might involve rushing to market before a rate increase, or the initial release of pent-up demand to sell where there’s a rate cut, but the issue merits modeling further out than that, and this is where the paper aims to add insight, such as by offering multiple models of how this might play out.

One core question, of course, is how we might want to get a handle on expected future CG rates, since this is a question of what investors actually anticipate. While this is unlikely to be a function just of the current rate and/or historical rates, two possible benchmarks, before one starts about thinking about, say, which party looks strong in the next election (and what their platforms say about capital gains or tax rates / taxation of investment more generally), would be as follows:

Random walk from the current CG rate: Under this view, whatever the rate is now, so far as one can tell (leaving aside any particular political information like that noted above), it’s just as likely to go up or down.

Historically bounded CG rate range: Under this view, we’ve learned from history the basic range within which we (or rather, investors) might expect CG rates to continue fluctuating. Say this runs from about 15% to 30%. So if the current rate is towards the high end or the low end of this range, there’d be some lean towards expecting it to revert towards or even past the middle.

Since the paper presents alternative models rather than advocating one particular approach, it’s open to and potentially consistent with both, but it gives particular attention to the latter.

Anyway, here are some of my main thoughts in response to the current draft:

1) New view with a twist – By reason of its focus on expectations regarding future rates, the paper brought to mind for me the so-called new view of dividend repatriations (and of  corporate dividends in the domestic context). This was a positive association for me, as I consider the new view, properly understood – rather than improperly misunderstood, as sometimes it is – as a truly central organizing idea.

Only, the issue discussed by Kamin and Oh is the new view with a twist, as I’ll explain below.

Okay, let’s start with the new view itself, as applied to the international / dividend repatriation context. Suppose that a resident multinational isn’t currently taxable domestically on certain foreign source income (FSI) that is earned through foreign subsidiaries. But let’s further suppose that, as under U.S. international tax law pre-2017 act, dividend or other repatriations of the FSI are taxable to the domestic parent. Then, at least in theory, the FSI is domestically taxable, but benefits from deferral, because the tax awaits the repatriations.

One might think it obvious that deferral lowers the present value of the parent’s domestic liability with regard to the FSI. After all, isn’t that what deferral within an income tax usually does? But the new view (dating from a 1985 paper by David Hartman that drew on earlier work, regarding classical double corporate income taxation, by the likes of David Bradford, Alan Auerbach, Mervyn King, and William Andrews) showed that under certain conditions this is false. More specifically, under those conditions, deferral does NOT lower the present value of the ultimate domestic tax liability.

Suppose we assume the following: taxable repatriation will take place at some point, the repatriation tax rate is fixed and will never change, and the after-tax rate of return that is available domestically equals that which is available abroad. Then deferral does not lower the present value of the ultimate domestic tax liability, with the further consequence that there is no tax lock-out: the system isn’t discouraging companies from repatriating their foreign profits. (Note of course that it is a separate question whether, say, publicly traded companies might be reluctant to repatriate by reason of accounting rules that have built up around the tax rule.)

To show this algebraically, suppose X is the amount of foreign profits that are waiting to be repatriated, r is the globally available after-tax rate of return, and t is the unchanging repatriation tax rate. Given the above assumptions, immediate repatriation, followed by domestic reinvestment of the funds for a period, leaves the taxpayer with X(1 – t)(1 + r).  Repatriating at the end of the period leaves the taxpayer with X(1 + r)(1 – t).

One way of explaining the equivalence intuitively is that, while deferral lowers the present value of the tax that would be due if X were repatriated today, the amount to be repatriated, and hence the amount of the tax given t’s fixed character, keeps growing at the same interest rate. So it is crucial to the analysis that this is a one-and-done tax: Once FSI is repatriated, its further domestic growth is not subject to the repatriation tax, only to whatever domestic income tax there might happen to be for all domestic source income.

When I referred above to misuses of the new view based on misunderstanding it, I have in mind treatment of it as a specific empirical claim – i.e., that there is no lockout under a deferral regime, because in fact the assumptions about r and t are actually (or even necessarily) true. But that is simply not the right way to view it or use it. Indeed, that would be on a par with claiming that the Coase Theorem purports to show that it makes no difference who owns a particular legal entitlement, or that the Modigliani-Miller Theorem (MMT) shows that it makes no difference how one uses debt versus equity financing.

It’s become well-recognized that the Coase Theorem, by showing that it makes no difference who owns the entitlement under specified circumstances (i.e., zero transaction costs, and where pertinent to one’s use of it no relevant endowment / distributional effects), doesn’t show that the thing at issue doesn’t matter – rather, it shows where one would have to look in order for it to matter. Likewise, what MMT shows is that, for debt versus equity to matter, it must have something to do with its underlying assumptions – e.g., no bankruptcy or tax implications, and no effect on agency costs under asymmetric information. So once again, what one actually learns is where to look, in assessing whether the thing at issue matters.

In the case of the new view, we learn that, for expected tax burdens under a deferral system to influence repatriation behavior, something about r, or something about t, must be doing the dirty work. Hence, one now knows where to look. So one is applying the new view – not “refuting” it – when one observes that U.S. companies became extra-eager to avoid repatriations in light of (a) the 2004 foreign dividend tax holiday, (b) the clear pre-2017 prospect that the U.S. corporate rate would be lowered from its then 35% level, and (c) the clear pre-2017 prospect that the U.S. would adopt dividend exemption without fully replacing the forgiven future taxes via a deemed repatriation that accompanied its enactment,

Anyway, back to new view with a twist in the Kamin-Oh paper. How do we face a different issue than we did under the international new view? Here’s a short list:

a) The capital gains tax isn’t one-and-done if you sell a capital asset and invest the proceeds in a new capital asset. Instead, it starts accruing all over again. Hence, deferral does offer time value benefits to the taxpayer.

b) Given the step-up in basis at death under Code section 1014, the tax disappears if one is willing to wait long enough, rather than being inevitable at some point.

c) It’s not as clear in CG tax policy as it was in international tax policy that the current rate was likely to go down, rather than up.

d) Suppose the rate is about to change, and you want to beat it to market. In the CG realm, there may be times when this is difficult. E.g., suppose you have a unique business asset that’s hard to value and for which is there a thin market of potential buyers. By contrast, in international, generating a taxable dividend from the foreign sub to the U.S. parent (which need not be funded out of loose cash already on hand) should generally not have been that hard.

2) Uncertainty versus optionality– This last difference brings to central stage an important distinction between two related concepts. One is uncertainty, insofar as taxpayers don’t and even can’t know what future capital gains rates are going to be. The other is optionality, insofar as taxpayers can deliberately plan to realize taxable gains more in low-rate periods and less in high-rate periods, including by selling just before a rate increase or just after a rate cut.

Uncertainty is bad for a risk-averse taxpayer. But optionality can only be good. An option that you possess can’t be worth less than zero. The option to wait for a lower future CG rate is worth more than otherwise if rates are volatile rather than stable. And it’s worth more than otherwise if the rate is more likely to go down than up. Thus, if we believe that future CG rates most likely will stay within the historically observed 15 to 30 percent range, the option is worth more, all else equal, if the current CG rate is in the neighborhood of 30 percent than 15 percent.

Still, because optionality is bound to be an important part of the picture for many or most real world holders of appreciated capital assets, and because an option can’t be worth less than zero, I think of capital gains rate uncertainty as likely, in the main, to put a thumb on the scales in favor of deferral (which, again, is already tax-favored in this setting), not against it).

Time for more cat pandering

In response to popular demand (i.e., a recent conversation), I’ve decided that it’s time for more cat pix. So here goes, with quiz question:

I wonder if one can tell just from these two pictures who is the calmer, and who the crazier, of these two brothers (whom I would presume are genetically just half-siblings). Whether or not the pictures give it away, it’s not a close call.

Here’s Gary:

And here’s Sylvester:

So, whaddaya think?

NYU Tax Policy Colloquium, week 2: Rebecca Kysar’s Unraveling the Tax Treaty

Yesterday at the Tax Policy Colloquium, Rebecca Kysar of Fordham Law School presented Unravelingthe Tax Treaty. Here are some of my thoughts regarding this very interesting paper and the issues it raises.

Bilateral tax treaties are the subject of a flourishing sub-literature that has often, and perhaps increasingly, criticized their impact on developing countries in particular.  The paper explores extending this critique to the United States, e.g., because, as a net capital importer these days, we may tend to lose current tax revenue from treaties’ standard practice of having the two treaty partners mutually surrender source taxing jurisdiction with respect to the other’s residents.
For purposes of thinking about the issues that the paper raises, I find the following outline of main treaty provisions helpful. Bilateral tax treaties frequently include the main six types of provisions, among others:

1) Residency/LOB rules – Treaty benefits are accorded to residents of the two treaty partners. However, reflecting corporate residence’s inherently limited meaningfulness as a concept, there may be “limitation on benefit” (LOB) rules that aim to defeat “treaty-shopping” by denying treaty benefits to, say, entities that are residents of one of the jurisdictions but that are functioning as mere conduits.

2) Permanent establishment (PE) rules– Treaty partners generally agree not to tax each others’ residents’ business profits on a source basis in the absence of a “permanent establishment” (PE) in the source jurisdiction – e.g., an office with dependent agents working there.

3) Withholding taxes – Countries frequently tax domestic source dividend, interest, and/or royalty income that is paid to nonresidents. Given the collection and enforcement problems that might otherwise arise, this is done via withholding taxes that have a fixed rate and are gross basis (i.e., no deductions are allowed).  For example, the U.S. by statute has a 30% withholding tax.  In treaties, however, countries may reciprocally agree to a charge a lower withholding tax rate (e.g., 15%, 5%, or even 0%) to each other’s residents.

4) Anti-“double taxation” rules– I’m putting “double taxation” in scare quotes here for a reason. As I discuss below, the concept at issue here is perhaps better described other than as one of double taxation.  But there can be good reason for one not to like the result whereby cross-border investment is tax-discouraged, such as by reason of its being fully taxed in both the residence jurisdiction from whence it came, and the source jurisdiction to which it went. Treaties may address this problem by requiring that each treaty partner, with respect to its residents, either exempt foreign source income (FSI) earned in the other jurisdiction, or grant foreign tax credits with respect to the taxes imposed by the source country. The tax treaties also typically provide mechanisms for achieving consistent treatment if otherwise each country would treat the same income as earned within its own borders.

5)  Nondiscrimination– Treaty partners may agree not to have tax rules discriminating against each others’ residents for tax purposes (although the provisions tend to be quite narrowly drawn).

6) Information exchange, etc. – Treaties also contain lots more types of provisions, which I will here cavalierly funnel into a catch-all, albeit noting in particular information exchange between the two sovereigns as it’s an important aspect that treaty critics often praise.

A number of further common treaty provisions might merit separate coverage due to their potential importance. For example, treaties may indicate that one should apply an arm’s length standard to transactions between commonly owned affiliates, supporting the use of transfer pricing. But these provisions are not as extensively discussed in the Kysar paper as those listed above.

The paper’s main arguments are to the effect that (a) 2 and 3 are bad, (b) 4 and 5 are poorly defined and/or not needed, and (6) can be done separately.  For reasons of time and space, I’ll just discuss (a) here.

WHY CEDE SOURCE-BASED TAX JURISDICTION RE. NON-PE BUSINESS INCOME AND PASSIVE INCOME?

Features 2 and 3 above cede source-based tax jurisdiction with respect to non-PE business income and passive income. This leads to the question: Why would one ever reduce optionality / future flexibility by ceding something in advance? There are 3 main answers that one could offer in this context, pertaining in turn to pre-commitment, reciprocity, and coordination.

Pre-commitment – Retaining discretion for future policy choices is a bad thing if one is sufficiently likely to misuse it. (This of course is the “tie Odysseus to the mast” line of argument.) Governments may benefit, for example, from committing in advance against ex post expropriation of inbound investment. In the realm of free trade, if we posit that tariffs are generally a bad idea, but are too often produced by protectionist forces even when one doesn’t have a malignant clown at the helm, free trade treaties serve to pre-commit countries to do what they would benefit from doing anyway.

The best argument for applying that line of reasoning here is that small open economies may fail to benefit from taxing inbound capital. Absent market power, local rents, etc., the tax is likely to be borne by locals even if it is nominally paid by the foreigners, and may impose greater deadweight loss than the alternative tax instruments that might have been used instead. But the optics of literal tax payment by the outsiders may be unduly enticing.

This line of argument seems more likely to apply to withholding taxes on passive income, than to non-PE business income. For example, foreign multinationals with valuable IP or trade names may be in a position to earn local rents, so one might want to tax them without regard to whether they have been able to avoid crossing the PE threshold.

Reciprocity – A tax treaty’s two parties reciprocally recede. So one’s loss of source-based taxing jurisdiction is offset by the other side’s so receding with respect to one’s own residents.

In a pure symmetry case where investment flows, income earned, tax rates, etc., are completely the same, the revenues foregone equal those that the other side foregoes with respect to one’s own residents. The paper notes that, if net capital importers tend to lose in revenue terms (at least on a current basis) from the asymmetry between the value at stake on the two sides to the deal, that is an issue not just for developing countries, but also for developed net capital importers like the U.S. Agreed that this is relevant to the analysis.

Coordination – If countries attach positive value to coordinating their tax systems with each others’, and thereby avoiding peculiar interactions or combined effects, then tax treaties may offer a valuable coordination device. There has increasingly been concern that countries’ go-it-alone responses to profit-shifting concerns (e.g., digital services tax, UK diverted profits tax, the BEAT, etc.) may interact with each other in undesired ways. But admittedly existing tax treaties may not help all that much in coordinating responses.

How much difference does it make?– To the extent that the U.S. is losing tax base by reason of treaty concessions with respect to non-PE businesses and withholding taxation, it’s worth noting that the treaties’ marginal effect is seemingly reduced by relevant aspects of domestic U.S. tax law. For example, we don’t tax inbound business income on a source basis unless there is a “U.S. trade or business.” The legal concept here overlaps considerably with that of finding a “permanent: establishment.” Likewise, even if we presume that 30% is the truly “intended” withholding tax rate (rather than serving in part merely to give us something to negotiate away on behalf of U.S. taxpayers), the extent to which it can be avoided by, say, using derivative financial instruments (such as notional principal contracts) instead of directly realizing income that is subject to withholding tax – and the extent to which (despite recent regulatory tightening) this might be intended and reasonably so given the “small open economy” issue – is worth keeping in mind.

NYU Tax Policy Colloquium, week 1: Stefanie Stancheva’s Taxation and Innovation in the 20th Century

Yesterday we kicked off the 24th NYU Tax Policy Colloquium, with Stefanie Stantcheva of the Harvard Economics Department presenting Taxation and Innovation in the 20th Century. Here are some of the main points that occurred to me about the paper. (I don’t comment here about the discussions, in order to preserve their off-the-record status – not that this often matters, but so that no one in attendance will ever need to worry about this question.)
1) Strong empirical results – This paper is a major success (for Stantcheva and her co-authors: Ufuk Agcigit, John Grigsby, and Tom Nicholas) because it generates strong and robust empirical findings, using new datasets that permit them to examine how state and local personal and corporate income taxes since 1920 may have affected innovation, as defined by patent quantity and “quality” (measured by citations).  They find very significant tax elasticities for innovation quantity and quality, as thus measured, that are robust across a range of different specifications.  And while a lot of what they find appears to be “business-stealing” – i.e., mere shifting of activity from one tax jurisdiction to another – they conclude that this is not the entire story; there appear also to be (lesser) effects on overall activity levels.

While the paper is science not advocacy, it strikes me as potentially important for a broader set of tax policy debates. A whole lot more would be needed to support some of the implications that I discuss below, and the authors make no claim that such support is likely to be found, but it’s nonetheless worth spelling out here why I think this line of research is of broader interest.

Other empirical results in the paper that are of interest include (a) its comparison of the effects of corporate versus personal income tax rates, and (b) its finding that agglomeration effects, such as the concentration of IP researchers in Silicon Valley, reduce the tax elasticity of the measured behaviors.

2) Contra Diamond and Saez, Ocasio-Cortez, et al? – Our era of rising high-end inequality has helped to shift the Overton window for academic debate, and perhaps even public political debate, regarding how high marginal rates at the top should be.

Optimal income taxation (OIT), a literature to which Stantcheva has (elsewhere) made significant contributions, was long thought to suggest having relatively flattish rates, for what is essentially a technical or logical reason.  OIT aims to maximize a measure of social welfare, typically based on utilitarian or other summations of individuals’ welfare, by trading off redistributive benefits (from declining marginal utility or a pro-egalitarian aggregation rule) against efficiency costs. However, the fact that rate brackets at the very top of the income distribution are inframarginal (because one is guaranteed to be above them) for so small a proportion of the people subject to them, as compared to rate brackets lower down the distribution, pushes against steep rate graduation. E.g., if I know that I will be earning $1 million or more, and the choices that I am considering relate only to the question of how much more, then rate brackets for the first $1 million of my income have no distortionary effects (i.e., only income effects) on what I decide to do.

An influential paper from several years back, by Peter Diamond and Emmanuel Saez, nonetheless used an OIT methodology to support top marginal rates as high as 70%. Their conclusion relied heavily on (1) empirical evidence of low short-term labor supply elasticity, which suggested that the deadweight loss from so high a rate at the top might not be so great, and (2) a view that any lost utility to people at the very top from the high rates would be subjectively, and/or as a matter of social welfare function weighting, indistinguishable from zero. Under the Diamond-Saez paper’s model, one therefore might choose the revenue-maximizing rate for people at the top of the income distribution. One still, however, wouldn’t want a more than revenue-maximizing rate.

An admitted problem with this conclusion was our lack of good knowledge about long-term labor supply elasticities. However, one may derive additional support for very high rates, and indeed for more than revenue-maximizing rates at the top, if one believes (as I do) that extreme high-end wealth concentration can have negative externalities, e.g., by reason of its (in the words of an NYT op-ed today by Saez and Gabriel Zucman) undermining “democracy against oligarchy” and creating “corro[sion of] the social contract.”  Under that view, extremely high wealth and income concentration at the top can be like pollution, which a Pigovian tax would price at marginal cost rather than aiming at revenue maximization.
As Paul Krugman and Matthew Yglesias, among others, have noted, this academic research underlies Alexandra Ocasio-Cortez’s recent call for a 70 percent top marginal rate, which I view as desirably expanding the Overton window for public policy debate, although I wouldn’t go all the way there myself for reasons that it would be too digressive to address here.

So, how is the Stantcheva paper on innovation’s tax sensitivity relevant to all this? The answer is that it could point the way towards a new line of argument against very high marginal rates, in particular at the top where one might expect successful IP entrepreneurs to find themselves.  The point here is not just that innovation, as measured by the paper, appears to be highly tax-responsive, but also that there are arguments for its having great social value.
3) Is “innovation” special? – In my days (back in the mid-1980s) as a Legislation Attorney at the Joint Committee on Taxation, I had a colleague whose area of responsibility included the R&D credit. He personally hated the credit (not that he could do anything about this preference), despite the fact that our economists loved it, at least in principle, because they viewed R&D activity as having positive externalities. He liked to say, in support of his view, that too much of the activity that would end up qualifying for the credit was of the character of, say, McDonald’s or Burger King working on their sesame seed buns.

But even leaving aside knowledge spillovers from businesses’ research that may be more consequential than improving the sesame seed bun, we know there is some patentable innovation that appears to have strong positive externalities. Consider iPhones, or else advances in medical treatment that prolong people’s lives and wellbeing. There is “innovation” out there that (a) creates new consumer surplus, in that the subjective value to people of a newly available item exceeds its market price, and/or (b) increases workers’ productivity, thereby permitting to reach higher levels of market consumption plus leisure than were previously available to them.

Insofar as high taxes reduce the amount of such socially valuable innovation, we have positive externalities from low rates to think about, not just the negative externalities that Saez and Zucman invoke. Thus, a finding that socially valuable innovation is reduced – not just shifted around between jurisdictions – by higher tax rates would complicate the OIT analysis and push back against the Diamond-Saez-Zucman / Ocasio-Cortez approach.

This is very far from being established by the Stantcheva paper (as Stantcheva herself would be the first to agree). But it helps to show why further research, along the lines that this paper significantly (albeit still preliminarily) advances, is of very broad interest.

One should also note the possibility of negative externalities that might be associated with innovation – e.g., if it increases high-end inequality, creates welfare losses from Schumpeterian “creative destruction,” and/or involves strategic and defensive patenting, patent trolling, etc. But in any event the paper helps to point out the need to think more about possible spillovers from “innovation,” as well as about long-term labor supply elasticities, when debating top marginal rates.

4) The paper’s quantity and quality measures of innovation – In keeping with much other IP literature, the paper uses patents as a measure of innovation quantity, and patent citations as a measure of innovation quality. This is inevitably open to challenge and imperfect, as a rich vein of IP literature has been exploring.

5) Policy implications if high tax rates reduce socially valuable innovation – Even if the paper’s possible and still speculative broader policy implications are confirmed, there is a whole lot of instrument choice to think about. E.g., lower rates vs. exempting the normal return to innovation vs. targeted subsidies for innovation vs. stronger IP protections. Each tool might have its own set of tradeoffs.

6) Fiscal federalism issues – Some of the positive spillovers that innovation might yield may operate at the global level. E.g., insofar as iPhones improve people’s lives, such improvement is not conditioned on the iPhone’s having been invented within one’s own taxing jurisdiction. So just as any one country, if it is acting purely selfishly and unilaterally, lacks the incentive to impose carbon taxes that are priced at global marginal cost, so it might be expected to disregard or at least undervalue innovation’s positive spillover effects on outsiders.

On the other hand, “business-stealing” may be locally beneficial even if not globally so. Thus, local incentives to encourage innovation might end up being either too high or too low in the aggregate, but seem likely to be misdirected from a global social welfare standpoint.

I’ve commented here mainly on some of the broader implications, albeit as still unknown, that cause “Taxation and Innovation in the 20th Century” to be of especial broader interest. But I should also note one more time its yielding strong and robust empirical findings, of a sort that researchers more commonly wish for than achieve.

2019 NYU Tax Policy Colloquium

As it’s now less than two weeks to showtime, here is an update on our speaker schedule for the 2019 NYU Tax Policy Colloquium, now including most paper titles:

SCHEDULE FOR 2019 NYU TAX POLICY COLLOQUIUM

(All sessions meet from 4:00-5:50 pm in Vanderbilt 208, NYU Law School)

1.      Tuesday, January 22 – Stefanie Stantcheva, Harvard Economics Department. “Taxation and Innovation in the 20thCentury.”

2.      Tuesday, January 29 – Rebecca Kysar, Fordham Law School. “Unraveling the Tax Treaty.”

3.      Tuesday, February 5 – David Kamin, NYU Law School.  TBD.

4.      Tuesday, February 12– John Roemer, Yale University Economics and Political Science Departments. “A Theory of Cooperation in Games With an Application to Market Socialism.”

5.      Tuesday, February 19 – Susan Morse, University of Texas at Austin Law School. “Government-to-Robot Enforcement.”

6.      Tuesday, February 26 – Li Liu, International Monetary Fund.  “At a Cost: The Real Effects of Transfer Pricing Regulations.”

7.      Tuesday, March 5 – Richard Reinhold, Willkie, Farr, and Gallagher LLP.  [The parsonage allowance and the establishment clause.]

8.      Tuesday, March 12 – Tatiana Homonoff, NYU Wagner School. “Encouraging Free Tax Preparation Among Paper Filers: Evidence from a Field Experiment.”

9.      Tuesday, March 26– Michelle Hanlon, MIT Sloan School of Management.  TBD.

10.  Tuesday, April 2– Omri Marian, University of California at Irvine School of Law. “The Making of International Tax Law: Empirical Evidence from Natural Language Processing.”

11.  Tuesday, April 9– Steven Bank, UCLA Law School. “Manufacturing Tax Populism.”

12.  Tuesday, April 16– Dayanand Manoli, University of Texas at Austin Department of Economics. “Tax Enforcement and Tax Policy: Evidence on Taxpayer Responses to EITC Correspondence Audits.”

13.  Tuesday, April 23– Sara Sternberg Greene, Duke Law School.  A Theory of Poverty: Legal Immobility.”

14.  Tuesday, April 30– Wei Cui, University of British Columbia Law School. “The Digital Services Tax: A Conceptual Defense.”

AALS Tax Section panel

This past Saturday (January 5), at the American Association of Law School’s Annual Meeting (in New Orleans), I was among the four panelists at an AALS Tax Section panel. The panel was organized and moderated by Shu-Yi Oei, the other panelists were Karen Burke, Ajay Mehrotra, and Leigh Osofsky, and the topic was “The 2017 Tax Changes: One Year Later.” For more general background information about the panel, see here (from the Tax Prof blog).

We divided up in advance the particular topics to be discussed by each of us, and here is a very rough effort to reproduce in miniature my comments:

1) What have we learned in the past year about the economic impact of the 2017 tax act?
This morning, the Sun rose. Did we thereby learn something new  about the Solar System? No, because that is exactly what we expected. By contrast, we most definitely would have learned something new (assuming we were still around to reflect about it) if, for some extraordinary reason, the Sun HADN’T risen in the morning today.

For exactly that reason, we haven’t learned all that much, in the past year, about the economic impact of the 2017 tax act. For example, there was absolutely no dispute, among serious, responsible, and knowledgeable people, that the act was going to lose a lot of revenue. And so it has – perhaps slightly on the high side, relative to “dynamic” expectations, but that is what I expected for various reasons.

We also “learned” that it did not stimulate a flood of new U.S. investment and other economic activity. But the only thing that was seriously in dispute in this dimension remains so – what might be the effects on U.S. investment over a much longer time horizon – given, e.g., that the relationship between statutory and effective tax rates for multinationals is not perfectly understood (and may have changed in multiple ways by reason of the 2017 act), and that the long-term effect of rising debt overhang will need more time to be observed.

There also seems to have been, unsurprisingly, a bit of mild Keynesian stimulus at the wrong time, i.e., when it was not really much needed. The rising debt overhang may make it harder in the future to use Keynesian stimulus through budget deficits at times when it might be far more needed.

Whether or not the flood of stock buybacks by U.S. companies was expected, it should have been. What else to do with the money that one is no longer constrained from “repatriating” as an accounting matter? And big U.S. companies with overseas profits were not generally cash-constrained with regard to U.S. investment.

The buybacks gave a great talking point to critics of the 2017 act, because their occurrence seemed so contradictory to the ridiculous talking points that were being made by the act’s proponents. But were the buybacks as such bad? Not really. They presumably shifted funds from companies that had no particular current use for the $$ to shareholders who now might find it transactionally cheaper to direct as they liked the value that was paid out. This can be a good thing. And if the funds transfer was merely being delayed under prior law by international deferral, that wasn’t really doing anyone any particular good (including the U.S. tax authorities).

2) International changes
I’ve discussed the 2017 U.S. international tax changes in greater detail on other occasions. But 3 points I made are as follows:

(a) In the aftermath of GILTI and the BEAT, it’s clearer than ever that we’re in a “post-territorial” world, i.e., one in which the old “worldwide versus territorial” debate has been shown to be orthogonal to the issues of main interest to policymakers.

(b) Many U.S. tax lawyers with whom I have spoken have an aesthetic dislike for the shift in U.S. international tax law, and not just because it wiped out much of their knowledge and allowed their junior associates to be on a more even knowledge footing with them, going forward. GILTI, the BEAT, and FDII (to the extent that anyone actually cares about it) have devalued legal advice based on judgment, relative to clients’ running lots of scenarios to guide tax planning.

E.g., suppose the client is wondering about whether it will face the BEAT this year, rather than escaping it under the so-called 3 percent rule (under which the BEAT doesn’t apply if less than 3% of one’s deductions are “base erosion tax benefits”). Even if one can set the numerator for this computation with certainty – which may not be the case – one is highly unlikely to know the denominator with anything close to certainty, as it may depend on the uncertain course of various business outcomes. So rather than just ask the lawyers what the BEAT means, firms may base key planning choices on running lots of probabilistic scenarios. Whether or not this is any worse than the prior state of the play ifor American or global welfare, it’s definitely much less fun for the tax lawyers.

(c) It’s been interesting to observe that a number of other countries appear to be intrigued by the idea of adopting their own versions of GILTI and the BEAT. While not a huge surprise, I didn’t regard this in advance as entirely certain..

3) Partial repeal of state and local tax (SALT) deductions
On this front, it’s been fun (if that’s the word for it)  to observe the fault lines in academic debate between people who might typically agree more with each other than they do on this issue.

In the broader policymaking world, I’ve been at least mildly surprised by:

(a) the extent to which blue states have stepped forward to devise what might be called workarounds (I think this reflects the legislation’s nasty red state vs. blue state optics).

(b) the extent to which the Treasury, in response, has seemingly been willing to back away from past limited giveaways to what were mostly red state (albeit more limited) workaround schemes. I had wondered if the Treasury might either (i) feel more constrained by past rulings that favored, e.g., the use of state law tax credit tricks to make private school tuition effectively deductible, or (ii) be willing to respond with baldfaced inconsistency as between past red state and post-2017 blue state planning responses.

4) Where might we be headed next?
This remains unclear, given both the long-term fiscal gap and pervasive U.S. political uncertainty. But future action may need to focus more on new revenue sources (such as from VATs, including disguised versions such as the BAT/DBCFT, and/or from carbon taxes and the like), and less on “tax reform.”

Indeed, I think the term “tax reform” is now dead, other than as a synonym for “changes that I, the speaker, happen to like.” And good riddance, as it had outlived its usefulness.

From at least the 1950s through the 1970s, “tax reform” mainly meant broadening the base so that high-end effective rates would tend to come closer to matching the era’s steeply graduated statutory rates.

Then in the 1980s, “tax reform” came to mean broadening the base and lowering the rates, in a manner that was meant to be net revenue-neutral and distribution-neutral. It might also involve switching from the current income tax to a far more comprehensive version of the consumption tax, although that definition didn’t really get very far off the ground until more recent decades, when it continued to lack political traction.

After the so-called 2017 “tax reform” that lost immense revenue, was extremely regressive, and in many respects narrowed the tax base (e.g., via the egregious passthrough rules), I think we can forget about the term’s being used in public policymaking without evoking derisive laughter. Whether or not 1986 tax reform was tragedy (I don’t think it was), 2017 was definitely farce, and this implies no third act for the concept.